I have encoded Gender column by OneHotEncoder. I want to apply log transformation to only Female[0] column but it is applying log to all the columns — why?
My code:
import pandas as p
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.compose import ColumnTransformer
import numpy as n
customer=p.read_csv('/content/Customers.csv')
customer.drop(['CustomerID','Profession','Family Size','Work Experience'],axis=1,inplace=True)
column=ColumnTransformer(
[
('ohe_gender',OneHotEncoder(sparse=False,dtype=n.int32),[0])
],remainder='passthrough'
)
function=ColumnTransformer(
[
('function',FunctionTransformer(n.log1p),[0,1])
],remainder='passthrough'
)
s=column.fit_transform(customer)
function.fit_transform(s)
Output:
array([[0.00000000e+00, 6.93147181e-01, 1.90000000e+01, 1.50000000e+04, 3.90000000e+01],
[0.00000000e+00, 6.93147181e-01, 2.10000000e+01, 3.50000000e+04, 8.10000000e+01],
[6.93147181e-01, 0.00000000e+00, 2.00000000e+01, 8.60000000e+04, 6.00000000e+00],
...,
[0.00000000e+00, 6.93147181e-01, 8.70000000e+01, 9.09610000e+04, 1.40000000e+01],
[0.00000000e+00, 6.93147181e-01, 7.70000000e+01, 1.82109000e+05, 4.00000000e+00],
[0.00000000e+00, 6.93147181e-01, 9.00000000e+01, 1.10610000e+05, 5.20000000e+01]]
After encoding (OHE) before FunctionTransformer the output was
array([[ 0, 1, 19, 15000, 39],
[ 0, 1, 21, 35000, 81],
[ 1, 0, 20, 86000, 6],
...,
[ 0, 1, 87, 90961, 14],
[ 0, 1, 77, 182109, 4],
[ 0, 1, 90, 110610, 52]])
I do want to apply log transformation in the 0th index of the above array but as you can see in first output it is applying on all the values although I have specified [0] in column transformer, why? I expect the output with log of only [0] index.
It is transforming the columns you specified: 0 and 1.
Your log transformer specifies: ('function', FunctionTransformer(n.log1p), [0,1])
and those two columns are being transformed.
For example, the first row was:
[ 0, 1, 19, 15000, 39]
And the result was:
[0.00000000e+00, 6.93147181e-01, 1.90000000e+01, 1.50000000e+04, 3.90000000e+01]
If you only want to transform the first column, then change the transformer: “(‘function’, FunctionTransformer(n.log1p), [0])`.
The number 1.9e+01
is the same as 19
— don’t be fooled by the scientific notation.
You can suppress this default behaviour with numpy.set_printoptions(suppress=True)
.