Customer Data
The dataset is provided via image please once refer the attached image before reading up the problem
Here i have encoded Gender column by OneHotEncoder
problem: I do want to apply log transformation to only Female[0] column but it is appling log to all the columns why?
Code:
import pandas as p
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
import numpy as n
import seaborn as sns
import scipy.stats as sci
import matplotlib.pyplot as plt
customer=p.read_csv('/content/Customers.csv')
customer.drop(['CustomerID','Profession','Family Size','Work Experience'],axis=1,inplace=True)
import pandas as p
column=ColumnTransformer(
[
('ohe_gender',OneHotEncoder(sparse=False,dtype=n.int32),[0])
],remainder='passthrough'
)
function=ColumnTransformer(
[
('function',FunctionTransformer(n.log1p),[0,1])
],remainder='passthrough'
)
s=column.fit_transform(customer)
function.fit_transform(s)
Output
array([[0.00000000e+00, 6.93147181e-01, 1.90000000e+01,
1.50000000e+04,
3.90000000e+01],
[0.00000000e+00, 6.93147181e-01, 2.10000000e+01, 3.50000000e+04,
8.10000000e+01],
[6.93147181e-01, 0.00000000e+00, 2.00000000e+01, 8.60000000e+04,
6.00000000e+00],
...,
[0.00000000e+00, 6.93147181e-01, 8.70000000e+01, 9.09610000e+04,
1.40000000e+01],
[0.00000000e+00, 6.93147181e-01, 7.70000000e+01, 1.82109000e+05,
4.00000000e+00],
[0.00000000e+00, 6.93147181e-01, 9.00000000e+01, 1.10610000e+05,
5.20000000e+01]]
Note: After encoding (OHE) before FunctionTransformer the o/p was
array([[ 0, 1, 19, 15000, 39],
[ 0, 1, 21, 35000, 81],
[ 1, 0, 20, 86000, 6],
...,
[ 0, 1, 87, 90961, 14],
[ 0, 1, 77, 182109, 4],
[ 0, 1, 90, 110610, 52]])
(I do want to apply log transformation in the [0]th index of the above array but as you can see in 1st O/p it is appling on all the values althoigh i have specified [0] in column transformer . why?
i hope you got the issue.
I expect the output with log of only [0] index.
insort apply OHE to Gender and do log transform on only 0th column