So sorry if my description is vague.
I am trying to build a house price prediction system. Data has outliers and is non-gaussian and for target feature y, log transform is used. I have used StandardScaler() to fit for my model before doing one hot encoding. The code looks like this:
numerical_features = df4[['bhk', 'area', 'price_lakhs', 'price_per_sqft']]
categorical_features = df4.select_dtypes(include=['object'])
scaler = StandardScaler()
standardized_features = scaler.fit_transform(numerical_features)
std_df4 = pd.DataFrame(standardized_features, columns=numerical_features.columns)
std_df4.head()
Now for predicting for new values, I used this predict_price() function. I have a hard time understanding how to do it the same as the upper block of code. I separated numerical and categorical values. I can’t do the same below. This code works wrongly and I think the one hot encoded values from columns x[3:] also might have been scaled which is not how upper code works. Any Regressor model[clf.predict() in below code] I use gives same answers for different values of input to predict() below.
def predict_price(bhk, area, price_per_sqft, type, region):
house_type_loc_index = np.where(X.columns == 'type_' + type)[0][0]
print(house_type_loc_index)
region_loc_index = np.where(X.columns == 'region_' + region)[0][0]
print(region_loc_index)
x = np.zeros(len(X.columns))
x[0] = bhk
x[1] = area
x[2] = price_per_sqft
if house_type_loc_index >= 0:
x[house_type_loc_index] = 1
if region_loc_index >= 0:
x[region_loc_index] = 1
columns = X.columns
x = x.reshape(1, -len(columns))
scaler = StandardScaler()
standardized_features = scaler.fit_transform(x)
data = pd.DataFrame(standardized_features, columns = columns)
print(data)
ans = clf.predict(data)[0]
return exp(ans)
I was expecting the model to be able to predict values based on the inputs I gave to the predict function. The predict function is called as below.
predict_price(bhk = 2, area = 2000, price_per_sqft = 35, type = 'Apartment', region = 'Airoli')
The answer I got is: 42.610385322858455
predict_price(bhk = 3, area = 600, price_per_sqft = 70, type = 'Villa', region = 'Vashi')
The answer I got is: 42.610385322858455
I checked further more and every value it receives is 0 for both above lines of predict_price(). That’s why maybe I strongly think I am using standardScaler() wrongly in predict()
bhk area price_per_sqft type_Apartment type_Independent House
0 0.0 0.0 0.0 0.0 0.0
type_Penthouse type_Studio Apartment type_Villa region_Agripada
0 0.0 0.0 0.0 0.0
region_Airoli … region_Vasai region_Vashi region_Vikhroli
0 0.0 … 0.0 0.0 0.0
region_Ville Parle East region_Ville Parle West region_Virar
0 0.0 0.0 0.0
region_Virar West region_Wadala region_Worli region_other
0 0.0 0.0 0.0 0.0
vizzy bhagat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.