This might seem really trivial, but I just don’t understand this problem. So basically, I’m building a restaurant recommender for my city using a Kaggle dataset and RandomForestRegressor.
I built the model, and now want the model to recommend a good restaurant when it is given 4 parameters: location, approx cost, type of restaurant, and number of votes. However, it is returning a value error: X has 8 features, but RandomForestRegressor is expecting 2924 features as input.
This is what I’m trying to run:
import joblib
import numpy as np
from sklearn.preprocessing import StandardScaler
model = joblib.load('my_model.pkl')
scaler = joblib.load('scaler.pkl')
def preprocess_input(location, type_, cost, votes):
one_hot_location = [1 if loc == location else 0 for loc in ['Whitefield', 'Koramangala', 'Indiranagar']]
one_hot_type = [1 if t == type_ else 0 for t in ['Casual Dining', 'Quick Bites', 'Cafe']]
scaled_features = scaler.transform([[cost, votes]])
return np.array(one_hot_location + one_hot_type + list(scaled_features[0])).reshape(1, -1)
input_data = preprocess_input('Whitefield', 'Casual Dining', 1000, 500)
prediction = model.predict(input_data)
print(f"Predicted restaurant: {prediction}")
The shapes of the train data:
X_train.shape
= (41373, 2924)
y_train.shape
= (41373,)
This is how my dataset looks like
I’m a beginner to this, please help me out! Thanks!
Carl Jacob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.