I got a flight predictions dataset that i wanted to try my machine learning skills.
I cleaned the data and fixed some new features and removed others
i also got out some valuable data. But when i tried to make predictions and evaluate my model
then this is the answer i got! And that was after i tuned my model with SearchGridCV
Regression metrics on the test set
r2: 82.10%
mean_absolute_error: 1229.1407307097613
mean_squared_error: 2933265.159841384
model = RandomForestRegressor(
max_depth=20,
max_features='sqrt',
min_samples_leaf=2,
min_samples_split=5,
n_estimators=200
)
X = df.drop('Price',axis=1)
y = df['Price']
X_train, X_test, y_train, y_test = train_test_split(
pd.get_dummies(X)
, y, test_size=0.2, random_state=42)
model.fit(X_train,y_train)
y_preds = model.predict(X_test)
i tried to modify the hyperparameters and also to remove some outliers
def remove_outliers_iqr(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
numerical_columns = ['Price', 'Dep_hours', 'Dep_min', 'Arrival_hours', 'Arrival_min', 'Duration_hours', 'Duration_min']
for column in numerical_columns:
df = remove_outliers_iqr(df, column)
but i still get the same results
this is the complete notebook since i don’t know to to attach the whole code in a notebook way here
https://github.com/jamhus/ztm-course/blob/master/fligt%20prices%20analysis/flight%20prices.ipynb