I have a predefined training set and a testing set, the target variable of the training set is Booking_bool. The test set does not have this variable. I am unsure about how to code the random forest classifier to get a classification report as it does not have the target variable that I can define a y_test.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import classification_report
X_train = df_train.drop(['booking_bool', 'click_bool', 'position', 'year', 'srch_room_count', 'is_weekend', 'time_of_day', 'season', 'srch_children_count', 'srch_adults_count'], axis=1)
y_train = df_train['booking_bool']
X_test = df_test.drop(['year', 'srch_room_count', 'is_weekend', 'time_of_day', 'season', 'srch_children_count', 'srch_adults_count'], axis=1)
rf = RandomForestClassifier(n_estimators=1000, max_depth=10, min_samples_split=50, max_features='sqrt', class_weight='balanced')
rf.fit(X_train, y_train)
y_pred_train = rf.predict(X_train)
print("Classification report for training set:")
print(classification_report(y_train, y_pred_train))
y_pred_test = rf.predict(X_test)
print(y_pred_test)
For the train set I can define an x_train and y_train, but for the test set I cannot define a y_test because the target variable is not present, therefore I cannot create a classification report. What is a common way of approaching this problem?