I have this code which produces a Value error.
y = df['weather type']
# y is an array with the 11 unique values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
# and of an 'object' type:
array([10, 10, 10, ..., 10, 10, 9])
# Encode the target variable (weather type) to numeric values -
# not sure if I should have done this step because it seems to have messed up my target labels?
y_le = LabelEncoder()
y = y_le.fit_transform(y)
# the unique values of y_le.classes_ are '0', '1', '10', '11', '12', '2', '3', '5', '6', '7', '8'
# the unique values of y_val are 0, 1, 3, 4, 5, 6, 7, 8, 9, 10
# Initialize the XGBoost classifier
xgb_model = xgb.XGBClassifier(objective='multi:softmax', num_class=len(le.classes_))
# Train the model
xgb_model.fit(X_train, y_train)
# Make predictions on the validation set
y_pred_val = grid_search.predict(X_val)
# Evaluate the model
# Print classification report and confusion matrix
print("nClassification Report:n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))
The value error is as follows:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[292], line 6
2 y_pred_val = xgb_model.predict(X_val)
4 # Evaluate the model
5 # Print classification report and confusion matrix
----> 6 print("nClassification Report:n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))
7 #print("nClassification Report:n", classification_report(y_val, y_pred_val, labels=range(len(y_le.classes_)), target_names=y_le.classes_))
File ~anaconda3libsite-packagessklearnmetrics_classification.py:2332, in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
2326 warnings.warn(
2327 "labels size, {0}, does not match size of target_names, {1}".format(
2328 len(labels), len(target_names)
2329 )
2330 )
2331 else:
-> 2332 raise ValueError(
2333 "Number of classes, {0}, does not match size of "
2334 "target_names, {1}. Try specifying the labels "
2335 "parameter".format(len(labels), len(target_names))
2336 )
2337 if target_names is None:
2338 target_names = ["%s" % l for l in labels]
ValueError: Number of classes, 10, does not match size of target_names, 11. Try specifying the labels parameter
as far as I can see, I have already set target_names=y_le.classes_.
How to fix this?
Additionally my target variable, weather_type is an ‘object’ data type, and I am not sure if I should converted it to numeric for an XGBoost multi-classification model?