I have built a pipeline using “MultiOutputClassifier” and “SelfTrainingClassifier” and would now like to build a confusion matrix to see how well the labeling performed. I found that “confusion_matrix” only works for nonmulti-label output which lead me to find the function “multilabel_confusion_matrix”
I tried to run my output and known results in through the function and result in the following error
numpy.core._exceptions._UFuncNoLoopError: ufunc 'maximum' did not contain a loop with signature matching types (dtype('<U11'), dtype('<U11')) -> None
I read it might be a type problem and using the example provided on the scikit learn website I found out the data types for the numpy arrays, tried again and received the same error.
here is my code
labels = list(y_test_df.keys())
yt = y_test.to_numpy().astype(np.int64)
y_pred = pipeline.predict(y_test)
y_pred = y_pred.astype(np.int64)
multilabel_confusion_matrix(yt, y_pred, labels=labels)
When I look at the data structure it is as follows
y_test.shape
>> (60527, 7)
y_test.dtype
>> dtype('int64')
y_pred.shape
>> (60527, 7)
y_pred.dtype
>> dtype('int64')
print(labels)
>> ["rpg", "fps", "rts", "dnd", "aaa", "nfc", "abc"]
edit:
I have found that by removing the labels than everything appears to run fine with the confusion matrix being output. my question though is why does this happen with the labels?