I’m using Azure automl for NLP, it’s a multilabel classification where a single prediction could involve multiple values. In my case, I’m predicting repair codes for automotive units. I’ve condensed this into two columns, one column is a plain text document comprised of the unit make, model, parts used, and repair steps the technician noted during the repair. The second column is what I’m looking to predict, which is a list of relevant repair codes. I’ve given 28,000 training samples, and 4,000 validation samples.
No matter the defaults, or hyperparameter tuning, I almost always get 0 on accuracy, precision, recall, and of course f1. Log loss is 33 to 50 in every instance. When I get these 0 metrics, I don’t get any results from the models deployed endpoint.
Any idea what I’m missing to improve the prediction metrics for this scenario? Is NLP multilabel the wrong approach altogether?