I am given binary data points of dimension 5,000. I am asked to perform machine learning predicting a binary vector of length 1k, where each position of the output is a class. The classes are not exclusive.
What I know about the class distribution:
- positions with small index are more common
- a sample can belong to several classes
- each sample fulfills only a handful of class requirements (i.e. the output is “sparse”)
How can I keep track of the loss in my ML model? I have used multi-layer perceptrons (pytorch) and cross-entropy loss (CE loss), but I find it hard to interpret the results. I assume CE loss is used when you have several classes, but only one is chosen at a time (single class classification).
Moreover, my prediction leads to vectors with about half the bits set, whereas I expect between 20 and 50 bits set, not more.
I am happy to get advice on what to read, what to do or any other help.
# An "example" data point:
point = [1, 0, 0, 1, 0, 0, 1, 1, ...,1, 1, 0, 0, 1, 0, 1] # length 5000
label = [1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...,0] # length 1000
# label has only 20-50 bits set