I am trying to train a model for a specific task.
Here is a simple description:
image1
image2
Here are screenshots of two different datasets: The data in Image 1 is in the correct order, with no errors and no missing data. The data in Image 2, on the other hand, is disordered, contains noise, and has missing data.
I want to train a model that, when given data of the type shown in Image 2 as input, can return data of the type shown in Image 1.
I try to use RF and CNN model, but but the results did not develop as I expected. I am considering that this may be due to incorrect label selection.
Actually, from the image1, the connection between could be found easily.
For example,
1 2 3 4
A A-1 B B-1
A2 A2-1 B2 B2-1
A3 A3-1 B3 B3-1
A4 A4-1 B4 B4-1
In the data of the type shown in Image 1, A=A2-1,A2=A3-1
Therefore, I hope the model can learn this relationship and then identify the correct order in the disordered data (Image 2). Once one correct sequence is determined, the correct and unique order can be obtained by recursion. Since the rows correspond to each other (i.e., A A-1 B B-1 are fixed in the same row), once the sequence of one column is correctly determined, the entire sequence is also correctly determined.
So I tried to solve the problem using a model. The model was able to run and did learn something, but it didn’t learn anything useful. I began to realize that maybe the issue lies in the label selection. (In fact, this problem might be solvable with an algorithm, but I want to use machine learning to accomplish it.)
I hope someone can provide advice on how to choose labels and split the training and validation sets.
Pantalone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.