I am working on a machine learning project where I have a dataset with a combination of numeric columns and columns containing arrays. The numeric columns (eg. Mean) contain single values, while the columns with arrays (eg. Gradient) can have a variable number of elements per row.
What are the best practices for dealing with this type of input? Can I use numeric columns and columns with arrays simultaneously in a machine learning model? If so, what are the most common strategies to handle this heterogeneity of data during the preprocessing and training phase of the model?
I would greatly appreciate any suggestions or resources that can help me better understand how to deal with this challenge.
Example:
Input of the model:
Mean | Gradient |
---|---|
0.5 | [1,2,3,45,0.2] |
1 | [2,5,1.2,5,0] |
Thank you in advance for your help!
I tried searching online, but could not find a real answer.
Aaron Ach is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.