Problem Description: I am working on a project where I need to combine both textual data (e.g., customer reviews as an example) and numerical/categorical data (like age, product categories) to build a predictive model. While this is not specifically about customer reviews, I am still in the early stages of research. I have approximately 1.5 million rows in my dataset, but only about 1200 have a true Y value.
I am looking for machine learning techniques to handle this, but I want to avoid deep learning for now, especially because I am unsure if my dataset size is sufficient for deep learning models.
My Question:
What machine learning techniques can I use to combine text data with numerical and categorical data without using deep learning? I have looked into stacking, but I am wondering what my other options are for combining these types of data effectively.
Dataset Details:
Approximately 1.5 million rows, with only 1200 having a true Y value.
Text features: similar to customer reviews (just an example).
Numerical/categorical features: age, product categories, etc.
Goal:
I am looking for suggestions on techniques or workflows that could help me combine these data types efficiently without deep learning. Is stacking my best option, or are there other approaches I should consider?
1