I’m trying to use AutoML to run a model and running this code:
summary = databricks.automl.regress(train_pdf, target_col="target_value", timeout_minutes=100, time_col = 'open_date')
However, when I run this, it states that open_date has 20,000+ null values so it drops those rows.
This isn’t true because I checked for null values using different methods and I didn’t find any. I also made sure by checking the value counts of that col and it was always a date in this format: yyyy-MM-dd. I tried changing the datatype from Date to Timestamp and it still showed that comment. Does anyone know what the issue is?
I don’t believe it’s relevant but including it in case it is I also noticed this:
My data is about 30,000 rows. The comment about dropping the null values is after it gives a different message that’s it’s using only 30% of the data which is fine but I’m not sure why it would even say 20,000 rows when 30% of the data is only 9k.
Angie Adonis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.