I am trying to train a CatBoostClassifier model using catboost_spark using a Pandas DataFrame. All of the examples I’ve found create a data pool based on dummy data that uses Vector or VectorAssembler (example 1, example 2). Is there a way to easily use a Pandas df for training a spark model, or is there a way to convert a Pandas df into the data Pool?
I tried to use the Pandas df directly in the fit
function, but it keeps crashing the kernel that I’m using.
user27360168 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.