Python to Pyspark migrated code is running too long
I have a python script to predict an output using Logistic regression. Input file has only 40k entries with 10 columns which is relatively a small dataset for spark engine. This code is part of a pipeline where the preceding codes runs for 1-2 hours (in python) and the end result is to predict an outcome using logistic regression. The Python prediction script runs only for 11 seconds however the spark code for the same runs for more than an hour. This is strange!
SPARK_GEN_SUBQ_0 WHERE 1=0, Error message from Server: Configuration schema is not available
I’m trying to read the data from sample schema from table nation from data-bricks catalog via spark but i’m getting this error.