Relative Content

Tag Archive for pythonpysparkazure-databricks

Python to Pyspark migrated code is running too long

I have a python script to predict an output using Logistic regression. Input file has only 40k entries with 10 columns which is relatively a small dataset for spark engine. This code is part of a pipeline where the preceding codes runs for 1-2 hours (in python) and the end result is to predict an outcome using logistic regression. The Python prediction script runs only for 11 seconds however the spark code for the same runs for more than an hour. This is strange!