Pandas on Spark API Date Operations
I am using Pandas in Spark API for some data preprocessing file which was initially in Pandas. I am seeing that the date operations are very slow and some are not compatible at all. For Eg: I cannot do this df[time_col] + pd.Timedelta(1, unit=’D’) instead I had to write the below operation: df[time_col ].apply(lambda x: x+timedelta(days=1)).