Running paraller threads with Pyspark dataframe
I have to write same data to two separate data store. I am using PySpark foreachPartition to process data parallel, and using threads to write the data to two store in parallel.
I have to write same data to two separate data store. I am using PySpark foreachPartition to process data parallel, and using threads to write the data to two store in parallel.