Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sql

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).

Spark job spilling data vs OOM

I am using spark sql to run sql jobs using 10G executor memory.
When I am monitoring using Spark UI, I can see that data is being spilled to Disk and Memory (expected doing some explode operations ).

How to drop records after date based on condition

I’m looking for an elegant way to drop all records in a DataFrame that occur before the latest occurrence of ‘TEST_COMPONENT’ being ‘UNSATISFACTORY’, based on their ‘TEST_DT’ value for each ID.