Relative Content

Tag Archive for pysparkaws-glueparquetapache-spark-sql-repartition

PySpark Job Fails with Executor Lost Error on AWS Glue

I’m running a PySpark job on AWS Glue, and it sometimes fails with an “Executor Lost” error. The job reads data from an S3 bucket, processes it, and writes the output back to another S3 location. The goal of the job is to remove the last partition and consolidate the data into a single Parquet file. Here is the relevant part of my code: