Relative Content

Tag Archive for pythonpysparkruntime-error

PySpark EOF and CRC Java errors

I am using PySpark for data processing. I have tried it on both Windows 11 and WSL2 with Python version 3.10.12, java version 21.0.3 and 17, winutils and hadoop.dll for Hadoop 3.3.6, and Spark version 3.5.1. When I run my code, I usually run into either an EOF or a CRC Java exception. There are also a few other exceptions that occur, less frequently. The error that occurs changes depending on when I run it, even for the exact same code. I experienced the same errors on different code, and I was able to get it to run all the way through only by running each part in Jupyter notebooks until there wasn’t an error. Here is the code: