I have a structured streaming application in Spark3 written in scala 2.12. The application starts reading from streaming source, applies some transformations and writes to multiple HDFS paths and after sometime one/two task keeps on running for many hours and writing to HDFS path stops. This task which is running for long time is responsible for writing to the HDFS path.
it writes to checkpoint offset and does not progress further. Meanwhile the other paths are not affected.
The application when restarted starts to write the data again from where it left.
How to make this application stable
Sinduja Gururajan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.