Relative Content

Tag Archive for javaapache-sparkpysparkmonitoringamazon-emr

Write out the peak memory utilization of a Pyspark Job on EMR to a file

We run a lot of Pyspark jobs on EMR. The pipeline executed is the same, but the inputs can wildly change the peak memory utilization, and that utilization is growing over time. I would like to automatically write out the peak memory utilization of each step submitted to the EMR cluster. If it matters, we are using cluster mode with yarn as the cluster manager. We are also submitting these jobs as Docker containers.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for javaapache-sparkpysparkmonitoringamazon-emr

Write out the peak memory utilization of a Pyspark Job on EMR to a file