Context: I have a C# API that sends an HTTP POST request to the /batches route of Livy, and Livy forwards the arguments to my Scala Spark driver. As far as I know, internally Livy runs the spark-submit command within the container of my Spark master.
The body of the request sent to Livy is as follows:
{
"file": "/opt/scala-apps/spark-driver-assembly-1.0.0.jar",
"proxyUser": "X",
"className": "Y",
"args": ["a gigantic json"],
"name": "myTest",
"conf": {
"spark.sql.broadcastTimeout": "1500",
"spark.driver.extraJavaOptions": "-Dlog4j.configuration=file:/opt/scala-apps/livy/conf/log4j.properties -Dguid=111 -Dtimestamp=20240424"
}
}
Issue: The JSON sent in the arguments to Livy (in “args”) is quite large. I didn’t include it here because it’s not necessary. I know that “args” is an array of strings, and I only pass one arg to the Livy route. When I make this request via Postman (to simulate my API doing this), I receive the following error:
“java.io.IOException: Cannot run program “/opt/spark/bin/spark-submit”: error=7, Argument list too long”
From my interpretation, the internal code of the Livy server cannot handle such a large argument.
HOWEVER, I conducted a test by splitting this JSON into two strings, like this:
{
"file": "/opt/scala-apps/spark-driver-assembly-1.0.0.jar",
"proxyUser": "X",
"className": "Y",
"args": [
"a gigantic json (part 1)",
"a gigantic json (part 2)"
],
"name": "myTest",
"conf": {
"spark.sql.broadcastTimeout": "1500",
"spark.driver.extraJavaOptions": "-Dlog4j.configuration=file:/opt/scala-apps/livy/conf/log4j.properties -Dguid=111 -Dtimestamp=20240424"
}
}
In other words, “args” now has two strings. Doing this, the above error does not occur.
With this information, I want some guidance on how to proceed with this problem and a possible solution. Remember that the test I conducted (splitting the JSON into two strings in “args”) is not valid. I don’t want to do this in my code for a few reasons.
P.S.: I also have the impression that it might be some configuration of my container’s shell (or the container’s own operating system) that doesn’t accept such large arguments. Where can I find this information and what can I do?
More infos:
My Spark Driver is configured to have 1 master and 2 workers (each in a different machine). I’m also running my application with Spark Standalone Cluster. Spark version is 3.5.
Here is my spark master conf file:
spark.master spark://myServer:7077
spark.sql.caseSensitive false
spark.executor.heartbeatInterval 90000
spark.network.timeout 400000
spark.executor.heartbeat.maxFailures 10
spark.shuffle.registration.timeout 500000
spark.shuffle.push.finalize.timeout 600s
spark.files.fetchTimeout 600s
spark.rpc.lookupTimeout 600s
spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout 600s
spark.eventLog.enabled true
spark.eventLog.dir file:/opt/spark/logs/spark-events/
spark.history.fs.logDirectory file:/opt/spark/logs/spark-events/
spark.executor.logsDirectory /opt/spark/logs
spark.sql.adaptive.enabled true
spark.sql.adaptive.skewJoin.enabled true
spark.sql.adaptive.localShuffleReader.enabled true
spark.sql.adaptive.coalescePartitions.enabled true
spark.executor.memory 5g
spark.executor.cores 2
spark.driver.memory 8g
spark.driver.cores 4
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.shuffleTracking.enabled true
spark.dynamicAllocation.executorIdleTimeout 600s
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.port 7078
spark.blockManager.port 7087
spark.driver.blockManager.port 7011
spark.shuffle.service.enabled true
spark.shuffle.service.port 7337
spark.submit.deployMode client
spark.worker.cleanup.enabled true
Here is my livy.conf file (I tried to modify the values of “header.size” but it did not work out):
livy.spark.master = spark://X:7077
livy.spark.deploy-mode = client
# Configure Livy server http request and response header size.
#livy.server.request-header.size = 300000
#livy.server.response-header.size = 300000
livy.server.session.state-retain.sec = 600s
livy.cache-log.size = 1000000
livy.file.local-dir-whitelist = /opt/scala-apps
Here is my docker-compose.yml file:
version: "3.4"
services:
spark_master:
container_name: spark_master
image: apache/spark:3.5.0
stdin_open: true
tty: true
user: root
network_mode: host
environment:
- TZ=Asia/Baghdad
- SPARK_PUBLIC_DNS=X
restart: unless-stopped
volumes:
- volumes
ports:
- ports
entrypoint: y
spark_worker:
container_name: spark_worker
image: apache/spark:3.5.0
stdin_open: true
tty: true
user: root
network_mode: host
environment:
- TZ=Asia/Baghdad
- SPARK_MASTER_ADDRESS=spark://X:7077
- SPARK_WORKER_PORT=7087
- SPARK_PUBLIC_DNS=
restart: unless-stopped
volumes:
- volumes
ports:
- ports
entrypoint: z