The Problem:
I’ve a DAG wherere I basically launch a docker Container with a bunch of volumes and environment variables.
One of these Environment variables is a Json String that must be computed with Python Code and then supplied to the DockerOperator. This JSON String serves as a configuration to some application.
from datetime import datetime
from airflow import DAG
from airflow.providers.docker.operators.docker import DockerOperator
default_args = {
'owner' : 'airflow',
'description' : 'Test',
'start_date' : datetime(2024, 5, 1),
}
def some_function():
# Here is the logic to generate the HUGE dynamic JSON String
json_string = "..."
return json_string
with DAG('docker_operator_demo', default_args=default_args, schedule_interval="5 * * * *", catchup=False) as dag:
example_task = DockerOperator(
task_id='run_my_app',
image='my_docker_img',
container_name='my_app',
api_version='auto',
auto_remove=True,
command="echo run_my_app",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
environment={
"MY_APP_CONFIG": some_function()
}
)
example_task
The problem is that this JSON String is huge sometimes (like 200+ keys where each values is 1000 characters+). When the JSON is this huge, I’ve this error argument list too long
.
I understand where this error comes from, it’s just that the env variables exceeds the UNIX limit.
Possible solution:
I’ve thought about supplying this “dynamically computed” JSON string as a volume mount to the DockerOperator but I can’t find a proper way to do this.
The Question:
With Airflow, is it possible to dynamically create a file and supply it as a volume to the docker operator ? If yes, can you please share an example ?