I am struggling with this for quite some time :
Problem : I am not able to trigger a DLT job (in venv) Environment via a Airflow scheduler.
Below are few points to consider:
- Everything is running in local Ubuntu Environment
- Airflow is running in .venv and DLT is too running in a separate .venv
- I want to trigger a DLT Python Code that uses a secrets.toml to use saved dB credentials. ( there is no way to take the credentials out of secrets.toml out and use it directly in python code , product limitation , tried it several ways )
- I have tried to use several Airflow operators such as pythonoperator , externalpythonoperator (dosent exist now) , bashoperator , virtualpythonoperator etc , none have worked.
- I have also tried to create Environment variables and pass it via Airflow UI , dsent work .
- Also tried to define python executable on top of python code (shebang) , no help .
These few things have worked so far :
-
If python code is standalone and has no dependency to the project yaml or toml file (read point 3 above) then it works. (Airflow can trigger it then)
-
I created a bash file , which had a path for venv python and activate it and then path of python executable, executing the bash file does it , but when I try to trigger this bash script from Airflow, it fails again. (Diretly running bash works but via Airflow doesn’t hints me that Airflow is using a userID diffrent then me directly running bash script ( could airflow user ID be the issue)
-
Dlthub (the product) has this reference, which does it via Google composer via API keys, but for project security reasons , I cannot go that way, and need a complete local solution (link shared)
-
Since everything is running in ubuntu, I have also tried passing the DB credentials via local variable or via Airflow UI variables , both ways , if its a single standalone python code it works, but when it comes to using project yaml and toml and other such configuration files then it jus fails
I can share error message, but they revolve around , –> could not find dB credentials variables etc?
My ask is very simple , we should be externally able to activate and run a project , is there a project/github links I can use for reference?
sharing the code, that i tried :
- DAG via BASHOPERATOR:
t1 = BashOperator(
task_id='python_script',
bash_command='source /home/azure/Documents/dlthub/.vdlt/bin/activate && python /home/azure/Documents/dlthub/data_pipeline/Ini_LclVrb.py',
env={'SQLSERVER': "{{ var.value.SQLSERVER_CONN }}"},
dag=dag)
- DAG via PYTHONOPERATORY: (notice run_python_script() has one subprocess.call commented out, thats its two flavor i tried,
def run_python_script():
# subprocess.call(['source', '/home/azure/Documents/dlthub/.vdlt/bin/activate', '&&', 'python', '/home/azure/Documents/dlthub/data_pipeline/Ini_LclVrb.py'], shell=True, env={'SQLSERVER_CONN': "{{ var.value.A_SQLSERVER_CONN }}"})
subprocess.call(['/home/azure/Documents/dlthub/.vdlt/bin/python', '/home/azure/Documents/dlthub/data_pipeline/Ini_LclVrb.py'], env={'SQLSERVER_CONN': "{{ var.value.A_SQLSERVER_CONN }}"})
t1 = PythonOperator(
task_id='python_script',
python_callable=run_python_script,
# op_kwargs={'SQLSERVER_CONN': "{{ var.value.A_SQLSERVER_CONN }}"},
dag=dag)
- Creating a .SH and creating DAG to trigger SH and then the SH would trigger .PY (When i locally execute .SH it works fine)
BASH
#!/bin/bash
# Path to the Python interpreter in the virtual environment
VENV_PATH="/home/azure/Documents/dlthub/.vdlt/bin/python"
# Path to the Python script to be executed
SCRIPT_PATH="/home/azure/Documents/dlthub/data_pipeline/Ini_LclVrb.py"
# Activate the virtual environment and run the Python script
$VENV_PATH $SCRIPT_PATH
DAG:
t1 = BashOperator(
task_id='python_script',
bash_command='/home/azure/Documents/dlthub/data_pipeline/Ini_LclVrb.sh',
dag=dag)
all these , did not work for me