I know we could submit a python wheel job through UI, but is that possible to achieve it with airflow DatabricksSubmitRunOperator?
For doc, it seems that only python_file supported, is there workaround to submit a python_wheel to run?
Thanks in advance.
First of all you need to store you wheel and your main into a dbfs location and then call it with your Airflow DAG as this example:
<code> new_cluster = {
"name": "my_cluster",
"spark_version": "14.3.....",
"num_workers": 1,
"autoscale": {
"min_workers": 1,
"max_workers": 8
},
}
spark_python_task_params = {
"new_cluster": new_cluster,
"spark_python_task": {
"python_file": "dbfs:path/to/your/main.py",
"parameters": []
},
"libraries": [
{
"whl": "dbfs:path/to/your/wheel.whl"
}
],
}
spark_task = DatabricksSubmitRunOperator(
task_id="run_this_job",
databricks_conn_id="databricks_default",
dag=dag,
json=spark_python_task_params)
</code>
<code> new_cluster = {
"name": "my_cluster",
"spark_version": "14.3.....",
"num_workers": 1,
"autoscale": {
"min_workers": 1,
"max_workers": 8
},
}
spark_python_task_params = {
"new_cluster": new_cluster,
"spark_python_task": {
"python_file": "dbfs:path/to/your/main.py",
"parameters": []
},
"libraries": [
{
"whl": "dbfs:path/to/your/wheel.whl"
}
],
}
spark_task = DatabricksSubmitRunOperator(
task_id="run_this_job",
databricks_conn_id="databricks_default",
dag=dag,
json=spark_python_task_params)
</code>
new_cluster = {
"name": "my_cluster",
"spark_version": "14.3.....",
"num_workers": 1,
"autoscale": {
"min_workers": 1,
"max_workers": 8
},
}
spark_python_task_params = {
"new_cluster": new_cluster,
"spark_python_task": {
"python_file": "dbfs:path/to/your/main.py",
"parameters": []
},
"libraries": [
{
"whl": "dbfs:path/to/your/wheel.whl"
}
],
}
spark_task = DatabricksSubmitRunOperator(
task_id="run_this_job",
databricks_conn_id="databricks_default",
dag=dag,
json=spark_python_task_params)