I am deploying a Databricks Asset Bundle to do an ETL job. As part of this, I am making a small table from some master data in a parquet file, which is in my repo.
I want to pass the file_path as a variable defined in my jobs yaml files with dynamic insertion of user and environment variables.
I tried the following:
With this repo:
My repo structure:
databricks.yml
masterdata/
├── my_file.parquet
src/
├── my_script.py
resources/
├── my_job.yml
And this code:
file_path = <file_path>
df = spark.read.parquet(file_path)
and this yaml:
- task_key: station_table_generation
job_cluster_key: job_cluster
notebook_task:
notebook_path: ../src/curated/station.py
parameters:
- name: station_file
default: file:/Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}/files/masterdata/my_file.parquet
I dont know how to call parameters for the job in the code.
Johannes Schwartzkopff is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.