I am trying to use the SQLServerToBigQuery Flex Template to pull data from a SQL Server database and write it to BigQuery. I am doing this via API requests through Airflow. A few issues I have already noted are these:
- When I try to launch the job as per the documentation, I get errors regarding the illegality of the driverJars and driverClassName parameters that appear to be required
- When I try to launch the job after removing those two parameters, I get a NullPointerException that doesn’t tell me exactly what value is null that it is expecting to be non-null
- The logging in dataflow exposes the username and password. I tried using a KMS key, but there, I get permission errors regarding the KMS key despite giving the service account I am using for the dataflow job Cloud KMS Admin role on that key.
Here is my basic airflow code:
_request_body = {
'launchParameter': {
'jobName': self._jobname,
'launchOptions': self._dataflowoptions,
'parameters': self._parameters,
'containerSpecGcsPath': _dataflow_template,
'environment': {
'network': self._dataflowoptions.get('network'),
'subnetwork': self._dataflowoptions.get('subnetwork'),
'serviceAccountEmail': hook._get_credentials_email,
'machineType': self._dataflowoptions.get('machineType', 'n2-standard-2'),
'maxWorkers': self._dataflowoptions.get('maxNumWorkers', '10'),
'workerRegion': self._dataflowoptions.get('region'),
'stagingLocation': self._dataflowoptions.get('stagingLocation'),
'tempLocation': self._dataflowoptions.get('tempLocation'),
'kmsKeyName': _kms_key_name
)
}
}
}
hook.start_flex_template(
body=_request_body,
project_id=self._dataflowoptions.get('project'),
location=self._dataflowoptions.get('region')
)
Has anyone else experience similar issues?