I’m new to GX and I’m trying to test the connections defined in my great_expectations.yml.
This is the function that I’m trying to run under airflow in a local docker environment:
`def test_ge_data_context_connections():
context = ge.data_context.DataContext()
# Test each datasource
for datasource in context.list_datasources():
datasource_name = datasource['name']
logging.info(f"Testing connection for datasource: {datasource_name}...")
try:
datasource_config = context.get_datasource(datasource_name)
datasource_yaml = yaml.dump(datasource_config.config)
connection_test_result = context.test_yaml_config(datasource_yaml)
`
ERROR:
File “/home/airflow/.local/lib/python3.11/site-packages/great_expectations/data_context/config_validator/yaml_config_validator.py”, line 556, in _test_instantiation_of_misc_class_from_yaml_config
instantiated_class = instantiate_class_from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/airflow/.local/lib/python3.11/site-packages/great_expectations/data_context/util.py”, line 42, in instantiate_class_from_config
raise KeyError(
KeyError: “Neither config : ordereddict([(‘data_connectors’, ordereddict([(‘default_runtime_data_connector_name’, ordereddict([(‘batch_identifiers’, [‘default_identifier_name’]), (‘class_name’, ‘RuntimeDataConnector’), (‘module_name’, ‘great_expectations.datasource.data_connector’), (‘name’, ‘default_runtime_data_connector_name’)])), (‘s3_bucket_data_connector’, ordereddict([(‘assets’, ordereddict([(‘tasks_data_asset’, ordereddict([(‘base_directory’, ‘s3:///data/DEV/Tasks’), (‘class_name’, ‘Asset’), (‘group_names’, [‘file’]), (‘module_name’, ‘great_expectations.datasource.data_connector.asset’), (‘pattern’, ‘(.)’)]))])), (‘bucket’, ‘*’), (‘class_name’, ‘ConfiguredAssetS3DataConnector’), (‘module_name’, ‘great_expectations.datasource.data_connector’), (‘name’, ‘s3_bucket_data_connector’), (‘prefix’, ‘data/DEV’)]))])), (‘execution_engine’, ordereddict([(‘class_name’, ‘PandasExecutionEngine’), (‘module_name’, ‘great_expectations.execution_engine’)])), (‘id’, None), (‘name’, ‘s3_datasource’)]) nor config_defaults : {} contains a module_name key.”
great_expectations.yml config that I’m testing:
datasources:
s3_datasource:
class_name: Datasource
module_name: great_expectations.datasource
execution_engine:
class_name: PandasExecutionEngine
module_name: great_expectations.execution_engine
data_connectors:
default_runtime_data_connector_name:
name: default_runtime_data_connector_name
class_name: RuntimeDataConnector
module_name: great_expectations.datasource.data_connector
batch_identifiers:
– default_identifier_name
s3_bucket_data_connector:
name: s3_bucket_data_connector
class_name: ConfiguredAssetS3DataConnector
module_name: great_expectations.datasource.data_connector
assets:
tasks_data_asset:
class_name: Asset
module_name: great_expectations.datasource.data_connector.asset
base_directory: ${Storage}/Tasks
pattern: (.*)
group_names:
– file
bucket: ${S3_BUCKET}
prefix: ${S3_PREFIX}
creds:
access_key: airflow
secret_key: airflow
data_warehouse:
class_name: Datasource
module_name: great_expectations.datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
module_name: great_expectations.execution_engine
connection_string: ${DATA_WAREHOUSE}
data_connectors:
default_runtime_data_connector_name:
name: default_runtime_data_connector_name
class_name: RuntimeDataConnector
module_name: great_expectations.datasource.data_connector
batch_identifiers:
– default_identifier_name
default_inferred_data_connector_name:
name: default_inferred_data_connector_name
class_name: InferredAssetSqlDataConnector
module_name: great_expectations.datasource.data_connector
include_schema_name: true
introspection_directives:
schema_name: “public”
default_configured_data_connector_name:
name: default_configured_data_connector_name
class_name: ConfiguredAssetSqlDataConnector
module_name: great_expectations.datasource.data_connector
assets:
Tasks:
class_name: Asset
module_name: great_expectations.datasource.data_connector.asset
schema_name: “public”
data_asset_name: “Tasks”
config_variables_file_path: uncommitted/config_variables.yml
I’m having the same error for both S3 and Postgresql datasources.
Versions:
Airflow 2.9.2
Great Expectations 0.18.16
python 3.11.9
Thank you
I want to test the connections prior to start validating great_expectations checkpoints.
Ana is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.