pandera: 0.18.3
pandas: 2.2.2
python: 3.9/3.11
Hi,
I am unable to setup the pandera for pandas dataframe as it complains:
File "/anaconda/envs/data_quality_env/lib/python3.9/site-packages/pandera/api/base/schema.py",
line 96, in get_backend
raise BackendNotFoundError(
pandera.errors.BackendNotFoundError: Backend not found for backend, class: (<class ‘data_validation.schemas.case.CaseSchema’>,
<class ‘pandas.core.frame.DataFrame’>). Looked up the following base
classes: (<class ‘pandas.core.frame.DataFrame’>, <class
‘pandas.core.generic.NDFrame’>, <class
‘pandas.core.base.PandasObject’>, <class
‘pandas.core.accessor.DirNamesMixin’>, <class
‘pandas.core.indexing.IndexingMixin’>, <class
‘pandas.core.arraylike.OpsMixin’>, <class ‘object’>)
My folder structure is:
project/ data_validation/ schema/ case.py validation/ validations.py pipeline.py
case.py:
import pandas as pd
import pandera as pa
class CaseSchema(pa.DataFrameSchema):
case_id = pa.Column(pa.Int)
validations.py
import pandas as pd
from data_validation.schemas.case import CaseSchema
def validate_case_data(df: pd.DataFrame) -> pd.DataFrame:
"""Validate a DataFrame against the PersonSchema."""
schema = CaseSchema()
return schema.validate(df)
pipeline.py
import pandas as pd
from data_validation.validation.validations import validate_case_data
def validate_df(df: pd.DataFrame) -> pd.DataFrame:
"""Process data, validating it against the PersonSchema."""
validated_df = validate_case_data(df)
return validated_df
df = pd.DataFrame({
"case_id": [1, 2, 3]
})
processed_df = validate_df(df)