I am trying to register a builtin
check in pandera, but I’m not exactly sure how it’s supposed to be implemented.
For a “standard” check you can do something like the following.
@extensions.register_check_method()
def regex_check(s: pd.Series, pattern: str) -> bool:
return s.apply(lambda x: "".join(x)).str.strip().str.fullmatch(pattern)
Then you can use it in a class.
class DataSchema(DataFrameModel):
test: Series[str] = pa.Field(regex_check={"pattern": r"^d{9}$"})
However this method of creating checks does not have the capability of returning an Error Message like a builtin
check.
When I attempt to create a builtin check I get a Exception
.
raise BuiltinCheckRegistrationError(
pandera.api.extensions.BuiltinCheckRegistrationError: Check 'regex_check' doesn't have a base check implementation. You need to create a stub method in the <class 'pandera.api.checks.Check'> class and then register a base check function implementation with the <class 'pandera.api.checks.Check'>.register_builtin_check_fn method.
See the `pandera.api.base.builtin_checks` and `pandera.backends.pandas.builtin_checks` modules as an example.
The pandera.api.base.builtin_checks
doesn’t appear to exist, but pandera.backends.pandas.builtin_checks
does, but I’ve let to implement it properly.
The author wrote a paper here in which he describes how to extend existing built in methods. Additionally there are some code snippets in the repo which show the stubs.
from pandera.api.checks import Check
@Check.register_builtin_check_fn
def not_equal_to(data: Any, value: Any) -> Any:
raise NotImplementedError
I can’t seem to find anyone actually implementing them. Before I X,Y myself. I am looking for a way to add error messages to the SchemaErrors
for the checks. If there is some other method I should be using in the Class Based API let me know.