I’d like to use Pydantic to define and validate AST of queries that will be applied on a Pandas dataframe. Here’s my code:
from typing import List, Literal, Optional, Union
from pydantic import BaseModel, Field
import pandas as pd
class ColumnCondition(BaseModel):
"""
A class that represents a condition that is applied to a single column.
"""
tag: Literal["ColumnCondition"] = "ColumnCondition"
column: str = Field(..., title="The name of the column to apply the condition to.")
operator: Literal["==", "!=", "<", ">", "<=", ">="] = Field(
..., title="The operator of the condition."
)
value: Optional[str] = Field(None, title="The value to compare the column to.")
class AndCondition(BaseModel):
"""
A class that represents an 'and' condition that is applied to two or more conditions.
"""
tag: Literal["AndCondition"] = "AndCondition"
conditions: List["Condition"]
Condition = Union[ColumnCondition, AndCondition]
class ConditionModel(BaseModel):
condition: Condition = Field(discriminator="tag")
def get_column_metadata(df: pd.DataFrame) -> dict:
return {col: str(dtype) for col, dtype in df.dtypes.items()}
if __name__ == "__main__":
"""
Example
"""
condition_json = {
"tag": "AndCondition",
"conditions": [
{
"tag": "ColumnCondition",
"column": "original_amount.currency",
"operator": ">=",
"value": "100",
},
{
"tag": "ColumnCondition",
"column": "original_amount.currency",
"operator": "<=",
"value": "1000",
},
],
}
cond = ConditionModel.model_validate({"condition": condition_json})
print(cond.model_dump_json(indent=2))
This works well, but I have a few questions:
- Is there a way to remove the
ConditionModel
wrapper class? I couldn’t work around it. - What is the best way to handle the types of values? should I have another field in the
ColumnCondition
class of its type? or maybe hold a list of columns and types? - What is the best way to convert such a condition into a string to be used in the
Dataframe.query
method? should I implement a__str__
in each class? or maybe write a method that traverse the AST and creates the string?