I have a Pandas dataframe built like this:
Fruit | Color | Eaten? | Date Eaten |
---|---|---|---|
Apple | Red | Yes | 14-Mar-2024 |
Apple | Green | No | 14-Mar-2024 |
Apple | Yellow | Yes | |
Banana | Red | ||
Banana | Yellow | Yes | 14-Mar-2024 |
I’m trying to create some rules for verifying my data, but what’s valid depends on the value in other columns, e.g. “Red” is valid when the Fruit is “Apple”, but not when the Fruit is “Banana”; Having a Date Eaten is valid when Eaten is “Yes”, but not when Eaten is “No”
I’d like to be able to take a csv with previously manually verified data, make a set of rules from that, store those rules in a file for later use, and then use those rules to check new data. Ideally I’d just be able to get a Dataframe with all rows where there’s invalid data, but I’m not too picky about that.
Looked into setting up rules manually, but there too many possible combinations, so that seemed impractical.
Googling sent me to decision trees, which seemed promising, but seemed more useful for predictions based on graphable data rather than verifying exact strings.
Evan Wynne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.