I am working on a CSV validation script based on The National Archives schema (https://blog.nationalarchives.gov.uk/csv-validator-new-digital-preservation-tool/) using Python and Lark. The basics all work such that if you declare a column must be called “name” and not be empty (name: notEmpty
), or called “age” and be between 0 and 120 (age: range(0, 120)
) the Lark transformer is able to collect up all the relevant functions through the tree and hand them over to the validation script to confirm that each value is True.
However, I cannot get my head around how I would have explicit contexts from within the CSV which will only be read by the validator, therefore beyond the scope of the transformer. For a made up example, lets say I have a CSV of car brands which has columns “make”, “model”, “color”. I need to be able to say
color: if($make("Ford") and $model("Model T"), is("black"))
In the parser and transformer class I have various expressions but to give the simplest example, the is_expr
looks like this:
// Lark EBNF
is_expr: "is(" string_literal ")"
string_literal: /"(\.|[^"])*"/
from lark import Lark, Transformer
class CSVS_Transformer(Transformer):
...
def is_expr(self, tree):
(tree,) = tree
def is_validator(value):
return value == tree
return is_validator
...
I really cannot work out how I could include references to other parts of the CSV when the transformer never sees the CSV file.