I have a business entity called “Groups”. This groups are linked to a PQL-like string called “rule”. This rules are evaluated to check which products from an ecommerce match with the rule, then get added to this group. For example:
'NAME CONTAINS ("JACKETS") AND LONG_DESCRIPTION NOT_CONTAINS ("CARDIGANS")'
Business problem is, there are multiple groups with the same rule, and they want them all joined in a single group. Take Knit groups as example:
'(NAME CONTAINS ("sweater","polo neck","jumper") AND LONG_DESCRIPTION CONTAINS ("knit","cashmere")),
'(NAME CONTAINS ("leggings")) AND (LONG_DESCRIPTION CONTAINS ("knit"))',
'NAME CONTAINS ("knit","KNITTED") OR LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED","CROCHET")',
'NAME CONTAINS ("knit","KNITTED","CROCHET") OR LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED","CROCHET")',
'NAME CONTAINS ("knit","cashmere","knitted","crochet") OR LONG_DESCRIPTION CONTAINS ("knit","knitted","cashmere","crochet")',
'LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED") OR NAME CONTAINS ("KNIT","KNITTED")',
'LONG_DESCRIPTION CONTAINS ("KNIT","CROCHET") OR NAME CONTAINS ("KNIT","CROCHET")'
Ideally joined rule would be:
'(NAME CONTAINS ("sweater","polo neck","jumper") AND LONG_DESCRIPTION CONTAINS ("knit","cashmere")) OR (NAME CONTAINS ("leggings")) AND (LONG_DESCRIPTION CONTAINS ("knit")) OR NAME CONTAINS ("knit","KNITTED","CROCHET") OR LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED","CROCHET")'
Clauses where AND operators are present should stay untouched, as both parts of the clause are necessary to get products that match both parts. On the other hand, OR clauses could be summed up, if there are rules that contain the common values + more values inside the parentheses, as this rule
NAME CONTAINS ("knit","KNITTED","CROCHET") OR LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED","CROCHET")'
contains the same information as these 2 combined
'LONG_DESCRIPTION CONTAINS ("KNIT","KNITTED") OR NAME CONTAINS ("KNIT","KNITTED")',
'LONG_DESCRIPTION CONTAINS ("KNIT","CROCHET") OR NAME CONTAINS ("KNIT","CROCHET")'
Problems Im facing are:
- Nested parentheses: As this rules are set by commercial agents, sometimes they missuse parentheses in the rule builder, like declaring this:
(NAME CONTAINS ("JACKETS"))
. As you can see, these parentheses are useless, but there are occasions when nested parentheses are required, like here'(NAME CONTAINS ("sweater","polo neck","jumper") AND LONG_DESCRIPTION CONTAINS ("knit","cashmere")) OR MATERIAL EQUALS "KNIT"',
. They are needed because clauses divided by an AND operator are more restrictive that those with OR.
2.Handle rules with operators: How to maintain clauses with an AND while combining OR clauses for redundant information. It is also related to handling parentheses, but now with more logic to be applied.
Things I have tried:
-
Processing rules trough regex: Of course, this was my first approach, trying to capture different parts of the rules, and trying to apply parentheses reduction + operator logic. The problem is that keeping some parentheses while deleting others, specially to know when the substrings inside the parentheses is another clause or values (i.e: when do I have
NAME CONTAINS
(…) or (…OR NAME CONTAINS () AND
…)?) -
Parametrize: Trying to parametrize what a rule is, making a Rule class, that through regex could divide the entity into parts: Operations, Attributes, Values and Operators. But I keep having problems with how to divide it because of the parentheses.
What would be your approach to this problem?
Thanks in advance!