I have designed a system for experiment design. Essentially, the user defines a list of properties with a Min, Max and Nominal value. Something like below where the tuple is min, max, nom. There are some other properties of these objects as well but this is the bare minimum to keep it simple.
X = (0, 50, 42.5)
Y = (-50, 0, -42.5)
Z = (-50, 50, 0)
A = (-50, 50, 2.5)
...
From here I can call a variety of functions such as Min(), Max(), Nom(), Random()
etc to get a dictionary of {X:float, Y:float, Z:float, ...}
that I can use elsewhere in the code base.
The user can also optionally define constraints, further limiting the values as defined. These are simple expressions where each side supports the standard operations +, -, *, /, ^
and the comparison operators can be in the set =, !=, <, <=, >, >=
while values can be a variable reference or a number.
X >= Y + 5
Z < X - 10
Z >= Y
A < X - .1
A > Y + .1
...
I have already used ply.lex to write a simple lexer for the constraint expressions and can evaluate the expressions to be true or false based on provided dictionaries. A constraint expression is one side of the above expressions, such as (Y+0.1) or (X-10)
tokens = ['NUM', 'PLUS', 'MINUS', 'MULT', 'DIV', 'EXP', 'LPAREN', 'RPAREN', 'SOURCE']
t_PLUS = r'+'
t_MINUS = r'-'
t_MULT = r'*'
t_DIV = r'/'
t_EXP = r'^'
t_LPAREN = r'('
t_RPAREN = r')'
t_SOURCE = r'[a-zA-Z]+[a-zA-Z0-9_]*'
def t_NUM(t):
r'-?[d.]+(?:e-?d+)?'
t.value = float(t.value)
return t
t_ignore = ' t[]'
def t_error(t):
print(f'Illegal character {t.value[0]}')
t.lexer.skip(1)
lexer = lex.lex()
class ConstraintExpression:
input_str = None
tokens = None
def __init__(self, expr):
self.input_str = str(expr)
lexer.input(self.input_str)
self.tokens = [x for x in lexer]
def evaluate(self, values):
if len(self.tokens) == 1:
return float(self.get_value(self.tokens[0], values))
postfix_tokens = infix_2_postfix(self.tokens)
stack = []
for token in postfix_tokens:
if token.type == "NUM" or token.type == "SOURCE":
stack.append(token)
else:
val1 = self.get_value(stack.pop(), values)
val2 = self.get_value(stack.pop(), values)
stack.append(str(eval(str(val2) + token.value + str(val1))))
return float(stack.pop())
This is all fine and well.
The problem I am running into is there is a requirement of the larger system to support sweeping and DOEs, not just randomization. To that end I need to be able to “solve” the constraints so to speak to figure out what the coerced min/max would be given a set of values for other properties.
Example: Everything but X is nom and I want to sweep X in 5 steps (6 points), currently that iterator is something like linspace(0, 50, 6)
which would return array([ 0., 10., 20., 30., 40., 50.])
however, the step X=0 violates the constraint of A<X-0.1
as A=2.5
. Logically I can see I would need to figure out X>A+0.1
meaning the real MIN for this sweep is 2.6 so it would really be linspace(2.6, 50, 6)
-> array([ 2.6 , 12.08, 21.56, 31.04, 40.52, 50. ])
. This also violates a constraint against Z so this would be an iterative process (I have thought about but don’t have a solution for cyclic dependencies if anyone has thoughts on that.)
This problem gets even worse when we start talking about DOEs where each variable is starts at MIN and sweeps to MAX in STEPS, but it’s a full cross product of proposed values. Many of these (often over 50% are invalid conditions).
X = (0, 10, 20, 30, 40, 50)
Y = (-50, -40, -30, -20, -10, 0)
Z = (-50, -25, 0, 25, 50)
A = (-50, -40, -30, -20, -10, 0, 10, 20, 30, 40, 50)
DOE1 = X=0, Y=-50, Z=-50, A=-50
DOE2 = **X=10**, Y=-50, Z=-50, A=-50
...
DOE6 = **X=50**, Y=-50, Z=-50, A=-50
DOE7 = X=0, **Y=-40**, Z=-50, A=-50
DOE8 = **X=10**, Y=-40, Z=-50, A=-50
...
I have started exploring SymPy which seems to be able to do this, but feel like if I go down that route, I need to toss what I’ve already done with the lexer out the window and fully commit to using symbolic formulas over lexing/parsing (I’m OK with this just want to make sure this is the right approach). Something like this would allow me to dynamically create, substitute, simplify, and extract the “new” min/max from these equations.
>>> x, y, z, a = symbols("x y z a")
>>> c = [x>=y+1, z<x-10, z>=y, a<x-0.1, a>y+0.1]
>>> c
[x >= y + 1, z < x - 10, z >= y, a < x - 0.1, a > y + 0.1]
>>> c[0].subs(y, -42.5)
x >= -41.5
>>> c[1].subs(z, 0)
0 < x - 10
>>> simplify(c[1].subs(z, 0))
x > 10
>>> simplify(c[1].subs(z, 0)).gts
x
>>> simplify(c[1].subs(z, 0)).lts
10
The refactoring effort would be minimal as I would only need to change how the ConstaintExpression
is structured (a Constraint
is just two ConstraintExpression
with a Op
) so instead of a Constraint expression being a token backed eval()
thing it would be an SymPy backed expression thing.
>>> c1 = x
>>> c2 = y+1
>>> c3 = z
>>> c4 = x-10
>>> c1 >= c2
x >= y + 1
>>> c3 < c4
z < x - 10
>>> simplify((c1 >= c2).subs(y, -42.5))
x >= -41.5
>>> simplify((c3 < c4).subs(z, 0))
x > 10
Thoughts on this are very appreciated as are alternatives as I’ve been banging my head against this problem for a few weeks now without making headway. I am using Python 3.10.11
For simple algebraic relationships, AccumBounds
can be substituted in for expressions and you can check to see whether they have all been satisfied. If not, you can decide which variable you want to change and omit that during the substitution and solve the system of inequalities for that variable.
from sympy import *
X = ('X',0, 50, 42.5)
Y = ('Y',-50, 0, -42.5)
Z = ('Z',-50, 50, 0)
A = ('A',-50, 50, 2.5)
reps = {Symbol(i[0]): AccumBounds(i[1],i[2]) for i in (X,Y,Z,A)}
v = var('X Y Z A')
conditions = Tuple(X >= Y + 5, Z < X - 10, Z >= Y, A < X - 0.1, A > Y + 0.1)
nonneg = Tuple(*[i.gts - i.lts for i in conditions])
>>> any(i.max<0 for i in nonneg.subs(reps))
False
# let's change Y so it gives an invalid condition
>>> reps[Y]=AccumBounds(50,100)
>>> any(i.max<0 for i in nonneg.subs(reps))
True
And now let’s see what the critical range is for Y
:
>>> Tuple(*[i for i in nonneg.subs({i:j for i,j in reps.items() if i!=Y}) if i.has(Y)])
(-Y + AccumBounds(-5, 45), -Y + AccumBounds(-50, 50), -Y + AccumBounds(-50.1, 49.9))
>>> _.replace(lambda x:x.is_Float, lambda x:Rational(str(x)))
(-Y + AccumBounds(-5, 45), -Y + AccumBounds(-50, 50), -Y + AccumBounds(-501/10, 499/10))
>>> [solve(i)[0] for i in _] # pretend the expressions are less trivial
[AccumBounds(-5, 45), AccumBounds(-50, 50), AccumBounds(-501/10, 499/10)]
>>> Intersection(*[Interval(i.min,i.max) for i in _])
Interval(-5, 45)
So Y
must be in that interval.
It’s an interesting problem, especially the consideration about multiply variables being out of range and whether an iterative refinement could resolve that.
This feels a bit like the linear optimization problem. I wonder if, for any variable with a negative in its range, you replace it with a dummy x
such that x = variable + min
. Then you would have symbols that must all be nonnegative and could attempt to minimize the sum of the symbols under the given constraints. That’s pretty unchartered waters for me…it’s just an idea.
from sympy.solvers.simplex import lpmin
>>> lpmin(sum(v),[i.gts>=i.lts for i in conditions]+[i>=reps[i].min for
i in v]+[i<=reps[i].max for i in v])
(-149.900000000000, {A: -49.9000000000000, X: 0, Y: -50, Z: -50})
>>> lpmax(sum(v),[i.gts>=i.lts for i in conditions]+[i>=reps[i].min for
i in v]+[i<=reps[i].max for i in v])
(139.900000000000, {A: 49.9000000000000, X: 50, Y: 0, Z: 40})
Notice how A
and Z
ranges have been modified to maximize or minimize the sum of the variables.