I have been trying out symbolic regression and was wondering how to use it approximate a row of Pascal’s triangle for example. I make the data with:
import math
def print_pascals_triangle_row(n):
row = []
for k in range(n + 1):
coefficient = math.comb(n, k)
row.append(coefficient)
return row
n = 20
pascals_triangle_row = print_pascals_triangle_row(n)
This gives:
[1,
20,
190,
1140,
4845,
15504,
38760,
77520,
125970,
167960,
184756,
167960,
125970,
77520,
38760,
15504,
4845,
1140,
190,
20,
1]
y = pascals_triangle_row
X = np.array(range(len(y))).reshape(-1, 1)
Set up the model:
from pysr import PySRRegressor
model = PySRRegressor(
maxsize=15,
niterations=5000, # < Increase me for better results
binary_operators=["+", "*"],
unary_operators=[
"log",
"exp",
"inv",
"square",
"sqrt",
"sign",
# ^ Custom operator (julia syntax)
],
# ^ Define operator for SymPy as well
elementwise_loss="loss(prediction, target) = (prediction - target)^2",
# ^ Custom loss function (julia syntax)
)
And finally fit the model:
model.fit(X, y)
This doesn’t give a model that is anywhere near accurate. My guess is the fundamental reason is that it can’t do x**x
which is need to approximate factorial.
Is there any way to allow symbolic regression to use the binary operator ** or otherwise fit a row of Pascal’s triangle?
21
Now that it is identified that the question was “how to add pow
” (which was crystal clear from the beginning — or “why can’t we use pow
” —. Just, since just adding pow
works on my computer, I wasn’t sure that I understood the question), and also to be able to publish some code, I put my answer here.
Adding pow
Now, for a start, that adding pow is as simple as adding “^” to binary_operators
:
import numpy as np
import math
from pysr import PySRRegressor
def print_pascals_triangle_row(n):
row = []
for k in range(n + 1):
coefficient = math.comb(n, k)
row.append(coefficient)
return row
n = 20
pascals_triangle_row = print_pascals_triangle_row(n)
y = pascals_triangle_row
X = np.array(range(0, 0+len(y))).reshape(-1, 1)
model = PySRRegressor(
maxsize=15,
niterations=5000, # < Increase me for better results
binary_operators=["+", "*", "^"],
unary_operators=[
"log",
"exp",
"inv",
"square",
"sqrt",
"sign",
# ^ Custom operator (julia syntax)
],
elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)
res = model.fit(X, y)
(I used "pow"
in my comment. But it is the exact same. I suspect that reason why it didn’t work at first for you, is because you added it to unary_operator. At least, that is what commented line suggest)
It is worth noting that, doing so, it does use ^
from times to times in intermediate result. But ends up with only solution that make no usage of ^
. Which is not that surprising, since,
-
After all,
pow
is already feasible using combination ofexp
and somemultiplication
..2^x
is alsoexp(x*0.693147)
— I don’t sayexp(x*log(2))
since in this case, the regressor would have used a constant -
The result is really not that bad with what it does, without
^
. The maximum error is 536. And 536, sure, is a lot for smally
values, but that doesn’t matter for the loss function that is interested only in raw difference. And 536 for the biggest value (which “leads” the loss function) is only 0,3%. For comparison, and implementation of20!/(n!×(20-n)!)
using Stirling formula has an error of 2323. And the reason why you wantedpow
was to allow the regressor to use Stirling formula… well, it doesn’t want to, because it found better than Stirling formula.
Changing loss function
That “-123
instead of 1
is only an error of 124, which is not so bad” maybe a good argument for the loss function, but obviously not for a human. So, to take into account that we expect error to be small in proportion of the expected value, we can change the loss function. For example
model = PySRRegressor(
maxsize=15,
niterations=5000, # < Increase me for better results
binary_operators=["+", "*"],
unary_operators=[
"log",
"exp",
"inv",
"square",
"sqrt",
"sign"
],
elementwise_loss="loss(prediction, target) = (prediction/target - 1)^2",
)
This setting leads to
[1, 18, 185, 1175, 5100, 16229, 39716, 77328, 122614, 160762, 175791, 160762, 122614, 77328, 39716, 16229, 5100, 1175, 185, 18, 1]
on my computer. Compared to target
[1, 20, 190, 1140, 4845, 15504, 38760, 77520, 125970, 167960, 184756, 167960, 125970, 77520, 38760, 15504, 4845, 1140, 190, 20, 1]
So 10% error for the 2 “18 instead of 20”. 5% or less for all the rest.
Reducing complexity
It is tempting to think “who can most can less” and let that parameter maxsize=15
. But in reality, if maxsize
is too big, that prevents exploring a lot smaller expressions. So it really help to reduce it. Besides, you can then increase the number of iteartions niterations
, since it is also faster.
Adding selfpow
We can add a custom unary operator x↦xˣ
If shouldn’t work (or at least not as intended) since doing so would be also to expect the regressor to rediscover Stirling formula. But it won’t, since it can find better result than Stirling formula.
But for the sake of exploring how it can be done, let’s do it
model = PySRRegressor(
maxsize=10,
niterations=5000, # < Increase me for better results
binary_operators=["+", "*"],
unary_operators=[
"log",
"exp",
"inv",
"square",
"sqrt",
"sign",
"selfpow(x::T) where {T} = x > 0 ? (x^x) : T(NaN)"
],
extra_sympy_mappings={'selfpow': lambda x: x**x if x>0 else np.nan},
elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)
Don’t be impressed by my handling of julia code: I don’t know julia. Just, this was almost given to me in an error message, when I first tried selfpow(x) = x^x
. It yielded an error because that function is not defined on whole ℝ (obviously it just tries some random values and catches any error raised to detect that). And that error message suggest to return NaN for invalid values. And more over gives an example of how to do so with a log example such as "log(x::T) where {T} = x > 0 ? log(x) : T(NaN)"
. So, I just cargo-culted the thing for my x^x
.
As expected, with this setting selfpow
is not used (except temporarily in intermediate results)
Adding factorial (aka cheating :D)
Likewise, we can add factorial
model = PySRRegressor(
maxsize=10,
niterations=50000, # < Increase me for better results
binary_operators=["+", "*"],
unary_operators=[
"log",
"exp",
"inv",
"square",
"sqrt",
"sign",
"fac(x) = (x>0) ? gamma(x+1) : Float32(1.0)",
],
extra_sympy_mappings={'fac':lambda x: 1},
elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)
Note that the extra_sympy_mappings
for fac
is false. But it doesn’t matter (for me), I just wanted the formula, not a sympy function. Contrarily to what the doc says, I cannot just skip the sympy mappings when I don’t care. So I put anything to shut it up, since otherwise it complains. Not that it would have been hard to code a sympy equivalent. It wouuld probably took me less time that I just took to explain why I didn’t 😀
More importantly, note that I use not factorial
but gamma
.
With factorial
I don’t get it to use it. With gamma
it does. gamma(n+1)
is the same as factorial(n)
when n
is integer.
I did try with factorial. Which forces me to round
, cast to int
, then to big
, then cast result back to Float32
. Plus test case n<0
.
But it doesn’t use it. The probable cause (I am speculating here) is that this regression is obviously supposed to work on floats. And probably does some derivative at some point to optimize at least constants. But factorial
, even once transformed to work on any (rounded) float, is a “stair” function. So its derivative is 0 almost everywhere (in the mathematical meaning of “almost”) and not computable elsewhere.
So, I use gamma
which is the real version of factorial
(the complex, even, but I am interested only by real).
And with this setting (with smaller maxsize
and bigger niterations
as you can see) I do get the expected result
What is funny, is that I do get a 20!/(20-n)!/n!
(once translated).
But that is not the best result.
The best is
(20! / (n!*753640) / (20-n)!)*753640
And in between some other strange way to say 20!/n!/(20-n)!
, like sqr(sqrt(...)))
and even a (20!/n!/(20-n)!)*1
All with a loss of 0, to the numerical error (some 10⁻⁵, which, since as you see, I am using your original loss, that is the “reference” are the raw values. So 10⁻⁵ is very small), except for the last one, that has a loss of exactly 0.
So, obviously, it finds the expected result, and then plays with condition number (even if I am sure that there is no explicit code in the regressor about that. I am just using the everyday abuse of supposing intentions to AI) to find some formula that reduce the numerical error.