How to use Symbolic Regression to approximate Pascal’s triangle?

I have been trying out symbolic regression and was wondering how to use it approximate a row of Pascal’s triangle for example. I make the data with:

import math
 
def print_pascals_triangle_row(n):
    row = []
    for k in range(n + 1):
       coefficient = math.comb(n, k)
       row.append(coefficient)
    return row
 
n = 20
pascals_triangle_row = print_pascals_triangle_row(n)

This gives:

[1,
 20,
 190,
 1140,
 4845,
 15504,
 38760,
 77520,
 125970,
 167960,
 184756,
 167960,
 125970,
 77520,
 38760,
 15504,
 4845,
 1140,
 190,
 20,
 1]

y = pascals_triangle_row
X = np.array(range(len(y))).reshape(-1, 1)

Set up the model:

from pysr import PySRRegressor
model = PySRRegressor(
    maxsize=15,
    niterations=5000,  # < Increase me for better results
    binary_operators=["+", "*"],
    unary_operators=[
        "log",
        "exp",
         "inv",
         "square",
        "sqrt",
        "sign",
        # ^ Custom operator (julia syntax)
    ],
    # ^ Define operator for SymPy as well
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
    # ^ Custom loss function (julia syntax)
)

And finally fit the model:

model.fit(X, y)

This doesn’t give a model that is anywhere near accurate. My guess is the fundamental reason is that it can’t do x**x which is need to approximate factorial.

Is there any way to allow symbolic regression to use the binary operator ** or otherwise fit a row of Pascal’s triangle?

Now that it is identified that the question was “how to add pow” (which was crystal clear from the beginning — or “why can’t we use pow” —. Just, since just adding pow works on my computer, I wasn’t sure that I understood the question), and also to be able to publish some code, I put my answer here.

Adding `pow`

Now, for a start, that adding pow is as simple as adding “^” to binary_operators:

import numpy as np
import math
from pysr import PySRRegressor

def print_pascals_triangle_row(n):
    row = []
    for k in range(n + 1):
       coefficient = math.comb(n, k)
       row.append(coefficient)
    return row

n = 20
pascals_triangle_row = print_pascals_triangle_row(n)
y = pascals_triangle_row
X = np.array(range(0, 0+len(y))).reshape(-1, 1)

model = PySRRegressor(
    maxsize=15,
    niterations=5000,  # < Increase me for better results
    binary_operators=["+", "*", "^"],
    unary_operators=[
        "log",
        "exp",
        "inv",
        "square",
        "sqrt",
        "sign",
        # ^ Custom operator (julia syntax)
    ],
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)

res = model.fit(X, y)

(I used "pow" in my comment. But it is the exact same. I suspect that reason why it didn’t work at first for you, is because you added it to unary_operator. At least, that is what commented line suggest)

It is worth noting that, doing so, it does use ^ from times to times in intermediate result. But ends up with only solution that make no usage of ^. Which is not that surprising, since,

After all, pow is already feasible using combination of exp and some multiplication.. 2^x is also exp(x*0.693147) — I don’t say exp(x*log(2)) since in this case, the regressor would have used a constant
The result is really not that bad with what it does, without ^. The maximum error is 536. And 536, sure, is a lot for small y values, but that doesn’t matter for the loss function that is interested only in raw difference. And 536 for the biggest value (which “leads” the loss function) is only 0,3%. For comparison, and implementation of 20!/(n!×(20-n)!) using Stirling formula has an error of 2323. And the reason why you wanted pow was to allow the regressor to use Stirling formula… well, it doesn’t want to, because it found better than Stirling formula.

Changing loss function

That “-123 instead of 1 is only an error of 124, which is not so bad” maybe a good argument for the loss function, but obviously not for a human. So, to take into account that we expect error to be small in proportion of the expected value, we can change the loss function. For example

model = PySRRegressor(
    maxsize=15,
    niterations=5000,  # < Increase me for better results
    binary_operators=["+", "*"],
    unary_operators=[
        "log",
        "exp",
        "inv",
        "square",
        "sqrt",
        "sign"    
    ],
    elementwise_loss="loss(prediction, target) = (prediction/target - 1)^2",
)

This setting leads to
[1, 18, 185, 1175, 5100, 16229, 39716, 77328, 122614, 160762, 175791, 160762, 122614, 77328, 39716, 16229, 5100, 1175, 185, 18, 1]
on my computer. Compared to target
[1, 20, 190, 1140, 4845, 15504, 38760, 77520, 125970, 167960, 184756, 167960, 125970, 77520, 38760, 15504, 4845, 1140, 190, 20, 1]

So 10% error for the 2 “18 instead of 20”. 5% or less for all the rest.

Reducing complexity

It is tempting to think “who can most can less” and let that parameter maxsize=15. But in reality, if maxsize is too big, that prevents exploring a lot smaller expressions. So it really help to reduce it. Besides, you can then increase the number of iteartions niterations, since it is also faster.

Adding `selfpow`

We can add a custom unary operator x↦xˣ
If shouldn’t work (or at least not as intended) since doing so would be also to expect the regressor to rediscover Stirling formula. But it won’t, since it can find better result than Stirling formula.

But for the sake of exploring how it can be done, let’s do it

model = PySRRegressor(
    maxsize=10,
    niterations=5000,  # < Increase me for better results
    binary_operators=["+", "*"],
    unary_operators=[
        "log",
        "exp",
        "inv",
        "square",
        "sqrt",
        "sign",
        "selfpow(x::T) where {T} = x > 0 ? (x^x) : T(NaN)"
    ],
    extra_sympy_mappings={'selfpow': lambda x: x**x if x>0 else np.nan},
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)

Don’t be impressed by my handling of julia code: I don’t know julia. Just, this was almost given to me in an error message, when I first tried selfpow(x) = x^x. It yielded an error because that function is not defined on whole ℝ (obviously it just tries some random values and catches any error raised to detect that). And that error message suggest to return NaN for invalid values. And more over gives an example of how to do so with a log example such as "log(x::T) where {T} = x > 0 ? log(x) : T(NaN)". So, I just cargo-culted the thing for my x^x.

As expected, with this setting selfpow is not used (except temporarily in intermediate results)

Adding factorial (aka cheating :D)

Likewise, we can add factorial

model = PySRRegressor(
    maxsize=10,
    niterations=50000,  # < Increase me for better results
    binary_operators=["+", "*"],
    unary_operators=[
        "log",
        "exp",
        "inv",
        "square",
        "sqrt",
        "sign",
        "fac(x) = (x>0) ? gamma(x+1) : Float32(1.0)",
    ],
    extra_sympy_mappings={'fac':lambda x: 1},
    elementwise_loss="loss(prediction, target) = (prediction - target)^2",
)

Note that the extra_sympy_mappings for fac is false. But it doesn’t matter (for me), I just wanted the formula, not a sympy function. Contrarily to what the doc says, I cannot just skip the sympy mappings when I don’t care. So I put anything to shut it up, since otherwise it complains. Not that it would have been hard to code a sympy equivalent. It wouuld probably took me less time that I just took to explain why I didn’t 😀

More importantly, note that I use not factorial but gamma.

With factorial I don’t get it to use it. With gamma it does. gamma(n+1) is the same as factorial(n) when n is integer.

I did try with factorial. Which forces me to round, cast to int, then to big, then cast result back to Float32. Plus test case n<0.
But it doesn’t use it. The probable cause (I am speculating here) is that this regression is obviously supposed to work on floats. And probably does some derivative at some point to optimize at least constants. But factorial, even once transformed to work on any (rounded) float, is a “stair” function. So its derivative is 0 almost everywhere (in the mathematical meaning of “almost”) and not computable elsewhere.

So, I use gamma which is the real version of factorial (the complex, even, but I am interested only by real).

And with this setting (with smaller maxsize and bigger niterations as you can see) I do get the expected result

What is funny, is that I do get a 20!/(20-n)!/n! (once translated).
But that is not the best result.
The best is
(20! / (n!*753640) / (20-n)!)*753640

And in between some other strange way to say 20!/n!/(20-n)!, like sqr(sqrt(...))) and even a (20!/n!/(20-n)!)*1

All with a loss of 0, to the numerical error (some 10⁻⁵, which, since as you see, I am using your original loss, that is the “reference” are the raw values. So 10⁻⁵ is very small), except for the last one, that has a loss of exactly 0.

So, obviously, it finds the expected result, and then plays with condition number (even if I am sure that there is no explicit code in the regressor about that. I am just using the everyday abuse of supposing intentions to AI) to find some formula that reduce the numerical error.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 07:02

Thẻ: pythonmathregression

Thiết kế website giá rẻ

Danh mục

How to use Symbolic Regression to approximate Pascal’s triangle?

Adding `pow`

Changing loss function

Reducing complexity

Adding `selfpow`

Adding factorial (aka cheating :D)

Danh mục

Adding pow

Changing loss function

Reducing complexity

Adding selfpow

Adding factorial (aka cheating :D)

Adding `pow`

Adding `selfpow`