My goal is to perform regression over a set of data coming from the “real world” (sensors).
The data is in tabular format. There are 6 independent features with very different values (scaling is necessary). Additionally, the dependent variable has a large variability (potential scaling needed?).
An initial classical training shows weaknesses in the prediction when one of the variables (Let’s say X1) is low in absolute value.
Experts tell me that the value to be predicted (let’s call it Y) in this specific region where X1 is low can be approximated by a linear regression. So if all other features are constant, X1 and Y have a linear dependency of the type Y= a * X1 + b.
The problem is that coefficients “a” and “b” depends on the other features a = f(X2,X3,X4,X5)…
Note that I have a table of the coefficient “a” and “b” for several combinations of the 5 other features.
I would like to integrate the linearization that is “physics informed” into the training process. But how can I do that? I had a look on Physics Informed Neural Network, but they are oriented for PDE only and not closed form equation as I have.
For me, a natural thing to do would be to generate fake data in this area through the equation. Would it be considered as Physics Informed Machine Learning ? I won’t see the difference between adding fake data, and adding a loss which would try to fullfill the equation.
Many thanks for your answer,
Have a great day !