Looking at least-squares regression methods in various python libraries, and from their documentation and the two threads below (amongst others) is it right to say the following?
scipy.linalg.lstsq()
andsklearn
LinearRegression
are effectively the same except thatLinearRegression
preprocesses the data. To prevent it doing so there’s a parameterfit_intercept
that can be set toFalse
- The may or may not use different routines by default, depending on version, but if speed isn’t an issue (small dataset & powerful computer) this doesn’t matter
- If coefficients should be positive (which is my case),
LinearRegression
has the parameterpositive=True
, and bothscikit
andnumpy
have nnls-methods
Remaing question: Is there any significant difference between the following, f.ex penalty function? pre-processing? (my dataset is small’ish compared to many, all positive values, and coefficients ought to be positive, though first tests indicate that this occurs even when it’s not specified in the method)
scipy.optimize.nnls
sklearn.linear_model.LinearRegression(positive=True, fit_intercept=False)
np.linalg.lstsq
scipy.linalg.lstsq
What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?
Why using the scipy.nnls and the sklearn.linear_models.LinearRegression produces different results? Super Learner question
I have tried np.linalg.lstsq
, sklearn
LinearRegression(positve=True)
, and scipy.nnls
. Comparing pre-processing (standardizing with sklearn
StadardScalar
) and not, the raw unprocessed data results in more accurate results.
7