Looking at least-squares regression methods in various python libraries, and from their documentation and the two threads below (amongst others) is it right to say the following?
- scipy.linalg.lstsq() and sklearn LinearRegression are effetivly the same except that LinearRegression preprocesses the data. To prevent it doing so there’s a parameter “fit_intercept” that can be set to False
- The may or may not use different routines by default, depending on version, but if speed isn’t an issue (small dataset & powerful computer) this doesn’t matter
- If coefficients should be positive (which is my case), LinearRegression has the parameter “positve=True”, and both scikit and numpy have nnls-methods
Remaing question: Is there any significant difference between the following, f.ex penalty function? pre-processing? (my dataset is small’ish compared to many, all positive values, and coefficients ought to be positive, though first tests indicate that this occurs even when it’s not specified in the method)
-
np.nnls
-
scipy-nnl
-
sclearn.LinearRegression(positive=True, fit_intercept=False)
-
np.linalg.lstsq
-
scipy.linal.lstsq
What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?
Why using the scipy.nnls and the sklearn.linear_models.LinearRegression produces different results? Super Learner question
I have tried np.linalg.lstsq, sklearn LinearRegression(positve=True), and scipy.nnls. Comparing pre-processing (standardizing with sklearn StadardScalar) and not, the raw unprocessed data results in more accurate results.
drone no.204 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.