I want to fix the betas in multi linear regression based on some data I have, which leads to a RSquare value less than 0% and greater than 100 % based on the projection approach mentioned in Tibshirani, Hastie et. all book.
What’s the best way to compute RSquare after fixing the beta values for running multi linear regression with no intercept –
Load the Data
import numpy as np
import pandas as pd
import statsmodels.api as sm
data = sm.datasets.get_rdataset('iris').data
Define x
and y
variables –
x = data.iloc[:, 1:4].values
y = data.iloc[:, 0].values
Solve for betas as per Tibshirani Book –
betas = np.linalg.solve(x.T @ x, x.T @ y)
Alternately, Fixate betas per some understanding of the environment-
alt_betas = [3.7, -10, 45.78]
Now, Compute R Squared in 3 ways –
-
Using Statsmodel with no intercept
-
Using Projection Method for R Sq
-
Using Projection Method but using the Fixated Betas for RSq
Statsmodels
sm.OLS(y, x).fit().rsquared * 100
99.61972754365206
Projection
(y @ x @ betas / (y @ y) ) * 100
99.61972754365208
Projection with fixed betas
(y @ x @ alt_betas / (y @ y) ) * 100
511.1237918393523
Now I understand it should be different given I’m using different betas, but this violates the rule that RSq should be between 0 and 1.
If I had some alternate betas, is there a way to fix it and use statsmodels OLS
to compute the R Square?
Think of it as I need to use Alternate Betas as my use case which I think is the true representation of the environment from my perspective.
Thanks in advance!