I am trying to compute the Beta of my financial data in Excel. My excess stock returns exist in one column and market returns in another. I just want to find the slope of my regression line. The two methods are:
=LINEST(I2:I254,K2:K254)
=(MMULT(MINVERSE(MMULT(TRANSPOSE(K2:K254), K2:K254)),MMULT(TRANSPOSE(K2:K254),I2:I254)))
The second option is the OLS (Ordinary Least Squares) Matrix Form
However I am getting a slight difference between the results of the two. How do I write my matrix multiplication to match the Linest
output?
The clunky method of manually inverting the matrix like that of OP’s method #2 is a badly conditioned problem requiring a much better approach that is numerically stable. The algebra says you can do it that way but in practice a numerical algorithm that aims to get the right answer in all cases has to be a lot smarter.
MINVERSE is subject to numerical instability and rounding errors that make it very difficult if not impossible to get identical answers to a well implemented linear fit least squares method. In some old versions of Excel it used to be the case that LINEST was dodgy and the line fitting in charts was excellent.
Regrettably I notice that on my copy of Excel 2021 the situation is now reversed with LINEST seemingly doing things right and the charts version badly broken. Amusingly it is only the charts linear fit that is so horribly broken – if you try to fit a second order polynomial to these data it gets the answer right!
It was considerably more accurate (although not perfect back in v2010).
This can be most easily demonstrated by the following slightly pathologocial dataset with a large additive constant on the x axis to mess up the condition number of the naive fitting matrix:
x f(x)
100000000 2
100000001 4
100000002 6
100000003 8
100000004 10
100000005 12
100000006 14
100000007 16
100000008 18
100000009 20
LINEST gets the correct answer with no problems.
This is what X-Y charts in XL2021 gets with the same data for a linear fit. Note that the “fitted” line is a mile off the actual data and with the gradient well out! Axis choice is a bit bizarre too…
However, the line fit in the charts function fails dismally (to my surprise it worked OK in some previous versions). It is pot luck whether charts works for any polynomial fit higher than order 3 although it will allow you to choose orders up to 6.
The algebra may be right but the numerical stability certainly isn’t.
For more on the shortcomings of Excel statistics see for example this page on the pros and cons of using Excel for statistics by a rival company or in more detail but behind an Elselvier paywall McCulloch (2008) or for recent improvements Melard (2014). I don’t entirely agree with his assessment there are still some notable bugs. I’d be interested to know at which version of Excel they swapped the linear fitting bugs around. It seems like they can never get both right at the same time 🙁
1
I think the matrix formula in your screenshot is based on zero-mean X
and Y
.
You need to “normalise” your own X and Y by deducting the mean.
Also, you don’t really need Transpose + MMULT since both your X and Y are one-dimension. just sum product in excel will do.
Here is the doc on LINEST
:
https://support.microsoft.com/en-us/office/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d
Method1: using COVARIANCE.P
and VAR.P
to calculate beta:
=COVARIANCE.P(A2:A7,B2:B7)/VAR.P(A2:A7)
(formula in B10)
Method2: calculating covariance and variance by hand:
=SUM((A2:A7-AVERAGE(A2:A7))*(B2:B7-AVERAGE(B2:B7)))/SUM((A2:A7-AVERAGE(A2:A7))^2)
(formula in B11):
The LINEST()
function by default includes an intercept in the regression. If you want to exclude the intercept, you need to specify it in the LINEST()
function:
=LINEST(I2:I254, K2:K254, FALSE)
revanth kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.