Let ( mathbf{X} in mathbb{R}^{n times d} ) be the matrix of regressors, and let ( mathbf{Y} in mathbb{R}^n ) be the response vector. Consider the linear regression model with parameter vector ( mathbf{b} in mathbb{R}^d ). Find the gradient and Hessian of the least squares (LS) functional with respect to the parameter vector ( mathbf{b} ).
linear regression model:
[
mathbf{Y} = mathbf{Xb} + mathbf{epsilon}
]
where ( mathbf{Y} in mathbb{R}^n ) is the response vector, ( mathbf{X} in mathbb{R}^{n times d} ) is the matrix of regressors, ( mathbf{b} in mathbb{R}^d ) is the parameter vector, and ( mathbf{epsilon} ) is the vector of errors.
I’m stuck. maybe
The gradient with respect to ( mathbf{b} ) is: [ nabla_{mathbf{b}} J(mathbf{b}) = nabla_{mathbf{b}} left( frac{1}{2} (mathbf{Y} – mathbf{Xb})^top (mathbf{Y} – mathbf{Xb}) right) ] Simplifying the expression for ( J(mathbf{b}) ): [ J(mathbf{b}) = frac{1}{2} (mathbf{Y}^top mathbf{Y} – 2 mathbf{Y}^top mathbf{Xb} + mathbf{b}^top mathbf{X}^top mathbf{Xb}) ] The gradient is then: [ nabla_{mathbf{b}} J(mathbf{b}) = -mathbf{X}^top mathbf{Y} + mathbf{X}^top mathbf{Xb} ]