I am trying to understand linear regression using maximum likelihood approach. According to Bishop’s Patterns Recognition, a target variable y
can be modeled by introducing an error term e
that is normally distributed with mean 0 and some variance. Then the relation is given as y=h(x,w)+ e
.
I don’t understand what e
actually means? Also, how does the training data (x1,y1)….(xn,yn) fit in this equation?