An example in the documentation for scikit-learn Quantile Regression shows an example where a parameter alpha
is set to zero. The default is 1.
The documentation for QuantileRegressor
shows the default value set to 1.0. It states that this is a regularization constant that multiples an L1 penalty term.
I do not have an intuitive understanding of what the Lasso is, or what L1 regression means exactly.
Is there an intuitive explanation for how the parameter alpha relates to these things?
There is a wikipedia article related to Quantile Regression which is quite detailed. Scanning through this, it looks like alpha is lambda in the section Choice of Regularization Parameter. It may also be refered to as t elsewhere.
My intuition here could be wrong.
My conclusion thus far is that alpha probably only has an effect in multiple dimension (> 1) regression problems, and it might be used to select a subset of dimensions, that is, the most significant dimensions which have the most statistically predictive power?
$endgroup$
1
I would recommend you to read and understand simple L1/L2 loss – then it should be clear; you simply jus add another term yo your loss function.
Say you have a loss-function like:
loss = F()
where F()
is any loss-function. Now, if you have some training-data you can overfit your data rather easily, if you allow the parameters of your model to be as big (or small) as they want, to fit a perfect “curve”.
Now, if you restric your models parameter by saying “your value must be between -5 and 5”, then you limit the potential of overfitting.
A way to do that is simply to add some term to your loss function
loss_new = F() + alpha * G(parameters)
where G
is some function that calculates the size of the parameters which could for instance be the two-norm (the length of your parameter vector)
loss_new = F() + alpha * np.linalg.norm(parameters) #
If you still want to minimize this loss-function you have to find a compromise between a small loss and small norm. If you alpha
is very large then G()
must be very small thus you can control the “complexity” of the model my having a smaller/larger alpha.
$endgroup$
2