To my Understanding standardisation is used for robustness. However, I’m having difficulty in understanding how to apply standardisation to my data in a LASSO and KNN Regression in the cv.glmnet and knnreg functions from the glmnet and caret libraries and comparing the regression output agaisnt each other.
I first tried using standardize = TRUE
in the cv.glmnet function for the LASSO regression and as I understand this standardises the data for the regression then ‘de-standardises’ the regression output.
When I then used the knnreg function I first scaled the numerical data with scale(...)
and created numerical dummy variables after and then ran the knn regression. However, the output is presented still in it’s scaled form and as a result the two regressions betas and MSE cannot be compared.
Is there a way to either ‘de-scale’ the result from the knn regression or ‘re-scale’ the results from LASSO, or am I looking at this the wrong way all together?
TIA for any help.
##Code for CV LASSO regression
set.seed(34064064)
library(glmnet)
Hitters<-na.omit(Hitters)
x <- model.matrix(Salary~.,Hitters)[,-1]
y <- Hitters$Salary
cv<-cv.glmnet(x,y,lambda=exp(seq(-2, 4, length.out = 30)),nfolds=10,alpha=1,standardize = TRUE,type.measure = "mse")
best.lambda <- cv$lambda.min
fit <- glmnet(x, y, lambda=best.lambda, alpha=1, standardize=TRUE)
y.pred <- predict(fit, newx=x)
training.mse <- mean((y - y.pred)^2)
print(training.mse)
library(caret)
fn.split <- function(d,p=0.2) {
aux <- 1:length(d[,1])
id.test <- sort(sample(aux,size=floor(p*length(aux)),
replace=FALSE))
d.test <- d[id.test,]
d.train <- d[-id.test,]
return(list(train=d.train,test=d.test))
}
# Standardize numeric columns in train dataset
for (i in 1:ncol(train)) {
if (is.numeric(train[,i])) {
train[,i] <- (train[,i] - mean(train[,i])) / sd(train[,i])
}
}
# Standardize numeric columns in test dataset
for (i in 1:ncol(test)) {
if (is.numeric(test[,i])) {
test[,i] <- (test[,i] - mean(test[,i])) / sd(test[,i])
}
}
#Create dummy variable
train$League<-ifelse(train$League=="N",1,0)
train$NewLeague<-ifelse(train$NewLeague=="N",1,0)
train$Division<-ifelse(train$Division=="W",1,0)
test$League<-ifelse(test$League=="N",1,0)
test$NewLeague<-ifelse(test$NewLeague=="N",1,0)
test$Division<-ifelse(test$Division=="W",1,0)
knn.reg<-knnreg(Salary~.,data=train,k=20)
pred <- predict(knn.reg, newdata = test)
mse<- mean((test$Salary-pred)^2) #MSE on test data
mse
SCZZI is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.