I am working in R with an SVM model from the e1071 library, and my question is whether I can identify those samples that the model has misclassified and are farthest from the SVM hyperplane.
My code is as follows:
best_model <- list(
model = svm(x = omicDataReduced[, -which(colnames(omicDataReduced) == classVariable)],
y = omicDataReduced[[classVariable]],
kernel = best_kernel,
cost = best_C,
probability = TRUE,
decision.values = TRUE),
error = best_error,
cost = best_C,
kernel = best_kernel
)
My dataset has 790 samples and 18,710 predictors.
Edit:
I have discovered that with the ‘decision.values’ attribute, you can get the distance to the hyperplane. My question now is: How can I check that the samples are beyond the margins?
Here’s the code I have so far:
pred <- predict(best_model$model, omicDataReduced[, -which(colnames(omicDataReduced) == classVariable)], decision.value=T)
decision_values <- attr(pred, "decision.values")
misclassified_indices <- which(as.numeric(pred) != as.numeric(omicDataReduced[[classVariable]]))
misclassified_decision_values <- abs(decision_values[misclassified_indices])
selected_indices <- misclassified_indices[order(misclassified_decision_values, decreasing = TRUE)][1:round(diagnosticChangeProbability * length(misclassified_indices))]
4