I’ve trained a BART model on a binomial classification task. Among the predictors is a factor with 18 levels. They are as follows:
levels(poke$Type.1)
[1] "Bug" "Dark" "Dragon" "Electric" "Fairy" "Fighting"
[7] "Fire" "Flying" "Ghost" "Grass" "Ground" "Ice"
[13] "Normal" "Poison" "Psychic" "Rock" "Steel" "Water"
When I read the BART object, specifically the varcount.mean
(an array), to determine variable importance, I get the following difficult-to-interpret factor names (note the Type ones):
bart1$varcount.mean
Type.11 Type.12 Type.13 Type.14 Type.15 Type.16
1.930 1.825 1.782 1.804 1.983 1.864
Type.17 Type.18 Type.19 Type.110 Type.111 Type.112
1.913 0.000 1.950 1.977 1.983 1.987
Type.113 Type.114 Type.115 Type.116 Type.117 Type.118
2.105 2.004 1.871 2.102 1.906 2.342
Total HP Attack Defense Sp..Atk Sp..Def
4.400 2.266 2.415 2.175 2.508 2.652
Speed Generation
2.711 2.281
My question is – how would you solve this issue, short of renaming each row manually? Is there an argument I can pass to bart
, or a convenient function I can use to rename the output rows in the bart object?
Here is the program I have written:
rm(list=ls())
graphics.off
library(BART)
library(tidyr)
library(dplyr)
library(ggplot2)
library(ROCR)
pokeB<-read.csv("~/Downloads/Pokemon.csv", header=T)
Legend<-vector(length=800) %>% rep(0, 800)
pokeB<-data.frame(pokeB, Legend)
pokeB$Legend<-as.integer(
ifelse(pokeB$Legendary=="True","1","0")
)
poke<-pokeB %>% select(Type.1,Total,HP,Attack,Defense,
Sp..Atk,Sp..Def,Speed,Generation,
Legend)
poke$Type.1<-as.factor(poke$Type.1)
set.seed(1)
train<-sample(1:nrow(poke), nrow(poke)/2)
x<-poke %>% select(-Legend)
y<-poke[,"Legend"]
xtrain<-x[train,]
ytrain<-y[train]
xtest<-x[-train,]
ytest<-y[-train]
bart1<-mc.gbart(xtrain, ytrain, x.test=xtest, type='pbart',
mc.cores=4)
ord1<-order(bart1$varcount.mean,decreasing=T)
vars1<-as.data.frame(bart1$varcount.mean[ord1])
pred1<-ifelse(bart1$prob.test.mean>0.5, 1, 0)
tab1<-table(ytest, as.factor(pred1))
To see the messy factor names output in the bart object, just view vars1
. My goal is to either have the output of the bart object preserve original factor level names, or to edit the vars1
dataframe to restore original level names in a convenient manner (i.e. not going manually row-by-row).
Thank you for reading