I’m calling XGBoost using the Java library XGBoost4J
on a dataset of ~9 million rows to do a binary classification problem in which 95% of the ~9 millions rows are the “negative control” but I’m interesting in predicting the “positive control” (and in particular, I’m interested in maximizing the number of predicted positive controls at a 1% false discovery rate). However, I noticed that about only 1 or at most 2 cpu cores are being used. Is there a way that I can use all my other cores? (I have about 300 more cores on the cloud VM I’m using!) (The parameters I’m using are below.)
final Map<String, Object> paramMap = new HashMap<String, Object>();
paramMap.put("eta", "0.005");
paramMap.put("verbosity", "1");
paramMap.put("objective", "binary:logistic");
paramMap.put("eval_metric", "aucpr");
paramMap.put("maximize_evaluation_metrics", "true");
paramMap.put("subsample", "0.9");
paramMap.put("colsample_bytree", "0.9");
paramMap.put("seed", "0");
(Note: I don’t have any GPUs on this VM. But, if it can’t parallelize accross CPU cores, I’m guessing it won’t do any better on a GPU….)