I’m trying to run some models on HPC GPUs using CmdStan and OpenCL. I’m using this Docker image to get everything available just right.
Just to keep things simple, I’m using the same stan model from the tutorial and this R code, also from the tutorial
library(cmdstanr)
n <- 250000
k <- 20
X <- matrix(rnorm(n * k), ncol = k)
y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
mdata <- list(k = k, n = n, y = y, X = X)
options(
cmdstanr_verbose = TRUE
)
mod_cl <- cmdstan_model("bernoulli_logit_glm.stan",
cpp_options = list(stan_opencl = TRUE))
fit_cl <- mod_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
opencl_ids = c(0, 0), refresh = 0)
The compiling seems to happen with no problems, but when the sampling begins, I get the following error for every chain:
Chain 1 Unrecoverable error evaluating the log probability at the initial value.
Chain 1 Exception: compile_kernel: calculate : Unknown error -11 (in '/tmp/RtmpFWYvx3/model-187133cf9b179.stan', line 14, column 2 to column 57)
Warning: Chain 1 finished unexpectedly!
Any help would be appreciated.
Additional details that might not be relevant.
- The cluster I’m on has singularity available, so my job file looks like:
singularity run --nv ~/stan-opencl_latest.sif Rscript fit_model.R
-
Using the image this way results in a read-only file system, and since I couldn’t figure out docker volumes, I’ve copied the contents of
/usr/share/.cmdstan
to my writable file system, re-built it, and haveset_cmdstan_path(CMDSTANPATH)
in my R script. This has worked fine for any non-gpu models I’ve tried fitting. -
When I run
clinfo --list
in a job send to the cluster, I get back
Platform #0: NVIDIA CUDA
`-- Device #0: Tesla P100-PCIE-12GB