What is fundamentally different about the tensorflow/tensorflow images vs the nvidia/cuda docker images from the perspective of GPU support? I don’t care about e.g. the Python stuff. For example, I would have thought the nvidia image already includes CUDA and cuDNN, so why does the tensorflow image appear to install it too? I’m asking so I can direct users how to use my library outside a Docker container, or make my own minimal image.
More context
I want to work out what dependencies I need to run the XLA CUDA PJRT plugin. I can get it working with the tensorflow/tensorflow:latest-gpu
image on Docker Hub, which uses this Dockerfile. That Dockerfile is based off the nvidia/cuda:12.3.0-base-ubuntu22.04
image. My workflow fails with that container, with error Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
. It also fails for the runtime
and devel
versions of that image.