Basic Goal
Training some machine learning model by pulling feature from a database and need to persist the model (tensorflow protobuf format) in Google cloud storage buckets for later serving. This is the very simple architecture.
Library Dependency
I am using Cloud storage fuse for interfacing with the Google cloud storage bucket and containerising the application in a docker image. This is how the relevant lines in the Dockerfile
look like.
# The base image
ARG DISTRIBUTION=debian
FROM $DISTRIBUTION
...
# Install cloud storage fuse
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt $GCSFUSE_REPO main" | tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.asc
apt-get --assume-yes update
apt-get --assume-yes --no-install-recommends install gcsfuse
All good, except, the container has to run with the privileged
flag, which
- Some runtimes may disallow, e.g. if I am using kubernetes?
- I understand to be a security risk, although not sure why?
Question
But I want the gcsfuse
as it provides a clean local-filesystem like interface to cloud, almost letting me forget that I am persisting the files in cloud, also makes it easy to swap out with different filesystems in future if necessary, providing ideal interface segregation.
So what’s the way out? Is there any way to use to gcsfuse
without the privileged mode, or somehow giving the container just enough permission to interface with the cloud storage?