When I launch a docker container with sysbox runtime with a render node by sharing –device=/dev/dri/renderD128, sysbox-fs logs go crazy and it goes high CPU usage. I enabled logs and I see this
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 03:05:54" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 03:05:54" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
If I restart sysbox-fs service, this issue goes away temporarily on deployed containers (unable to docker exec the running containers afterwards), but if I deploy a new container, this issue again starts while sharing devices or somewhere else (?).
Any reason what causes the infinite loop of /run/systemd/mount-rootfs/sys/devices/virtual
unmount call that goes away when sysbox-fs is restarted?
Log File: sysbox-fs.log
(After some researching…)
I can see a lot of "Received umount syscall from pid 1092497"
for different targets, and they seem to go perfectly. I just searched for the first occurrence of umount
in the log file, tracing every umount call.
time="2024-06-01 02:57:40" level=debug msg="Received umount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="target: /sys/fs/cgroup/unified, flags: 0x8, root: /, cwd: /var/labsdata"
time="2024-06-01 02:57:40" level=debug msg="Received mount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="source: cgroup2, target: /sys/fs/cgroup/unified, fstype: cgroup2, flags: 0xe, data: , root: /, cwd: /var/labsdata"
time="2024-06-01 02:57:40" level=debug msg="Received umount syscall from pid 1092497"
time="2024-06-01 02:57:40" level=debug msg="target: /sys/fs/cgroup/unified, flags: 0x8, root: /, cwd: /var/labsdata"
From line 7379 of log file we can see the first occurance of umount call to /run/systemd/mount-rootfs/sys/devices/virtual
that gets ignored, and from then on its just an infinite loop, for every container I deploy with a device, this just adds up and the log file is full of this messages, I have to turn off the debug log else its consuming lotta storage. This just don’t stop, only if I pass the --device=/dev/dri/renderD128
, and with the little knowledge I have, I am able to understand this infinite umount calls should be related to this device I passed, somehow causing an infinite loop.
time="2024-06-01 02:58:30" level=debug msg="Received umount syscall from pid 1098145"
time="2024-06-01 02:58:30" level=debug msg="target: /run/systemd/mount-rootfs/sys/devices/virtual, flags: 0x8, root: /, cwd: /"
time="2024-06-01 02:58:30" level=debug msg="Requested ReadDirAll() on directory /sys/kernel/mm/hugepages (req ID=0x1454)"
time="2024-06-01 02:58:30" level=debug msg="Executing ReadDirAll() for req-id: 0x1454, handler: SysKernel, resource: hugepages"
time="2024-06-01 02:58:30" level=debug msg="Ignoring unmount of sysbox-fs managed submount at /run/systemd/mount-rootfs/sys/devices/virtual"
I went through the code located at https://github.com/nestybox/sysbox-fs/blob/master/nsenter/utils.go – this file has a potential possibility to go on a cleanup loop that could repeatedly send unmount calls, that later gets ignored by seccomp, as shown in the log, from here: https://github.com/nestybox/sysbox-fs/blob/4c2bc153f33af1bd30a227a14ecfc8174ff280d5/seccomp/umount.go#L128
Can we skip these devices from unmounting that are for sure going to get ignored by seccomp thus saving lot of CPU? Is my understanding of whats going on is correct? If so, how to solve this issue?
PS: I also raised an issue in the official repo: https://github.com/nestybox/sysbox/issues/808, but I am still asking here, because I am willing to fix this issue myself, its been 3 days and no one answered there.