Linux LSOF is causing 100% CPU load inside a container in some configurations
Daniel Nashed – 2 October 2024 07:10:53
This hit me a couple of times and now I found the root cause.
I am still thinking about how to best solve it for everyone. But here are the current facts.
LSOF is the Linux tool to list open files and is used by NSD.
It runs into issues when the security limits allow too many open files.
Background
The Linux security limits prevent processes to use too much resources.
Traditionally the security limits for number of open files was set quite low to 1024.
But some applications like Domino need a lot of open files because each lib, each open file and each network connection is a file under Linux.
The recommended number of maximal open files for the Domino user "notes" is 80000.
Usually you set this in /etc/security/limits.conf.
The container image sets this limit correctly. But it turns out that Docker or better containerd sets a much higher limit in some cases.
If Domino would switch the user like on a normal server, the security limits would be enforced and set to 80000.
But because the user the container is started with is already the unprivileged user "notes", the user is not switched.
Therefore the security limits are not changed by the Linux container.
Usually security limits for number of files should look like this by default:
ulimit -n
1048576
In case of certain Docker / containerd configurations the container runs with a 1024 times higher limit:
ulimit -n
1073741816
This causes the high CPU usage with LSOF.
Solution
A first solution would be to explicitly set the number of open files when starting the container.
docker run --rm -it --ulimit nofile=1048576:1048576 ubuntu bash
But probably it would be a better way to set the security limits before starting Domino inside the container to be on the safe side.
If the container launches with a too high limit an unprivileged user can still lower but not increase the limits.
I am currently testing if we should reduce the number of files to 1048576.
The only software currently affected is LSOF. But you never know and this would be a good defensive change.
- Comments [0]