Linux tools for performance tracing and optimization
Daniel Nashed – 12 February 2026 10:28:24
This week I am working on an application to forward logs received from the Domino server via pipe with STDIN.
The application uses multiple threads to queue, process and push changes to different targets.
To ensure stdin is immediately processed, it is running with multiple threads for different targets and also for retry when pushing directly to a remote server (like Loki via REST API).
This is a C++ helper process sitting next to Domino, not Domino itself.
Tight loops with the right timing, wake-ups, mutexes, queues are really important for a process like this.
Last night I was looking into performance optimizing the code. It turned out that changing the timing of some of the inner loops reduced CPU load and system calls.
Linux has awesome tools to analyze performance. strace is also useful to trace system calls in general. But in my context I am using it to summarize the calls not to trace each of them.
There are two great tools, which help to understand what a process is doing.
In contrast to "top" and other tools the focus is not on how much total CPU, context switched etc a process uses.
It's more about which fraction of the resources is used in which code area.
The following is a quick example how it looks for my application. It has multiple threads and multiple worker loops.
The break down in profiling helps understanding what is going on. For larger applications this needs to be analyzed really narrowed down to a process and/or a short time when a specific problem occurs.
Those two tools can be useful also for other applications. But it is of course difficult to optimize if you can't change the code.
For an application like Domino where the application is layered you can't change the lower part of the code.
But you have influence on your applications in C-API, Lotus Script etc.
Analyzing on Linux level is often only helpful to understand what resources are used to continue tracing on higher levels what your application is doing.
In that case the Linux tools only give you an idea what to look for in higher levels and to measure improvements when optimizing code.
In my case it was looking into the C/C++ code itself and the system calls it was performing.
Using perf you see the system calls and your application code. strace focuses on system calls only.
perf top -p 1853196
Samples: 21 of event 'task-clock:uppp', 4000 Hz, Event count (approx.): 3463221 lost: 0/0 drop: 0/7
Overhead Shared Symbol
29.71% domfwd [.] sccp
14.51% domfwd [.] __stdio_close
14.51% domfwd [.] fopen
7.43% domfwd [.] __clock_nanosleep
4.84% domfwd [.] LoadPidMap(char const*, std::unordered_map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<
4.84% domfwd [.] PushThread(void*)
4.84% domfwd [.] __fstatat
4.84% domfwd [.] __intscan
4.84% domfwd [.] __lockfile
4.84% domfwd [.] __stdio_read
4.84% domfwd [.] std::basic_filebuf<char, std::char_traits<char> >::basic_filebuf()
strace -c -p 1853196
strace: Process 1853196 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.65 0.008708 10 862 write
25.78 0.003583 4 754 futex
11.57 0.001608 9 168 read
------ ----------- ----------- --------- --------- ----------------
100.00 0.013899 7 1784 total
- Comments [0]