Daniel Nashed 30 April 2012 10:39:06
We have been tracing this problem together with IBM for half a year and finally got a solution for a performance problem that blocked us from migrating 8500 users from Groupwise to Domino.
When we first started to run large scale imports to Domino (load with creating many documents with attachments in parallel with multiple workstations) I discovered a slow down for some transactions by 100 ms or even a multiple of 100 ms.
Part of the problem was based on issues introduced with a newer kernel that SuSE is using in SLES11. The new process scheduler CFS which I have blogged earlier needs some "tuning" to work nicely with Domino.
SLES 11 SP2 even ships with a 3.0 kernel which has an updated CFS implementation which needs a different tuning setting than SP1 (you need to set echo NO_FAIR_SLEEPERS > /sys/kernel/debug/sched_features after reboot).
In addition another disk I/O issue is fixed in the shipping 3.0 kernel which in addition reduces the I/O.
Still those settings did not help to completely solve the problem and we continued tracing. I have build test tools to do all sort of load tests locally and remotely to figure out what could cause the delay.
And it turned out that there was an issue with the IOCP handling (sys-epoll) which caused delays in the thread pool coordination.
Last week we got a hotfix (853FP1HF85) containing a changed behaviour in the way the worker threads are coordinated (SPR PHEY8RJHXR).
This SPR is already submitted for 8.5.4 and should hopefully make it into 8.5.3 FP2.
In testing the response time for a normal work-load was five times lower than without the fix!
In my testing with an attachment workload I had even better results.
Load Test Environment
- 30 threads
- 100 documents
- attachment size 2 MB
- create 8.5.3 mailfiles from template
- search for all documents after writing them
nshload -t30 -w100 -a /largefile.txt -s servername
- SLES 11 SP2 64bit Domino 8.5.3 FP1
- fast quad-core CPU, fast, local RAID10 disks, 32 GB RAM
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
- amgr disabled
- transaction logging enabled
- NO_FAIR_SLEEPERS for most tests beside one test
- noop scheduler instead of CFQ for disk devices
I have tested with out network interface creating the documents locally as a reference.
Than I tested without the fix but NO_FAIR_SLEEPERS set
After that I did a test with the hotfix.
And I have got the following results
|no network||without fix||fix + fair sleeper||fix + no fair sleeper|
|elapsed time (sec)||56||828||78||68|
|response time client|
Response time improvment: 12 times faster (reduced to 8%)
Elapsed time improvement: 16 times faster (reduced to 6%)
We have a dramatical performance improvement with the hotfix.
You can also see that the NO_FAIR_SLEEPERS is helpful but the real performance boost we get is because of the hotfix :-)
Our current nightly migration runs for 24 hours. With this fix I would assume that at least it would be done in 1/5 of the time (really careful estimation if the migration tool does not cause other delays) .
So we would be done in 5 hours instead of 24 hours.
This solves our performance issues during migration and also improves response time during normal operations dramatically.
I have also done another tests creating documents with attachments in parallel with 100 threads.
There was no response time slow down with the nshopen tests. I got the same 3 ms average response time.
The test creating 100 documents with 100 threads and a 1,5 MB attachment in parallel took 187 seconds!!!
If you are not running into critical performance issues you should wait for the next fixpack to ship.
In case you are having issues you should first looking into the other performance optimizations I have posted (CFS changed, switching from CFQ to noop disk I/O scheduler).
If that does not help you could open a PMR to get the hotfix :-)
I am looking forward to the Domino performance team to ship new performance benchmark comparing Linux with Windows in future :-)
- Comments