New Performance Problem with Domino on Windows 2008 64bit
For example the problem with the large amount of memory used by the cache manager which competed with memory allocations by Domino. To prevent the cache manager from using too much physical memory IBM uses the system call SetSystemFileCacheSize(..) (ref: http://msdn.microsoft.com/en-us/library/aa965240%28VS.85%29.aspx). But it turned out this only fixed part of the underlying issue. To understand what is going wrong you have to understand how the Windows Cache Manager works internally.
How the Windows Cache Manager Works
The Windows Cache Manager uses Virtual Memory to store the cache pages one by one for each open file in block sizes of 256 kb. Allocated pages stay in Virtual Memory for a while depending on multiple factors. The most important factor is if the file is still open by any running process. The Virtual Memory Manager manages the physical memory used for the cache. The amount of memory is the working set of the cache manager specified via the SetSystemFileCacheSize Windows API call.
Domino 8.5.x sets this value by default to 30% of the physical memory which is a good starting point. But on the other hand, if the Virtual Memory for the file-system cache becomes larger and larger the Virtual Memory manager will throw out many pages but keeps the open files still in the Virtual Memory of the File-System cache.
IMHO this is not how a cache manager should be implemented. But in normal conditions this is still not a big issue because the file-system cache should not grow too much. But this becomes an issue when a lot of data needs to be cached. The biggest limitation we ran into the 1 TB limit for the Virtual Memory used by the cache manager.
For example if you have a lot open files with a lot read activity (for example during a Domino backup) the pages read stay in Virtual Memory of the cache manager. The total limit is 1 TB even the physical memory might be only a couple of GBs.
So even the data stays in Virtual Memory of the cache manage it might be already removed by the Virtual memory manager because the working-set of the cache manager is far smaller than the Virtual Memory.
What can go wrong with the cache manager
There have been a couple of issues that caused high CPU load when Windows needed to scan thru all the Virtual Memory pages. Some of the general issues have been addressed in Windows 2008 R2. But even Win2008 R2 still needs hotfixes which you need to manually apply. Still with all the fixes the file-system cache can become very ineffective. Depending on what fixes you have installed the problem shows up in different ways.
Either you have a temporary high CPU load (without the fixes) or at some point the I/O rate drops dramatically.
What happens in case of Domino
It turned out that Microsoft uses different cache strategies for different ways files are opened. Domino 7 and higher (also the current 8.5.2 version) uses the following flag to open all NSF files:
FILE_FLAG_RANDOM_ACCESS (0x10000000)
Access is intended to be random. The system can use this as a hint to optimize file caching.
(reference: http://msdn.microsoft.com/en-us/library/aa363858%28VS.85%29.aspx)
IBM is using this flag to reduce the amount of read ahead caching because NSF files are more or less read in a random way. But it turned out that in Windows 2008 64bit this flag triggers that the file blocks that are read stay in the cache until the file is closed.
Because Domino keeps file open in the Database Cache (dbcache) for performance reasons it takes quite a long time until the cache is released. Especially during a backup of a larger Domino server the file-system cache will grow dramatically and causes performance issues.
I have done a lot of testing with different ways opening files. It turned out that the first file-handle opened with the FILE_FLAG_RANDOM_ACCESS causes that all following file reads of all other processes are stay in the cache for a long time. This explains the high cache usage we have seen with Domino also in earlier versions.
The documentation about the flag is not really clear on the Microsoft side and the behaviour changed with Windows 2008 64bit. Even when reading the Windows Internal documentation this is not really clear.
But it turned out that even with all the fixes in Windows 2008 the file-system cache still works very ineffective when FILE_FLAG_RANDOM_ACCESS is used.
Solution
In our customer situation we got a hotfix to disable the attribute when opening Notes databases. All the performance tests have shown that this is not just the solution for our particular issue with backup performance but also helps to increase performance in other situations. IBM is currently checking if this flag can be disabled in a future fixpack of Domino. We got a SPR #KBRN899NF6 and a hotfix to provide a notes.ini variable to disable the FILE_FLAG_RANDOM_ACCESS for now.
Update: It is important that you enabled the following notes.ini parameter once you have installed the hotfix:
Disable_Random_RW_File_ATTR=1
If you are running on Win2008 with a data volume above 500 GB you request this fix. Hopefully we get this chance into the next Fixpack
Support is working on a technote but I wanted to give you a heads up and provide some details about the background of the problem.
Let me know if you have any questions or feedback.
-- Daniel
- Comments [24]