Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

alt

Daniel Nashed

performance impact with private on first use views/folders

Daniel Nashed – 22 December 2009 19:44:30

The following issue exists for a very long time. But it took until last year until it has been discovered. We have found it at two customers with performance problems using annotated client_clock data and it took a while to convince IBM support hat this can have a bigger performance impact for customer -- specially when using WAN connections.

Meanwhile the problem has been described in a technote (http://www.ibm.com/support/docview.wss?uid=swg21322578) / SPR# RSTN7K2EM4 with information from our PMR but I want to give some additional background we figured out.

-- Daniel


Technical background:

Notes/Domino uses an internal cache for design elements to avoid the overhead of searching design elements every time. The design cache is leveraged by the API like NIFFindDesignNote. This works locally and specially also for remote transactions. It's a single transaction invoked on client side to request the NoteID of the design element.

The design cache is based on a kind of hash table using the names of the design elements. If you use private on first use views/folders at some point (depending on the length of the names of the design elements and some other details) the design cache overflows (due to internal structure issues) and is completely disabled. This happens with about 40 entries as described in the technote.

Once this happened the Notes Client will try to open the design collection (a kind of special view) and reads thru the design collection step by step requesting buffer by buffer with data from the design collection. When you are lucky the design element is in the first part of the collection but most of the time it is in the middle or end of the collection. The more design elements the database has the more reads the client has to perform. And specially when a design element cannot be found always the full design collection is read.

Here is a real world example from a customer application how the transactions look when the design cache works and when it is broken.

With design cache

(3-507 [331]) FINDDESIGN_NOTES: 0 ms. [34+16=50] 

Without design cache
 
(3-299 [309]) OPEN_COLLECTION(REPC1256E62:004B8651-NTFFFF0020,0000,0000): 0 ms. [42+34=76] 
(4-299 [310]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65148=65224] 
(5-299 [311]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 16 ms. [76+65388=65464] 
(6-299 [312]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+63918=63994] 
(7-299 [313]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65506=65582] 
(8-299 [314]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65498=65574] 
(9-299 [315]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 15 ms. [76+64920=64996] 
(10-299 [316]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+63688=63764] 
(11-299 [317]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+64522=64598] 
(12-299 [318]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65402=65478] 
(13-299 [319]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65478=65554] 
(14-299 [320]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 16 ms. [76+65296=65372] 
(15-299 [321]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65238=65314] 
(16-299 [322]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65386=65462] 
(17-299 [323]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+9480=9556] 
(18-299 [324]) CLOSE_COLLECTION(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [12+0=12] 

You see that instead of one transaction with 50 bytes of data transferred.
In the error case we have 16 transactions with 14 transactions receiving up to 64k of data.
In my case the test has been performed on a local machine without any network delay.
But in normal customer cases in a WAN we have maybe 100-200 ms or more for each of those transactions.

If this happs for each requested design element (depending how it is queried).


How to find if a database has lost the design cache

You can see it by checking the client_clock data and I wrote a little tool to scan databases if the design cache is still in place.
If you want to test individual databases I can send you the lite version of this tool.
I wrote another tool that scans whole servers and also allows to delete private on first use view/folder entries.


How to recover

Once the design cache is gone it is quite difficult in most case to get it enabled again. First of all you have to get rid of the private on first use folder/view instances and than compact the database. In most cases this might re-enable the design cache but in many cases you have to create a new instance of the database.


How to avoid the problem

Currently the only solution is to avoid private on first use views/folders because for larger applications you always have more than 40 instances of a private on first use view/folder.

There is one work-around to avoid the client transactions for searching the design collection using a hotfix.
The hotfix ignores that the DB lost it's design cache and let the server do the search in the design collection without sending all the entries to the client.
But this is clearly just a temporary work-around which can also have some side-effects (the hotfix is currently still in test).

What will be done in future

There is no simple fix for this issue. The full solution to avoid that the design cache breaks needs a change which cannot be done in the current code stream - according to the current information we got from IBM support.


I hope this post helps customers and other Business Partners with their applications.

Feel free to contact me offline to get a copy of the lite version of the analysis tool.

Links

    Archives


    • [HCL Domino]
    • [Domino on Linux]
    • [Nash!Com]
    • [Daniel Nashed]