Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

performance impact with private on first use views/folders

Daniel Nashed  22 December 2009 20:44:30

The following issue exists for a very long time. But it took until last year until it has been discovered. We have found it at two customers with performance problems using annotated client_clock data and it took a while to convince IBM support hat this can have a bigger performance impact for customer -- specially when using WAN connections.

Meanwhile the problem has been described in a technote (http://www.ibm.com/support/docview.wss?uid=swg21322578) / SPR# RSTN7K2EM4 with information from our PMR but I want to give some additional background we figured out.

-- Daniel


Technical background:

Notes/Domino uses an internal cache for design elements to avoid the overhead of searching design elements every time. The design cache is leveraged by the API like NIFFindDesignNote. This works locally and specially also for remote transactions. It's a single transaction invoked on client side to request the NoteID of the design element.

The design cache is based on a kind of hash table using the names of the design elements. If you use private on first use views/folders at some point (depending on the length of the names of the design elements and some other details) the design cache overflows (due to internal structure issues) and is completely disabled. This happens with about 40 entries as described in the technote.

Once this happened the Notes Client will try to open the design collection (a kind of special view) and reads thru the design collection step by step requesting buffer by buffer with data from the design collection. When you are lucky the design element is in the first part of the collection but most of the time it is in the middle or end of the collection. The more design elements the database has the more reads the client has to perform. And specially when a design element cannot be found always the full design collection is read.

Here is a real world example from a customer application how the transactions look when the design cache works and when it is broken.

With design cache

(3-507 [331]) FINDDESIGN_NOTES: 0 ms. [34+16=50] 

Without design cache
 
(3-299 [309]) OPEN_COLLECTION(REPC1256E62:004B8651-NTFFFF0020,0000,0000): 0 ms. [42+34=76] 
(4-299 [310]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65148=65224] 
(5-299 [311]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 16 ms. [76+65388=65464] 
(6-299 [312]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+63918=63994] 
(7-299 [313]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65506=65582] 
(8-299 [314]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0ms. [76+65498=65574] 
(9-299 [315]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 15 ms. [76+64920=64996] 
(10-299 [316]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+63688=63764] 
(11-299 [317]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+64522=64598] 
(12-299 [318]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65402=65478] 
(13-299 [319]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65478=65554] 
(14-299 [320]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 16 ms. [76+65296=65372] 
(15-299 [321]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65238=65314] 
(16-299 [322]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+65386=65462] 
(17-299 [323]) READ_ENTRIES(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [76+9480=9556] 
(18-299 [324]) CLOSE_COLLECTION(REPC1256E62:004B8651-NTFFFF0020): 0 ms. [12+0=12] 

You see that instead of one transaction with 50 bytes of data transferred.
In the error case we have 16 transactions with 14 transactions receiving up to 64k of data.
In my case the test has been performed on a local machine without any network delay.
But in normal customer cases in a WAN we have maybe 100-200 ms or more for each of those transactions.

If this happs for each requested design element (depending how it is queried).


How to find if a database has lost the design cache

You can see it by checking the client_clock data and I wrote a little tool to scan databases if the design cache is still in place.
If you want to test individual databases I can send you the lite version of this tool.
I wrote another tool that scans whole servers and also allows to delete private on first use view/folder entries.


How to recover

Once the design cache is gone it is quite difficult in most case to get it enabled again. First of all you have to get rid of the private on first use folder/view instances and than compact the database. In most cases this might re-enable the design cache but in many cases you have to create a new instance of the database.


How to avoid the problem

Currently the only solution is to avoid private on first use views/folders because for larger applications you always have more than 40 instances of a private on first use view/folder.

There is one work-around to avoid the client transactions for searching the design collection using a hotfix.
The hotfix ignores that the DB lost it's design cache and let the server do the search in the design collection without sending all the entries to the client.
But this is clearly just a temporary work-around which can also have some side-effects (the hotfix is currently still in test).

What will be done in future

There is no simple fix for this issue. The full solution to avoid that the design cache breaks needs a change which cannot be done in the current code stream - according to the current information we got from IBM support.


I hope this post helps customers and other Business Partners with their applications.

Feel free to contact me offline to get a copy of the lite version of the analysis tool.

Comments

1Jamie Magee  29.12.2009 3:06:16  performance impact with private on first use views/folders

Wow, this has huge implications. I'm running into a large application with over 150 private-on-first-use view and folder instances. Thanks for the work on this. I will be interested to share findings with you.

Another alternative to private views/folders is to write LS code uses the selectionformula and other NotesView properties to create and manage multiple near-copies of shared views, each one intended for a specific user, including readers if desired. This emulates the functionality of private views, but with many shared views. If each is a mutually exclusive set of documents, having dozens or even hundreds of such shared views is actually more performant than a single shared view of all the documents.

2Turtle  03.01.2010 9:51:31  performance impact with private on first use views/folders

Yeah, it's strange, you'd think that really low-level issues like this would have been found years ago, but over the years I've found some really glaring issues that exist way, way down in the system... things that couldn't be fixed easily short of an ODS or other major update. And you call IBM, and after a few months, and maybe after a few beers with the appropriate developer at Lotusphere, you find that yeah, it really IS a problem, but it'll be a couple of years before a usable fix will be out there.

Hence, the workarounds.

In this instance, well, I don't have any real Notes client population, so, I'm OK with ignoring it right now.

3Andrew Magerman  19.01.2010 9:02:06  performance impact with private on first use views/folders

Hi Daniel,

Thanks for the tip.

By the way, I wonder if you are aware of a small tool that Jaimie and I wrote which parses the debugfile into something more palatable:

{ Link }

If you do not object, I will include your tip (with a reference to this page) in the next version of the tool (which shows a lot more precisely the nature of the loaded/read design note (i.e. is it a navigator, or a profile document, or what)