Daniel Nashed 25 October 2012 10:53:53
We ran into this issue at a customer. When one server was down the other cluster mate was responding very very slow at some point.
For troubleshooting we started in the early morning with one server and until a certain time all users where happy.
But suddenly the response times went up dramatically. When bringing up the 2nd server also the 2nd server had slow response time for a while.
It turned out that the backend code that tries to find the matching replica had issues when too many user invoked the function at the same time.
When looking into a NSD we found a matching call-stack for an existing SPR.
The good news is that this problem has been addressed in 8.5.3 along with a directory assistance failover issue.
We had a production load test at that customer after upgrading to 8.5.3 confirming that this resolved our failover issues.
So among other issues fixed in 8.5.3 if you are running a Domino cluster, this is a good reason to upgrade to 8.5.3 current fixpack.
For the Directory Assistance issue my preference would be to use local directories only.
This gives you better performance in directory lookups (I don't know any customer with dedicated, directory servers just for DA).
And if you want to reuse existing DA configurations on different servers, you cannot leave the server name in the DA config document empty but you can use a "*" to specify the local server.
Reference for both issues:
SPR# SWAS8GGHMC: Make cluster failover more efficient, the existing API resolved the failover replica IDs and an extremely inefficient fashion causing server outages
SPR# JMOE5WQL5N - Fixes a behavior in Directory Assistance - When set up as a configuration directory, and using DA cluster feature, directory assistance will not find alternate replica.
- Comments