Configuring SAI / Loadmon for fast servers
Daniel Nashed – 12 August 2009 12:15:52
As I have mentioned in an earlier post the SAI has been fixed in 8.5. But in some situations the SAI is still not working as expected specially on very fast servers.
The main reason is that the transactions measured in the SAI are calculated in microseconds and the way the loadmon (module in Domino calculating the data for the SAI) works can cause still quite odd effects.
Loadmon calculates the current average for 12 key transaction types (see sample below). By default this is 5 intervals of 15 seconds each. For each of the transaction types it needs at least 5 transactions to make it relevant in an interval.
The current average is than compared against the lowest ever calculated average for this transaction type and an Expansion Factor (XF) is calculated (how many times it took longed than in the fastest interval) .
Side Note: Minimum Average Transactions times are stored in memory and written to the loadmon.ncf file on shutdown. That's why you need to remove this file when the server is shutdown once you make changes to the loadmon notes.ini variables.
For that reason my Unix/Linux start script has an option to remove the loadmon.ncf at every restart.
The XF per transaction is than merged into a mixed XF for all transaction types which leads to the SAI (using another formula SAI = 100 * (1 - log (XF) / log (2) / n) where n is the transinfo range, 6 by default).
This algorithm works fine if your server transactions are never so incredible fast that they cannot be compared to normal transaction times.
Because the time is calculated in microseconds it can happen for example at night that transactions take only around 50 microseconds.
If you compare 1 millisecond transaction (normal operation) with 50 microsecond transaction (happens when the server is totally idle in very special conditions -- e.g. cluster partner etc.), that's not really a fair comparison.
So what happens is that even if a normal transaction takes around 1 ms the transaction appears already to be slow for the loadmon calculation -- even it is very fast.
Below I added an example of an untuned loadmon with some real world data from a brand new and fast Domino 8.5 server on current hardware.
So there are two approaches to solve this issue. One would be to change the transinfo range to reduce the effect the difference between the normal and extreme fast values have.
But this might not be a desirable way because this could lead to an odd behaviour or you might not be able to completely tune it.
The second approach is new in 8.5. You can set the following two new notes.ini parameter.
Server_MinPossibleTransTime=1500
Server_MaxPossibleTransTime=20000000
Those two parameters (in microseconds) allow you to specify the minimum and maximum values used for the individual transactions. This example sets the minimum to 1,5 ms and the maximum to 20 seconds.
Transactions that take shorter are set to 1,5 ms. Transactions that longer are set to 20 seconds.
This ensures that individual very short or very long transactions do not have a too high impact on the SAI calculation.
The second example below has been generated with 1 ms as the lower boundary and 20 seconds as the upper boundary.
Those two notes.ini parameters are not yet well documented. You only find them in the public SPR description (of 7.0.4 because 8.5 was a feature release with no SPR list).
Another side note: The following two examples are generated with a tool I developed long time ago when 6.0.2 shipped and we needed to debug the SAI.
You can get the values from the server stats when you enable debug_loadmon=1. My trace tool takes snapshot of the stats in a seconds interval and writes them into a notes database to allow to annotate them.
-- Daniel
Untuned Loadmon Example
RunningAvgTime | MinAvgTransTime | CurrentTransCount | XF | |
CLOSE_DB | 972,6316 | 34,3667 | 38 | 28,3015978355199 |
DB_INFO_GET | 63,3333 | 48,4000 | 6 | 1,30853994490358 |
DB_READ_HIST | 138,0000 | 110,8333 | 1 | 0 |
DB_REPLINFO_GET | 64,0000 | 44,8333 | 7 | 1,4275092936803 |
DB_WRITE_HIST | 0,0000 | 117,6250 | 0 | 0 |
GET_OBJECT_SIZE | 0,0000 | 65,1667 | 0 | 0 |
GET_SPECIAL_NOTE_ID | 54,0000 | 46,2951 | 1 | 0 |
OPEN_DB | 5.564,3953 | 131,8000 | 43 | 42,21847760878 |
OPEN_NOTE | 4.638,4054 | 114,9697 | 74 | 40,3445910327829 |
READ_OBJECT | 86,2593 | 62,6250 | 27 | 1,3773933614253 |
SERVER_AVAILABLE_LITE | 56,4167 | 47,0000 | 12 | 1,20035460992908 |
NIF_OPEN_NOTE | 0,0000 | 106,6607 | 0 | 0 |
7 | 116,178463687021 |
AvailabilityIndex | 35 |
ExpansionFactor | 15,0937731906097 |
Tuned Loadmon Example
RunningAvgTime | MinAvgTransTime | CurrentTransCount | XF | |
CLOSE_DB | 1.463,7308 | 1.000,0000 | 130 | 1,4637307692308 |
DB_INFO_GET | 1.000,0000 | 1.000,0000 | 5 | 0 |
DB_READ_HIST | 1.372,7485 | 1.000,0000 | 163 | 1,3727484662577 |
DB_REPLINFO_GET | 1.000,0000 | 1.000,0000 | 74 | 1 |
DB_WRITE_HIST | 1.000,0000 | 1.000,0000 | 65 | 1 |
GET_OBJECT_SIZE | 0,0000 | 1.000,0000 | 0 | 0 |
GET_SPECIAL_NOTE_ID | 1.000,0000 | 1.000,0000 | 15 | 1 |
OPEN_DB | 3.754,4135 | 1.000,0000 | 133 | 3,7544135338346 |
OPEN_NOTE | 1.870,0294 | 1.000,0000 | 34 | 1,8700294117647 |
READ_OBJECT | 1.000,0000 | 1.000,0000 | 48 | 1 |
SERVER_AVAILABLE_LITE | 1.000,0000 | 1.000,0000 | 8 | 1 |
NIF_OPEN_NOTE | 0,0000 | 1.000,0000 | 0 | 0 |
9 | 13,4609221810878 |
AvailabilityIndex | 87 |
ExpansionFactor | 1,68301058631922 |
- Comments [1]