Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

 
alt

Daniel Nashed

 

Configuring SAI / Loadmon for fast servers

Daniel Nashed  12 August 2009 12:15:52

As I have mentioned in an earlier post the SAI has been fixed in 8.5. But in some situations the SAI is still not working as expected specially on very fast servers.

The main reason is that the transactions measured in the SAI are calculated in microseconds and the way the loadmon (module in Domino calculating the data for the SAI) works can cause still quite odd effects.

Loadmon calculates the current average for 12 key transaction types (see sample below). By default this is 5 intervals of 15 seconds each. For each of the transaction types it needs at least 5 transactions to make it relevant in an interval.
The current average is than compared against the lowest ever calculated average for this transaction type and an Expansion Factor (XF) is calculated (how many times it took longed than in the fastest interval) .

Side Note: Minimum Average Transactions times are stored in memory and written to the loadmon.ncf file on shutdown. That's why you need to remove this file when the server is shutdown once you make changes to the loadmon notes.ini variables.
For that reason my Unix/Linux start script has an option to remove the loadmon.ncf at every restart.

The XF per transaction is than merged into a mixed XF for all transaction types which leads to the SAI (using another formula SAI = 100 * (1 - log (XF) / log (2) / n) where n is the transinfo range, 6 by default).

This algorithm works fine if your server transactions are never so incredible fast that they cannot be compared to normal transaction times.
Because the time is calculated in microseconds it can happen for example at night that transactions take only around 50 microseconds.

If you compare 1 millisecond transaction (normal operation) with 50 microsecond transaction (happens when the server is totally idle in very special conditions -- e.g. cluster partner etc.), that's not really a fair comparison.
So what happens is that even if a normal transaction takes around 1 ms the transaction appears already to be slow for the loadmon calculation -- even it is very fast.
Below I added an example of an untuned loadmon with some real world data from a brand new and fast Domino 8.5 server on current hardware.

So there are two approaches to solve this issue. One would be to change the transinfo range to reduce the effect the difference between the normal and extreme fast values have.
But this might not be a desirable way because this could lead to an odd behaviour or you might not be able to completely tune it.

The second approach is new in 8.5. You can set the following two new notes.ini parameter.

Server_MinPossibleTransTime=1500
Server_MaxPossibleTransTime=20000000

Those two parameters (in microseconds) allow you to specify the minimum and maximum values used for the individual transactions. This example sets the minimum to 1,5 ms and the maximum to 20 seconds.

Transactions that take shorter are set to 1,5 ms. Transactions that longer are set to 20 seconds.
This ensures that individual very short or very long transactions do not have a too high impact on the SAI calculation.

The second example below has been generated with 1 ms as the lower boundary and 20 seconds as the upper boundary.

Those two notes.ini parameters are not yet well documented. You only find them in the public SPR description (of 7.0.4 because 8.5 was a feature release with no SPR list).

Another side note: The following two examples are generated with a tool I developed long time ago when 6.0.2 shipped and we needed to debug the SAI.
You can get the values from the server stats when you enable debug_loadmon=1. My trace tool takes snapshot of the stats in a seconds interval and writes them into a notes database to allow to annotate them.


-- Daniel


Untuned Loadmon Example


RunningAvgTimeMinAvgTransTime
CurrentTransCount
XF
CLOSE_DB
972,6316
34,3667
38
28,3015978355199
DB_INFO_GET
63,3333
48,4000
6
1,30853994490358
DB_READ_HIST
138,0000
110,8333
1
0
DB_REPLINFO_GET
64,0000
44,8333
7
1,4275092936803
DB_WRITE_HIST
0,0000
117,6250
0
0
GET_OBJECT_SIZE
0,0000
65,1667
0
0
GET_SPECIAL_NOTE_ID
54,0000
46,2951
1
0
OPEN_DB
5.564,3953
131,8000
43
42,21847760878
OPEN_NOTE
4.638,4054
114,9697
74
40,3445910327829
READ_OBJECT
86,2593
62,6250
27
1,3773933614253
SERVER_AVAILABLE_LITE
56,4167
47,0000
12
1,20035460992908
NIF_OPEN_NOTE
0,0000
106,6607
0
0
7
116,178463687021
AvailabilityIndex
35
ExpansionFactor
15,0937731906097


Tuned Loadmon Example


RunningAvgTimeMinAvgTransTime
CurrentTransCount
XF
CLOSE_DB
1.463,7308
1.000,0000
130
1,4637307692308
DB_INFO_GET
1.000,0000
1.000,0000
5
0
DB_READ_HIST
1.372,7485
1.000,0000
163
1,3727484662577
DB_REPLINFO_GET
1.000,0000
1.000,0000
74
1
DB_WRITE_HIST
1.000,0000
1.000,0000
65
1
GET_OBJECT_SIZE
0,0000
1.000,0000
0
0
GET_SPECIAL_NOTE_ID
1.000,0000
1.000,0000
15
1
OPEN_DB
3.754,4135
1.000,0000
133
3,7544135338346
OPEN_NOTE
1.870,0294
1.000,0000
34
1,8700294117647
READ_OBJECT
1.000,0000
1.000,0000
48
1
SERVER_AVAILABLE_LITE
1.000,0000
1.000,0000
8
1
NIF_OPEN_NOTE
0,0000
1.000,0000
0
0
9
13,4609221810878
AvailabilityIndex
87
ExpansionFactor
1,68301058631922





Links

    Archives


    • [HCL Domino]
    • [Domino on Linux]
    • [Nash!Com]
    • [Daniel Nashed]