Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

 
alt

Daniel Nashed

 

Linux I/O Performance Tweek

Daniel Nashed  23 April 2012 04:51:09


During a performance troubleshooting session at a customer we found a very interesting tuning option that can boost your storage performance dramatically.
This setting is really helpful for SAN environments, SSD discs, virtual servers and also physical servers using RAID storage.

Current Linux versions do use the an I/O scheduler called CFQ (complete fair queuing).
This scheduler is helpful when you have simple local disk where Linux can optimize the I/O requests by re-ordering them.

But this can introduce latency with modern I/O devices.
If you have a caching RAID controller or a SAN, SSD disk the hardware backend is far more effective optimizing the I/O.
Most disk sub-systems also include a larger amount of cache to optimize the I/O.

Changing the I/O scheduler from "CFQ" to "noop" can dramatically reduce I/O latency.
The "await" in the iostat output shows the time needed for an I/O request from the application to the queue, to the disk and back to the application.
Even if the service time of the device is good (svctm) the await could be a lot higher because of CFQ that tries to re-order the I/O requests.

If you switch from "CFQ" to "noop" all I/O requests are send to the I/O device directly without any additional delay.
You will notice that the await is dramatically reduced.

We have seen good results with SAN environments and also physical disks.

Here is an example for two tests that we did with a larger physical machine with RAID 10 disks.

First Test

I have used 80 thread to read 32000 docs each (80 separate local databases on the server with small documents) 

The result was as follows: 
51 sec with CFQ scheduler 
28 sec with noop scheduler 
19 sec all data in cache. 

Second Test

80 threads creating 2000 docs each 
with noop 42 sec 
with cfq 132 sec 

You can see from the results that we have dramatic difference in performance between CFQ and noop. 
The server had a lot of RAM and repeating the test did not show and read I/O. 
So all data has been served from cache in this case! That means the 19 seconds is without physical disk I/O. Just the processing time in back-end. 

Side note for File-System Cache

Even for larger environment caching can effect the results. So for all tests beside the last test we have flushed the file-system cache. 

Here is the cache flush command on Linux 
echo 3 > /proc/sys/vm/drop_caches 

So if you do your own testing you should


Here is a short extract from an older IBM Linux redbook that is a great summary of the background:

Select the right I/O elevator in kernel 2.6 
For most server workloads, the complete fair queuing (CFQ) elevator is an adequate choice 
as it is optimized for the multiuser, multiprocess environment a typical server operates in. 
However, certain environments can benefit from a different I/O elevator.

Intelligent disk subsystems 
Benchmarks have shown that the NOOP elevator is an interesting alternative in high-end 
server environments. When using IBM ServeRAID or TotalStorage DS class disk 
subsystems, the lack of ordering capability of the NOOP elevator becomes its strength. 
Intelligent disk subsystems such as IBM ServeRAID and TotalStorage DS class disk 
subsystems feature their own I/O ordering capabilities. Enterprise class disk subsystems 
may contain multiple SCSI or FibreChannel disks that each have individual disk heads and 
data striped across the disks. It would be very difficult for an operating system to anticipate 
the I/O characteristics of such complex subsystems correctly, so you might often observe 
at least equal performance at less overhead when using the NOOP I/O elevator.

Virtual machines 
Virtual machines, regardless of whether in VMware or VM for zSeriesĀ®, may only 
communicate through the virtualization layer with the underlying hardware. Hence a virtual 
machine is not aware of the fact if the assigned disk device consists of a single SCSI 
device or an array of FibreChannel disks on a TotalStorage DS8000. The virtualization 
layer takes care of necessary I/O reordering and the communication with the physical 
block devices. Therefore, we recommend using the NOOP elevator for virtual machines to 
ensure minimal processor overhead.


How do you change the settings?

You can change the setting dynamically for testing with this manual command (example for hda device)

#
echo noop > /sys/block/hda/queue/scheduler

If you want to permanently change the scheduler for all block devices you can set it as a boot option.

Edit /boot/grub/grub.conf and enter in kernel line elevator=noop.


Conclusion

You should really look into this setting and try it in your environment.
Changing the settings dynamically in an production environment does not cause any issues and you can see with "iostat -x 1" how the "await" will change.

I am interested in your feedback from your environments.

-- Daniel

Comments

1Simon  25.04.2012 1:47:29  Linux I/O Performance Tweek

Very interesting with regards to Linux VMs in general, I use them a lot. Will have a play. Thanks!

2xose  08.05.2012 11:36:59  Linux I/O Performance Tweek

how about deadline ?

3Daniel Nashed  08.05.2012 13:30:43  Linux I/O Performance Tweek

@xose, deadline is better than cfq but best would be noop.

the system directly sends the request to the hardware and the hardware does the optimization.

to reduce the latency you should switch to noop.

-- Daniel

4Ninke Westra  04.02.2013 17:09:38  Linux I/O Performance Tweek

Is there an equivalent for Windows, or is this not an issue?

5David Fry  21.03.2013 12:50:23  Linux I/O Performance Tweek

How did you put workload on the disk? What command did you use? (I would like to be able to run the command to compare noop against cfq).

Thanks

6Michelangelo  13.05.2013 15:10:49  Linux I/O Performance Tweek

Daniel,

on my systems (SuSe EL), this kernel parameter is not configured. Do you know if there is a default or if it is equivalent to noop?

7Greg  11.02.2014 11:01:35  Linux I/O Performance Tweek

Hello Daniel,

I know this is old post but your test are absolutely correct. If I would have Active- Active storage please ALWAYS check and use ALUA multipathing policy never Round Robin or MRU. If you have no possibility to setup ALUA or you do not have Active Active storage please use Fixed.

Great article.

Cheers

8Ed H  27.10.2014 8:17:37  Linux I/O Performance Tweek

I have seen this elevator=noop pay off for many installs of SLES 11 Linux installs under VMware.

I also wonder, like Ninke Westra, if there is an equivalent setting in the Windows Server OS?

9Daniel Nashed  29.10.2014 18:49:47  Linux I/O Performance Tweek

He Ed,

on the windows side we don't have a setting like this.

This is only relevant on Linux.

-- Daniel

10Bill Z  31.01.2015 6:33:41  Linux I/O Performance Tweek

For a server, many people use it at the same time, someone said cfq can make disk bandwith shared amond the differenct processes. so I think cfq for sever still make sense.

11Daniel Nashed  09.02.2015 10:20:23  Linux I/O Performance Tweek

@Bill, no IMHO CFQ does make no sense in any modern server or workstation environment.

Intelligent disks can handle the I/O much better than what the queue-reordering of a OS that tries to guess the disk layout.

Sending the request to the disk is the best solution also for physical disks.

Even if it is not a RAID the disk still has a cache that might make operations already faster.

And CFQ cannot help for SSDs, RAID disks and in any virtualized environment.

VMware recommends "noop" in some tuning papers as well.

Holding back I/O requests causes delays in most cases. The disk sub-system can better handle serving those requests in almost any case.

-- Daniel

12Vlastimil Zach  25.04.2016 13:05:28  Linux I/O Performance Tweek

In windows side we have Subsystem Device Driver Device Specific Module (SDDDSM) for each storage sytem , which includes modules

- Enhanced data availability

- Dynamic I/O load-balancing across multiple paths

- Automatic path failover protection

- Concurrent download of licensed internal code

- Path-selection policies for the host system

Links

    Archives


    • [HCL Domino]
    • [Domino on Linux]
    • [Nash!Com]
    • [Daniel Nashed]