Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

Linux I/O Performance Tweek

Daniel Nashed  23 April 2012 06:51:09


During a performance troubleshooting session at a customer we found a very interesting tuning option that can boost your storage performance dramatically.
This setting is really helpful for SAN environments, SSD discs, virtual servers and also physical servers using RAID storage.

Current Linux versions do use the an I/O scheduler called CFQ (complete fair queuing).
This scheduler is helpful when you have simple local disk where Linux can optimize the I/O requests by re-ordering them.

But this can introduce latency with modern I/O devices.
If you have a caching RAID controller or a SAN, SSD disk the hardware backend is far more effective optimizing the I/O.
Most disk sub-systems also include a larger amount of cache to optimize the I/O.

Changing the I/O scheduler from "CFQ" to "noop" can dramatically reduce I/O latency.
The "await" in the iostat output shows the time needed for an I/O request from the application to the queue, to the disk and back to the application.
Even if the service time of the device is good (svctm) the await could be a lot higher because of CFQ that tries to re-order the I/O requests.

If you switch from "CFQ" to "noop" all I/O requests are send to the I/O device directly without any additional delay.
You will notice that the await is dramatically reduced.

We have seen good results with SAN environments and also physical disks.

Here is an example for two tests that we did with a larger physical machine with RAID 10 disks.

First Test

I have used 80 thread to read 32000 docs each (80 separate local databases on the server with small documents) 

The result was as follows: 
51 sec with CFQ scheduler 
28 sec with noop scheduler 
19 sec all data in cache. 

Second Test

80 threads creating 2000 docs each 
with noop 42 sec 
with cfq 132 sec 

You can see from the results that we have dramatic difference in performance between CFQ and noop. 
The server had a lot of RAM and repeating the test did not show and read I/O. 
So all data has been served from cache in this case! That means the 19 seconds is without physical disk I/O. Just the processing time in back-end. 

Side note for File-System Cache

Even for larger environment caching can effect the results. So for all tests beside the last test we have flushed the file-system cache. 

Here is the cache flush command on Linux 
echo 3 > /proc/sys/vm/drop_caches 

So if you do your own testing you should


Here is a short extract from an older IBM Linux redbook that is a great summary of the background:

Select the right I/O elevator in kernel 2.6 
For most server workloads, the complete fair queuing (CFQ) elevator is an adequate choice 
as it is optimized for the multiuser, multiprocess environment a typical server operates in. 
However, certain environments can benefit from a different I/O elevator.

Intelligent disk subsystems 
Benchmarks have shown that the NOOP elevator is an interesting alternative in high-end 
server environments. When using IBM ServeRAID or TotalStorage DS class disk 
subsystems, the lack of ordering capability of the NOOP elevator becomes its strength. 
Intelligent disk subsystems such as IBM ServeRAID and TotalStorage DS class disk 
subsystems feature their own I/O ordering capabilities. Enterprise class disk subsystems 
may contain multiple SCSI or FibreChannel disks that each have individual disk heads and 
data striped across the disks. It would be very difficult for an operating system to anticipate 
the I/O characteristics of such complex subsystems correctly, so you might often observe 
at least equal performance at less overhead when using the NOOP I/O elevator.

Virtual machines 
Virtual machines, regardless of whether in VMware or VM for zSeriesĀ®, may only 
communicate through the virtualization layer with the underlying hardware. Hence a virtual 
machine is not aware of the fact if the assigned disk device consists of a single SCSI 
device or an array of FibreChannel disks on a TotalStorage DS8000. The virtualization 
layer takes care of necessary I/O reordering and the communication with the physical 
block devices. Therefore, we recommend using the NOOP elevator for virtual machines to 
ensure minimal processor overhead.


How do you change the settings?

You can change the setting dynamically for testing with this manual command (example for hda device)

#
echo noop > /sys/block/hda/queue/scheduler

If you want to permanently change the scheduler for all block devices you can set it as a boot option.

Edit /boot/grub/grub.conf and enter in kernel line elevator=noop.


Conclusion

You should really look into this setting and try it in your environment.
Changing the settings dynamically in an production environment does not cause any issues and you can see with "iostat -x 1" how the "await" will change.

I am interested in your feedback from your environments.

-- Daniel