Linux I/O Performance Tweek
During a performance troubleshooting session at a customer we found a very interesting tuning option that can boost your storage performance dramatically.
This setting is really helpful for SAN environments, SSD discs, virtual servers and also physical servers using RAID storage.
Current Linux versions do use the an I/O scheduler called CFQ (complete fair queuing).
This scheduler is helpful when you have simple local disk where Linux can optimize the I/O requests by re-ordering them.
But this can introduce latency with modern I/O devices.
If you have a caching RAID controller or a SAN, SSD disk the hardware backend is far more effective optimizing the I/O.
Most disk sub-systems also include a larger amount of cache to optimize the I/O.
Changing the I/O scheduler from "CFQ" to "noop" can dramatically reduce I/O latency.
The "await" in the iostat output shows the time needed for an I/O request from the application to the queue, to the disk and back to the application.
Even if the service time of the device is good (svctm) the await could be a lot higher because of CFQ that tries to re-order the I/O requests.
If you switch from "CFQ" to "noop" all I/O requests are send to the I/O device directly without any additional delay.
You will notice that the await is dramatically reduced.
We have seen good results with SAN environments and also physical disks.
Here is an example for two tests that we did with a larger physical machine with RAID 10 disks.
First Test
I have used 80 thread to read 32000 docs each (80 separate local databases on the server with small documents)
The result was as follows:
51 sec with CFQ scheduler
28 sec with noop scheduler
19 sec all data in cache.
Second Test
80 threads creating 2000 docs each
with noop 42 sec
with cfq 132 sec
You can see from the results that we have dramatic difference in performance between CFQ and noop.
The server had a lot of RAM and repeating the test did not show and read I/O.
So all data has been served from cache in this case! That means the 19 seconds is without physical disk I/O. Just the processing time in back-end.
Side note for File-System Cache
Even for larger environment caching can effect the results. So for all tests beside the last test we have flushed the file-system cache.
Here is the cache flush command on Linux
echo 3 > /proc/sys/vm/drop_caches
So if you do your own testing you should
Here is a short extract from an older IBM Linux redbook that is a great summary of the background:
Select the right I/O elevator in kernel 2.6
For most server workloads, the complete fair queuing (CFQ) elevator is an adequate choice
as it is optimized for the multiuser, multiprocess environment a typical server operates in.
However, certain environments can benefit from a different I/O elevator.
Intelligent disk subsystems
Benchmarks have shown that the NOOP elevator is an interesting alternative in high-end
server environments. When using IBM ServeRAID or TotalStorage DS class disk
subsystems, the lack of ordering capability of the NOOP elevator becomes its strength.
Intelligent disk subsystems such as IBM ServeRAID and TotalStorage DS class disk
subsystems feature their own I/O ordering capabilities. Enterprise class disk subsystems
may contain multiple SCSI or FibreChannel disks that each have individual disk heads and
data striped across the disks. It would be very difficult for an operating system to anticipate
the I/O characteristics of such complex subsystems correctly, so you might often observe
at least equal performance at less overhead when using the NOOP I/O elevator.
Virtual machines
Virtual machines, regardless of whether in VMware or VM for zSeriesĀ®, may only
communicate through the virtualization layer with the underlying hardware. Hence a virtual
machine is not aware of the fact if the assigned disk device consists of a single SCSI
device or an array of FibreChannel disks on a TotalStorage DS8000. The virtualization
layer takes care of necessary I/O reordering and the communication with the physical
block devices. Therefore, we recommend using the NOOP elevator for virtual machines to
ensure minimal processor overhead.
How do you change the settings?
You can change the setting dynamically for testing with this manual command (example for hda device)
#
echo noop > /sys/block/hda/queue/scheduler
If you want to permanently change the scheduler for all block devices you can set it as a boot option.
Edit /boot/grub/grub.conf and enter in kernel line elevator=noop.
Conclusion
You should really look into this setting and try it in your environment.
Changing the settings dynamically in an production environment does not cause any issues and you can see with "iostat -x 1" how the "await" will change.
I am interested in your feedback from your environments.
-- Daniel
- Comments [13]
1Simon 25.04.2012 1:47:29 Linux I/O Performance Tweek
Very interesting with regards to Linux VMs in general, I use them a lot. Will have a play. Thanks!
2xose 08.05.2012 11:36:59 Linux I/O Performance Tweek
how about deadline ?
3Daniel Nashed 08.05.2012 13:30:43 Linux I/O Performance Tweek
@xose, deadline is better than cfq but best would be noop.
the system directly sends the request to the hardware and the hardware does the optimization.
to reduce the latency you should switch to noop.
-- Daniel
4Ninke Westra 04.02.2013 17:09:38 Linux I/O Performance Tweek
Is there an equivalent for Windows, or is this not an issue?
5David Fry 21.03.2013 12:50:23 Linux I/O Performance Tweek
How did you put workload on the disk? What command did you use? (I would like to be able to run the command to compare noop against cfq).
Thanks
6Michelangelo 13.05.2013 15:10:49 Linux I/O Performance Tweek
Daniel,
on my systems (SuSe EL), this kernel parameter is not configured. Do you know if there is a default or if it is equivalent to noop?
7Greg 11.02.2014 11:01:35 Linux I/O Performance Tweek
Hello Daniel,
I know this is old post but your test are absolutely correct. If I would have Active- Active storage please ALWAYS check and use ALUA multipathing policy never Round Robin or MRU. If you have no possibility to setup ALUA or you do not have Active Active storage please use Fixed.
Great article.
Cheers
8Ed H 27.10.2014 8:17:37 Linux I/O Performance Tweek
I have seen this elevator=noop pay off for many installs of SLES 11 Linux installs under VMware.
I also wonder, like Ninke Westra, if there is an equivalent setting in the Windows Server OS?
9Daniel Nashed 29.10.2014 18:49:47 Linux I/O Performance Tweek
He Ed,
on the windows side we don't have a setting like this.
This is only relevant on Linux.
-- Daniel
10Bill Z 31.01.2015 6:33:41 Linux I/O Performance Tweek
For a server, many people use it at the same time, someone said cfq can make disk bandwith shared amond the differenct processes. so I think cfq for sever still make sense.
11Daniel Nashed 09.02.2015 10:20:23 Linux I/O Performance Tweek
@Bill, no IMHO CFQ does make no sense in any modern server or workstation environment.
Intelligent disks can handle the I/O much better than what the queue-reordering of a OS that tries to guess the disk layout.
Sending the request to the disk is the best solution also for physical disks.
Even if it is not a RAID the disk still has a cache that might make operations already faster.
And CFQ cannot help for SSDs, RAID disks and in any virtualized environment.
VMware recommends "noop" in some tuning papers as well.
Holding back I/O requests causes delays in most cases. The disk sub-system can better handle serving those requests in almost any case.
-- Daniel
12Vlastimil Zach 25.04.2016 13:05:28 Linux I/O Performance Tweek
In windows side we have Subsystem Device Driver Device Specific Module (SDDDSM) for each storage sytem , which includes modules
- Enhanced data availability
- Dynamic I/O load-balancing across multiple paths
- Automatic path failover protection
- Concurrent download of licensed internal code
- Path-selection policies for the host system
13Torsten Link 07.11.2024 9:04:13 Linux I/O Performance Tweek
Are these settings still relevant these days? One of my customers experiences very high Wait I/O values on all of their Domino servers running on Linux and sometimes this seems to really impact the performance and availability of the servers.
Do you have any update on this for current versions of Linux (e.g. SLES 15FP6 with Domino 14)?