Raj2796's Blog

February 18, 2010

Vmware Vsphere PVSCSI white paper and article on interrupt coalescing

Filed under: pvscsi,vmware — raj2796 @ 10:51 am

As a final post on PVSCSI i’ve stummbled upon the vmware PVSCSI Storage Performance study paper
vsp_4_pvscsi_perf.pdf

Also the vmware VROOM! page

And lastly a more detailed explenation on why PVSCI is not allways best :

PVSCSI and Low IO Workloads
Filed under: Uncategorized — Tags: pvscsi, storage, vmkernel — Scott @ 10:46 am

Scott Sauer recently asked me a tough question on Twitter. My roaming best practices talk includes the phrase “do not use PVSCSI for low-IO workloads”. When Scott saw a VMware KB echoing my recommendation, he asked the obvious question: “Why?” It took me a couple of days to get a sufficient answer.

One technique for storage driver efficiency improvements is interrupt coalescing. Coalescing can be thought of as buffering: multiple events are queued for simultaneous processing. For coalescing to improve efficiency, interrupts must stream in fast enough to create large batch requests. Otherwise the timeout window will pass with no additional interrupts arriving. This means the single interrupt is handled as normal but after a useless delay.

An intelligent storage driver will therefore coalesce at high IO but not low IO. In the years we have spent optimizing ESX’s LSI Logic virtual storage adapter, we have fine-tuned the coalescing behavior to give fantastic performance on all workloads. This is done by tracking two key storage counters:

* Outstanding IOs (OIOs): Represents the virtual machine’s demand for IO.
* IOs per second (IOPS): Represents the storage system’s supply of IO.

The robust LSI Logic driver increases coalescing as OIOs and IOPS increase. No coalescing is used with few OIOs or low throughput. This produces efficient IO at large throughput and low latency IO when throughput is small.

Currently the PVSCSI driver coalesces based on OIOs only, and not throughput. This means that when the virtual machine is requesting a lot of IO but the storage is not delivering, the PVSCSI driver is coalescing interrupts. But without the storage supplying a steady stream of IOs there are no interrupts to coalesce. The result is a slightly increased latency with little or no efficiency gain for PVSCSI in low throughput environments.

LSI Logic is so efficient at low throughput levels that there is no need for a special device driver to improve efficiency. The CPU utilization difference between LSI and PVSCSI at hundreds of IOPS is insignificant. But at massive amounts of IO–where 10-50K IOPS are streaming over the virtual SCSI bus–PVSCSI can save a large number of CPU cycles. Because of that, our first implementation of PVSCSI was built on the assumption that customers would only use the technology when they had backed their virtual machines by world-class storage.

But VMware’s marketing engine (me, really) started telling everyone about PVSCSI without the right caveat (“only for massive IO systems!”) So, everyone started using it as a general solution. This meant that in one condition–slow storage (low IOPS) with a demanding virtual machine (high OIOs)–PVSCSI has been inefficiently coalescing IOs resulting in performance slightly worse than LSI Logic.

But now VMware’s customers want PVSCSI as a general solution and not just for high IO workloads. As a result we are including advanced coalescing behavior in PVSCSI for future versions of ESX. More on that when the release vehicle is set.
PVSCSI In A Nutshell

If you plodded through the above technical explanation of interrupt coalescing and PVSCSI I applaud you. If you just want a summary of what to do, here it is:

* For existing products, only use PVSCSI against VMDKs that are backed by fast (greater than 2,000 IOPS) storage.
* If you have installed PVSCSI in low IO environments, do not worry about reconfiguring to LSI Logic. The net loss of performance is very small. And clearly these low IO virtual machines are not running your performance-critical applications.
* For future products*, PVSCSI will be as efficient as LSI Logic for all environments.

(*) Specific product versions not yet announced.

original article can be found here at Scott Drummonds site – apologies for spelling your name wrong earlier 😛

Advertisements

2 Comments »

  1. Thanks for the the incoming link. But a minor correction: I am Scott Drummonds, not Scott Sauer.

    Comment by Scott — February 18, 2010 @ 4:40 pm | Reply

    • My bad ! Sorry about the typo, you’re correctly referenced as the author now 🙂

      Comment by raj2796 — February 18, 2010 @ 4:49 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: