Sometimes the performance of physical servers, PCs and laptops slows to a crawl. No matter what you do, it takes half an eternity to open some files. It’s tied into the architecture of the Windows operating system. The OS becomes progressively slower the longer it is used and the more it is burdened with added software and large volumes of data.

In the old days, the solution was easy – defragment the hard drive. However, many production servers can’t be taken offline to defragment, and many laptops only have solid state drives (SSDs) that don’t submit to defragmentation. So is there any hope?

Condusiv has solved these dilemmas in the soon to be released version of Diskeeper®. With over 100 million licenses sold, Diskeeper has been the undisputed leader for decades when it comes to keeping Windows systems fragment free and performing well. And with Diskeeper 16 coming out soon, feedback from Beta testers is that it goes way beyond a mere incremental release with a few added frills, bells and whistles. Instead, the consensus among them is that it is a “next generation” release that goes well beyond just keeping Windows systems running like new but actually boosts performance faster than new.

How is this being achieved? The company had been perfecting two technologies within its portfolio and is now bringing them together – fragmentation prevention and DRAM caching.

On the one side, the idea is that you prevent fragmentation before data is written to a production server. This is a lifesaver for IT administrators who need to immediately boost the performance of critical applications like MS-SQL running on physical servers. Diskeeper keeps systems running optimally with its patented fragmentation prevention engine that ensures large, clean, contiguous writes from Windows, eliminating the small, tiny writes that rob performance with “death by a thousand cuts” by inflating IOPS and stealing throughput.

But that’s only the half of it. A little known fact about Condusiv is that it is also a world leader in caching. In addition to their incredible work on Diskeeper, the Condusiv development team has evolved a unique DRAM caching approach that has been implemented via OEM partners for several years. So popular has this technology become that the company has sold over 5 million caching licenses that have been tied to ultrabooks but now is being made available commercially.

Soon to be released Diskeeper 16’s DRAM caching electrifies performance:

Dave Lewis sent in a question, “There is such a quandary about disk fragmentation in the VMware environment. One says defrag and another says never. Who's right? This has been a hard subject to track and define.”

I’m going to debunk “defragging” in a minute, but if you read VMware’s own best practice guide on improving performance (found here), page 17 reveals “adding more memory” as the top recommendation while the second most important recommendation is to “defrag all guest machines.”

As much as VMware is aware that fragmentation impacts performance, the real question is how relevant is the task of defragging in today’s environment with sophisticated storage services and new mediums like flash that should never be defragged? First of all, no storage administrator would defrag an entire “live” disk volume without the tedious task of taking it offline due to the impact that change block activity has against services like replication and thin provisioning, which means the problem goes ignored on HDD-based storage systems. Second, organizations who utilize flash can do nothing about the write amplification issues from fragmentation or the resulting slow write performance from a surplus of small, fractured writes.

The beauty behind V-locity® I/O reduction software in a virtual environment is that fragmentation is never an issue because V-locity optimizes the I/O stream at the point of origin to ensure Windows executes writes in the most optimum manner possible. This means large, contiguous, sequential writes to the backend storage for every write and subsequent read. This boosts the performance of both HDD and SSD systems. As much as flash performs well with random reads, it chokes badly on random writes. A typical SSD might spec random reads at 300,000 IOPS but drop to 23,000 IOPS when it comes to writes due to erase cycles and housekeeping that goes into every write. This is why some organizations continue to use spindles for write heavy apps that are sequential in nature.

When most people think of fragmentation, they think in terms of it being a physical layer issue on a mechanical disk. However, in an enterprise environment, Windows is extracted from the physical layer. The real problem is an IOPS inflation issue where the relationship between I/O and data breaks down and there ends up being a surplus of small, tiny I/O that chews up performance no matter what storage media is used on the backend. Instead of utilizing a single I/O to process a 64K file, Windows will break that down into smaller and smaller chunks….with each chunk requiring its own I/O operation to process.

This is bad enough if one virtual server is being taxed by Windows write inefficiencies and sending down twice as many I/O requests as it should to process any given workload…now amplify that same problem happening across all the VMs on the same host and there ends up being a tsunami of unnecessary I/O overwhelming the host and underlying storage subsystem.

As much as virtualization has been great for server efficiency, the one downside is how it adds complexity to the data path. This means I/O characteristics from Windows that are much smaller, more fractured, and more random than they need to be. As a result, performance suffers “death by a thousand cuts” from all this small, tiny I/O that gets subsequently randomized at the hypervisor.

So instead of taking VMware’s recommendation to “defrag,” take our recommendation to never worry about the issue again and put an end to all the small, split I/Os that are hurting performance the most.

After having chatted with 50+ customers the last three months, I’ve heard the same five questions enough times to turn it into a blog entry, and a lot of it has to do with flash:

1. Do Condusiv products still “defrag” like in the old days of Diskeeper?

No. Although users can use Diskeeper to manually defrag if they so choose, the core engines in Diskeeper and V-locity have nothing to do with defragmentation or physical disk management. The patented IntelliWrite® engine inside Diskeeper and V-locity adds a layer of intelligence into the Windows operating system enabling it improve the sequential nature of I/O traffic with large contiguous writes and subsequent reads, which improves performance benefit to both SSDs and HDDs. Since I/O is being streamlined at the point of origin, fragmentation is proactively eliminated from ever becoming an issue in the first place. Although SSDs should never be “defragged,” fragmentation prevention has enormous benefits. This means processing a single I/O to read or write a 64KB file instead of needing several I/O. This alleviates IOPS inflation of workloads to SSDs and cuts down on the number of erase cycles required to write any given file, improving write performance and extending flash reliability.

2. Why is it more important to solve Windows write inefficiencies in virtual environments regardless of flash or spindles on the backend?

Windows write inefficiencies are a problem in physical environments but an even bigger problem in virtual environments due to the fact that multiple instances of the OS are sitting on the same host, creating a bottleneck or choke point that all I/O must funnel through. It’s bad enough if one virtual server is being taxed by Windows write inefficiencies and sending down twice as many I/O requests as it should to process any given workload…now amplify that same problem happening across all the VMs on the same host and there ends up being a tsunami of unnecessary I/O overwhelming the host and underlying storage subsystem. The performance penalty of all of this unnecessary I/O ends up getting further exacerbated by the “I/O Blender” that mixes and randomizes the I/O streams from all the VMs at the point of the hypervisor before sending out to storage a very random pattern, the exact type of pattern that chokes flash performance the most - random writes. V-locity’s IntelliWrite® engine writes files in a contiguous manner which significantly reduces the amount of I/O required to write/read any given file. In addition, IntelliMemory® caches reads from available DRAM. With both engines reducing I/O to storage, that means the usual requirement from storage to process 1GB via 80K I/O drops to 60K I/O at a minimum, but often down to 50K I/O or 40K I/O. This is why the typical V-locity customer sees anywhere from 50-100% more throughput regardless of flash or spindles on the backend because all the optimization is occurring where I/O originates.

VMware’s own “vSphere Monitoring and Performance Guide” calls for “defragmentation of the file system on all guests” as its top performance best practice tip behind adding more memory. When it comes to V-locity, nothing ever has to be “defragged” since fragmentation is proactively eliminated from ever becoming a problem in the first place.

3. How Does V-locity help with flash storage?

One of the most common misnomers is that V-locity is the perfect complement to spindles, but not for flash. That misnomer couldn’t be further from the truth. The fact is, most V-locity customers run V-locity on top of a hybrid (flash & spindles) array or all-flash array. And this is because without V-locity, the underlying storage subsystem has to process at least 35% more I/O than necessary to process any given workload.

As much as virtualization has been great for server efficiency, the one downside is the complexity introduced to the data path, resulting in I/O characteristics that are much smaller, more fractured, and more random than it needs to be. This means flash storage systems are processing workloads 30-50% slower than they should because performance is suffering death-by-a-thousand cuts from all this small, tiny, random I/O that inflates IOPS and chews up throughput. V-locity streamlines I/O to be much more efficient, so twice as much data can be carried with each I/O operation. This significantly improves flash write performance and extends flash reliability with reduced erase cycles. In addition, V-locity establishes a tier-0 caching strategy using idle, available DRAM to cache reads. As little as 3GB of available memory drives an average of 40% reduction in response time (see source). By optimizing writes and reads, that means V-locity drives down the amount of I/O required to process any given workload. Instead of needing 80K I/O to process a GB of data, users typically only need 50K I/O or sometimes even less.

For more on how V-locity complements hybrid storage or all-flash storage, listen to the following OnDemand Webinar I did with a flash storage vendor (Nimble) and a mutual customer who uses hybrid storage + V-locity for a best-of-breed approach for I/O performance.

No. V-locity dynamically uses what Windows sees as available and throttles back if an application requires more memory, ensuring there is never an issue of resource contention or memory starvation. V-locity even keeps a buffer so there is never a latency issue in serving back memory. ESG Labs examined the last 3,500 VMs that tested V-locity and noted a 40% average reduction in response time (see source). This technology has been battle-tested over 5 years across millions of licenses with some of largest OEMs in the industry.

5. What is the difference between V-locity and Diskeeper?

Diskeeper is for physical servers while V-locity is for virtual servers. Diskeeper is priced per OS instance while V-locity is now priced per host, meaning V-locity can be installed on any number of virtual servers on that host. Diskeeper Professional is for physical clients. The main feature difference is whereas Diskeeper keeps physical servers or clients running like new, V-locity accelerates applications by 50-300%. While both Diskeeper and V-locity solve Windows write inefficiencies at the point of origin where I/O is created, V-locity goes a step beyond by caching reads via idle, available DRAM for 50-300% faster application performance. Diskeeper customers who have virtualized can opt to convert their Diskeeper licenses to V-locity licenses to drive value to their virtualized infrastructure.

Stay tuned on the next major release of Diskeeper coming soon that may inherit similar functionality from V-locity.

As much as we commonly mention the expected performance gains from V-locity® I/O reduction software is 50-300% faster application performance, that 50-300% can represent quite a range - a correlation relative to how badly systems are taxed by I/O inefficiencies in virtual environments that are subsequently streamlined by V-locity. While some workloads experience 300% throughput gains, other workloads in the same environment see 50% gains.

While there is already plenty of V-locity performance validation represented in 15 published case studies that all reveal a doubling in VM performance, we wanted to get an idea of what V-locity delivers on average across a large scale. So we decided to take off our “rose-colored” glasses of what we think our software does and handed over the last 3,450 VMs that tested V-locity to ESG Labs, who examined the raw data from over 100 sites and PUBLISHED THE FINDINGS IN THIS REPORT.

Here are the key findings:

·Reduced read I/O to storage. ESG Lab calculated 55% of systems saw a reduction of 50% in the number of read I/Os that get serviced by the underlying storage

·Reduced write I/O to storage. As a result of I/O density increases, ESG Lab witnessed a 33% reduction in write I/Os across 27% of the systems. In addition, 14% of systems experienced a 50% or greater reduction in write I/O from VM (virtual machine) to storage.

·Increased throughput. ESG Lab witnessed throughput performance improvements of 50% or more for 43% of systems, while 29% of systems experienced a 100% increase in throughput, and as much as 300% increased levels of throughput for 8% of systems.

·Decreased I/O response time. ESG Lab calculated that systems with 3GB of available DRAM achieved a 40% reduction in response time across all I/O operations.

·Increased IOPS. ESG Lab found that 25% of systems saw IOPS increase by 50% or more.

The key take-away from this analysis is demonstrating the sizeable performance loss virtualized organizations suffer in regard to I/O inefficiencies that can be easily solved by V-locity streamlining I/O at the guest level on Windows VMs. Whereas most organizations typically respond to I/O performance issues by taking the brute-force approach of throwing more expensive hardware at the problem, V-locity demonstrates the efficiencies organizations achieve at a fraction of the cost of new hardware by simply solving the root-cause problem first.

Over the last year, 2,654 IT Professionals took our industry-first I/O Performance Survey, which makes it the largest I/O performance survey of its kind. The key findings from the survey reveal an I/O performance struggle for virtualized organizations as 77% of all respondents indicated I/O performance issues after virtualizing. The full 17 page report is available for download at http://learn.condusiv.com/2015survey.html.

Key findings in the survey include:

- More than 1/3rd of respondents (36%) are currently experiencing staff or customer complaints regarding sluggish applications running on MS SQL or Oracle

- Nearly 1/3rd of respondents (28%) are so limited by I/O bottlenecks that they have reached an "I/O ceiling" and are unable to scale their virtualized infrastructure

- In the coming year, to remediate I/O bottlenecks, 25% plan to purchase a new SAN, 8% plan to purchase a hyper-converged appliance, 10% will purchase SAS spindles, 16% will purchases server-side SSDs, 8% will purchase PCIe flash cards, 27% will purchase storage-side SSDs, 35% will purchase nothing in the coming year

- Over 1,000 applications were named when asked to identify the top two most challenging applications to support from a systems performance standpoint. Everything in the top 10 was an application running on top of a database

- 71% agree that improving the performance of one or two applications via inexpensive I/O reduction software to avoid a forklift upgrade is either important or urgent for their environment

As much as virtualization has provided cost-savings and improved efficiency at the server-level, those cost savings are typically traded-off for backend storage infrastructure upgrades to handle the new IOPS requirements from virtualized workloads. This is due to I/O characteristics that are much smaller, more fractured, and more random than they need to be. The added complexity that virtualization introduces to the data path via the “I/O blender” effect that randomizes I/O from disparate VMs, and the amplification of Windows write inefficiencies at the logical disk layer erodes the relationship between I/O and data, generating a flood of small, fractured I/O. This compounding effect between the I/O blender and Windows write inefficiencies creates “death by a thousand cuts” regarding system performance, creating the perfect trifecta for poor performance – small, fractured, random I/O.

Since native virtualization out-of-the box does nothing to solve this problem, organizations are left with little choice but accept the loss of throughput from these inefficiencies and overbuy and overprovision for performance from an IOPS standpoint since they are twice as IOPS dependent than they actually need to be…except for Condusiv customers who are using V-locity® I/O reduction software to see 50-300% faster application performance on the hardware they already have by solving this root cause problem at the VM OS-layer.

Note - Respondents from companies with employee sizes under 100 employees were excluded from the results, so results would not be skewed by the low end of the SMB market.