No matter how much you tune and optimize your application workload, there’s always something else that can take the performance a little further, besides of course replacing the physical hardware with something a little more modern and faster.

Whatever direction you go, application performance should never be a tradeoff between data resiliency and data integrity. In my benchmarks, Erasure Coding supporting two simultaneous drive failures, Checksumming, Compression, and Deduplication have always been On.

Because the previous benchmarks were designed to demonstrate Datrium’s enterprise-grade Tier 1 storage solution, I always kept VMware VM memory to 1GB. I followed an existing EDB PostgresTM Advanced Server Performance on EMC XtremIO paper to give us comparable results. That’s the correct way to run a storage benchmark if what you are trying to demonstrate is raw storage performance. Adding RAM to the VM, you are promoting data caching, therefore alleviating the storage data path.

Now, without changing the physical hardware, I have run the same PostgreSQL benchmark with 100GB RAM assigned to the VM to demonstrate that RAM indeed play a significant role in benchmarks, mainly hiding the raw storage performance.

However, for this run, I have also optimized PostgreSQL configuration parameters for OLTP workloads with a large quantity of RAM, following PgTune recommendations. Furthermore, I also split the PostgreSQL database across three vDisks and three PVSCSI controller devices with LVM for better queue depth handling, and finally also changed the filesystem from EXT3 to XFS.

While I indeed made changes to PostgreSQL and virtual components, the physical hardware and the software-defined-storage layer remained untouched.

When compared with my last run on Datrium DVX 4.x: – Transactions per Second (TPS) increased 15.78% – Average Read latency remained steady at .3 ms – Average Application Write Latency decreased by 20.41%

Here is the performance improvement evolution across all my PgBench benchmarks.

Here is the screenshot of the latest pgbench results:

Conclusion

This exercise concludes demonstrating that there are always ways to optimize workloads further. In my particular case, I could yet examine IO block sizes ensuring that disks and LVM (PV and PG) are configured for the correct application IO block size, but instead, I took a ‘default’ config approach. I could also manually change the number of concurrent threads in the Datrium DVX software to use more CPU than the default 20%, or even increase the number of vCPU assigned to the VM.

Whatever direction you go, it is important to ensure that there isn’t a trade-off between performance, data resiliency, and data integrity. Unless you are running a home-lab, your organization’s data is just too important to optimize for performance without considering the consequences.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Software-Defined-Storage architectures are pervasive in the datacenter, and they fully leverage the improvements in server-side microprocessor architectures. As with most software-defined-storage solutions, Datrium DVX gains performance not only with faster servers and processors, but also with every new software release.

I recently posted an extensive benchmark of PostgreSQL running on Datrium DVX platform (here) with only 1GiB RAM to enable the database to stress the storage layer instead of caching data in memory. If you want to understand how DAVG and Application latency differs from SAN Controller latency read my previous article here.

With the Datrium DVX 4.0 about to hit GA I decided to run the exact same PGBench workload with a pre-GA release of the 4.0 software and see if, beyond all the new features highlighted here, we would also see a performance increase for this workload.

I made sure that the infrastructure, virtual machine and benchmark options were all exactly the same. To be more precise, I have not modified this lab environment since I ran the previous benchmark. Therefore the only change is the software upgrade from 3.x to 4.x.

The verdict:– Transactions per Second (TPS) increased by 15.52%– Average Read latency remained the same at 0.3ms– Average Application Write Latency decreased by 14.04%

Here is the screenshot of pgbench, and here you can find the screenshots for the previous run.

This performance increase is workload dependent. However, Datrium DVX 4.0 has been further optimized for large datasets with high throughput and high I/O count. Some of the application datasets we have been testing and pushing in our Solutions lab are between 15 and 20TiB with extremely demanding I/O patterns. More on that later!

That’s just another important value of software-defined-architectures — as microprocessors gain in performance and vendors keep improving their software we will just continue accelerating workloads, providing lower latencies and providing users with better experiences.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

We just had the opportunity to present the Datrium DVX solution at the latest Storage TechFieldDay. It was a memorable day, and I suggest that anyone looking at HyperConvergence to pay attention to OpenConvergence as a fundamental shift in the way convergence is being done, providing more flexibility, more performance, more scalability and being integrated seamlessly with public clouds.

The series of four videos go from high-level product architecture to detailed information about LFS (Log Structured FileSystem) and is full of demos. Enjoy!

Here is what one of the attendees, Glen Dekhayser (@gdekhayser), Field CTO @Red8IT, had to say about Datrium.

The Maslow Hierarchy of Needs describes a series of universal needs as motivators for human behaviors. From a patient experience perspective that would include things like, easy access to healthcare services, and a focus on staying patient-centered.

However, nowadays healthcare delivery-of-care is hugely dependant on digital systems such as Electronic Medical Records and Picture Archiving Systems. Keeping these life-critical systems running 24/7 is a monumental task for Healthcare IT. Moreover, data security management to Patient Health Information (PHI) is likely the most critical competency that Healthcare IT needs to assimilate nowadays.

From a systems and infrastructure standpoint the Maslow Hierarchy for Healthcare IT equates to the following:

Step 0 – Data Security

Protecting Protected Health Information (PHI) when data is at-rest has become a top priority for healthcare organizations. However, despite growing awareness, encryption of data in-flight is consistently overlooked. In most healthcare IT systems we only see data-at-rest encryption, and the only attack vector that addresses is ‘physical’ access to drives, which also happens to be the lowest stated aspect of HIPAA.

Nowadays data-in-flight is most vulnerable to perpetrators that can tap into the network connections given the widespread use of IP network protocols; security measures for data in storage come to nothing if in-flight data is not safeguarded as well.

Datrium’s Blanket encryption is an industry-first, providing end-to-end encryption with always-on full data services, such as compression, deduplication, and erasure coding.

Moreover, Datrium is the only converged platform with FIPS 140-2 validation from the Computer Security Resource Center (CSRC) of the National Institute of Standards and Technology (NIST).

Step 1 – System Resiliency

System Resiliency is the ability to maintain systems running even in the event of a fault. Infrastructure resiliency comes in two flavors: data integrity and data durability.

Datrium takes data integrity very seriously and has worked hard to build a system that delivers the highest levels of data integrity and durability.

It starts with Double Fault Tolerance Erasure Coding designed to protect against two simultaneous disk failures but also covers in-line integrity verification and self-healing.

A traditional array can only protect the integrity of data it receives (dutifully safeguard data that may have been corrupted by intervening network or host-side problems). Datrium DVX protects your data ‘before’ storing it locally or sending it over the network.

Datrium DVX offers primary storage performance (Step 3), but the system is similarly designed to provide integrated copy-data management, including application-centric snapshot and replication features which allow and entire application suite to be restarted “instantly” with a single click.

When snapshots are stored within the primary storage system, no restore from a secondary platform is required before an application can be restarted from a point in time before the corruption or data loss. And in the event of a site failure, the replica DVX can bring applications back online without waiting for restore operations to complete. A pair of DVX systems can handle data protection, snapshots and disaster recovery for multiple locations allowing for fast and efficient recovery.

For healthcare providers, the entire data lifecycle must be protected. DVX natively encrypts all data in-transit to remote sites or to the public cloud using SSL/TLS eliminating the need for costly VPN services.

Step 3 – Performance

For life-critical systems such as Electronic Health and Electronic Medical Records (EHR/EMRs), application performance and response times are essential to keep the delivery of care smooth, as it should be.

In partnership with Dell and IOmark.org, Datrium DVX has been officially validated as the fastest and most scalable converged platform in the world. Datrium delivers 5X more performance than the previous All-Flash record and 10x more performance than the previous hyperconverged record. Datrium DVX also has the lowest latency across all audited platforms!

As an example, the National Physician Services provides ambulatory service with Allscripts TouchWorks, where 1.1 million appointments are scheduled and managed every year – amounting to approximately 110,000 transactions per day. The system moreover services six AHS entities and accounts for over 1,000 physicians, while the Datrium infrastructure powers a half-billion dollars of financial transactions each year.

Step 4 – Applications

Finally, the clinical applications themselves. Recognizing that healthcare applications are effectively responsible for fast-tracking or potentially delaying the delivery of care, they are the most valuable piece of the puzzle.

Most ISVs make an enormous effort to ensure application are always-on, but that also makes them strict and cautious about infrastructure and systems supporting their databases and other parts of the application stack.

It is vital that Health IT infrastructures supporting clinical applications can efficiently deliver on application requirements.

This article was first published by Andre Leibovici (@andreleibovici) at datrium.com

Cutting to the chase, I want to share some benchmark numbers I have been able to run in our Solutions lab and demonstrate how Datrium DVX compares to other published figures. While some may claim that benchmarks can be gamed (and they can), I tried to stick to a simple formula that can be easily repeated by anyone on any platform for comparable results. Furthermore, the more hardware you throw at the problem, the more performance you will get, but generally if you fix as many variables as possible, the results should be within a reasonable margin.

This blog post is about PostgreSQL performance on Datrium, but I do make direct comparisons with results published by other vendors. If you don’t like reading competitive pieces of evidence, stop here. You have been warned!

If you don’t know how Datrium architecture works, I recommend watching this video from Clint Wyckoff. In a Datrium system, data nodes are used for storing durable data, while a copy of the data is stored on the host flash. All read IO is local to the host with intrinsic data locality, while write IO is stored on the host flash and also on the data node(s) using Erasure Coding (N+2 parity). Furthermore, all IO operations are compressed and deduplicated, by default – no check boxes.

** PostgreSQL utilizes all allocated memory, and uses shared_buffers to cache as much data possible. Since we’re aiming to demonstrate storage performance I limited VM memory to 1 GB to force the PostgreSQL to utilize the storage device as much as possible.

** These PostgreSQL parameters can be changed to improve performance however, it is possible to lose data whenever a sudden shutdown occurs. Some vendors that perform data integrity checks recommend to turn these settings off for better performance. I have chosen to NOT turn these off during this benchmark.

pgbench 9.2 was used to create the database and run the benchmark. pgbench results are shown as TPS or transactions per second. In the XtremIO paper, they executed a read-only and an OLTP-B mixed workload (read/write). I decided to skip the read-only benchmark because it’s useless for production environments. I used the same commands used in the XtremIO white paper to produce the benchmark. The commands are as follows:

Run the pgbench database initialization. The following command loads a pgbench database using a scale factor of 7500, vacuums the resulting data, and then indexes it. It will create a database of approximately 113 GB in size:# pgbench -i -s 7500 –index-tablespace=foobar –tablespace=foobar foobar

How does Datrium DVX compare?

I have not seen a single vendor benchmark that executed pgbench demonstrating the real end-to-end application latency. All papers that I have found report array controller latencies – and there’s a big reason for that! There is enormous latency difference depending on where latency is measured. Application latency, measured by the application, is what matters at the end of the day, so I’m not hiding it.

Latencies shown by ExtremIO are not real application latencies, but rather the latency measured at the array controller. Moreover, I found a Gotcha in their performance numbers.

TPS

AVG Read Latency

AVG Write Latency

ExtremIO

7,642

~0.2

~0.4 (not real)

Datrium

10,673

~0.2

~4.1

Granted, I chose to compare to XtremIO because it’s probably the lowest latency storage solution for raw performance when discussing single host deployment. Also, the white-paper does not specify the data protection RAID-level used. This makes me wonder if they were actually using RAID 6 (Disk Striping with Double Parity). Finally, as with any SAN the more hosts and VMs you add, less performance you get for each application.

The Gotcha!

The XtremIO paper states the following, “We ran the following pgbench command to generate a mixed workload with a 2:1 read/write ratio” (page 16). However, the results table (page 19) demonstrate that Read IOPS is 4.7X higher than Write IOPS – it’s 80R:20W!

Where is the 2:1?

I want to believe that there is a genuine mistake in the report and that the authors were not trying to game the results. Therefore, It’s just fair to say that the specified latency numbers are not real or valid.

Table from XtremIO paper

When I ran the same pgbench command on Datrium the results were consistently 70R:30W. We can clearly see that XtremIO handled ~8,000 Write IOPS at peak, while Datrium absorbed 16,523 Write IOPS at peak – more than double the amount. (see below)

This other paper for the VMAX 250F All Flash with 32 SSDs achieved 11,757 TPS in a RAID 5 (3+1) configuration and a VM with 96 GB Memory. The paper does not clarify if compression is enabled during tests, but no serious enterprise SAN array promotes RAID 5 for data protection nowadays. Lower data resiliency and memory caching plays out in a performance benchmarks. Moreover, latencies are also measured at the array controller.

Datrium always use N+2 parity erasure coding to mitigate against any two simultaneous drive or block failures while still providing compression, and deduplication.

How about HyperConverged?

I would love to compare Datrium to Converged or HyperConverged solutions, but vendors seem hesitant to report their real performance numbers, and when they do, they do not provide enough information for a decent comparison.

I did, however, find Nutanix numbers (here) provided by user jcytam that I used as a general guidance. I replicated the pgbench benchmark as much as I could, using the same VM configuration, 8 vCPU and 24 GB RAM. I also used the same pgbench command as described in the post, and the same pgbench major release. Unfortunately, the Replication Factor (akin to RAID) was not specified.

In Nutanix warm Read IO comes from SSDs/RAM and Write IOs go to SSDs. That said, this is not an official Nutanix benchmark and should not be seen as official numbers – many factors can influence a benchmark.

Further down on this blog I measure Datrium DVX with Samsung PMA SSDs.

I could not find pgbench benchmarks for VMware VSAN, Hyperflex or Simplivity.

Benchmark Tuning

The XtremIO benchmark above was a comparison without any tuning, but the XtremIO paper does not indicate that there were no PostgreSQL, VMware or Linux tuning. So, I decided to do a simple tuning, while keeping all the declared configuration the same. That means, no change to VM memory or CPU.

Note that I have also run pgbench with lots of memory, CPU cores and higher shared_buffers, and I got to multiple hundreds of thousands TPS – however, it means nothing because it doesn’t demonstrate the storage performance capability.

I also implemented the changes recommended by PgTune according to my environment.

Let’s look at the new results.

TPS

AVG Read Latency

AVG Write Latency

ExtremIO

7,642

~0.2

~0.4 (not real)

Datrium (default)

10,673

~0.2

~4.1

Datrium (tuned)

14,504

~0.3

~5.7

Just reinforcing the idea that latencies shown by ExtremIO are not real application latencies, but rather the latency measured at the array controller. In the image below, on the Datrium benchmark, where disk latency at the vSphere VM level never goes above 4 ms (lower than the ~5.7 at the application level), and for the most part is below 3 ms.

If I was to measure latency at the data node, it would have been much lower, but would be meaningless. I suggest that vendors always provide real application latencies – it’s just fair to customers.

Scalability

Based on the workload generated by pgbench using a single VM and single Host, Datrium DVX tells me that I would be able to add another 29 servers with the equivalent workload and results, before I need to add another data node to the pool, totalizing 435,120 TPS. (see image below)

10 data nodes can be part of a data pool, in which case we would have approximately 4.3 Million TPS.

When it comes to scaling performance, no vendor can beat Datrium – look at this 3rd party IOmark audited benchmark with 128 servers, or the review by Storage Review.

SATA SSD vs NVMe

As NVMe approaches price parity with SATA SSDs, we will start seeing greater adoption of the technology, and Datrium is well positioned to support NVMe – customers have been utilizing NVMe on hosts for over a year.

Since I ran the benchmark on a host with two NVMe SSDs, I decided to run the same workload on another host with two SATA SSD to understand the difference, because I thought readers would ask about it.

This host is a Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz and the SSDs are cheap ($0.5/GB) Samsung PMA. Looking at the results, it’s clear that the lower-grade SSD doesn’t provide the same performance as the NVMe, and we also notice a bump in write latency.

That said, the performance numbers are outstanding for a pgbench running with 1GB RAM on cheap commodity flash with N+2 parity erasure coding, while still providing compression and deduplication. Repeat that 30 times until you get to the data node boundary, and then add more data nodes, up to ten.

TPS

AVG Read Latency

AVG Write Latency

Datrium NVMe (tuned)

14,504

~0.3

~5.7

Datrium SATA SSD (tuned)

10,319

~0.3

~8.6

I can’t stress enough that all the performance numbers presented in the blog post have been generated on a single Supermicro server with Two flash devices and a Datrium data node (F12X2) with 12 SSDs. The list price for a data node and a host license is sub $150K, and it scales to 435,120 TPS based on this same workload.

Conclusion

We have to remember that if we throw memory, host Flash, CPU, and make changes to shared_buffers on PostgreSQL, it is possible to get up to hundreds of thousands of TPS from a single VM with the same pgbench workload. I could have added up to 16 NVMe devices on the host to distribute the load and get more parallelism, but it is too easy ? and costly ? to solve performance problems throwing hardware at it.

I didn’t run this benchmark to prove that PostgreSQL does an outstanding job caching and managing data in memory, or that Intel newer processors are faster, but instead to show Datrium raw storage performance.

I also know that comparing benchmarks can lead to endless debates, in which case I invite vendors to run this very benchmark and share their numbers. I can provide the source VM, pre-configured, and you just run the benchmark. I also invite vendors to demonstrate their numbers with Erasure Coding (or equivalent data protection) with Deduplication and Compression ENABLED, like Datrium.

To me, the exciting part is to see how well storage systems handle benchmarks when all parts are moderately equal. Datrium is on par with any enterprise-grade Tier 1 storage solution, providing industrial-strength data resiliency, data reduction and scalability. Datrium scalability, up to 18 Million IOPS, and 256 GB/s Random Write throughput is unmatched in the industry.

My Rant – Over the last few days, I’ve spent many hours over storage benchmarks from various vendors, but honestly, what’s up with benchmarks that do not use production-grade conditions to demonstrate performance numbers? Some papers appear to purposely hide details to avoid vendors replicating their benchmark, while others game their numbers to make them look good. As an industry, we need to be better than that!

As a next step, I am planning to run the same benchmark with Red Hat Enterprise Virtualization. I also will run a scale-out pgbench benchmark with VMs on multiple servers – adding up to 2,000 snapshots per VM. pgbench-tools is also an option.

If you would like to see a specific benchmark on Datrium let me know and we will do everything possible to run it – that is one of my team’s charter at Datrium – and we shall not hide or lie performance numbers.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

It makes HCI expensive and not cost competitive!

Now that we got that out of the way let me explain. It is not that HCI solutions cannot provide higher levels of availability, most of them do, but vendors frequently steer customers to a resiliency factor that makes them look cost-effective – makes them look good from a financial viewpoint.

Data durability in the presence of failures is table stakes for any organization, and failure tolerance is achieved by data redundancy in some fashion. One way to achieve redundancy is with mirroring. You can mirror 2-way or 3-way.

At any scale and seriousness, you have to do 3-way replication, or you are rolling the dice on data loss. The reason is not so much that you will lose two drives at the same time. What is much more common is the following scenario: 1 drive fails, and the system starts re-mirroring data from the remaining drive. All it takes is a sector read error (also known as Latent Sector Errors ), and you have now lost data.

Over the past 2 decades, most of the industry has moved beyond 1FT (i.e., 1-drive failure tolerance). Examples of 1FT are RAID-5 in SAN arrays, RF2 from Nutanix and 1FT from VMware VSAN. No serious enterprise SAN array promotes 1FT. However, most HCI vendors still, by default, recommend 1FT. These HCI vendors have regressed in providing resiliency in order to make their products look financially viable.

Can’t argue with math, if you do not have 2-drive failure tolerance, the chance of data loss is an astonishing 0.5% per year. Gartner also recommends 3-way replication.

“Use traditional three-way mirroring with RF3 when best performance, best availability and best reprotection time of data are all equally important.” [1]

THE REAL REASON, THE COST!

Implementing HCI using 3-way replication will incur over 300% overhead due to the capacity required to protect and re-protect data. Furthermore, there’s a higher minimum number of hosts required for cluster protection.

Here is how the capacity math works out: Take 5 hosts (7-2 for N+2) that need to provide ~118TBs. Now, to determine the capacity required per host with 3-way mirroring, use 118TBs [useable required across the cluster] / 5 [number of hosts online] * 3 [three-way mirroring overhead] = 71TBs per host.

At this point, one may be making the correlation between the raw capacity and the useable capacity required to ensure that data can always be fully re-protected.

Typically 3x is banded about when talking about FFT2 or RF3, however that’s with 100% utilization, and no ability to re-protect data, in reality, the system requires 497TBs (71TBs * 7 hosts) of total capacity to provide 100TBs of usable capacity, this is an overhead of 4.97x.

On the host count issue, there is a minimum number of hosts required to provide the additional availability and re-protection with a 3-way mirror – and it’s higher than with 2-way mirror. HCI vendors have different architectures, but the math works similarly.

The additional cost of servers and storage capacity is thought as a deal breaker for many organizations considering HCI.

3-WAY MIRRORING

Storing three copies of data with 2FT (examples include RAID 6 for arrays, RF3—Nutanix, FTT2—VMware VSAN) or using erasure-coding techniques that tolerate two failures improves reliability significantly.

To lose data in a system that tolerates two drive failures, there needs to be either three simultaneous drive failures, two drive failures and a LSE, or one drive failure and LSEs in both redundant copies of the same chunk. All of these are very improbable events, and the chance of data loss is reduced by many orders of magnitude.

Datrium has done exhaustive studies with its data, but also with public studies and data provided by Google, Facebook, Nutanix, Netapp and Backblaze. These studies have been done by engineers with PhDs on the topic of disk failures; this is a serious study. Furthermore, recently they included the math for Flash drives, and the results do not look any better – SSDs encounter Latent Sector Errors (also called uncorrectable errors) at an alarming rate

I would not be writing all that if Datrium did not implement, by default, 2FT. Datrium uses Double Fault Tolerance Erasure Coding, a Log-Structured Filesystem (LFS), and In-line Integrity Verification and Healing. I also wrote an article discussing our data integrity methodology, but the most important is to know that system protects customer data with higher levels of resiliency at a highly competitive price point, even against HCI RF2 implementations.

CONCLUSION

This article is not meant to say that HCI is not good, nor this article is picking on any specific vendor, but rather pointing at any storage vendor that wants to offer lower levels of data resiliency in exchange for a better solution cost.

HCI provide numerous benefits to enterprises, but with power comes responsibility and IT teams are responsible for data in their organizations.

I believe that anyone reading this article will agree that 3-way mirroring is better than 2-way mirroring — so as an industry we all should be advocating for better resiliency, even if the solution will cost a little more. At this point we are probably entering the Risk Management realm. “you pay your money, you take your chances”.

My recommendation to you:

If you are considering Datrium, Nutanix, VMware VSAN, or any Tier-1 storage solution, always ensure that you are comparing apples-to-apples and that your data is going to be protected with best of breed resiliency and data integrity.

Hey, I’m just one opinion here. Do you agree? Disagree? Let me know what you think.

Most of my life I have been closer to applications and primary storage, and later on when managing technology teams I always counted on well-trained data protection professionals to handle such critical role in organizations. One thing was clear to me – backup is equivalent to insurance. It is a tool that is there for when sh*t hit the fan. {pardon the expression}

Other than Data Domain adding deduplication and more recently backup vendors using HCI approach to store backups in scale-out clusters, not much has changed since my customer days. Some vendors are now starting to tinker with Cloud as tape replacement, but for the most part, the cost is still prohibitive due to the high capacity requirements for incremental, daily, weekly and monthly full backups.

I may not be a data protection authority, but I do understand RPO, RTO and the business impacts.

RPO (Recovery Point Objective) refers to the amount of data at risk. It is determined by the amount of time between data protection events and reflects the amount of data that potentially could be lost during disaster recovery. The metric is an indication of the amount of data at risk of being lost.

RTO (Recovery Time Objective) is related to downtime. The metric refers to the amount of time it takes to recover from a data loss event and how long it takes to return to service. RTO refers then to the amount of time the system’s data is unavailable or inaccessible preventing normal service.

Zero RTO!

The team at Datrium is knee deep into data protection technologies, including having the Data Domain founding team as part of the founding team. They made data protection an integral part of Datrium DVX, including primary storage (BTW fastest and most scalable in history) — and the most exciting — Zero RTO!

RTO ZERO means that when an application or VM restore is needed, the restore is instantaneous — a single click and ZERO waiting to restore the application to a consistent state. Please note that I am not talking about reverting a snapshot that is co-located on the same storage sub-system that the application is running on (like VMware vSphere and HCI snapshots) — this is a restore from data at-rest on different media than your primary storage. Let’s have a look at how this works!

There are two important things to know about how Datrium DVX works:

The DVX Hyperdriver Software on each compute node (hypervisor) manages all active data for the VMs within that host. It provides scalable IO performance, availability and data management capabilities.

The DVX Data pool provides persistence and resiliency for a durable copy of all data in the cluster. In normal operation it is write-only, but it also provides streaming read performance for flash uploads as well as cluster coordination for simple management.

(1)

(2)

The DVX maintains two distinct “namespaces” (filesystem metadata about VMs and datastore-files).

The “datastore” contains the current, live version of all VMs and files. This is what the hypervisor “sees.” The hypervisor management tool (vCenter for VMware, or RHEV Manager for RedHat) allows you to browse the contents of the live datastore at any time. The contents of these files always contain the most recently written data.

The “snapstore” on the other hand contains previous point-in-time snapshots of the live datastore as it existed previously. Every time a protection group causes a snapshot to be taken; entries are made in
the snapstore with the contents of every file in the live datastore at that instant in time.

Datrium uses a “redirect on write” (ROW) technique to store incoming data. New data is always written to new locations (vs. copy-onwrite techniques that can introduce delays as changes are copied). Because only changes are stored in a snapshot, and because DVX only stores compressed and deduplicated data; snapshots consume relatively little capacity.

If, for example, a given VM’s protection policy causes snapshots to be taken every hour and retained for two days, then snapstore would contain up to 48 different versions of this VM’s files. These policies can be overlapped delivering RPO of 10 minutes and retaining up to 2,000 snapshots per VM.

Datrium DVX provides two uses for the point in time copies of VMs and datastore – restoring/reverting VMs/files in the live datastore and creating net new VMs/files (cloning).

Restoring/reverting replaces the state of live VMs or datastore-files with the state from the point-in-time when the snapshot was taken. It instantly “rolls back” VMs or datastore-files to a previous point in time.

Cloning is the process of taking a point-in-time snapshot and creating a net new VM is immediately populated with the state contained in the snapshot. It is an instantaneous way to create one or more copies of existing VMs and applications.

In contrast, conventional backup tools that understand VMs (many don’t know VM-level objects) would need to restore/copy the data from an external repository (such as a NAS) via a proxy server. However, because the primary storage and the backup tool operate disjointly and don’t possess global deduplication awareness, the entire VM dataset has to be restored, causing the system to transfer large sums of data, taking hours and sometimes days to complete restores, depending on storage and link performances.

It is crucial to understand that the Datrium DVX global deduplication enables the system to simply have to up-stream differential data from the data pool to compute nodes when a restore is triggered – and that happens both synchronously and asynchronously, allowing applications restart instantly.

The ability to instantly “roll back” or clone a VM to from the data pool delivers Zero RTO, and that is what matters at the end of the day when trying to restore systems. In many cases, even a thirty minutes delay window to restore systems can cause organizations millions of dollars in damages.

Please note that there are situations where you may need to work with both Datrium and the leading backup vendors, specially in heterogeneous on-premise storage and sub-5 minute remote-site RPO.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

]]>9626http://myvirtualcloud.net/zero-rto-application-restores-myth-or-reality/We built the Best Storage Platform in the World. Now, 2018 comes down to Business Execution.http://feedproxy.google.com/~r/myvirtualcloudnet/~3/CUmalCoTjJM/
Thu, 18 Jan 2018 02:17:43 +0000http://myvirtualcloud.net/?p=9623

In the book Built to Last, Jim Collins and Jerry Porras beautifully describe that ‘building a visionary company requires one percent vision and 99 percent alignment‘.

Business fallacies more often than not disrupt even the most successful companies with the most advanced and visionary products. Business execution is possibly more important than actually having a good product, or being on the right market at the right time. Datrium is at one of these crucial, but glorious moments – and I’m thrilled to be part of it.

Our engineering team, under the leadership of Hugo Patterson and his VMware, Netapp & Data Domain teams, spent almost four years building what I can unequivocally call the Best Storage Platform for VMs in the World.

We built the fastest and most scalable converged primary and secondary storage platform in the world that at the same time enables organizations to grow their data centers at their own pace ‘pay-as-you-go’ while bringing their existing datacenter investments as part of the solution, and providing the simplest way to make use of data and cloud services. All with a better TCO/ROI than HCI.

That’s a lot of mumbo-jumbo to explain in a paragraph, so look at this:

Ok, so we have the hottest converged storage platform on the market. Could I be delusional? I don’t think so…

I was just talking to Hugo about our customers and platform benefits, and he goes… there’s nobody at this point in our rear-view mirror that anytime soon can do what Datrium DVX delivers today from a scalability, features or performance perspectives. Yes, I agree!

So What’s next?

From product and vision perspectives, I have written about the new Cloud DVX, Backup-as-a-Service, CloudView and Orchestrated Disaster Recovery. NVMe-over-Fabric is also set to change the game once again, and Datruim DVX is in a unique position to leverage the new tech.

2018 doesn’t belong to tech innovation.

In 2018 Datrium is firing all cylinders on marketing, channel, and sales. We have a ridiculously good track record of winning nearly 100% of all Proof-of-Concepts (POC) against traditional SAN and HCI vendors. In all honesty, when we lose it’s not because of product, performance, features, capabilities or roadmap – it’s because of company viability. Yes, we are a newish tech vendor, and some companies are just averse to changes, but surely and slowly we are getting into bigger customers – including few Global 100 accounts.

We have an incredible win rate against the top two HCI vendors – we win 2 out of every three opportunities against them. POC! POC! POC!

We have an incredible ecosystem of partners that support us in sales and marketing. We went from 6 to over 30 technology partners in less than six months, including some of the top technology and application vendors in the world – and with few of them we are going deeper on the relationship.

We have a remarkable partnership with Dell EMC for delivering a turn-key solution to our customers. Together with Intel, we have set the world record for most scalable and fastest storage platform on the planet – by miles distance from the previous world record.

We have initiated our international operations and will expand fast. By the way – we are hiring AEs/SEs, Inside Sales and Channel leaders in the US, and very soon EMEA.

There’s so much more I could talk about the work happening in Operations, Sales, Marketing, and adjacent areas. Datrium has hands-down the best product (thanks, Engineering!), but now it’s time to make sure that we have the 99 percent business alignment and execution.

Finally, we would not even be here if it wasn’t for our awesome customers that have been supporting and believing on Datrium team and tech, since the early days, since before I joined the company. For you, thank you!

This article was first published by Andre Leibovici (@andreleibovici) at LinkedIn

While setting up couple AWS EC2 instances a few days ago, I noticed that despite the higher level abstraction and orchestration provided by cloud services I still needed to understand quite a lot about application behavior to properly stand-up a solution that would cater for my business. Luckily I was just playing around with some open-source software and the configuration did not matter that much, but it could have been very different if I was dealing with production systems and applications.

When setting up EC2 instances you need to provide networking information and lots of details about how storage will perform, including volume types, HDD vs. SSD, and the number of IOPS expected.

I know many infrastructure admins that would not even know where to start configuring these settings; not because they do not understand the features and metrics, but rather because they do not own the apps, they are not the DBAs, and also because they do not know what the requirements are on Day 2. The easiest way to mitigate complexity is to select best of breed for every storage option and pay the price.

AWS EC2 Volume Configuration

On the other hand, HCI vendors sell themselves as on-premise datacenter simplification solutions, and in some sense, they are simplifying because they remove complex 3-tier SAN configurations (zoning, mask, LUNs, RAID, etc..). However when you look under the covers, you will notice that most HCI solutions on the market have been originally architected in a way to test market potential, and for almost all of them, enterprise data services have been implemented as an afterthought.

Truth be told…when data services are added as a bolt-on, as an afterthought, it becomes challenging to efficiently integrate new features and services in a meaningful way and eliminating complexity – and a quick look at HCI solutions on the market today will prove this point, where you will find that they have plenty of knobs and checkboxes for turning features On and Off.

I used this picture before in a diferent article, but I think it is priceless because it demonstrates very well the complexity of HCI solutions.

In the world of private and hybrid clouds, self-service portals, and higher level orchestration services, it does not make sense to ask users to identify, and in most cases make assumptions, about applications and data behavior. Users should not bother if the data is de-dupable, or RF2 vs. RF3 vs. Erasure Coding, or if compression delay should be 30 min or 60 min, or if checksumming, erasure coding and compression is to remain enabled for a given application.

If you thought that EC2 was complex at the beginning of this article now you are probably thinking that EC2 is a piece of cake compared to HCI solutions. We don’t worry about all that when using the public cloud, so why should we when using private clouds?

Datrium DVX has virtually no knobs that need to be adjusted or configured, and yet presents an extensive list of data services, such as Deduplication, Compression, Erasure Coding, Checksumming, End-to-End Encryption, Replication, Snapshotting, Cloning, Compression over Wire, etc. Also, there’s no choosing of Fast or Slow media – all data follow the application and will always reside on local attached Flash/NVMe for best performance.

From an implementation perspective, the file system always uses distributed Erasure Coding for reliable data protection against at least 2 simultaneous disk failures (or All but One server in a cluster), and the software stack uses no more than 20% of host CPU to deliver all data services, Always-On and Always In-Line. In a spirit of openness and truth, there is only 1 knob – the FIPS 140-2 Encryption. (btw – Only converged solution certified with FIPS 140-2).

Datrium allows for much better scalability, resiliency and performance than HCI and is simpler than EC2, and the architecture is a game changer in the datacenter space. If you are curious about how Datrium works under the covers, watch this video with Devin Hamilton and Alastair Cooke – it is excellent!

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Nowadays when it comes to home automation, the possibilities are practically endless, and virtually every device in a home can be connected and automated with a little bit of patience and some extra money to spend. There are proprietary and open-source platforms and accessories to choose from and deciding on how to go about automating your home can be a daunting task for newbies.

I went through the home automation challenge myself with my new home, and I will try to make it easier for you to understand why and how I did it. Before I get started, I would like to clarify some of the premises that have driven my decisions.

1 – My family, for good or the bad, is locked into the Apple ecosystem. My wife and I own iPhones; our daughter has an iPad, we all have MacBook Pros, Apples TVs, Apple Watch, and the Apple Airport Extreme as the home router. Not surprisingly, we are also on iTunes and Apple Music. So, whatever solution I chose would have to be compatible with the Apple ecosystem.

2 – I have enough devices at home, so a critical metric was to add the least amount of devices, including bridges, sensors, and cameras.

3 – Alarm Sensors – Fortunately, I moved into new home construction that has been entirely cabled with cat5e cables for PoE (Power Over Ethernet). Furthermore, all doors and windows have been pre-wired with sensors connected to the alarm system.

4 – Cameras – When it comes to cameras, I wanted to make sure I didn’t have to introduce an NVR in my home, and ensure that the streaming was going directly to the cloud. From my perspective, this is also more secure in case of a home invasion, as the NVR with the images can just be taken out of the home.

5 – DoorBell – All doorbells have somewhat the same functionality. More on that later.

6 – Lights and Switches – While there are many accessories to control lighting, in my opinion, the approach of replacing sockets and bulbs is somewhat counter-intuitive. I wanted to make sure that whatever I do, it’s a solution that doesn’t require me to buy expensive replacements from the same vendor. This drove me to look at replacing the house switches with smart switches that can operate with any bulbs.

7 – Other Devices – I wish I could connect all devices, but not all devices are ready for that. That said, I also managed to automate my TVs, Roomba, and even the Garage Door (ore on that later).

Now that I have revealed my fundamental premises I need to be clear that cost was never a concern for my automation project. I made decisions merely on the features and benefits that I wanted to be delivered to my home. In some cases, I chose not to adopt a more expensive accessory or bridge, but that wasn’t based on cost. So, if you are building a budget conscious home automation project, this likely is not going to work entirely for you. That said, keep reading the article as you will see that I also used open-source components that you can (like I did) efficiently run on a cheap Raspberry Pi3.

Choosing the Platform

I looked at many platforms, read lots of reviews and watched plenty of YoutTube videos, but for the sake of the family in learning how to operate all these smart devices, I ended up choosing the Apple HomeKit platform. Please note that Apple HomeKit may not be the most feature-rich platform, but the native iOS integration wins for me. Furthermore, anything I configured for my home would automatically be exposed to my Apple Watch.

The Apple HomeKit

Apple HomeKit has been out there for a little while, but for the most part vendor adoption has been low due to a special Apple-specified MFi coprocessor to authenticate, connect and transfer information securely, and also a closed API that didn’t allow vendors to achieve the desired outcome for their devices. So, there aren’t many accessories out there to chose from.

Recently Apple has dropped that requirement and introduced software authentication for its system with iOS 11, and things are slowly changing, with more accessories compatible with HomeKit being made available.

Apple HomeKit is native to iOS, and you can use an Apple TV 4th Gen or an iPad as the home hub. In HomeKit you can configure accessories, rooms, scenes, combine accessories, switch on/off, and create automation rules {if/then/when} for accessories and sensors. Using Apple TV as the bridge, I didn’t have to buy and deploy yet another device – other proprietary platforms would likely require a new bridge.

For my automation, as an example, I have a simple rule that turns entrance overhang lights On for 5 minutes whenever my Ring Doorbell detects motion. Another nice one is the auto turn On of the Christmas Tree lights every day at sunset and turn it off at 11 PM automatically.

With the platform defined it’s was time to choose accessories – in my case, they all must be Apple HomeKit compatible if I want them to be part of the automation rules and single pane of glass to manage them all.

Something I found out later is that even with HomeKit compatible accessories, not all functions and features are exposed to HomeKit, and for some devices, you will still need to configure them using their native iOS app. However, after initial configuration, most of what needs to be done can be performed via HomeKit.

Smart Security Cameras

At this point in time (Dec/2017) the only HomeKit enabled camera available with WiFi support is the Logitech Circle 2. The Circle 2 does not natively support PoE, but I didn’t want to drill holes for the power or have to re-cable only for the cameras.

My preference was to buy a PoE Splitter with a USB connector, the Circle 2 standard power connector. Using this method I was able to feed power to the cameras using the built-in cat5e cables and also send the streaming via WiFi to the Apple Airport Extreme and to Logitech Circle 2 cloud.

I deployed Circle 2 cameras inside and outside my home, and I really like the functionality, including configurable motion zones, smart person detection, and the family-favorite Day Brief feature, where we can see everything that happened during the last 24hs in 60 seconds – mostly the cat and squirrels having fun in the backyard.

Live Web View

Logitech Circle 2 Zone Editor (web only)

The camera has a high price tag and also a subscription model where the Freemium version only gives you 24hs of cloud storage and no smart features. Definitely not cheap, but works really well.

[Automation] I automated the home lighting based on camera motion sensor, so whenever a camera detects motion it starts recording, but also the lights for the areas being recorded are automatical turned on. This enables better recording clarity and hopefully scares anyone willingly trying to get into my property.

Video streaming is also available from AppleTV or from any iOS device part of the Family Sharing.

Smart Lighting

Lighting is probably what is most widely available for Apple HomeKit, but most of the solutions are socket or bulb based. I decided to take the grassroots approach and replace the in-wall switches with Lutron Casetta Wireless that fully integrate with Apple HomeKit. Lutron and Leviton are the two most prominent brands in the in-wall switch automation market.

Unfortunately, with Lutron, I had to buy and deploy their home bridge, but it was worthwhile because a single Lutron bridge commands many switches and it will seamlessly integrate with Apple HomeKit. In any case, that was the first extra piece of hardware I had to add to my home.

Lutron has wireless switches with On/Off, Dimmers and Remote Controls for 1-way, 3-way, and multi-location deployments. Please note, that most switches will require ground and neutral wiring, what may be an issue with older buildings.

The only downside of using in-wall switches is that you may need to be the electrician and do the work yourself, including re-wiring, but there are YouTube videos that explain the process in detail. I did it by myself, and while it took some time to learn all the different configurations (at my home I have few 3-way and multi-location switches), after a while, it gets straight-forward. The upside of in-wall switches is that it is a set and forget solution, not requiring specific bulbs or sockets.

[Automation] After configuring the switches, I created a HomeKit Scene, “Goodnight” where I simply say “Siri, Goodnight” and all the lighting are set the way I want. Also, as part of the Goodnight scene, Homekit will check that the front door and the garage doors are closed and turn off the TVs. I also have a movie scene that sets the lighting and sound for the perfect movie experience. (my next venture will be in a home theater with Dolby Atmos)

[Automation] Another automation that I like is the ability to turn on and off multiple switches whenever any switch is used. As an example, I have various switches to control the back porch lighting, and they are in different rooms, but now any switch will trigger all lights at the same time.

Smart Thermostat

There are few options for Apple HomeKit, and you can check and manage temperature from your phone, and most will use house proximity to adapt the temperature to your preference and location. The most adopted are the Nest and Honeywell.

While I really liked the Nest, it was not compatible with Homekit, so the essential checkbox in my project that was not being ticked. I then watched and read reviews about how the Honeywell Lyric Round Wi-Fi was really good with its geofence feature – and it also integrates with Homekit. To be honest, the Apple HomeKit feature was the decision-factor for the thermostat.

I followed the Lyric instructions (there’s some manual wiring required), and in no time I had installed the accessory, the app and also integrated with HomeKit.

Now our home always has our desired temperature using both cool and heat, and at night it brings the temperature down by 2 degrees automatically just so we can have a cozy sleep. Yes, I can also change the temperature from my couch, but that is never needed.

The Lyric also knows when we are not home based on our phone geolocation and will not waste energy if we are home, but it likewise knows when we crossed the pre-defined geo-boundaries and will make sure it turns itself on if we are getting back home. Most smart thermostats work similarly.

Smart Lock

I initially wanted the August to manage my door lock, but I soon decided that was better to get something a little more robust and permanent, and also containing the actual deadbolt. I looked at Kwikset and Schlage. The problem was that my door has a 2 piece deadbolt and removing the original deadbolt would leave a mark on the external part of the door. Because of that, I ended up choosing the August, which I have been pleased with it.

The August integrates with HomeKit using the AppleTV and enables you to open and close the door using your iPhone or Apple Watch (yes, I use that when I go for a run). I can also check the status of the door with my iOS and the door will auto-close in 5 minutes, in case someone left open.

The August can provide temporary keys to people, but for that, I would have to buy their August Connect bridge and yet another device in my home. I decided not.

[Automation] I also implemented some automation rules for the August. As an example, in my home whenever the door unlocks, and it’s night time, the front overhang lights are turned on automatically for 5 minutes. Of course, with any of Apple HomeKit, accessory status alert are always configurable.

Apple HomeKit Automation Rules for my Home

Non-Apple HomeKit Compatible Devices

So far, despite some trade-off and in the case of the in-wall switches, requiring some work, all the devices were Apple HomeKit compatible, and their installation and configuration were pretty straight-forward. However, I still had accessories that I would like to connect to HomeKit, so I started looking for a solution.

The accessories included a Ring Doorbell, couple Samsung TVs, the Roomba and a Chamberlain MyQ garage door (this last one does support HomeKit with yet another bridge).

I soon learned about the Homebridge project, a lightweight NodeJS server you can run on your home network that emulates the iOS HomeKit API. I looked at the plug-ins available, and they would solve the rest of my automation project while making everything seem native to iOS and HomeKit.

I first thought about buying an Intel NUC for Homebridge, but it felt like bringing an automatic weapon to a sword fight. I next discovered that couple people had ported Homebridge to ARM and that I could run on a Raspberry Pi. I immediately went to Datrium marketing department (the company I work for) and asked for one of the RPi customer giveaways during conferences – and to my luck, they had a spare one. With a little bit more research I learned that there was a Homebridge Docker image and that someone had already created a Dockerfile for that. BINGO!

(I am not going into configuring Homebridge here as it will undoubtedly require a dedicated article)

Ring Doorbell (w/ Homebridge)

I had been waiting for a vendor to release a doorbell that could integrate with Apple HomeKit, but despite announcements, no vendor has released anything to date. I had just couple options really, the August and the Ring, but none would integrate with HomeKit. Well, at least until Homebridge. I acquired the Ring Doorbell Pro.

[Automation] As part of the automation rules, whenever Ring detects front door motion it will turn the front lights On, but we also have been using Ring doorbell and August smart lock to allow the cleaners into the house, so we don’t have to give them a set of keys.

There is also an option to integrate the Ring camera using Homebridge FFmpeg plug-in, but I have not done that yet. If someone rings the door, we use the Ring app to answer, and I think it will stay this way because it also allows for two-way communication. Ideally, the doorbell and the smart-lock should be integrated with the same app, but that is not the case. I’m hopeful that August will solve this problem soon as they have both the smart-lock and the doorbell.

Ring works well, but whenever an Apple HomeKit doorbell is available, I’m likely to replace it. It seems that Ring 2 is ready for it, but not enabled.

More Integrations (w/ Homebridge)

Aditional integration and smart automation in my home using Homebridge include the Roomba vacuum cleaner, the garage door, and the Samsung TVs. I have added them all to the Goodnight scene, and now the system makes sure everything is off, closed and locked as we go to bed.

The Alarm System

No top-tier alarm monitoring companies on the market are supporting Apple Homekit yet; they all have their own iOS and Android apps. There’s a Homebridge plug-in for SimpliSafe, but I have not tested.

Because of lacking support for HomeKit and because my home was fully cabled with in-wall sensors for windows and doors I didn’t have an option here. I’m just using a standard monitored alarm system with top-tier vendors. They have an iOS app, but it’s not integrated into Homekit unfortunately.

UPDATE – The Honeywell Controller alarm system supports HomeKit.

What’s next and what I’ve learned.

All this home automation is expensive, and the benefits will differ for each and every one of you. In my case, it all started with securing and monitoring the home, but soon it became a hobby, and I had a lot of fun in doing it. As for next steps, I still have lights to add to my Scenes, and I also would like to replace the smoke and carbon dioxide detectors with smart ones that can let me know about an incident even when I am not home. Also, I would like to experiment with music following me across rooms.

There are ways to make all this automation cheaper, but being in the Apple ecosystem Homekit was the right choice for me. Open Source platform options will also work, but based on my experience they require far more engineering work than proprietary platforms such as Apple HomeKit, Wink, Samsung SmartThings and others. Even to get Homebridge to work properly with all devices was a PitA. Moving forward I expect all accessories to be able to talk most platform languages and remove the need for a dedicated bridges.

Do you have your own experience automating your home? Let us know what worked and what didn’t.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

This is a minor update to the VDI Calculator, just adding capability and support for Intel Xeon processors with 20 cores. I’m adding that mostly because Datrium now has a Compute Node (CN) that supports Intel Xeon Gold 6148, 20 cores/socket, 2.4GHz (specs). That said this new option provides support for any hardware from any vendor supporting 20 core processors.

Datrium’s Blanket encryption is already an industry-first, providing software-based (without hardware dependencies) end-to-end encryption. Datrium’s client software runs as part of the hypervisor and is uniquely able to provide cluster-wide encryption domain with full data services, such as compression, de-duplication, and erasure coding.

The encryption covers ESXi, RedHat Enterprise Virtualization and CentOS KVM host RAM buffers, the host SSDs, the data nodes HDDs and SSDs, the data in-flight between hosts and data nodes, and also the data stored in data nodes NVRAM.

Protecting data-at-rest has become a top priority for organizations. However, despite growing awareness, encryption of data in-flight is consistently overlooked. Nowadays in-flight data is most vulnerable to perpetrators that can tap into the network connections given the widespread use of IP network protocols; security measures for data in storage come to nothing if in-flight data is not safeguarded as well.

NIST and CMVP

Datrium is now certified by NIST (National Institute of Standards and Technology) and the Cryptographic Module Validation Program (CMVP). The CMVP validates cryptographic modules to Federal Information Processing Standards (FIPS)140-2, Security Requirements for Cryptographic Modules, and other FIPS cryptography based standards. The Federal Agencies accept the modules approved as conforming to FIPS 140-2. Learn more about it here on the NIST website.

With this certification, Datrium becomes the 1st and only converged or hyperconverged platform with a cryptographic module officially certified by NIST for FIPS140-2. Furthermore, a quick search on NIST website demonstrate all supported FIPS algorithms and also the extent of the test configurations, including x86, AIX and ARM platforms.

For more information and for the implementation details read the white-paper on Datrium Blanket encryption (here).

Update: I have only considered primary storage converged solutions for my article. Upon additional search on NIST website, I was able to find that Cohesity (secondary storage converged) is also FIPS 140-2 certified.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

We have partnered with @vBrownBag and @thectoadvisor to demonstrate the simplicity of the Datrium solution. In this video, Devin Hamilton, our Director of System Engineering, talks with Alastair Cooke about Datrium DVX Architecture and Open Convergence overview.

Open Convergence is really, kind of the combination of all the greatest things that SAN ever provided, all the greatest things that HCI ever provided, and none of their pitfalls. – Devin Halmilton

In all honesty, this is a must watch video interview for those not yet versed in Open Convergence. Not many in the industry have knowledge depth and the tremendous presentation skills that Devin has.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Storage arrays and HCI systems come in all shapes and sizes with vastly different architectures tailored to a variety of use cases. One thing they all have in common is that, sooner or later, storage drives, disk or SSDs, will fail. Entire drives may become unavailable, or they may silently lose a few sectors to what is known as Latent Sector Errors (LSEs). LSEs and silent corruptions injected by faulty hardware or software are increasingly common despite built-in drive ECC.

A storage system must remain available and continue to serve data in the face of drive failures and LSEs. In our view, any enterprise storage system must tolerate the failures of at least two drives which will most often manifest as the failure of one entire drive plus an LSE on a second drive discovered during a drive rebuild.

Datrium DVX relies on Erasure Coding during its normal operation and exposes no controls to disable EC or change the level of data protection. It does so without any performance sacrifices for workloads heavy on small random writes and overwrites.

The technical paper below describes in detail data protection modes and how DVX with built-in Erasure Coding achieves 1.8M IOPS for 4K random writes for a system configured with 10 hard disk-based Data Nodes which exceeds the performance of most all-flash arrays.

[click image to open paper]

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

We have partnered with @vBrownBag and @thectoadvisor to demonstrate the simplicity of the Datrium solution. In this video, Lakshmi Bairavasundaram talks with Alastair Cooke about The Science and Statistics of Failures and Partial Failures. This is a must watch video for those interested in the inner working of the drives, SSDs, and their failure rates.

Datrium announcements are often characterized by technology improvements as an evolution of the shipping platform. However, we are on a journey to deliver the best practical solutions that bridge the operational gaps between on-premise and public clouds, moving organizations to a world where the underlying nuts and bolts are not relevant to achieving the best business outcomes.

As part of this mission, we have already simplified on-premise infrastructure, collapsing and eliminating enterprise storage silos, and removing day-2 configuration complexity and maintenance requirements. Yes, Datrium DVX does not require application knowledge specificities that other converged and hyperconverged products demand from users and admins to enable/disable data services such as compression, dedupe, erasure coding and replication.

Welcome The Cloud DVX!

The Cloud DVX is a zero-administration Software-as-a-Service piece of the overall Datrium platform solution that lives on the cloud (AWS). As a part of the service offering, Datrium manages the service availability, automated software upgrades as well as proactive support and self-healing functions related to Datrium and AWS resources.

The Cloud DVX is the brains for on-premise DVX instances. The software is built on the same split provisioning foundation as the on-premise DVX, enabling massive scalability of compute or capacity independently and on-demand. Furthermore, the same superpowers of (LFS) Log-Structured Filesystem is behind the Cloud DVX.

In this post, I highlight the three initial use-cases that are being delivered in the next few months or are part of a short-term product roadmap.

Cloud is the New Tape – Backup-as-a-Service

Traditionally IT organizations provide incremental and differential snapshots and backups of running systems and store an extra copy on an on-premise secondary storage for quick retrieval (low RTO), and later the same data is archived to tape for long-term retention.

Cloud DVX BaaS offers a self-managed (CrashPlan like) solution that supports multi-site, multi-system, and multi-object end-to-end global deduplication with full data efficiency and encryption on the wire and at rest. Also, because the service supports end-to-end encryption, there is no need to add a separate VPN and related AWS charges.

Datrium Cloud Groups

Because the Cloud DVX provides direct-to-host restore, the management (and cost) of an additional on-premises backup or cloud gateway device is eliminated, further simplifying public cloud backup and recovery.

One-click setup with AWS

Multi-site, -system, -object global dedupe

Full data efficiency on wire, at-rest

Forever incremental native backups

VM and vDisk granular recovery

Built-in E2E encryption (no VPN charges)

Direct cloud-to-host restore

Global Catalogue

Automated Self-Healing

Automated Upgrades

Proactive Support

How does it work?

Data is stored using AWS S3, and is priced by AWS based on capacity used and also on put and get operations, for both metadata and data. For this reason, being globally dedupe-aware and not sending or receiving the same data blocks twice is extremely important to maintain the cost-effectiveness of the solution. Datrium DVX can do that even when end-to-end encryption is in use.

Cloud DVX has been designed to do incrementals forever type of remote backups. The only reason to send full backups is when new VMs have been created or during an initial seeding. However, even in this case, global dedupe is employed to send just the missing pieces. The same logic applies when trying to recover data. Only the missing pieces are sent back, making the experience faster and cheaper.

As we all know, AWS services sometimes can go down. So, we use AWS Lambda to monitor the Cloud DVX services., continually looking for anomalies and issues, and self-healing the system. Datrium’s Lambda monitoring software detects and rectifies the issue by restarting services using different resources, therefore masking AWS eventual issues. Finally, the admin can also select the desired AWS region to store snapshots, via Datrium GUI.

Check out right now the newly updated website with all info on Cloud Backup as a Service, and watch the video below demonstrating the AWS setup on Datrium taking only 60s. It’s that simple!

With v1.0, up to 30TB raw (global pre-dedupe) and 4 DVX systems are supported, but numbers will increase with upcoming releases.

Want more?

Read this blog post by Brian Biles “What’s Wrong With Cloud Backup?“. Honestly, not many people know more about data protection than Brian, a founder of Data Domain and Datrium.

CloudView – The Cloud is the System, One thing to Manage

The Cloud DVX is also the single GUI for on-premise, private and public DVX systems, offering one unique and straightforward cloud service for multiple sites, systems, or departments. The name of this service is CloudView.

CloudView collect data from DVX systems every few minutes to perform data analytics, correlation and root-cause analysis across all connected systems. Furthermore, based on the data CloudView will promote dynamic workloads, placing the right data at the right place. Additionally, CloudView will become the API gateway to communicate with multiple DVX deployments.

If you liked Nimble InfoSight, you’ll love CloudView.

(ETA) H1 2018

Single File Restore – Power to the Users

Today organizations rely on numerous vendors to allow users to perform Single File Restore, as opposed to restoring the whole virtual disk (VMDK). Datrium SFR works in tandem with the Cloud DVX, enabling guest objects, such as word documents, to be seamlessly retrieved from any Datrium storage tier, including cloud, back to the user operating environment.

Using powerful analytics and a global catalog the Cloud DVX will enable advanced searches, but over time it will provide additional insights into app-specific requirements and recovery modes, such as for Exchange and SQL.

We will soon provide more information on how DVX Single File Restore works.

(ETA) 2018

IR2 or Instant Remote ReStart – Orchestrated Disaster Recovery

Many convergence and hyperconvergence vendors are trying to solve the hybrid cloud puzzle, but they are all focused on providing a single cloud and single hypervisor solution. We are not building a hypervisor. We are not building a custom cloud. We aim to deliver a feature-rich turn-key orchestration solution for the multi-hypervisor and multi-cloud world, be it VMware, RHEV, Azure, AWS, GCP, or VMware Cloud on AWS.

The Cloud DVX is the brains and the witness agent for the recovery orchestration engine, and provide the automation framework that is being used to restore Protection Groups (PG’s), VM’s, and Datastore Files in a systematic and repeatable way.

Datrium customers will be able to select VMs for protection, choose a pre-defined or create a new runbook automation (RBA) template, and the applications are then replicated in the background, ready to be recovered on AWS in the event of a site failure.

We will soon provide more information about IR2, but it encompasses DR Test and Reporting, Failover and Failback, Planned Migrations and Data Cloud Workload Mobility.

If you are familiar with VMware SRM (Site-Recovery-Manager), think of that for multi-DVX systems and multi-Cloud.

IR2 orchestration fundamental roadmap entails:

From on-prem to on-prem

From on-prem to cloud

From cloud back to on-prem

With v1.0, the focus is on orchestrating and automating the registration and power-on of VM’s and supporting changes to the VM’s (such as networking mappings) that are necessary to bring them up in the target environment. DR orchestration will be initially available between on-premise DVX systems across multiple sites, soon followed by AWS as a target.

(ETA) H2 2018

There is lots of goodness coming from the Cloud DVX, but there are also many more features and enhancements to the on-premise Datrium DVX software that will be available on forthcoming releases. This is a fantastic technology product, from a fantastic company, in a fantastic time. You should check it out!

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

We have partnered with @vBrownBag and @thectoadvisor to demonstrate the simplicity of the Datrium solution. In this video, Boris Weissman talks with Alastair Cooke about Datrium I/O architecture. This is a must watch video for those interested in the inner working of the unique Datrium technology.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

This is a really short post. It always seems easy to write about a given technology and say it is faster, better and simpler. Is it really? We have partnered with @vBrownBag and @thectoadvisor to prove that we are not selling hot air balloons. Tomorrow (11-16-17 @ 9 AM PST) you will have the opportunity to see Live how simple and fast Datrium DVX technology is and how Open Convergence drastically simplify datacenter operations and scalability.

What lies beyond hyperconvergence? Last week Mike McLaughlin and I had the wonderful opportunity to present on the ActualTech Media Converged and Hyperconverged Webinar. The team at ActualTech Media is doing a phenomenal job, and the event had over 330 attendees with many great questions and many companies asking for direct follow up with Datrium.As part of the webinar

As part of the webinar Cloudistics and Unitrends were also presenting. Watch the Datrium session, where I take you on a datacenter evolution journey, from SAN to HCI to OCI, and Mike finishes with a great demo about Datrium DVX scalability and simplicity.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Couple weeks back I had the wonderful opportunity to present Datrium tech and the Open Convergence evolution alongside my colleague Mike McLaughlin and the Citrix Ready team. Along with LoginVSI we crafted and validated a reference architecture for 6,000 XenDesktop virtual desktops running on a single Datrium platform while enjoying benefits of AllFlash performance at a cost of $9 per month per user over a three year ROI (list prices).

What the video Unlock Flash-based Economics and User Experience for VDI with Open Converged

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Learning new technology is exciting, and I recently came across a Datrium feature that I believe no one else in primary storage delivers today. It is solving a cumbersome problem for organizations with large amounts of data that need to be seeded for disaster recovery purposes.

Seeding DR datacenters can be an arduous and lengthy procedure even for the most advanced storage platforms, especially when the amount of data is exponentially higher than the bandwidth available between datacenters. Simply put, all data from the site A must efficiently travel to site B before next snapshots start replicating and incremental data can start flowing.

In the past, the simplest way to achieve that was to place legacy disk-arrays (SAN) side-by-side, replicate the data, physically move the array to the DR site and reinitiate replication from the last snapshot. This method surely isn’t the desired resolution for the problem because it requires transport of heavy, big and delicate equipment.

In the case of hyperconverged (HCI) platforms this problem is exacerbated given the fact that data live on different servers. HCI require transport of dozens of servers between sides.

Data replication over-the-wire (WAN) is a more popular option, but depending on the amount of data to be replicated and the bandwidth available, the initial seeding process can take days, weeks and even years. Networking vendors introduced de-duplication over-the-wire reducing the amount of data transmitted, therefore reducing the time spent executing full initial DR seeding. However, that worked only for small and similar datasets due to the massive amount of metadata and computing necessary to maintain coherence on both sides of the network.

More recently we started to see storage vendors implementing native de-duplication over-the-wire, replacing specialized WAN deduplication appliances; Datrium also offers the technology.

That said, the amount of data in enterprises has grown exponentially over the last decade and even with de-duplication over-the-wire starting the migration of Terabytes or Petabytes of data is a monumental task even for links with ample bandwidth. Mere 100 TB transmitted over a 200Mbps WAN takes 46 days. 1 Petabyte takes over a year. AWS knowing about the challenges created Snowball, a solution that syncs your data on-premises and then ships to their datacenters.

The Datrium Way

With Datrium, data is ALWAYS globally deduplicated, compressed and erasure coded, and blocks are uniquely and logically aggregated. Because of that, the system possesses the full understanding of existing and missing data blocks of both datacenter sites, primary and DR, allowing administrators to load any data, even old data from old backups, on the DR site and Datrium recognizes the differences. From there the system only transfers missing or different data, and when the initial seeding is complete, the regular snapshot replication cycle initiates.

Here are the five different ways users can do the initial seeding with Datrium:

Side-by-side like “legacy” arrays (or over a fast network link).

Restore from an existing Veeam/Commvault/Cohesity/NetBackup/ARCserve disk backup on a remote/DR site. The backup can even be stale.

Restore from a tape backup on a remote site. The backup could be stale.

Copy from an existing LUN mirror of a legacy array to a Datrium DVX on the DR site. The LUN mirror could also be stale.

Ship a USB drive from the source to the destination.

In all cases, the remote site can be seeded with somewhat stale data and Datrium replication figures out what’s missing and transfer only that data incrementally. This also means that the primary site does not need to be stopped and, in the case of some backups already present on the DR site, the primary site is not affected by the initial seeding.

While global deduplication does reduce storage space, there are other benefits too. Technology foundations do matter!

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

]]>http://myvirtualcloud.net/disaster-recovery-seeding-is-a-pain/feed/19441http://myvirtualcloud.net/disaster-recovery-seeding-is-a-pain/Beyond Hyperconvergence w/ David Davishttp://feedproxy.google.com/~r/myvirtualcloudnet/~3/jj-A6gDcR5g/
Mon, 30 Oct 2017 14:47:59 +0000http://myvirtualcloud.net/?p=9438David Davis (@davidmdavis) and James Green (@jdgreen) from ActualTech Media were at Datrium HQ last week to talk about Open Convergence. Check out this short interview!

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

“Datrium is the most scalable, fastest and lower latency storage solution (converged or not) on the market, beyond doubt.”

We already knew that our solution is really competent, but there’s nothing like a 3rd party audited benchmark to convince even the most suspicious greybeards. The results are mind-blowing!

In partnership with Dell and IOmark.org, we have been able to validate that Datrium achieved 8,000 VMs on a single Datrium converged platform running 60 servers and 10 data nodes. (Press Release here)

Let’s stop here for a brief moment because I need to make sure that you understand Datrium’s capabilities and how these tests were executed before we compare to other IOmark results.

1 – The purpose of the IOmark.org is to create and maintain benchmarks that allow users to accurately and fairly characterize storage systems running application workloads. IOmark is a storage benchmark, not synthetic. It replays real application workload traces. So the usual “that’s an artificial hero number” does not apply here, this is for real. The benchmark reports “number of VMs supportable by the system,” not just peak throughput. It also runs for a few hours (dataset population & then run, with vMotions, etc. going on in the background). So it’s like steady-state performance.

3 – The servers were Dell C6320 using old generation E5-2697v4 Intel Broadwell CPU with 4 x 1.92 TB Samsung SSDs (PM863a), and each server was running VMware vSphere hypervisor, but the same results would be achieved with RedHat Virtualization (RHEV). The data cluster was formed of 10 nodes with 12 x 1.94 TB SATA SSDs (F12X2).

Ganesh Venkitachalam will be soon publishing the detailed story, results, and configuration, but you should have in mind that this audited benchmark establishes a real baseline to make apples-to-apples comparisons for storage subsystems and converged platforms.

This is an actual real-world system that anyone can buy today from Datrium without much thought, not a benchmark hero setup, and the fact that we did it with SATA and cheap Samsung SSDs (PM863a) is impressive.

Until Datrium, the highest IOmark audited benchmark was the IBM V9000 AFA with 1,600 IOmark VMs, and the highest hyperconverged solution was VMware VSAN with Intel Optane SSDs achieving 800 IOmark VMs. Datrium delivers not only 5X more performance than the previous IBM record and 10x more performance than the previous hyperconverged record but also has the lowest latency across all audited platforms. See for yourself all previously audited IOmark results here.

Interesting tidbit: The IBM V9000 benchmark was using only RAID 5 protection without Dedupe or Compression. The VSAN benchmark was using Intel Optane SSDs and only 1FT protection. Datrium was audited with (N+2) Erasure Coding protection, Dedupe, Compression, Encryption and cheaper SATA SSDs.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Three months ago we released the 3.0 version of our software (here) and announced a boatload of new features, including Red Hat and CentOS virtualization support, Containers Persistent Volumes, and an incredible 18 Million IOPS and 8 GB/s random write throughput. Then last month we announced Oracle RAC (here) with support for vSphere multi-writer VMDKs.

Today we are making another incredible product announcement, introducing release 3.1, providing Zero VM Downtime even when all drives in a host fail, Accelerating Performance-Intensive Workloads (even further), and Extending All Flash Beyond Primary Storage. Furthermore, we are updating our hardware platform, both data and compute nodes. (Press Release here)

If you are not yet familiar with Datrium and Open Convergence, at the 10,000-foot view we offer a back-end node for persistent data and backups (aka data node), and a front-end compute node where applications run with data locality and flash performance (those can also be your existing servers). Datrium is amazingly simple, without the complex set of dials and knobs that are commonly found in private cloud solutions.

Updated Compute Node CN2100

Updated Data Node D12X4B

The New All-Flash Data Node F12X2

IOmark Performance Benchmark

StorageReview.com Performance Benchmark

Zero VM Downtime

Peer Cache Protection

Let’s get to the announcements….

Updated Compute Node CN2100

As always you may bring your servers into the solution, but if you prefer you can use Datrium’s compute node with pre-validated performance numbers. The new hardware platform uses new Intel Skylake processors, adds NVMe support and also packs 2×25 GbE or 4×10 GbE.

Updated Data Node D12X4B

You may mix and match different data node generations as part of the same cluster. The new disk-based data node (internally codenamed as Arrow) is high performant and cost-optimized, offering up to 18 Million IOPS, 200 GB/s read throughput, 8 GB/s random write throughput, and ~100 TB of effective capacity. The new model also comes with 25 GbE support.

How does Datrium D12X4B look like, compared to…

To put into perspective, using a 70:30 read to write split ratio with 8K block sizes would give us a direct comparison to XtremIO. Using this configuration, DVX will do 3.3M IOPS; 3.7x better performance than the largest XtremIO. XtremIO Specifications (here).

You may compare with 32KB reads where DVX will deliver 6.25M IOPS; 17x better than the largest Pure Storage FlashArray. Pure Storage Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 2.7x better than the largest EMC VMAX All Flash. EMC VMAX All Flash Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 1.8x better than the largest SolidFire All Flash array. SolidFire All Flash Specifications (here).

The New All-Flash Data Node F12X2

For those highly intensive workloads with extreme random write operations requirements, we now offer All-Flash Data nodes (internally codenamed as Flarrow). All-Flash data nodes allow up to blazing fast 20 GB/s random write throughput when four or more nodes are clustered together – that’s 2X write throughput relative to the disk-based data node.

The ideal use cases include large IoT and Oracle RAC deployments, and where ultra-low and predictable latencies on cache miss, host failures, cold boots or under consistency degraded conditions are an absolute requirement. These beasts improve write bandwidth and extend host resilience.

Note that due to data locality most read IO operations happen at the compute nodes themselves using local SSD or NVMe. Therefore 200 GB/s read throughput with up to 128 servers remains the same as the disk-based performance.

The F12X2 offer 15TB usable, ~50TB effective capacity and enable you to add additional capacity quickly. The new model also comes with 25 GbE support.

Given the price point, the F12X2 combined with Datrium native data deduplication, compression, and inline erasure coding, now customers can efficiently extend the use of Flash beyond primary storage, yet providing the lowest possible latency using NVMe for primary and SAS/SATA flash for secondary – 16 GB/s random write throughput.

IOmark Performance Benchmark – Mind-Blowing 10x!

“Datrium is the most scalable, fastest and lower latency storage solution (converged or not) on the market, beyond doubt.”

We already knew that our solution is really competent, but there’s nothing like a 3rd party audited benchmark to convince even the most suspicious greybeards. The results are mind-blowing!

In partnership with Dell and IOmark.org, we have been able to validate that Datrium achieved 8,000 VMs on a single Datrium converged platform running 60 servers and 10 data nodes.

Until Datrium, the highest IOmark audited benchmark was the IBM V9000 AFA with 1,600 IOmark VMs, and the highest hyperconverged solution was VMware VSAN with Intel Optane SSDs achieving 800 IOmark VMs. Datrium delivers not only 5X more performance than the previous IBM record and 10x more performance than the previous hyperconverged record but also has the lowest latency across all audited platforms.

StorageReview.com benchmark is coming too!!

We also reached out to the team at StorageReview.com and asked them to run one of their famous benchmarks. I won’t steal their thunder, so let’s wait for the verdict.

Zero VM Downtime

Another improvement with the 3.1 release is the ability to continue running VMs and applications at high performance on the host even when one, many or all local SSDs (Peer Cache protection) have failed. Datrium DVX can continue to serve I/O from flash-based secondary storage and from other hosts on the cluster with the performance necessary to keep mission-critical applications running until the VM vMotion to a new host, and the local caches are warm, or until the failed SSDs are replaced.

Peer Cache Protection

In DVX, we hold all data in use on flash on the host. Moreover, we guide customers to size host flash to hold all data for the VMDKs. With always-on dedupe/compression for host flash as well, this is feasible – with just 2TB flash on each host and 3X-5X data reduction you can have 6-10TB of effective flash. (DVX supports up to 16TB of raw flash on each host). Experience proves this is in fact what our customers do: by and large, our customers configure sufficient flash on the host and get close to 100% hit rate on the host flash.

However, in most instances due to data reduction benefits, customer decide to have only 1 or 2 flash devices on each server, because that’s more than enough from a capacity and performance standpoints. With previous releases of DVX if the last available flash device failed the workload would then stop, and applications would have to be restarted, manually or via HA, on a different host.

With DVX 3.1 we are introducing the ability to utilize Peer Cache, the flash devices from other hosts, to keep the workload running even if the last available flash device fails, and without impacting application performance until new SSDs are placed. As with any array, you now would have to traverse the network, and there would be some additional latency, but in this case, DVX would be working like any other SAN.

As with any array, you now would have to traverse the network for IO read operations, and there would be some additional latency given that we would be introducing East <-> West traffic for reads instead of being local. But in this case, DVX would be working just like a SAN.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

]]>http://myvirtualcloud.net/datrium-3-1-features-overview-beyond-marketing/feed/29397http://myvirtualcloud.net/datrium-3-1-features-overview-beyond-marketing/Valar Morghulis – Myth and Fact of Single Failure Tolerancehttp://feedproxy.google.com/~r/myvirtualcloudnet/~3/v0iL6QgITvE/
http://myvirtualcloud.net/1ft/#commentsMon, 09 Oct 2017 14:35:40 +0000http://myvirtualcloud.net/?p=9326“Valar Morghulis” is the tenth and final episode of the second season of Game of Thrones. Valar Morghulis is a common greeting in Braavos, meaning “all men must die” in High Valyrian. It is meant in the sense of “all men must (eventually) die,” sooner or later.

Creating a fictitious parallel, it is plausible to say that your data will (eventually) die. To be more precise, when using Single Failure Tolerance (1FT, also known as FTT1 or RF2) there is an extremely high data-loss probability of 0.49% in one year or 1.95% over the 4-year life of a system with a with once-a-month disk scrubbing. Less frequent scrubbing increases the data-loss probability to as high as 18.6% in the fourth year of a system’s life.

It is a myth that 1FT, a configuration commonly quoted by hyperconverged vendors, can handle one disk failure.

You don’t need to take my word for it, but instead, read the study “Single Failure Tolerance (1FT): Myth and Fact” proposed by Lakshmi N. Bairavasundaram, Zhe Wang, and R. Hugo Patterson. The trio has enough credentials and credibility, to make you think twice before using 1FT for your HCI or Converged platform deployment. Download the technical paper Here.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

]]>http://myvirtualcloud.net/1ft/feed/19326http://myvirtualcloud.net/1ft/Data Integrity Should Be Taken Very Seriouslyhttp://feedproxy.google.com/~r/myvirtualcloudnet/~3/YU8dkpXDe2s/
http://myvirtualcloud.net/data-integrity-should-be-taken-very-seriously/#commentsSat, 07 Oct 2017 00:43:34 +0000http://myvirtualcloud.net/?p=9316

Now Let’s be open and talk about Data Integrity

Datrium takes data integrity very seriously and has worked hard to build a system that delivers the highest levels of data integrity and durability. To achieve this, Datrium has pursued a three-prong strategy: architecture, in-line integrity checks, and rigorous testing.

Architecture

The Datrium DVX is designed to minimize the risk of systemic or software issues impacting data integrity. Architectural features incorporated for this purpose include:

– Content-Addressed Data

All data is addressed by cryptographic-strength fingerprint very early in the write pipeline (prior to encryption) to uniquely identify each grouping of data. This is the strongest possible check that the content of the data matches what was stored. In addition, such data never changes; if it does, it is new data with a new and unique fingerprint. Thus, errors resulting from races to update a particular location in the storage system are eliminated because there are no such updates. All newly written data are written to a new location.

– Log-Structured Filesystem (LFS)

The log-structured layout ensures that new data are written to new locations. Even in the case of an overwrite, the old data remains in the system till the space reclamation process runs. This ensures that no needed data is inadvertently overwritten or lost. In addition, writes are batched into whole, erasure-coded stripes. This ensures that there is no chance of data loss due to a partially updated stripe.

– Double Fault Tolerance Erasure Coding

The system is designed to protect against two simultaneous disk failures. Thus, even when one disk fails, the system maintains redundancy that can be used to recover from any integrity fault discovered by the In-Line Integrity Verification described below.

– In-line Integrity Verification and Healing

All data stored in the DVX is encapsulated in data structures that indicate which data it is and include a data integrity checksum. On every read from the system, these fields are double checked to ensure that the data returned from the storage device is the requested data and that its integrity is intact. If the data is found not to be correct, the system will, inline, use the redundancy provided by erasure coding to rebuild the missing data and deliver it to the requesting VM. If the data integrity check fails when reading from the cache on the Compute Node, the hyperdriver will request the correct data from the Storage Pool on the Data Node cluster.

A traditional array can only protect the integrity of data it receives (dutifully safeguard data that may have been corrupted by intervening network or host-side problems). DVX protects your data before storing it locally or sending it over the network.

– Rigorous Stress Testing

Datrium also uses a battery of automated tests to make sure every code change maintains the high standard for data integrity. Our engineers run some of these tests before code check-in. Continuous integration and test infrastructure run the full battery of tests against the full body of code on a non-stop basis. The in-line integrity verification described above flags data integrity issues that may have cropped up so that we can find and fix any bug.

In addition to the suite of automated tests, there is a set of what we call soak tests. All during development and for an additional week after development completes but before the new code is released, the software subjected to a set of soak tests designed to subject the system to long-term stress similar to different kinds of workloads in a production environment.

These soak tests include the following.

1 – Oracle RAC

SLOB and 3 large hosts with four Oracle VMs running 70/30 read/write ratio and approximately 90k IOPS.

5 – Citrix VDI

6 – Veeam Backup

7 – General Stress

Several DVX system with workloads running on hundreds of guest VMs simulating workloads such as cache torture, boot and login storms, VDI workload based on SNIA evaluation of VDI workload, FIO, VDBench and in-house tools.

There can never be enough testing and validation when dealing with data integrity, but it is comforting to know that the Datrium engineering team deals with it with the utmost seriousness and have designed automated stress tests that run continuously for the severest of the scenarios.

During VMworld US I has the opportunity to chat in Portuguese with Valdecir Carvalho (@homelaber) from homelaber.com.br. Valdecir has been doing a great job with the VMUG in Sao Paulo, Brazil and was curious about the Open Convergence innovation that Datrium is bringing to the market and in what ways OCI differs from HCI. I had a blast and hard time giving an interview in Portuguese, but the outcome was excellent. Valdecir, thanks for the opportunity to talk to your readers.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

During VMworld, I had the opportunity to present at theCube on SiliconANGLE TV. Off course, Craig and I chatted about Datrium and Open Convergence technology. If you are interested in learning about OCI and want to understand the differences between HCI and OCI I recommend you to watch this video.

“Datrium is the hottest startup in Silicon Valley right now.” – John Furrier from SiliconANGLE

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Last week Datrium announced (here) a partnership with Oracle and the support for Oracle RAC (Real Application Clusters). As part of the announcement, we stated that the main reasons for running RAC on Datrium are:

Cloning, Snapshots, and Replication are fast and free, and this enables cloning of production databases for test/dev use,

Mixed-use clusters, no silos: you can have your oracle DBs in the same cluster as your test/dev or any other workloads,

Always-on inline erasure coding even for database in use: do not compromise data safety, and do not pay the penalty of 3-way replication as with HCI offerings,

All data services are always-on, including compression – which means you save on software licensing costs when compared to 3rd party software for compressing your Oracle databases,

That is all good, but to make sure we are correct and give additional assurance to customers we contracted with Oracle specialist firm House of Brick to test our Oracle RAC solution and provide an independent verdict. It turns out we were not only right, but House of Brick found, even more, benefits, including saving $15,000 per processor in ASO licensing when utilizing the Datrium blanket encryption.

Check out their assessment of the Datrium DVX platform and also see the failure scenarios they have tested below.

The Datrium Solutions team is launching an Amazon Alexa enabled Flash Briefing with weekly updates. Every week we will have the latest news and interesting topics related to Datrium, such as papers, blog articles, and events updates.

To receive the updates, you need an Amazon Echo or the Amazon app for your iPhone or Android.

Subscribe to the Datrium Skill for Alexa (here) and just say “Alexa, what’s my Flash Briefing?”.

Also, make sure you check the Alexa cards with links to the mentioned articles.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

VSS (Volume Shadow-Copy Service) is a Microsoft Windows service that allows backup and snapshot applications to “quiesce” applications –, i.e., put them on a consistent on-disk state before taking a backup. Application-consistent snapshots are used to ensure on-disk consistency for applications, and they can also help reduce recovery time for database applications.

In addition to the service itself, there are three essential types of components all residing within the Windows guest VM: Writer, Requester, and Provider.

Datrium has created its own VSS Requester and VSS Provider to perform native instant app-consistent snapshots for Microsoft workloads.

With this first release of Datrium VSS Provider, we support Windows Server 2008 R2 and above, and Microsoft SQL 2005 onwards. Microsoft Exchange and Active Directory Controllers will be allowed in the future.

Due to the native integration with the Datrium platform, where both the VM metadata and the data lives, Datrium eliminates VM stun times and drastically reduce the application performance dip. As a result, admins can take VM-level pause-less snapshots of applications with high change rates and at greater frequencies for more granular recovery points; up to 1.2 Million snapshots are supported.

To enable VSS, you need to install Datrium’s Requester and Provider within the Windows guest VM. The installer can be downloaded from the DVX UI, and it will verify that VMware Tools is installed in the VM.

The second step involves creating a Protection Group (PG) for which application-consistent snapshots are enabled. If for any reason the Windows guest does not quiesce the application using VSS, then a crash-consistent snap of the VM is automatically taken, and the system marks the PG Snap and the VM Snap (so that they are visible to the admin as not application-consistent).

DVX maintains two distinct “namespaces.” The “datastore” contains the current, live version of all VMs and files. This is what the hypervisor sees. The contents of these files always contain the most recently written data.

The “snapstore” on the other hand contains previous point-in-time snapshots of the live datastore as it existed earlier. Every time a protection group causes a snapshot to be taken, entries are made in the snapstore with the contents of files/VMs that are subscribed by the protection group, at that instant in time.

DVX uses a “redirect on write” (ROW) technique to store incoming data. New data is always written to new locations (vs. copy-onwrite techniques that can introduce delays as changes are copied). Because only changes are stored in a snapshot, and because DVX only stores compressed and deduplicated data, snapshots consume relatively little capacity.

As I mentioned Datrium created its own VSS provider instead of leveraging the VMware vSphere VSS, and due to this native integration with the ”datastore” and the “snapstore,” the solutions eliminate VM “stuns” and application performance slumps. A “stun” operation means that the execution of the VM is paused at an instruction boundary and allow in-flight disk I/Os to complete. Most HCI/SDS vendors use VMware VSS.

Both ”datastore” and the “snapstore,” live on the Data Node cluster. Data Nodes are dense cost-optimized fully-redundant x86 based storage appliances for data persistence, and data is always erasure-coded, compressed, and globally deduplicated (using host CPU resources). Furthermore, each snapshot is a logical and self-sufficient copy of a set of VMs (and other artifacts) and serves as a full backup, in some cases eliminating the need for 3rd party backup tools; especially if snapshots are being replicated to a different Datrium cluster.

The built-in data protection also eliminates CPU and RAM tax imposed by 3rd party B/R software that competes with executing VMs for resources – as well as their separate training, maintenance and license fees.

Below is an example of Datrium VSS provider quiescing a Microsoft SQL Server and snapshotting the VM.

VM Configuration

16 core, 64GB, 10 virtual disks (2x500GB, 8x 40GB)

Number of VM stuns:

VMWare VSS – 3 VM stuns

Datrium VSS – No VM stun

Duration of Application Performance Dip:

VMWare VSS – 8-10 minutes

Datrium VSS – up to 10 seconds.

Now, watch this 1” minute demo of Datrium VSS.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

I am proud of the work that the vBrownBag Crew has been doing over the years. From their early beginnings podcasting and webcasting, and trying to help outcasted VMworld presenters… to the most recent VMworld when sessions were featured in the conference schedule-builder and Live WebCasting was available.

The team has grown and evolved over the years, but the community focus has remained the same, even if the team understands that to make the venture successful some level of partnership with vendors is required. Well-done and congratulations to all involved. Check out their website here vbrownbag.com

During VMworld, I presented a session on the topic of ‘An Introduction to OpenConvergence‘. I tried, as much as possible, to stay away from the vendor or product marketing, spending time discussing the evolution of the data center and how HCI is morphing into a more scalable and simpler architecture.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

In this version of the Datrium DVX we are demonstrating archival of snapshots from the on-prem DVX to AWS S3 without ever leaving vCenter UI. This presents our customers with the ability to protect their infrastructure data both in the on-prem DVX cloud and off site in AWS DVX cloud repositories.

The solution uses forever-incremental snapshots, compression, and global deduplication end-to-end, which saves both storage and network costs – especially relevant in the cloud where network egress costs can add up quickly.

Datrium will manage the solution on AWS end-to-end, as a service. An on-prem DVX can store VM or file snapshots in AWS S3 for long-term archival, and easily retrieve them as needed.

* This feature is technology preview

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

We have recently released a white paper on Microsoft SQL Server, and the purpose of this paper is to demonstrate the “real-world” performance of a virtualized Microsoft SQL Server running on the Datrium platform; and I emphasize “real-world” because absolute performance more often than not, is not the only design consideration.

The Benchmark Setup

For this benchmark, we used a Datrium CN2000 server as compute node with an Intel Xeon E5 running at 2.4GHz clock speed.

The first step of any database benchmarking effort should be to decide what you want to measure. Are you trying to compare the “raw performance” or the “real world performance”? Quite often people who want to test raw performance desire to measure and compare some extremely low-level metrics such as IOPS. However, industry standard benchmarks are designed to approximate real world workloads where performance is measured in a more meaningful way such as Transaction per Minute (or TPM).

For testing the system, we used HammerDB. HammerDB is an open-source database benchmarking tool widely used in the industry, for both transactional and analytic scenarios. We also have chosen the TPC-C benchmark to build a roughly 500GB database of 5,000 warehouses and used more than 500 concurrent users to drive the database transactions.

For those not familiar with the TCP-C, it simulates online transaction processing workloads (OLTP). The HammerDB workload simulated roughly a 70:30 split of read and write transactions at 8KB block size.

We also chose to perform tests with a SQL Server VM that has 12 vCPU and 256 GB RAM because we realized that this configuration would be most similar to the vast majority of SQL deployments.

The Datrium Config

While the minimum configuration is one compute node and one data node the Datrium DXV solution allows scalability up to 128 compute nodes and 10 data nodes in a single system. In this benchmark, we used a single compute node and a single data node.

If the choice for this benchmark were to scale-out with multiple MSSQL servers and multiple databases the system-wide performance and capacity would be up to 200 GB/s of read bandwidth and 18 Million IOPS, 10 GB/s Write throughput and 1.7 petabytes (PB) of effective capacity.

Furthermore, with Datrium, data services are always-on by default, including checksumming, deduplication, compression, and erasure coding. Therefore, during the benchmark, these features are turned On and working as usual.

However, before I go into the performance numbers it is important to remember that the Datrium architecture enables performance isolation between compute nodes, maintaining reads local and most importantly eliminating writes across servers, and eliminates possible noisy neighbor issues and creating more predictable service levels – unlike HCI solutions.

Benchmark Results

The picture below demonstrates the HammerDB SQL Server performance numbers achieved in Fast Mode. Fast mode is the regular operation mode for a Datrium DVX compute node, whereas a maximum of 20% host CPU utilization is allocated to the storage IO and data services. Datrium also provides the ability to enable Insane Mode allowing compute nodes to utilize up to 40% of host CPU to improve storage IO operations. However, this benchmark is not using Insane Mode; we are using Fast Mode.

The SQL database achieved over 2 Million transactions per minute with the average read latency of 0.8ms and local flash read hit rate of 100%. The average write latency was just 1.9ms.

When talking about database performance TPM is important, but perhaps more important is to maintain latencies low for both reads and writes.

The next test is during what we call worst-case scenario. We define worst-case as when space reclamation and snapshots are enabled and in use to protect VMs and applications. Generally, during worst-case, space reclamation processes would kick-in when the data cluster reaches 75% capacity utilization. This test is important to enable us to understand if and how performance is affected by common daily tasks. For that end, we created a protection group to protect the SQL VM with multiple native snapshots being created during the performance benchmark.

The graph below shows us real-time statistics captured while space reclamation and snapshots are in used to protect the SQL Server VM. In this test, the counter achieved more than 1.8 Million TPM, and we can see that that SQL performance is not much affected by those operations. The average read latency was just 0.9ms, and the local flash read hit rate was also 100%. The average write latency was just slightly above 2ms.

The results demonstrate that Datrium can host Microsoft SQL Server databases with sub-millisecond latencies for read operations. A single Microsoft SQL Server VM with 12 vCPU and 256 GB of RAM can achieve 2 Million TPM before reaching 100% VM vCPU utilization.

It is important to remember that this benchmark was not designed to drive maximum IOPs or maximum HammerDB TPM consistently. The focus is to create a real-world performance benchmark, similar to what most organizations would have in-house. As you can see the Datrium DVX platform provides a solution that is fast and efficient, but most importantly, predictable for running Microsoft SQL workloads.

I recently published couple blog posts with all features being made available with the latest Datrium DVX software release. This is an aggregation of those posts for easy reading.

Red Hat Virtualization (RHV) support

Linux Bare-Metal (RHEL and CentOS) support

Docker Persistent Volumes (Virtualized and Bare-Metal)

Full Data Services for Containers

Split Provisioning (128 servers and 10 data nodes)

ZeroConf

Cloud Scale (18 Million IOPS and 200GB/s)

Instant, Application Consistent Snapshots (Zero VM stun)

Do you know Datrium Open-Convergence?

From an architectural perspective, the best way to describe this game changing tech is to visualize all active data, both VMs and Containers, serviced with data locality and using internal flash (SSD and NVMe) on each server. At the same time, a protection copy of the data is hosted in clustered data nodes with distributed erasure coding. Each server runs the DVX hyperdriver software responsible for IO processing and enterprise data services.

One of the advantages of the architecture is that servers are stateless, and losing any given amount of servers doesn’t impact data protection, availability, or SLAs. On the other hand, data nodes are highly available and protected with active/standby controllers, mirrored NVRAM, and hot-plug drives.

Lastly, when applications move between servers or when a failover happens, the DVX software instantly uploads the data to the target server. The DVX software uses other servers as the source before pulling data from the data cluster, guaranteeing flash to flash performance whenever possible. Nevertheless, because of the native global deduplication is it likely that most fingerprinted data is readily available on the target server.

For official information on features and time frame refer to the official Datrium Press Release (here).

Red Hat Virtualization (RHV)

Datrium customers now can deploy Red Hat Virtualization (RHV) and inherently get data service benefits, including Flash and NVMe IO acceleration and end-to-end blanket encryption.

Red Hat is the world’s leading provider of open source solutions and has been named a Visionary in the 2016 Gartner’s Magic Quadrant for x86 Server Virtualization Infrastructure.

Besides enabling the use of data services, one of the biggest benefits of Datrium’s multi-hypervisor implementation is the ability use of the same DVX system for supporting concurrently RHV and VMware vSphere deployments.

Datrium is now certified by Red Hat and providing support for RHV we are providing choice to customers, but also paving a path to support the entire Red Hat stack and application partner ecosystem, including OpenStack, OpenShift, and CloudForms, providing a unified and consistent set of management capabilities across:

Red Hat Virtualization, VMware vRealize, and Microsoft Hyper-V(*).

Private cloud platforms based on OpenStack®.

Public cloud platforms like Amazon Web Services and Microsoft Azure.

While Datrium works independent from CloudForms, it does enable multiple virtualization platforms to run across the same DVX system, eliminating silos and complexity, and in some cases allowing easy workload migration between hypervisors.

An interesting fact about RHV is that it has record-setting SPECvirt_sc2013 benchmark results, including highest overall performance and the highest number of well-performing VMs on a single server.

Linux administrators see datastores as local NFS mounts, and the mounts are backed by the DVX hyperdriver (manually installed in each server with 3.0 release) responsible for enabling IO acceleration and data services.

With this release, Datrium provides support for KVM and Containers, but other use-cases may be supported in upcoming releases, including Splunk, SAP, Hadoop and more.

Containers Persistent Volumes (Bare-Metal and Virtualized)

Containers are ephemeral, and files and services running inside a Container will not exist outside its lifetime. However, many applications require the ability to persist user session activity, making some aspects of the application stateful. Enterprises want persistent storage for Containers, and they also want to use the same infrastructure to manage dockerized and traditional workloads during the application lifecycle, development and production.

No more choosing between bare-metal and hypervisor

Containers and VMs used together to provide a great deal of flexibility in deploying and managing apps.

Organizations usually start their Containers journey running apps in VMs for the added flexibility provided by virtualization stacks. However as soon the application lifecycle and methodology are fully defined, organizations move their production Containers environment to bare-metal to harvest additional performance, reducing the (9-15%) CPU overhead created by the virtualization stack.

Datrium supports Docker persistent volumes for both virtualized and bare-metal deployments, while still providing IO optimization, acceleration and data services, including end-to-end encryption, snaps, replication and more. Using Datrium’s approach to Containers the development lifecycle is streamlined and automated much more easily because the drift between environments (Dev, QA, Staging, Pre-Prod, and Prod) is minimal.

Image courtesy of Docker Website

Data Services and Protection for Containers

Albeit some may argue that Containers should remain ephemeral, in my experience working with enterprises, there is a clear need for maintaining persistence across sessions for some applications and datasets, but also there is an enormous need to protect data in persistent volumes.

With Datrium persistent volumes may be cloned on one server can be immediately used on another, between both virtual and bare-metal deployments.

A significant challenge with Containers, however, is that it represents an order of magnitude more objects to manage than virtual machines, especially when persistent volumes are implemented. DVX 3.0 addresses this challenge with a combination of powerful search capabilities, the ability to create logical groups of Containers (called a Protection Group) aligned to applications, and assignment of protection policies to those groups for instant recovery, archive, DR and more.

In other words, all data services typically used with virtual machine workloads, such as snaps, cloning replication, and blanket encryption are now also available for Containers at the granular Container level, and the Datrium GUI makes it easy to understand and monitor.

Split Provisioning

Allow me to provide some background to Open Convergence and the problems that we are effectively solving with the new architecture.

The SAN proposition

In legacy storage arrays, all the CPU intensive data management (deduplication, compression, erasure coding, etc.) is carried out on the array side, by the controllers. These controllers are often sized for maximum performance, but as the solution scales with more servers, each host will get fewer IOPS and less storage capacity.

Scaling storage arrays can be done either by attaching multiple disk shelves to the same controllers, therefore bottlenecking the controllers or optionally doing controller head-swap, but it requires downtime, and it is expensive. Some storage arrays use a scale-out approach with multiple controllers, but because data management happens at those beefy controllers, it quickly becomes a costly proposition.

Finally, another option is to adopt a multi array strategy, leading to storage silos, complex management, lack of global deduplication and coordination failures.

The HCI proposition

HCI places compute and storage together, and as you scale one dimension, you also scale the other (lockstep provisioning), not allowing for independent scaling. Some vendors provide storage only nodes, but those also come with additional and unnecessary computing.

Additionally, HCI vendors do not allow different hardware vendors as part of the same solution or cluster, not allowing the reuse of existing servers and also not allowing the repurpose of existing storage investments – all that equates to vendor lock-in.

When it comes to the data path and IO traffic, data being written always go across servers and network, and there is a lot of traffic between servers (east <> west), in many instances creating noisy neighbor issues when heavy workloads impact lighter workloads.

Finally, at scale, multiple clusters are formed due to the requirement to create multiple failure domains, and also due to the cost of creating additional replicas of the data for resiliency – the larger the cluster, the higher the possibility of a double or triple failure.

The Open Convergence (OCI) proposition

DVX is a scale-out system where capacity can be scaled by adding Data Nodes and performance can be scaled by adding Compute Nodes. With the DVX 3.0 payload, we now support a maximum of 10 data nodes. This translates to more than 1PB of effective usable capacity (300 TB of usable capacity before data reduction). This hyperscale approach eases administrative tasks and reduces the cost for private clouds at scale.

With the DVX 3.0 payload, we support a single Datastore spanning all data nodes. This creates a single global namespace and a single deduplication storage pool. Each data node has dual NVRAM, dual controllers, redundant data network links (2-4 depending on the Model), redundant power supplies, etc.… so that there is no single point of hardware failure within the data node cluster.

The DVX architecture is server powered, with all data management functions (replication, encryption, fingerprinting, compression, space reclamation, erasure coding, drive rebuild) being carried out on Compute Nodes. The compute available for storage data services scales out as more compute nodes are added. You can add data nodes to add capacity to the system. As more data nodes are added, write performance also increases linearly, since we get more disks and more network links resulting in more storage pool write bandwidth. The increased pool read bandwidth also helps with increased space reclamation performance and increased drive rebuild performance.

Split Provisioning Architecture – No Bottlenecks

Each data node brings drives, and they are pooled together in single drive pool, and disks are uniformly distributed across data nodes, increasing the NVRAM bandwidth for VMs because the system aggregates NVRAM across data nodes.

Data strips are broken down to data chunks that are distributed to the storage pool using a Layout Map. The Layout Map solves 2 problems; (1) make sure that the data distributed evenly across all data nodes, and (2) in the event of a disk failure and during a rebuild, the load is distributed across all disks and hosts.

When a new data node is added, and more disks are added to the drive pool, the data is rebalanced, and the rebalancing also scales as data nodes are added. The data stripe is written via distributed erasure coding, making sure that there are two EC parity chunks to tolerate two simultaneous drive failures in the drive pool. More parity chunks may be added in the future if there is a need to tolerate more concurrent drive failures. Rebuild times decrease linearly as data nodes are added.

ZeroConf

Darium DVX now uses the mDNS multicast protocol to support the ZeroConf. ZeroConf is a set of technologies where, when a device is plugged into a network, a unique IP is assigned automatically, and it will resolve to a known hostname within that local subnet. ZeroConf can also be extended to provide a means to discover services available on each device.

With Datrium, ZeroConf is used for an initial deployment, to connect to a DVX on the local network with a public hostname and to configure the system. Also, it provides node discovery in the cluster, and to list all the available nodes in the local network.

Cloud Scale

Ok, here is where things start to get interesting. The Datrium architecture has been from the ground up architected for 1000’s of drives and 100’s of hosts. With the DVX 3.0 payload, the solution achieves mind-boggling numbers, scaling 10X, up to 128 compute nodes, 10 data nodes, 1.7 Petabyte data pool, 18 Million IOPS, and more than 8GB/sec write throughput.

Performance

The system-wide performance is a combination of the number of compute nodes and data nodes in the platform. The numbers below demonstrate some of the incredible internal benchmarks done with 2x-compressible, un-dedupable data.

How does Datrium look like, compared to…

To put into perspective, using a 70:30 read to write split ratio with 8K block sizes would give us a direct comparison to XtremIO. Using this configuration, DVX will do 3.3M IOPS; 3.7x better performance than the largest XtremIO. XtremIO Specifications (here).

You may compare with 32KB reads where DVX will deliver 6.25M IOPS; 17x better than the largest Pure Storage FlashArray. Pure Storage Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 2.7x better than the largest EMC VMAX All Flash. EMC VMAX All Flash Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 1.8x better than the largest SolidFire All Flash array. SolidFire All Flash Specifications (here).

In all honesty, the only thing that is in our league might be EMC Scale IO, and performance is their only metric. But if you care about data services like VM awareness, deduplication, compression, snapshots, cloning, erasure coding and replication, Datrium DVX is the only solution that can get the job done.

Bear in mind that Datrium is a hybrid platform, not an All Flash system like the above arrays, and Datrium data nodes use 7,200 RPM hard disks for durable storage. That is truly mind-boggling! The secret sauce comes from designing a brand-new log structured file system that works by treating the entire filesystem as a log: erasure-coded objects are written exactly once in an append-only log, never overwritten, and deleted. It is a difficult problem to implement a distributed, scale out LFS – especially when you consider how to reclaim space, but it gives you several excellent properties. Read mode in this post.

If any of the vendors mentioned above disagree, or if you have a more current specs sheet, please let me know, and I will happily correct the disparity.

HOW ABOUT 100% RANDOM WRITE LARGE SPAN WORKLOADS?

(This is what separates the men from the boys)

We could not find a single vendor that would publish such workload numbers because it’s a very difficult workload. Our performance engineering teams have been hard at work trying to push the system as much as they can. The picture below demonstrates the DVX 100% random-write performance without gaming the results, like writing to a small file in NVRAM or such.

490 VMs

100% 32KB random writes

2.1x compressible

Undedupable data

35 compute hosts

10 data nodes

Large span (1TB per VM) with roughly half petabyte of logical LBA span

Those with any enterprise storage experience will agree that 8.5 GBps of Random-Write Throughput with 1.5ms application latency is an astonishing achievement, especially if one considers that all data is being checksummed, deduplicated, compressed and erasure coded inline to disk.

Please note that I refer to application latencies as seen by ESX because this is the latency that is measured from the ESX level perspective, all the way through the network and storage stack, and back. Storage arrays normally report only one component of this latency, namely internal array latency that excludes the network and client storage protocol overheads.

[Click to Augment the Image]

System Maximums

Drive Rebuilds

Another interesting point is that the bigger the solution, more throughput & NVRAM is available, but also faster drive rebuilds. The time to rebuild drive failures decreases more than linearly.

Instant, Application Consistent Snapshots

VSS (Volume Shadow-Copy Service) is a Microsoft Windows service that allows backup and snapshot applications to “quiesce” guest applications –, i.e., put them on a consistent on-disk state — before taking a backup. Datrium has created its own VSS Provider to implement native instance app-consistent snapshot capabilities for Microsoft workloads.

With this first release of the VSS Provider, supports Windows Server 2008 R2 onwards and Microsoft SQL 2005 onwards. Additional Microsoft applications such as Exchange and AD Controllers will be enabled in the future, after testing.

Due to the native integration with the Datrium platform, where both the VM metadata and the data lives, Datrium can eliminate VM stun times and drastically reduce the application performance dip. As a result, IT admins can take VM-level pause-less snapshots of applications with high change rates and at greater frequencies for more granular recovery.

Datrium DVX 3.0 supports up to 1.2M snapshots in a 10 data node configuration, and 300K with a single data node.

Below is an example of Datrium VSS provider quiescing a Microsoft SQL Server and snapshotting the VM.

VM configuration:

16 core, 64GB, 10 virtual disks (2x500GB, 8x 40GB)

Number of VM stuns:

VMWare VSS – 3 VM stuns

Datrium VSS – No VM stun

Duration of Application Performance Dip:

VMWare VSS – 8-10 minutes

Datrium VSS – up to 10 seconds.

Peer Cache Mode

In DVX, we hold all data in use on flash on the host. Moreover, we guide customers to size host flash to hold all data for the VMDKs. With always-on dedupe/compression for host flash as well, this is feasible – with just 2TB flash on each host and 3X-5X data reduction you can have 6-10TB of effective flash. (DVX supports up to 16TB of raw flash on each host). Experience proves this is in fact what our customers do: by and large, our customers configure sufficient flash on the host and get close to 100% hit rate on the host flash.

However, in most instances due to data reduction benefits, customer decide to have only 1 or 2 flash devices on each server, because that’s more than enough from a capacity and performance standpoints. With previous releases of DVX if the last available flash device failed the workload would then stop, and applications would have to be restarted, manually or via HA, on a different host.

With DVX 3.0 we are introducing the ability to utilize Peer Cache, the flash devices from other hosts, to keep the workload running even if the last available flash device fails, and without drastically impacting application performance until new flash devices are placed. As with any array, you now would have to traverse the network, and there would be some additional latency, but in this case, DVX would be working like any other SAN.

As with any array, you now would have to traverse the network for IO read operations, and there would be some additional latency given that we would be introducing East <-> West traffic for reads instead of being local. But in this case, DVX would be working just like a SAN.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

A couple of times I have been asked about caching performance with Datrium DVX. It’s intuitive for people without an understanding of the technology to think that DVX performance is contingent entirely on the host cache and if nodes go down the caches would have to be rewarmed, or a full cache integrity checks would be necessary, like in ZFS with L2ARC. In the L2ARC case, performance could drag for hours until the cache is repopulated.

That is not the case with DVX…

In DVX, we hold data in-use on flash on the host. Moreover, we guide customers to size host flash to hold all data for the VMDKs. With always-on dedupe/compression for host flash as well, this is feasible – with just 2TB flash on each host and 3X-5X data reduction you can have 6-10TB of effective flash capacity. (DVX supports up to 16TB of raw flash on each host). Experience proves this is in fact what our customers do: by and large, our customers configure sufficient flash on the host and get close to 100% hit rate on the host flash.

With any array, you have to traverse the network. However, with any modern SSD, the network latency can be an order of magnitude higher than the device access latency. Flash does belong in the host, especially if you are talking about NVMe drives with sub-50usec latency.

What about a host failure?

Because all data is fingerprinted and globally deduplicated, when a VM moves between servers there is a very high likelihood that most data blocks for similar VMs (Windows/Linux OS, SQL Servers, etc.,) are already present on the destination server and data movement will not be necessary for those blocks.

Flash-to-Flash

DVX also uses a technology we call F2F (flash-to-flash)– the target host fetches data from other hosts (various) flash and moves the data over to the destination host if necessary. DVX can read data from host RAM, host flash, or Data Node drives (or, during failures, from Data Node NVRAM). DVX optimises reads to retrieve data from the fastest media. You lose data locality for the period during which this move happens, but it is restored reasonably quickly.

VM Restart and VMotion

In the uncommon case, i.e., after a VM restart on a new host (or after a vMotion) and the data is not available in any of the hosts flash, the DVX performance will be more like a conventional array (or some other HCI systems without data locality). However, the DVX optimises the storage format in the Data Node drive pool for precisely this situation. VMs tend to read vDisks in large consecutive clumps, usually reading data that was written together. Large clumps of the most current version of the vDisk are stored together. These are uploaded to the host as a contiguous stream upon the request of any individual block, providing a significant degree of read-ahead as a vDisk is accessed. Subsequent reads of the same blocks, of course, will be retrieved from local flash rather than from the Data Node.

Peer Cache

Furthermore, DVX also has a critical feature called ‘Peer Cache‘ that adds enhanced resiliency to the platform and protect applications even is the all local flash device fails.

The DVX worst case is someone else’s best case.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

I recently wrote about the new Datrium 3.0 release with blogs here and here, including the support for Red Hat Virtualization and mind-boggling performance numbers. However, during my writing, I forgot to include a new and critical feature that adds enhanced resiliency to the platform and protect the applications even is the last local flash device fails.

Peer Cache Mode

In DVX, we hold all data in use on flash on the host. Moreover, we guide customers to size host flash to hold all data for the VMDKs. With always-on dedupe/compression for host flash as well, this is feasible – with just 2TB flash on each host and 3X-5X data reduction you can have 6-10TB of effective flash. (DVX supports up to 16TB of raw flash on each host). Experience proves this is in fact what our customers do: by and large, our customers configure sufficient flash on the host and get close to 100% hit rate on the host flash.

However, in most instances due to data reduction benefits, customer decide to have only 1 or 2 flash devices on each server, because that’s more than enough from a capacity and performance standpoints. With previous releases of DVX if the last available flash device failed the workload would then stop, and applications would have to be restarted, manually or via HA, on a different host.

With DVX 3.0 we are introducing the ability to utilize Peer Cache, the flash devices from other hosts, to keep the workload running even if the last available flash device fails, and without drastically impacting application performance until new flash devices are placed. As with any array, you now would have to traverse the network, and there would be some additional latency, but in this case, DVX would be working like any other SAN.

As with any array, you now would have to traverse the network for IO read operations, and there would be some additional latency given that we would be introducing East <-> West traffic for reads instead of being local. But in this case, DVX would be working just like a SAN.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

It’s been an exciting week at Datrium with the release of our new payload (here) that delivers incredible performance numbers for virtualized and bare-metal applications. Yesterday I also gave a presentation on Real-World MS SQL Server Performance Benchmark running on Datrium DVX – my bit start at 27:25s. Enjoy!

Last month Datrium announced the first part of the DVX 3.0 payload and I blogged here; and today we are announcing the second part. In the first announcement, Datrium took us where no HCI vendor has been before, offering not only a multi-hypervisor platform (VMware vSphere, RedHat Virtualization and Open-source KVM on CentOS hosts), but also adding support for bare-metal containers on Linux hosts, both with granular data management.

Red Hat Virtualization (RHV) support

Linux RHEL and CentOS hosts running bare-metal containers

Full Data Services for Containers & KVM virtual machines

Today we are introducing some real awesomeness with mind-boggling performance!

Split Provisioning (128 servers and 10 data nodes)

ZeroConf

Cloud Scale (18 Million IOPS and 200GB/s)

Instant, Application Consistent Snapshots (Zero VM stun)

Split Provisioning

Allow me to provide some background to Open Convergence and the problems that we are effectively solving with the new architecture.

The SAN proposition

In legacy storage arrays, all the CPU intensive data management (deduplication, compression, erasure coding, etc.) is carried out on the array side, by the controllers. These controllers are often sized for maximum performance, but as the solution scales with more servers, each host will get fewer IOPS and less storage capacity.

Scaling storage arrays can be done either by attaching multiple disk shelfs to the same controllers, therefore bottlenecking the controllers or optionally doing controller head-swap, but it requires downtime, and it is expensive. Some storage arrays use a scale-out approach with multiple controllers, but because data management happens at those beefy controllers, it quickly becomes a costly proposition.

Finally, another option is to adopt a multi array strategy, leading to storage silos, complex management, lack of global deduplication and coordination failures.

The HCI proposition

HCI places compute and storage together, and as you scale one dimension, you also scale the other (lockstep provisioning), not allowing for independent scaling. Some vendors provide storage only nodes, but those also come with additional and unnecessary computing.

Additionally, HCI vendors do not allow different hardware vendors as part of the same solution or cluster, not allowing the reuse of existing servers and also not allowing the re-purpose of existing storage investments – all that equates to vendor lock-in.

When it comes to the data path and IO traffic, data being written always go across servers and network, and there is a lot of traffic between servers (east <> west), in many instances creating noisy neighbor issues when heavy workloads impact lighter workloads.

Finally, at scale, multiple clusters are formed due to the requirement to create multiple failure domains, and also due to the cost of creating additional replicas of the data for resiliency – the larger the cluster, the higher the possibility of a double or triple failure.

The Open Convergence (OCI) proposition

DVX is a scale-out system where capacity can be scaled by adding Data Nodes and performance can be scaled by adding Compute Nodes. With the DVX 3.0 payload, we now support a maximum of 10 data nodes. This translates to more than 1PB of effective usable capacity (300 TB of usable capacity before data reduction). This hyperscale approach eases administrative tasks and reduces the cost for private clouds at scale.

With the DVX 3.0 payload, we support a single Datastore spanning all data nodes. This creates a single global namespace and a single deduplication storage pool. Each data node has dual NVRAM, dual controllers, redundant data network links (2-4 depending on the Model), redundant power supplies, etc.… so that there is no single point of hardware failure within the data node cluster.

The DVX architecture is server powered, with all data management functions (replication, encryption, fingerprinting, compression, space reclamation, erasure coding, drive rebuild) being carried out on Compute Nodes. The compute available for storage data services scales out as more compute nodes are added. You can add data nodes to add capacity to the system. As more data nodes are added, write performance also increases lineraly, since we get more disks and more network links resulting in more storage pool write bandwidth. The increased pool read bandwidth also helps with increased space reclamation performance and increased drive rebuild performance.

Split Provisioning Architecture – No Bottlenecks

Each data node brings drives, and they are pooled together in single drive pool, and disks are uniformly distributed across data nodes, increasing the NVRAM bandwidth for VMs because the system aggregates NVRAM across data nodes.

Data strips are broken down to data chunks that are distributed to the storage pool using a Layout Map. The Layout Map solves 2 problems; (1) make sure that the data distributed evenly across all data nodes, and (2) in the event of a disk failure and during a rebuild, the load is distributed across all disks and hosts.

When a new data node is added, and more disks are added to the drive pool, the data is rebalanced, and the rebalancing also scales as data nodes are added. The data stripe is written via distributed erasure coding, making sure that there are two EC parity chunks to tolerate 2 simultaneous drive failures in the drive pool. More parity chunks may be added in the future, if there is a need to tolerate more concurrent drive failures. Rebuild times decrease linearly as data nodes are added.

ZeroConf

Darium DVX now uses the mDNS multicast protocol to support the ZeroConf. ZeroConf is a set of technologies where, when a device is plugged into a network, a unique IP is assigned automatically, and it will resolve to a known hostname within that local subnet. ZeroConf can also be extended to provide a means to discover services available on each device.

With Datrium, ZeroConf is used for an initial deployment, to connect to a DVX on the local network with a public hostname and to configure the system. Also, it provides node discovery in the cluster, and to list all the available nodes in the local network.

Cloud Scale

Ok, here is where things start to get really interesting. The Datrium architecture has been from the ground up architected for 1000’s of drives and 100’s of hosts. With the DVX 3.0 payload, the solution achieves mind-boggling numbers, scaling 10X, up to 128 compute nodes, 10 data nodes, 1.7 Petabyte data pool, 18 Million IOPS, and more than 8GB/sec write throughput.

Performance

The system-wide performance is a combination of the number of compute nodes and data nodes in the platform. The numbers below demonstrate some of the incredible internal benchmarks done with 2x-compressible, undedupable data.

How does Datrium look like, compared to…

To put into perspective, using a 70:30 read to write split ratio with 8K block sizes would give us a direct comparison to XtremIO. Using this configuration, DVX will do 3.3M IOPS; 3.7x better performance than the largest XtremIO. XtremIO Specifications (here).

You may compare with 32KB reads where DVX will deliver 6.25M IOPS; 17x better than the largest Pure Storage FlashArray. Pure Storage Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 2.7x better than the largest EMC VMAX All Flash. EMC VMAX All Flash Specifications (here).

You may compare with nominal read IOPS, likely 4KB, where DVX will deliver 18M IOPS; 1.8x better than the largest SolidFire All Flash array. SolidFire All Flash Specifications (here).

In all honesty, the only thing that is in our league might be EMC Scale IO, and performance is their only metric. But if you care about data services like VM awareness, deduplication, compression, snapshots, cloning, erasure coding and replication, Datrium DVX is the only solution that can get the job done.

Bear in mind that Datrium is a hybrid platform, not an All Flash system like the above arrays, and Datrium data nodes use 7,200 RPM hard disks for durable storage. That is truly mind-boggling! The secret sauce comes from designing a brand-new log structured file system that works by treating the entire filesystem as a log: erasure-coded objects are written exactly once in an append-only log, never overwritten, and deleted. It is a difficult problem to implement a distributed, scale out LFS – especially when you consider how to reclaim space, but it gives you several excellent properties. Read mode in this post.

If any of the vendors mentioned above disagree, or if you have a more current specs sheet, please let me know, and I will happily correct the disparity.

HOW ABOUT 100% RANDOM WRITE LARGE SPAN WORKLOADS?

(This is what separates the men from the boys)

We could not find a single vendor that would publish such workload numbers, because it’s a very difficult workload. Our performance engineering teams have been hard at work trying to push the system as much as they can. The picture below demonstrates the DVX 100% random-write performance without gaming the results, like writing to a small file in NVRAM or such.

490 VMs

100% 32KB random writes

2.1x compressible

Undedupable data

35 compute hosts

10 data nodes

Large span (1TB per VM) with roughly half petabyte of logical LBA span

Those with any enterprise storage experience will agree that 8.5 GBps of Random-Write Throughput with 1.5ms application latency is an astonishing achievement, especially if one considers that all data is being checksummed, deduplicated, compressed and erasure coded inline to disk.

Please note that I refer to application latencies as seen by ESX because this is the latency that is measured from the ESX level perspective, all the way through the network and storage stack, and back. Storage arrays normally report only one component of this latency, namely internal array latency that excludes the network and client storage protocol overheads.

[Click to Augment the Image]

System Maximums

Drive Rebuilds

Another interesting point is that the bigger the solution, more throughput & NVRAM is available, but also faster drive rebuilds. The time to rebuild drive failures decreases more than linearly.

Instant, Application Consistent Snapshots

VSS (Volume Shadow-Copy Service) is a Microsoft Windows service that allows backup and snapshot applications to “quiesce” guest applications –, i.e., put them on a consistent on-disk state — before taking a backup. Datrium has created its own VSS Provider to implement native instance app-consistent snapshot capabilities for Microsoft workloads.

With this first release of the VSS Provider, supports Windows Server 2008 R2 onwards and Microsoft SQL 2005 onwards. Additional Microsoft applications such as Exchange and AD Controllers will be enabled in the future, after testing.

Due to the native integration with the Datrium platform, where both the VM metadata and the data lives, Datrium can eliminate VM stun times and drastically reduce the application performance dip. As a result, IT admins can take VM-level pause-less snapshots of applications with high change rates and at greater frequencies for more granular recovery.

Datrium DVX 3.0 supports up to 1.2M snapshots in a 10 data node configuration, and 300K with a single data node.

Below is an example of Datrium VSS provider quiescing a Microsoft SQL Server and snapshotting the VM.

VM configuration:

16 core, 64GB, 10 virtual disks (2x500GB, 8x 40GB)

Number of VM stuns:

VMWare VSS – 3 VM stuns

Datrium VSS – No VM stun

Duration of Application Performance Dip:

VMWare VSS – 8-10 minutes

Datrium VSS – up to 10 seconds.

Peer Cache Mode

There is a lot more goodness coming from Datrium in the next few months, and all types of organizations are already realizing the benefits of Open Convergence, and at the same time recognizing the shortcomings and disadvantages of legacy architectures, both SAN and HCI.

Another VMworld is upon us, and I couldn’t be more excited about attending the conference with Datrium. Datrium technology excites me as much as Hyperconvergence did five years ago, and VDI did ten years ago ...and Datrium is going big time for this VMworld.

Suffice to say that the team at Datrium formed part of the founding teams for VMware and Data Domain. Data Domain revolutionized the enterprise storage industry by introducing deduplication technology, and VMware does not need introductions. Diane Greene and Mendel Rosenblum, VMware founders themselves, are early funders at Datrium. Maybe they will do a cameo appearance?!

We have recently made some awesome announcements as part of our Lithium (code name) payload, including RedHat RHV support, Docker Volumes, and Data Services for Container (see all here). However what is to come over the next couple week will be incredibly bigger than these previous announcements. We will announce something never seen before in the tech industry!

It will be Galore!!!

But the announcements are only in the next couple weeks, so for now, let’s talk about VMworld. Please, come and learn about Open Convergence, it’s new, it’s exciting, it’s disruptive!!!!

Before VMworld 2017

vExpert Community Webinar – Buckle-up for an exclusive live webcast for the vExpert community to share some news Galore! from Datrium. It will be massive!

vExperts, vExperts, vExperts, vExperts !!! We have plenty of gifts for you! – To show our appreciation for your dedication to being the best in the biz, the first 300 certified 2017 vExperts to register using the link below, will receive one cool custom ARDUBOY!

During VMworld 2017

#VMundeground – The fun starts on Sunday at 1 pm, at the Beerhaus (at the Park), next to New York New York, and Datrium is one of the event sponsors alongside Veeam, VTUG, Uila, and DellEMC. I will be there, so come bay and say hi and support the vCommunity.

DATRIUM VMworld 2017 vCast – This is an exclusive on-demand webcast with Brian Biles, CEO, and Founder of Datrium, direct from the VMworld Show Floor! Also, win, win win. Register for the webcast, and you will have automatically entered for a chance to win cool custom Datrium Raspberry Pi3!

All Datrium and Open Convergence Sessions

At the end of the day, it is all about the conference sessions and the networking. We have plenty of Datrium sessions discussing not only our incredible product and features but also addressing the overall effort in creating a new breed of converged architecture that we call Open Convergence. We have sessions from Monday through Thursday.

If you just want an introduction to Open COnvergence to understand the fundamental architectural changes to the architecture and how to benefit from it, my sessions will cover that without (I’ll try my best) any marketing slides or pitch. Come by, Thursday morning to this community session – it’s only 15min, and I’ll do my best to keep within the allotted timeframe. This session will also have a live webcast, and I’ll share the link as soon as I know.

Recently there’s been an exchange between industry luminaries on data reduction, data locality, and data protection. Howard Marks wrote a thoughtful piece <here> that expands on VSAN’s approach to data locality. Josh Odgers, the brilliant blogger now at Nutanix, responded with some notes <here> that in my opinion struggled to close the issue. They both seem to be reaching for a silver bullet that’s not there.

The objective of this blog post is to demonstrate that Data Locality is essential for enhancing application performance, and explain how it is possible to solve the application locality and the management complexity dilemmas seamlessly yet wielding high performance and data reduction benefits

When it comes to Performance, just get out of the way of Intel.If you can let server hardware serve applications as fast as possible and remain stateless, performance will be as good as possible, and what’s left is making administration simple.

When the Datrium team set out to build the best converged system possible, they considered many of these issues. In addition to all this, one overriding concern we had was simplicity – figuring out what features to enable when is a complete waste of time. If the feature is on a per-VM or per-some-group-of-objects basis, then the complexity truly becomes unmanageable. It is simply not possible to track 1000s of VMs and figure out what needs to be enabled and when. So: All features must be On all the time.

Incidentally, this is one area where modern arrays like Pure Storage nailed it, but most HCI vendors have checkboxes galore. If you can also add capacity or bandwidth/IOPS at will, then you will have a solved a real problem at scale.

Let’s look at how Datrium figures in each one of these angles.

The fundamentals: a true log-structured filesystem

The fundamental technology that enables features such as data reduction, data locality, and data protection in a Datrium DVX system is a Log Structured Filesystem, first described by Mendel Rosenblum (who is incidentally one of our investors, and one of the founders of VMware).

In a nutshell, an LFS works by treating the entire filesystem as a log: objects are written exactly once in an append-only log, never overwritten, and deleted. It is a difficult problem to implement a distributed, scale out LFS – especially when you consider how to reclaim space, but it gives you several excellent properties.

It lets us handle variable sized objects – if a 4K block is compressed to 3172 bytes, saving that 25% of space is a Good Thing. “Normal” 4K-based block allocation techniques will not let you save that space, but appending a 3.1K object to a log is as simple as appending a 4K object, there is no difference.

Garbage Collection, an inherent part of any LFS, is also a natural opportunity for deduplication. As live data is copied forward to reclaim space, duplicate as well as dead data can be left behind. Moreover, the more data that can be left behind, the more efficient the process.

You can compute parity to tolerate N failures once, and write a whole stripe from the get go and never modify the stripe – you do not have to incur the complexity or cost of reading cold data, erasure coding the data, and re-writing stripes. Unless you have a true append-only LFS, doing distributed erasure coding can get very challenging, which is why most HCI systems do not have always-on EC.

Flash is great at random reads but not random writes. LFS converts random writes to sequential writes, which is ideal for both flash and disk.

Of course, these are fundamental properties of a filesystem. It is next to impossible to change a filesystem at such fundamental levels once it is implemented. You have to figure out most of the requirements up front, and most current HCI products were not built for storage efficiency from the beginning – erasure coding, dedupe, compression, and encryption were layered on (or sometimes not) with bad side effects.

Compression

Compression ratios are of course workload dependent. Once you have an LFS compression is in fact quite straightforward to implement. You are only appending variable sized objects to a log. The trick is to find a compression algorithm that quickly gives up if the data is incompressible. We use a modified version of Google’s snappy. It is very CPU efficient thanks to new Intel instructions and has a reasonable compression ratio.

Here’s where skeptics come out and say “but you are taking up CPU even if the workload is incompressible”. The answer is – Who Cares? We measured workloads with an internal option that we specifically added for measurement purposes. The difference is truly in the noise – maybe 1-2% CPU savings. That is it! What do you care about, managing options on 1000s of objects or a couple of percent of CPU utilization, which Intel will make up shortly after this article is written?

Deduplication

Here’s how deduplication works at a high level: if there are multiple references to the same piece of data, the filesystem keeps one copy of the data. All references to the data will then use this same copy.

Dedup is very workload dependent. With VDI, you get 10X+ deduplication. With a log-processing workload, you get almost no dedupe except the OS image. Deduplication systems may have some overhead in terms of fingerprinting data, but much more in keeping deduplication tables up to date; and that is the fundamental reason some vendors only implement post-process deduplication.

With content addressing, a data block is immutable (new data = new content = new fingerprint), and it does not belong to any particular object. This saves lots of bookkeeping. Replicating at an object granularity is irrelevant in such a system – there’s just a pool of content shared by whatever file object wants to use it, including snapshots and clones. DVX doesn’t have refcounts and that also greatly simplifies both cloning and snapshotting. In particular, DVX doesn’t have to update a bunch of ref counts just to create a clone.

Suffice it to say that given the nature of our filesystem, deduplication is Always-On. Computing a fingerprint is almost free with new Intel instructions, especially if you can do it in one pass along with compressing the data. After computing the fingerprint, the cost is the same whether a piece of data has duplicates or not. There is little to no performance loss because of any of this, as is proven by a few array vendors. With deduplication and compression always enabled, Pure is killing it with good performance in the Tier-0/Tier-1 market.

We will publish a detailed DVX performance white paper with our performance later this year, and you can see the numbers for yourself. We have already published read IOPS/bandwidth numbers before though: with deduplication and compression enabled, with undedupable random data, we achieve about 140K IOPS (4K random read) and 1.5GBps bandwidth (32K random read) from a single host. We used undedupable data in the above benchmark so that there is zero gaming of the results. Note that in a DVX system the read bandwidth scales with the number of hosts, given that reads are local due to data locality.

As you can see, there is no reason to tweak four kinds of knobs for performance and data reduction depending on workload, if you have the right kind of filesystem.

Erasure Coding

Data durability in the presence of failures is table stakes for any storage system. Failure tolerance is achieved by redundancy in some fashion. One way to achieve redundancy is with mirroring. You can mirror 2-way (RF=2, FTT=1) or 3-way (RF=3, FTT=2). At any scale and seriousness, you have to do 3-way replication, or you are rolling the dices on data loss.

The reason is not that you will lose 2 disk drives at the time. What is much, much more common is the following scenario: 1 drive fails, and the system starts re-mirroring data from the remaining drive. For this re-mirroring, you have read from the remaining drive. All it takes is a sector read error from the remaining drive, and you have now lost data. NetApp has published extensive studies on this that demonstrates this problem. The summary for that study: 5% to 20% of all disks in the study had at-least one sector read error. So, if you are mirroring, choose RF=3/FTT=2 if you care about your data.

The problem with 3-way mirroring is that you now have 3X the overhead. Enter Erasure Coding. At a high level, Erasure Coding tolerates 2 drive failures (or 1 drive failure and a sector read error in the remaining drive). This is achieved by computing Error Correcting Codes that tolerate 2 failures. With a good implementation, you can tolerate 2 drive failures with an overhead of 25% or so. Which is way better than 3X.

Many HCI vendors have this as a checkbox with various caveats because doing distributed Erasure Coding is a hard problem. As Howard Marks points out, Erasure Coding messes with locality. Also, HCI vendors’ implementation of EC works only with write-cold data or on all-flash systems because there might be some read-modify-write involved which tanks performance. In some cases, even if you enable the “Erasure Code” checkbox, the system cannot actually Erasure Code the data – which means that you cannot bank on a 20% overhead, it might, in fact, be 3X overhead even if you have the box checked. This is one aspect that a few array vendors like Pure nailed: Erasure Coding is on by default, always on, and the overhead for durability is 20-30%.

DVX is similar to arrays in this regard – the only data durability method offered, which is not an option, is 2-drive failure tolerance using Erasure Coding. All data is always Erasure Coded and stored in a data node cluster. The DVX software computes the codes and writes the parity stripes directly from the host to the data nodes. Because of the write-once nature of LFS, there is no read-modify-write issue. Hosts are thus stateless. The data node itself has no single point of failure, so you are covered there as well. As a nice side-effect, when a host fails, there are none of the re-replication nightmares.

On to data locality and how it interacts with Erasure Coding.

Data Locality

As Howard Marks points out, 3-way replication is expensive and Erasure Coding in an HCI system shmears data all over the nodes which make data locality arguments problematic. Data Locality is another place where the DVX system is fundamentally different from both arrays and traditional HCI. By data locality, we allude to the data residing on flash on the host where a VM is running.

The key word here is host flash. In a DVX, we hold all data in use on a host on flash on the host. Moreover, we guide customers to size host flash to hold all data for the VMDKs. With always-on dedupe/compression for host flash as well, this is totally feasible – with just 2TB flash on each host and 3X-5X data reduction you can have 6-10TB of effective flash. (DVX supports up to 16TB of raw flash on each host). Experience proves this is in fact what our customers do: by and large, our customers configure sufficient flash on the host and get close to 100% hit rate on the host flash.

With any array, you have to traverse the network. However, with any modern SSD, the network latency can be an order of magnitude higher than the device access latency. Flash really does belong in the host, especially if you are talking about NVMe drives with sub-50usec latency. IOPS and throughput can be improved by buying bigger and bigger controllers, but there is no way out for latency: you need the flash on the host or suffer the consequences.

What about VMotion (live migration)? When a VM is VMotioned to another host, the DVX uses a technology we call F2F (flash-to-flash) – the destination host will fetch data from the source host flash and move the VM’s data over to the destination host. You lose data locality for the period during which this move happens, but it is restored reasonably quickly as the workload continues to run on the destination host. However, VMware DRS does not do VMotions every few minutes – even at the most aggressive level DRS has hysteresis, and VMs move on average once or twice a day at most. This means that in the common case, data locality really does help reduce network traffic and latencies hugely.

In the uncommon case i.e., after a VMotion, the DVX performance will be more like an array (or some other HCI systems without data locality). That is, the DVX worst case is someone else’s best case. Note that we preserve data locality even as data protection uses erasure coding – this is the key point.

Finally, because all data is fingerprinted and globally deduplicated, when a VM moves between servers there is a very high likelihood that most data blocks for similar VMs (Windows/Linux OS, SQL Servers, etc.) are already present on the destination server and data movement will not be necessary for those blocks.

Conclusion

With the right approach, you can solve all constraints: Datrium has always-on Data Reduction features like arrays, always-on Erasure Coding like arrays, Data Locality like the best HCI systems out there, and incremental scaling of capacity and IOPS/bandwidth like no one. I’m sorry if this reads like a commercial, but it is in fact true

What an amazing ride it has been so far; Datrium has been everything I hoped it would be, from technology and engineering to sales and management teams. This is my first Datrium “Beyond Marketing” series post, and I am covering the release of DVX 3.0.

The first part of this release takes Datrium where no hyperconvergence vendor has been before, offering support for multi-hypervisors (vSphere, Red Hat Virtualization and CentOS) and Linux bare-metal workloads, and support for bare-metal containers with granular data management.

Red Hat Virtualization (RHV) support

Linux Bare-Metal (RHEL and CentOS) support

Docker Persistent Volumes (Virtualized and Bare-Metal)

Full Data Services for Containers

Do you know Datrium Open-Convergence?

From an architectural perspective, the best way to describe this game changing tech is to visualize all active data, both VMs and Containers, serviced with data locality and using internal flash (SSD and NVMe) on each server. At the same time, a protection copy of the data is hosted in clustered data nodes with distributed erasure coding. Each server runs the DVX hyperdriver software responsible for IO processing and enterprise data services.

One of the advantages of the architecture is that servers are stateless, and losing any given amount of servers doesn’t impact data protection, availability, or SLAs. On the other hand, data nodes are highly available and protected with active/standby controllers, mirrored NVRAM, and hot-plug drives.

Lastly, when applications move between servers or when a failover happens, the DVX software instantly uploads the data to the target server. The DVX software uses other servers as the source before pulling data from the data cluster, guaranteeing flash to flash performance whenever possible. Nevertheless, because of the native global deduplication is it likely that most fingerprinted data is readily available on the target server.

For official information on features and time frame refer to the official Datrium Press Release (here).

Red Hat Virtualization (RHV) Partnership & Certification

Datrium customers now can deploy Red Hat Virtualization (RHV) and inherently get all Datrium data service benefits, including Flash and NVMe IO acceleration and end-to-end blanket encryption.

Red Hat is the world’s leading provider of open source solutions and has been named a Visionary in the 2016 Gartner’s Magic Quadrant for x86 Server Virtualization Infrastructure.

Besides enabling the use of data services, one of the biggest benefits of Datrium’s multi-hypervisor implementation is the ability use of the same DVX system for supporting concurrently RHV and VMware vSphere deployments.

Datrium is now certified by Red Hat and providing support for RHV we are providing choice to customers, but also paving a path to support the entire Red Hat stack and application partner ecosystem, including OpenStack, OpenShift, and CloudForms, providing a unified and consistent set of management capabilities across:

Red Hat Virtualization, VMware vRealize, and Microsoft Hyper-V(*).

Private cloud platforms based on OpenStack®.

Public cloud platforms like Amazon Web Services and Microsoft Azure.

Whilst Datrium works independent from CloudForms, it does enable multiple virtualization platforms to run across the same DVX system, eliminating silos and complexity, and in some cases enabling easy workload migration between hypervisors.

An interesting fact about RHV is that it has record-setting SPECvirt_sc2013 benchmark results, including highest overall performance and the highest number of well-performing VMs on a single server.

Linux administrators see datastores as local NFS mounts, and the mounts are backed by the DVX hyperdriver (manually installed in each server with 3.0 release) responsible for enabling IO acceleration and data services.

With this release, Datrium provides support for KVM and Containers, but other use-cases may be supported in upcoming releases, including Splunk, SAP, Hadoop and more.

Containers Persistent Volumes (Bare-Metal and Virtualized)

Containers are ephemeral, and files and services running inside a Container will not exist outside its lifetime. However, many applications require the ability to persist user session activity, making some aspects of the application stateful. Enterprises want persistent storage for Containers, and they also want to use the same infrastructure to manage dockerized and traditional workloads during the application lifecycle, development and production.

No more choosing between bare-metal and hypervisor

Containers and VMs used together provide a great deal of flexibility in deploying and managing apps.

Organizations usually start their Containers journey running apps in VMs for the added flexibility provided by virtualization stacks. However as soon the application lifecycle and methodology are fully defined, organizations move their production Containers environment to bare-metal to harvest additional performance, reducing the (9-15%) CPU overhead created by the virtualization stack.

Datrium supports Docker persistent volumes for both virtualized and bare-metal deployments, while still providing IO optimization, acceleration and data services, including end-to-end encryption, snaps, replication and more. Using Datrium’s approach to Containers the development lifecycle is streamlined and automated much more easily because the drift between environments (Dev, QA, Staging, Pre-Prod, and Prod) is minimal.

Image courtesy of Docker Website

Data Services and Protection for Containers

Albeit some may argue that Containers should remain ephemeral, in my experience working with enterprises, there is a clear need for maintaining persistence across sessions for some applications and datasets, but also there is an enormous need to protect data in persistent volumes.

With Datrium persistent volumes may be cloned on one server can be immediately used on another, between both virtual and bare-metal deployments.

A significant challenge with Containers, however, is that it represents an order of magnitude more objects to manage than virtual machines, especially when persistent volumes are implemented. DVX 3.0 addresses this challenge with a combination of powerful search capabilities, the ability to create logical groups of Containers (called a Protection Group) aligned to applications, and assignment of protection policies to those groups for instant recovery, archive, DR and more.

In other words, all data services typically used with virtual machine workloads, such as snaps, cloning replication, and blanket encryption are now also available for Containers at the granular Container level, and the Datrium GUI makes it easy to understand and monitor.

Design Thinking is the idea that products and interfaces should be designed around the needs of the people who will be using them.

However, it is often technical capabilities or market opportunities that either push or pull the development process. In the case of enterprise software, the complexity of the product, the size of the budgets, and time constraints confound things even more.

We have all witnessed enterprise software products becoming disarrayed and cluttered over time, with intricate dials and knobs making the life of users painful. Even the most advanced Design Thinking oriented companies over time end up with a complex set of dials and knobs that are commonly pushed or pulled by the product development process, and design teams have no alternative other than accommodate it as part of the overall user experience.

What does that have to do with SDS and HCI?

Most hyperconverged solutions on the market today have been originally architected in a way to test the market potential (initial market analysis has proved tremendous potential), and for almost all of them, enterprise data services have been implemented as an afterthought. Furthermore, ‘special’ customers often make demands that are not well-thought out and forces vendors to implement features as a priority matter. Trying to meet all these varied requirements, often results in complex products that require ‘special consultants’ to come in and tune the system to make the features work correctly.

Up and above the stack

In the world of private and hybrid clouds, self-service portals, and higher level orchestration services, it does not make sense to require users to identify, and in most cases make assumptions, about applications and data behavior. Users should not bother if application data is de-dupable, or RF2 vs. RF3 vs. Erasure Coding, or if compression delay should be 30min or 60min, or if checksumming, erasure coding and compression are to remain enabled for a given application. We don’t worry about all that when using the public cloud, so why should we when using private clouds?

Truth be told…when data services are added as a bolt-on, as an afterthought, it becomes challenging to efficiently integrate new features and services in a meaningful way embracing Design Thinking and eliminating complexity – and a quick look at HCI solutions on the market today prove this point.

In the SAN world, things are not easier either, perhaps even more complicated, and components such as LUNs, zoning, masks, WWPNs, and RAID groups are a perpetual struggle for users.

Enterprise Software does not have to be Complicated

When file systems are built from the ground up to support data services, they are designed to maintain services running in-Line and all the time, and yet providing the best resiliency, tolerance to failures, and durability.

However, for vendors that developed such services as an afterthought and need to maintain and support new and old modes, it is challenging to seamlessly implement these services into their existing journaling file systems without making huge compromises to performance, stability, resiliency, or user experience. Moreover, in many instances requiring enormous expertise.

Zero-Touch

Datrium was built from the ground up to support enterprise data services. We built the Datrium file system from scratch with data services and Design Thinking in mind, moreover many of the architectural choices have been initially made to remove unnecessary and complicated decision-making processes.

Datrium DVX has virtually no knobs that need to be adjusted or configured, and yet presents an extensive list of quintessential enterprise data services, such as Deduplication, Compression, Erasure Coding, Checksumming, End-to-End Encryption, Replication, Snapshotting, Cloning, Compression over Wire, etc.

From an implementation perspective, the file system always uses distributed Erasure Coding for reliable data protection against at least 2 simultaneous disk failures (comparable to RF3 or FTT2), and the software stack uses no more than 20% of host CPU to deliver all data services, Always-On and Always In-Line.

In a spirit of openness and truth, there is only one knob in the Datrium DVX that may be used by users. It is the Encryption ON/OFF switch, and it’s FIPS compliance selection mode.

Managing in the Design Thinking world

Design Thinking elements are evident throughout the platform, and one of my favorite features, besides zero-touch config, is the ability to create dynamically binding Protection Groups.

Protection Groups control scheduling, retention and replication policies, and always create snapshots consistently and atomically while removing the burden on the end user by eliminating all the typical cumbersome steps. As an example, there might be a 3-tiered application running in three different VMs, and all need to be snapped together (not one by one) for application consistency, and all this is done automatically without the user having to explicitly configure additional checkboxes.

Moreover, Protection Groups dynamically bind arbitrary collections of VM, vDisk, and files objects and apply scheduling, retention and replication policies. As an example, a Protection Group may be set with a VM naming pattern, and any new VMs created with the naming pattern is dynamically bound to the Protection Group. In private clouds or deployments with 100’s or 1000’s of VMs, this assures that applications and data are invariably protected, without user interaction.

Conclusion

We are not blind to the fact that we have to keep improving. Most of the user experience today is driven by the consumer products, and enterprise software vendors must incorporate Design Thinking into the development process. Having made the initial architecture efforts necessary to implement features that just work, and avoiding knobs as much as possible, we have paved the way to an uncomplicated system that is ready to manage applications and workloads at scale.

I am happy to announce that Datrium is now certified as Citrix Ready® and the solution is readily available in Citrix Ready Marketplace (here).

The Citrix Ready program helps customers identify third-party solutions that are recommended to enhance virtualization, networking and cloud computing solutions from Citrix. As part of the program, Datrium has completed a rigorous process to ensure its Open Convergence platform, Datrium DVX Rackscale is compatible with Citrix XenDesktop®.

Datrium DVX separates on-host IO performance from off-host durable data, so speed and persistent capacity can each be provisioned independently, efficiently scaling resources with the addition of new compute or data nodes. The separation of data between the host active data and durable copy allows each host to remain stateless and to operate independently from the others, allowing for better resource isolation and overall availability. More detail on the architecture is in this white paper.

More specifically, when looking at Citrix provisioning mechanisms, PVS and MCS, Datrium help in reducing the total cost per desktop, enhance security with end-to-end encryption, provide effortless management, increase performance and implement business continuity for Citrix servers and dedicated virtual desktops.

Bring Your Own Server

– An organization may obtain new servers from Datrium or from their preferred vendor, or they may use existing servers that are not quite ready to be retired. No hardware vendor lock-ins.

No HDD Requirement / Minimal SSD

– In VDI deployments Data Avoidance and Reduction ratios are very high granted that virtual desktops are exact copies of each other, and large storage capacity is not a great need for the actual desktops. Datrium requires just enough Flash capacity on each desktop host for the de-duplicated active working set, eliminating unnecessary storage components from servers, reducing costs, and making them stateless. This strategy benefits MCS deployments, but also brings the data services, simplicity, and virtual desktop acceleration to PVS deployments, while keeping costs under control.

Data Optimization

– Datrium DVX natively implement data avoidance techniques like zero cloning and VAAI support, coupled with data reduction methods like deduplication, compression and erasure coding. For MCS provisioning, the end result is extra capacity for durable data and additional performance for active data on servers. PVS deployments also benefit from performance for active data on servers.

End-to-End Encryption

– Rest easy that the whole enterprise data is protected from network sniffers with AES-XTS-256 end-to-end software encryption (at-host, in-flight and at-rest) without any performance hit to your end-users’ experience; for both MCS and PVS.

Data Locality

– Data locality provides increased performance for Read IOs maintaining all active data blocks belonging to virtual desktops on Flash in the servers where the desktops are currently running, vastly helping MCS deployments.

Write Cache Placement

– For Citrix PVS deployments Datrium DVX address management and placement of the Write Cache, making it straightforward and easy, providing a single convenient datastore across all servers in a cluster.

In a nutshell, it allows one to aggregate multiple network connections in parallel to increase throughput beyond what a single connection could sustain, and to provide redundancy in case one link goes down. LACP is a vendor independent standard term which stands for Link Aggregation Control Protocol, defined in IEEE 802.1ax or 802.3ad. LACP links need to be manually configured on the physical network switch, to allow both links to appear as one logical aggregated link. MAC address(es) from the host side could appear on both links simultaneously, and the switch will not freak out and thinking there’s a loop on the network. – Wen Yu

Datrium DVX is split between hosts and data nodes, and disaggregate performance from capacity (for an understanding of the DVX architecture read this). This approach means that, unlike traditional storage arrays, DVX has complete control over both ends of the communication stack, hosts and data nodes.

Performance

DVX manages and route storage traffic at a higher layer in the networking stack rather than rely on network link level protocol. DVX data nodes actively heartbeat hosts using each of the network interfaces and can detect which interfaces have good connectivity to hosts. (Each data node has two controllers in an active/passive mode, and each controller presents two network interfaces).

When a host have functioning network paths to both interfaces on the data node active controller, DVX will automatically spread the host traffic across the data node controller interfaces, increasing the available bandwidth across all hosts in the cluster – a total of 20Gb throughput for each data node.

Benefit: Improved network bandwidth

The performance increase itself is dependent on IO patterns. However, internal FIO tests with three hosts and 64KB block IO demonstrated up to 65% performance improvement for sequential write workloads and a 53% improvement for random write workloads. For small 4KB random IO, the performance improvement expected is lower due to the nature of the workload.

Note that Adaptive Pathing is not Multi-Pathing, and an individual host is limited to the bandwidth of a active interface in the network team/bond. Adaptive Pathing eliminates network bonding of the data interfaces on the controller. Adaptive Pathing spreads the data traffic from the hosts across the data nodes controller interfaces and results in greater aggregate bandwidth to the controllers, but Adaptive Pathing does not increase the bandwidth available to an individual host.

Reliability

DVX data nodes do not route storage traffic through the passive controller interface, but it does use the passive interface to monitor connectivity to all compute hosts and network interfaces.

If a host is only able to talk to one of the network interfaces on the active data node controller, the DVX software automatically routes all host traffic to the functional interface. Likewise, if a data node network interface can communicate to some of the hosts, but not others, each host will communicate to the most appropriate controller based on its unique connectivity status.

The intelligence is built into the DVX software stack and at any point in time compute hosts may be using different paths to transport data, even in the event of network connectivity failures, always automatically choosing the best network path.

If we a DVX data node detects that the passive controller can talk to all hosts that have connectivity to the active controller and additional hosts that cannot, then an automatic failover is triggered to increase host-controller connectivity.

Monitoring

The DVX platform fully understands the network connectivity and topology, and the GUI provides administrators with insights into networking issues and connectivity statuses such as redundant or degraded components, hosts and data nodes.

Not only DVX will route around network issues when they occur, but it also makes it easier to resolve issues and return the network to full redundancy.

Benefit: Continuous network monitoring

Conclusion

Simple and Easy! Just make sure that all network interfaces are properly cabled and add two new IP addresses, and Adaptive Pathing is enabled. Datrium DVX 2.0 Adaptive Pathing enables customers to get more bandwidth, improved network resiliency and increased availability with zero management overhead.

Today I am releasing version 7.2 of the VDI Calculator, and the key feature is the addition of support for Workspot.

Workspot is a multi-tenant cloud-first solution, where the operational components of management, brokering, load balancing and more are all included in the cloud service. Their solution can be deployed on Microsoft Azure or using traditional datacenter constructs, such as SAN, or HCI platforms such as Nutanix, VSAN, and Datrium.

Couple other bug fixes and enhancements have been added to this release, but nothing deserving a mention.

I am starting to discuss some of the features and design aspects that got me enthusiastic about Datrium tech. A feature that has been driving many interesting conversations with customers is the new Blanket Encryption. First, let’s obertewind a bit.

(If you only want to find out about the product features and skip my long-winded write-up just skip the bottom of the article.)

Gemalto’s Breach Level Index (BLI) tracks publicly disclosed breaches across the globe, measuring their severity via a multidimensional index based on factors including the number of compromised records, the source and the type of breach. In March 2017, Gemalto released the BLI findings revealing that 1,792 data breaches led to almost 1.4 billion data records being compromised worldwide during 2016, an increase of 86% compared to 2015.

These are mind-blowing numbers!

Most security breaches exploit human frailties, and CIOs need to educate their user populations on good security practices. That said, according to Gemalto’s research, only 4.2% of breaches were “Secure Breaches” where encryption was used, and the stolen data was rendered useless.

Folks, only 4.2%!

Protecting data ‘At-Rest’ has become a top priority for organizations. However, despite growing awareness, encryption of data (In-Flight) as it moves across the network is consistently overlooked. Nowadays In-Flight data is most vulnerable to perpetrators that have the ability to tap into the network connections given the widespread use of IP network protocols; security measures for data in storage come to nothing if In-Flight data is not properly guarded as well.

Because Datrium’s FE (FrontEnd client) runs as part of the hypervisor, Datrium is the only (…and please, correct me if I’m wrong) convergence solution that provides cluster-wide encryption domain for data In-Flight and At-Rest and is still able to provide the benefits of data reduction. Data is reduced, compressed and then encrypted as soon as it is created in the host RAM before it’s written to the host flash or transmitted to the data server fully encrypted. The design leverages resources on the ESX servers for most of the work and scale in line with the architecture.

While some app-based or OS-based encryption solutions offer an In-Flight encryption capability, they eliminate all data reduction optimizations (storage or transfer over WAN) as they randomize the blocks before they can be data-reduced.

Recognizing that software-defined-storage stacks use host CPU cycles to provide such services is important, as there can be performance implications when enabling data services.

AES – Advanced Encryption Standard specified in FIPS 197

FIPS – Federal Information Processing Standard

XTS-AES – Mode of Operation specified in IEEE Std. 1619-2007 and approved by SP800-38E with one additional requirement on the lengths of the data units.

FIPS 140-2 is a requirement to achieve compliance with the HIPAA standard to protect Healthcare data. Already mandated by the U.S. Department of Defense (DoD) for encryption, FIPS 140-2 is a robust security solution that reduces risk without increasing costs.

The FIPS-validated mode provides acceptable performance, but it does impose some additional CPU load on Controllers, while FIPS-approved (or fast mode) provides <5% performance degradation, but that may not be considered FIPS-validated. Both modes have no significant impact on data reduction.

> datastore encryption set –fips-mode [approved|validated]

Cross-Cluster and Cross-Site Replication

Datrium implements native one-click SSL/TLS encryption to secure data replication traffic between clusters and datacenters. The session encryption is selective and may be enabled on a Protection Group basis. So you only encrypt replication for VMs you deem necessary – or just allow it for everything!

When replicating between clusters the source and destination DVX data nodes have different AES-XTS-256 encryption keys. The source is responsible for decrypting the data and encrypting the resulting data stream with an SSL/TLS session key. The destination DVX data node is responsible for decrypting the data stream with the SSL/TLS session key and encrypting the result with the destination’s AES-XTS-256 encryption method. This ensures that the AES-XTS-256 encryption used by a DVX cluster is kept within the DVX.

.

This release supports:

Instant enable/disable encryption

Encrypt data In-Use, In-Flight, and At-Rest at a cluster-wide and full-stack levels

At current release, only new data is encrypted upon turning the feature on, but some old data may also be opportunistically encrypted during backend tasks, such as maintenance processes. Additionally, an external key manager is not supported in this first release.

Final Thoughts

Hackers have already breached internet-connected camera systems, smart TVs, and even baby monitors. It’s is dangerous to think that they aren’t already mining your organization’s data. Independent of the preferred security approach CIOs must take action before they are the next target. Robert Mueller, ex-FBI Director once said: “There are only two types of companies: those that have been hacked, and those that will be.”

This is all heavy and concerning stuff, so I thought I would end this blog post with something a little more humorous.

Eric Siebert has opened the Top vBlog for this year’s voting with sponsorship support from Turbonomic. This blog myvirtualcloud.net has ranked 14th place for the last four years, coming down from the 17th and 39th in years before.

All bloggers do an excellent job, using their personal time to share experiences and challenges with the broader community. I also found myself in work and technology transition, moving companies and trying to talk and demonstrate different viewpoints and technologies.

The technology datacenter world is at a major inflection point where many distinct and complimentary technologies are competing for awareness as organizations move into the public and hybrid infrastructure world, and where applications deployment models are drastically improving and reducing IT friction. It is a superb time to be in technology! If you like the content I have been publishing, please consider voting for this blog.