Category Archives: Scale-out architecture

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

Since Storage Field Day 15 3 weeks ago, the thoughts of the session with Huawei lingered. And one word came to describe Huawei Dorado V3, their flagship All-Flash storage platform is SPEED.

My conversation with Huawei actually started the night before our planned session at their Santa Clara facility the next day. We had a evening get-together at Bourbon Levi’s Stadium. I was with my buddy, Ammar Zolkipli, who coincidentally was in the Silicon Valley for work. Ammar is from Hitachi Vantara Japan, and has been a good friend of mine for over 17 years now.

Shortly, the Huawei team arrived to join the camaraderie. And we introduced ourselves to Chun Liu, not knowing that he is the Chief Architect at Huawei. A big part of that evening was our conversation with him. Ammar and I have immersed in the Oil & Gas EP (Exploration & Production) data management and petrotechnical applications when he was in Schlumberger and after that a reseller of NetApp. I was a Consulting Engineer with NetApp back then. So, the 2 of us started blabbering (yeah, that would be us when we get together to talk technology).

I observed that Chun was very interested to find learn about real world application use cases that would push storage performance to its limits. And I guessed that the best type of I/O characteristics would be small block, random I/O and billions of them, with near-real time latency. After that evening I did some research and could only think of a few, such as deep analytics or some applications with needs for Monte Carlo simulations. Oh, well, maybe I would share that with Chun the following day.

The moment the session started, it was already about the speed prowess of Huawei Storage. It was like the greyhounds unleashed going after the rabbit. In the lineup, the Dorado series stood out.

Share this:

[Preamble: I was a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

Cohesity SpanFS impressed me. Their filesystem was designed from ground up to meet the demands of the voluminous cloud-scale data, and yes, the sheer magnitude of data everywhere needs to be managed.

We all know that primary data is always the more important piece of data landscape but there is a growing need to address the secondary data segment as well.

Like a floating iceberg, the piece that is sticking out is the more important primary data but the larger piece beneath the surface of the water, which is the secondary data, is becoming more valuable. Applications such as file shares, archiving, backup, test and development, and analytics and insights are maturing as the foundational data managementframeworks and fast becoming the bedrock of businesses.

The ability of businesses to bounce back after a disaster; the relentless testing of large data sets to develop new competitive advantage for businesses; the affirmations and the insights of analyzing data to reduce risks in decision making; all these are the powerful back engine applicability that thrust businesses forward. Even the ability to search for the right information in a sea of data for regulatory and compliance reasons is part of the organization’s data management application.

Share this:

[Preamble: I am a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

The magic is happening.

When Dropbox first entered into the market which eventually termed as BYOD (Bring your Own Device), it was a phenomenon. There was nothing else that matched its simplicity and ease-of-use. A file uploaded into the cloud was instantaneously available on the tablets and smart phones. It was on every storage vendor’s presentation slides, using Dropbox as the perennial name dropping tactic to get end users buy-in.

Today, Dropbox is beyond BYOD and EFSS. They are a full fledged collaboration platform that includes project management, project workflow, file versioning, secure file transfer, smart file synchronization and Dropbox Paper. And they offer comprehensive plans from Basic, Plus and Professional to Business and Enterprise. Their upcoming IPO, I am sure, will give them far greater capital to expand, and realize their full potential as the foremost content-based collaboration platform in the world.

Dropbox began their exodus from AWS a couple of years ago. They wanted to control their destiny and have moved more than 500PB into their own private data center for their customer data. That was half-an-exabyte, people! And two years later, they saved $75million of operating costs after they exited AWS. Today, they have more than 1 Exabyte of customer data! That is just incredible.

And Dropbox’s storage architecture started with a simple foundational design called “Magic Pocket“. Magic Pocket is a “fixed-length, immutable” block storage layer.

Share this:

[Preamble: I am a delegate of Storage Field Day 15 from Mar 7-9, 2018. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I have been called a dinosaur. We storage networking professionals and storage technologists have been called dinosaurs. It wasn’t offensive or anything like that and I knew it was coming because the writing was on the wall, … or is it?

The cloud and the breakneck pace of all the technologies that came along have made us, the storage networking professionals, look like relics. The storage guys have been pigeonholed into a sunset segment of the IT industry. SAN and NAS, according to the non-practitioners, were no longer relevant. And cloud has clout (pun intended) us out of the park.

I don’t see us that way. I see that the Storage Dinosaurs are evolving as well, and our storage foundational knowledge and experience are more relevant that ever. And the greatest assets that we, the storage networking professionals, have is our deep understanding of data.

A little over a year ago, I changed the term Storage in my universe to Data Services Platform, and here was the blog I wrote. I blogged again just before the year 2018 began.

Share this:

Data storage silos everywhere. The early clarion call was to eliminate IT data storage silos by moving to the cloud. Fast forward to the present. Data storage silos are still everywhere, but this time, they are in the clouds. I blogged about this.

Object Storage was all the rage when it first started. AWS, with its S3 (Simple Storage Service) offering, started the cloud storage frenzy. Highly available, globally distributed, simple to access, and fitted superbly into the entire AWS ecosystem. Quickly, a smorgasbord of S3-compatible, S3-like object-based storage emerged. OpenStack Swift, HDS HCP, EMC Atmos, Cleversafe (which became IBM SpectrumScale), Inktank Ceph (which became RedHat Ceph), Bycast (acquired by NetApp to be StorageGrid), Quantum Lattus, Amplidata, and many more. For a period of a few years prior, it looked to me that the popularity of object storage with an S3 compatible front has overtaken distributed file systems.

What’s not to like? Object storage are distributed, they are metadata rich (at a certain structural level), they are immutable (hence secure from a certain point of view), and some even claim self-healing (depending on data protection policies). But one thing that object storage rarely touted dominance was high performance I/O. There were some cases, but they were either fronted by a file system (eg. NFSv4.1 with pNFS extensions), or using some host-based, SAN-client agent (eg. StorNext or Intel Lustre). Object-based storage, in its native form, has not been positioned as high performance I/O storage.

Share this:

[Preamble: I was a delegate of Storage Field Day 14 from Nov 8-10, 2017. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

They came out of stealth in August 2016 and have been making waves with very impressive stats. When E8 was announced, their numbers were more than 10 million IOPS, with 100µsecs for reads and 40µsecs for writes. And in the SFD14 demo, they reached and past the 10 million IOPS numbers.

The design philosophy of E8 Storage is different than the traditional dual controller scale-up storage architecture design or the multi-node scale-out cluster design. In fact, from a 30,000 feet view, it is quite similar to a “SAN-client” design advocated by Lustre, leveraging a very high throughput, low latency network.

Share this:

[Preamble: I am a delegate of Storage Field Day 14. My expenses, travel and accommodation are paid for by GestaltIT, the organizer and I am not obligated to blog or promote the technologies presented at this event. The content of this blog is of my own opinions and views]

I am here at the Commvault GO 2017. Bob Hammer, Commvault’s CEO is on stage right now. He shares his wisdom and the message is clear. IT to DT. IT to DT? Yes, Information Technology to Data Technology. It is all about the DATA.

The data landscape has changed. The cloud has changed everything. And data is everywhere. This omnipresence of data presents new complexity and new challenges. It is great to get Commvault acknowledging and accepting this change and the challenges that come along with it, and introducing their HyperScale technology and their secret sauce – Universal Dynamic Index.

Share this:

I have known of RDMA (Remote Direct Memory Access) for quite some time, but never in depth. But since my contract work ended last week, and I have some time off to do some personal development, I decided to look deeper into RDMA. Why RDMA?

In the past 1 year or so, RDMA has been appearing in my radar very frequently, and rightly so. The speedy development and adoption of NVMe (Non-Volatile Memory Express) have pushed All Flash Arrays into the next level. This pushes the I/O and the throughput performance bottlenecks away from the NVMe storage medium into the legacy world of SCSI.

Most network storage interfaces and protocols like SAS, SATA, iSCSI, Fibre Channel today still carry SCSI loads and would have to translate between NVMe and SCSI. NVMe-to-SCSI bridges have to be present to facilitate the translation.

In the slide below, shared at the Flash Memory Summit, there were numerous red boxes which laid out the SCSI connections and interfaces where SCSI-to-NVMe translation (and vice versa) would be required.

“No, we are not a storage company anymore. We are a data management company now.“

I was fairly surprised about the time it took for that mindset shift messaging from storage to data management. I am sure that NetApp has been doing that for years internally.

To me, the writing has been in the wall for years. But weak perception of storage, at least in this part of Asia, still lingers as that clunky, behind the glassed walls and crufty closets, noisy box of full of hard disk drives lodged with snakes and snakes of orange, turquoise or white cables. 😉

The article may come as a revelation to some, but the world of storage has changed indefinitely. The blurring of the lines began when software defined storage, or even earlier in the form of storage virtualization, took form. I even came up with my definition a couple of years ago about the changing face of storage framework. Instead of calling it data management, I called the new storage framework, the Data Services Platform.

So, this is my version of the storage technology platform of today. This is the Data Services Platform I have been touting to many for the last couple of years. It is not just storage technology anymore; it is much more than that.

Share this:

[Preamble: I was a delegate of Storage Field Day 12. My expenses, travel and accommodation were paid for by GestaltIT, the organizer and I was not obligated to blog or promote the technologies presented in this event]

When it comes to large scale storage capacity requirements with distributed cloud and on-premise capability, object storage is all the rage. Amazon Web Services started the object-based S3 storage service more than a decade ago, and the romance with object storage started.

Today, there are hundreds of object-based storage vendors out there, touting features after features of invincibility. But after researching and reading through many design and architecture papers, I found that many object-based storage technology vendors began to sound the same.

At the back of my mind, object storage is not easy when it comes to most applications integration. Yes, there is a new breed of cloud-based applications with RESTful CRUD API operations to access object storage, but most applications still rely on file systems to access storage for capacity, performance and protection.

These CRUD and CRUD-like APIs are the common semantics of interfacing object storage platforms. But many, many real-world applications do not have the object semantics to interface with storage. They are mostly designed to interface and interact with file systems, and secretly, I believe many application developers and users want a file system interface to storage. It does not matter if the storage is on-premise or in the cloud.

Let’s not kid ourselves. We are most natural when we work with files and folders.

Implementing object storage also denies us the ability to optimally utilize Flash and solid state storage on-premise when the compute is in the cloud. Similarly, when the compute is on-premise and the flash-based object storage is in the cloud, you get a mismatch of performance and availability requirements as well. In the end, there has to be a compromise.

Another “feature” of object storage is its poor ability to handle transactional data. Most of the object storage do not allow modification of data once the object has been created. Putting a NAS front (aka a NAS gateway) does not take away the fact that it is still object-based storage at the very core of the infrastructure, regardless if it is on-premise or in the cloud.

Resiliency, latency and scalability are the greatest challenges when we want to build a true globally distributed storage or data services platform. Object storage can be resilient and it can scale, but it has to compromise performance and latency to be so. And managing object storage will not be as natural as to managing a file system with folders and files.