About cfheoh

I am a technology blogger with 20+ years of IT experience. I write heavily on technologies related to storage networking and data management because that is my area of interest and expertise. I introduce technologies with the objectives to get readers to *know the facts*, and use that knowledge to cut through the marketing hypes, FUD (fear, uncertainty and doubt) and other fancy stuff. Only then, there will be progress.
I am involved in SNIA (Storage Networking Industry Association) and as of October 2013, I have been appointed as SNIA South Asia & SNIA Malaysia non-voting representation to SNIA Technical Council. I was previously the Chairman of SNIA Malaysia until Dec 2012.
As of August 2015, I am returning to NetApp to be the Country Manager of Malaysia & Brunei. Given my present position, I am not obligated to write about my employer and its technology, but I am indeed subjected to Social Media Guidelines of the company. Therefore, I would like to make a disclaimer that what I write is my personal opinion, and mine alone. Therefore, I am responsible for what I say and write and this statement indemnify my employer from any damages.

In my last entry, I mentioned that Nimbus has now 100TB in eBay and every single TB of it is on SSDs. The full details of how the deal was trashed out are detailed here, beating the competition from NetApp and 3PAR, the incumbents.

The significance of the deal was how a full SSDs system was able to out-price the storage arrays with a hybrid of spinning disks and SSDs.

The Nimbus news just obliterated the myth that SSDs are expensive. If you do the math, perhaps the price of the entire storage systems is not the SSDs. It could be the way some vendors structure their software licensing scheme or a combination of license, support and so on.

Just last week, we were out there discussing about hard disks and SSDs. The crux of the discussion was around pricing and the customer we were speaking too was perplexed that the typical SATA disks from vendors such as HP, NetApp and so on cost a lot more than the Enterprise HDDs and SSDs you get from the distributors. Sometimes it is a factor of 3-4x.

I was contributing my side of the story that one unit of 1TB SATA (mind you, this is an Enterprise-grade HDD from Seagate) from a particular vendor would cost about RM4,000 to RM5,000. The usual story that we were trained when we worked for vendors was, “Oh, these disks had to be specially provisioned with our own firmware, and we can monitor their health with our software and so on ….”. My partner chipped in and cleared the BS smoke screen and basically, the high price disks comes with high margins for the vendor to feed the entire backline of the storage product, from sales, to engineers, to engineering and so on. He hit the nail right on the head because I believe a big part of the margin of each storage systems goes back to feed the vendor’s army of people behind the product.

In my research, a 2TB enterprise-grade SATA HDDs in Malaysia is approximately RM1,000 or less. Similar a SAS HDDs would be slightly higher, by 10-15%, while an enterprise-grade SSD is about RM3,000 or less. And this is far less than what is quoted by the vendors of storage arrays.

Of course, the question would be, “can the customer put in their own hard disks or ask the vendor to purchase cheaper hard disks from a cheaper source?” Apparently not! Unless you buy low end NAS from the likes of NetGear, Synology, Drobo and many low-end storage systems. But you can’t bet your business and operations on the reliability of these storage boxes, can you? Otherwise, it’s your head on the chopping block.

Eventually, the customers will demand such a “feature”. They will want to put in their own hard disks (with proper qualification from the storage vendor) because they will want cheaper HDDs or SSDs. It is already happening with some enterprise storage vendors but these vendors are not well known yet. It is happening though. I know of one vendor in Malaysia who could do such a thing …

Share this:

There has been a slew of SSD news in the storage blogosphere with the big one from eBay.

eBay has just announced that it has 100TB of SSDs from Nimbus Data Systems. On top of that, OCZ, SanDisk and STEC, all major SSD manufacturers, have announced a whole lot of new products with the PCIe SSD cards leading the way. The most interesting thing was the factor of $/GB has gone down significantly, getting very close to the $/GB of spinning disks. This is indeed good news to the industry because SSDs delivers low latency, high IOPS, low power consumption and many other new benefits.

Side note: As I am beginning to understand more about SSDs, I found out that NAND flash SSD has a latency in the microseconds compared to spinning HDDs, which has milliseconds latency range. In addition to that DRAM SSDs have latency that is in the range of nano seconds, which is basically memory type of access. DRAM SSDs are of course, more expensive.

The SSDs are coming very soon into the mainstream, and this will inadvertently, drive a new generation of applications and accelerate growth in knowledge acquisition. We are already seeing the decline of Fibre Channel disks and the rise of SAS and SATA disks but SSDs in the enterprise storage, as far as I am concerned, brings forth 2 new challenges which we, as professionals and users in the storage networking environment, must address.

These challenges can be simplified to

Are we ready?

Where is the new bottleneck?

To address the first challenge, we must understand the second challenge first.

In system architectures, we know of various of performance bottlenecks that exist either in CPU, memory, bus, bridge, buffer, I/O devices and so on. In order to deliver the data to be process, we have to view the data block/byte service request in its entirety.

When a user request for a file, this is a service request. The end objective is the user is able to read and write the file he/she requested. The time taken from the beginning of the request to the end of it, is known as service time, which latency plays a big part of it. We assume that the file resides in a NAS system in the network.

The request for the file begins by going through the file system layer of the host the user is accessing, then to the user and kernel space, moving on through the device driver of the NIC card, through the TCP/IP stack (which has its own set of buffer overheads and so on), passing the request through the physical wire. From there it moves on through the NAS system with the RAID system, file system and so on until it reaches the file request. Note that I have shortened the entire process for simple explanation but it shows that the service request passes through a whole lot of things in order to complete the request.

Bottlenecks exist everywhere within the service request path and is also subjected to external factors related to that service request. For a long, long time, I/O has been biggest bottleneck to the processing of the service request because it is usually and almost always the slowest component in the entire scheme of things.

The introduction of SSDs will improve the I/O performance tremendously, into the micro- or even nano-seconds range, putting it in almost equal performance terms with other components in the system architecture. The buses and the bridges in the computer systems could be the new locations where the bottleneck of a service request exist. Hence we have use this understanding to change the modus operandi of the existing types of applications such as databases, email servers and file servers.

The usual tried-and-tested best practices may have to be changed to adapt to the shift of the bottleneck.

So, we have to equip ourselves with what SSDs is doing and will do to the industry. We have to be ready and take advantage of this “quiet” period to learn and know more about SSD technology and what the experts are saying. I found a great website that introduces and speaks about SSD in depth. It is called StorageSearch and it is what I consider the best treasure trove on the web right now for SSD information. It is run by a gentleman named Zsolt Kerekes. Go check it out.

Yup, we must be get ready when SSDs hit the mainstream, and ride the wave.

Share this:

Backup is necessary evil. In IT, every operator, administrator, engineer, manager, and C-level executive knows that you got to have backup. When it comes to the protection of data and information in a business, backup is the only way.

Backup has also become the bane of IT operations. Every product that is out there in the market is trying to cram as much production data to backup as possible just to fit into the backup window. We only have 24 hours in a day, so there is no way the backup window can be increased unless

You reduce the size of the primary data to be backed up – think compression, deduplication, archiving

You replicate the primary data to a secondary device and backup the secondary device – which is ironic because when you replicate, you are creating a copy of the primary data, which technically is a backup. So you are technically backing up a backup

You speed up the transfer of primary data to the backup device

Either way, the IT operations is trying to overcome the challenges of the backup window. And the whole purpose for backup is to be cock-sure that data can be restored when it comes to recovery. It’s like insurance. You pay for the premium so that you are able to use the insurance facility to recover during the times of need. We have heard that analogy many times before.

On the flip side of the coin, a snapshot is also a backup. Snapshots are point-in-time copies of the primary data and many a times, snapshots are taken and then used as the source of a “true” backup to a secondary device, be it disk-based or tape-based. However, snapshots have suffered the perception that it is a pseudo-backup, until recent last couple of years.

Here are some food for thoughts …

WHAT IF we eliminate backing data to a secondary device?

WHAT IF the IT operations is ready to embrace snapshots as the true backup?

WHAT IF we rely on snapshots for backup and replicated snapshots for disaster recovery?

First of all, it will solve the perennial issues of backup to a “secondary device”. The operative word here is the “secondary device”, because that secondary device is usually external to the primary storage.

Tape subsystems and tape are constantly being ridiculed as the culprit of missing backup windows. Duplications after duplications of the same set of files in every backup set triggered the adoption of deduplication solutions from Data Domain, Avamar, PureDisk, ExaGrid, Quantum and so on. Networks are also blamed because network backup runs through the LAN. LANless backup will use another conduit, usually Fibre Channel, to transport data to the secondary device.

If we eliminate the “secondary device” and perform backup in the primary storage itself, then networks are no longer part of the backup. There is no need for deduplication because the data could already have been deduplicated and compressed in the primary storage.

Note that what I have suggested is to backup, compress and dedupe, AND also restore from the primary storage. There is no secondary storage device for backup, compress, dedupe and restore.

Wouldn’t that paint a better way of doing backup?

Snapshots will be the only mechanism to backup. Snapshots are quick, usually in minutes and some in seconds. Most snapshot implementations today are space efficient, consuming storage only for delta changes. The primary device will compress and dedupe, depending on the data’s characteristics.

For DR, snapshots are shipped to a remote storage of equal prowess at the DR site, where the snapshot can be rebuild and be in a ready mode to become primary data when required. NetApp SnapVault is one example. ZFS snapshot replication is another.

And when it comes to recovery, quick restores of primary data will be from snapshots. If the primary storage goes down, clients and host initiators can be rerouted quickly to the DR device for services to resume.

I believe with the convergence of multi-core processing power, 10GbE networks, SSDs, very large capacity drives, we could be seeing a shift in the backup design model and possible the entire IT landscape. Snapshots could very likely replace traditional backup in the near future, and secondary device may be a thing of the past.

Share this:

There’s been a lot of questions about Solid State Drives (SSD), aka Enterprise Flash Drives (EFD) by some vendors. Are they less reliable than our 10K or 15K RPM hard disk drives (HDDs)? I was asked this question in the middle of the stage when I was presenting the topic of Green Storage 3 weeks ago.

Well, the usual answer from the typical techie is … “It depends”.

We all fear the unknown and given the limited knowledge we have about SSDs (they are fairly new in the enterprise storage market), we tend to be drawn more to the negatives than the positives of what SSDs are and what they can be. I, for one, believe that SSDs have more positives and over time, we will grow to accept that this is all part of what the IT evolution. IT has always evolved into something better, stronger, faster, more reliable and so on. As famously quoted by Jeff Goldblum’s character Dr. Ian Malcolm, in the movie Jurassic Park I, “Life finds a way …”, IT will always find a way to be just that.

SSDs are typically categorized into MLCs (multi-level cells) and SLCs (single-level cells). They have typically predictable life expectancy ranging from tens of thousands of writes to more than a million writes per drive. This, by no means, is a measure of reliability of the SSDs versus the HDDs. However, SSD controllers and drives employ various techniques to enhance the durability of the drives. A common method is to balance the I/O accesses to the disk block to adapt the I/O usage patterns which can prolong the lifespan of the disk blocks (and subsequently the drives itself) and also ensure performance of the drive does not lag since the I/O is more “spread-out” in the drive. This is known as “wear-leveling” algorithm.

Most SSDs proposed by enterprise storage vendors are MLCs to meet the market price per IOP/$/GB demand because SLC are definitely more expensive for higher durability. Also MLCs have higher BER (bit-error-rate) and it is known than MLCs have 1 BER per 10,000 writes while SLCs have 1 BER per 100,000 writes.

But the advantage of SSDs clearly outweigh HDDs. Fast access (much lower latency) is one of the main advantages. Higher IOPS is another one. SSDs can provide from several thousand IOPS to more than 1 million IOPS when compared to enterprise HDDs. A typical 7,200 RPM SATA drive has less than 120 IOPS while a 15,000 RPM Fibre Channel or SAS drive ranges from 130-200 IOPS. That IOPS advantage is definitely a vast differentiator when comparing SSDs and HDDs.

We are also seeing both drive-format and card-format SSDs in the market. The drive-format type are typically in the 2.5″ and 3.5″ profile and they tend to fit into enterprise storage systems as “disk drives”. They are known to provide capacity. On the other hand, there are also card-format type of SSDs, that fit into a PCIe card that is inserted into host systems. These tend to address the performance requirement of systems and applications. The well known PCIe vendors are Fusion-IO which is in the high-end performance market and NetApp which peddles the PAM (Performance Access Module) card in its filers. The PAM card has been renamed as FlashCache. Rumour has it that EMC will be coming out with a similar solution soon.

Another to note is that SSDs can be read-biased or write-biased. Most SSDs in the market tend to be more read-biased, published with high read IOPS, not write IOPS. Therefore, we have to be prudent to know what out there. This means that some solution, such as the NetApp FlashCache, is more suitable for heavy-read I/O rather than writes I/O. The FlashCache addresses a large segment of the enterprise market because most applications are heavy on reads than writes.

SSDs have been positioned as Tier 0 layer in the Automated Storage Tiering segment of Enterprise Storage. Vendors such as Dell Compellent, HP 3PAR and also EMC FAST2 position themselves with enhanced tiering techniques to automated LUN and sub-LUN tiering and customers have been lapping up this feature like little puppies.

However, an up-and-coming segment for SSDs usage is positioning the SSDs as extended read or write cache to the existing memory of the systems. NetApp’s Flashcache is a PCIe solution that is basically an extended read cache. An interesting feature of Oracle Solaris ZFS called Hybrid Storage Pool allows the creation of read and write cache using SSDs. The Sun fellas even come up with cool names – ReadZilla and LogZilla – for this Hybrid Storage Pool features.

Basically, I have poured out what I know about SSDs (so far) and I intend to learn more about it. SNIA (Storage Networking Industry Association) has a Technical Working Group for Solid State Storage. I advise the readers to check it out.

Share this:

Have you heard about Silent Data Corruption (SDC)? It’s everywhere and yet in the storage networking world, you can hardly find a storage vendor talking about it.

I did a paper for MNCC (Malaysian National Computer Confederation) a few years ago and one of the examples I used was what they found at CERN. CERN, the European Center for Nuclear Research published a paper in 2007 describing the issue of SDC. Later in 2008, they found approximately 38,000 files were corrupted in the 15,000TB of data they generated. Therefore SDC is very real and yet to the people in the storage networking industry, where data matters the most, it is one of the issues that is the least talked about.

What is Silent Data Corruption? Every computer component that we use is NOT perfect. It could be the memory; it could be the network interface cards (NICs); it could be the hard disk; it could also be the bus, the file system, the data block structure. Any computer component, whether it is hardware or software, which deals with the bits of data is subjected to the concern of SDC.

Data corruption happens all the time. It is when a bit or a set of bits is changed unintentionally due to various reasons. Some of the reasons are listed below:

Hardware errors

Data transfer noise

Electromagnetic Interference (EMI)

Firmware bugs

Software bugs

Poor electrical current distribution

Many more …

And that is why there are published statistics for some hardware components such as memory, NICs, hard disks, and even protocols such as Fibre Channel. These published statistics talk about BER or bit-error-rate, which is the occurrence of an erroneous bit in every billion or trillion of bits transferred or processed.

And it is also why there are inherent mechanisms within these channels to detect data corruption. We see them all the time in things such as checksums (CRC32, SHA1, MD5 …), parity and ECC (error correction code). Because we can detect them, we see errors and warnings about their existence.

However, SILENT data corruption does not appear as errors and warnings, and they do OCCUR! And this problem is getting more and more prevalent in modern day disk drives, especially solid state drives (SSDs). As the disk manufacturers are coming out with more compact, higher capacity and performance drives, the cell geometry of SSDs are becoming smaller and smaller. This means each cell will have a smaller area to contain the electrical charge and maintain the bit-value, either a -0 or -1. At the same time, the smaller cell is more sensitive and susceptible to noise, electrical charge leakage and interference of nearby cells as some SSDs has different power modes to address green requirements.

When such things happen, a 0 can look like a 1 or vice versa and if the error is undetected, this becomes silent data corruption.

Most common storage networking technology such as RAID or file systems were introduced during the 80’s or 90’s when disks were 9GB, 18GB and so on, and FastEthernet was the standard for networking. Things have changed at a very fast pace, and data growth has been phenomenal. We need to look at storage vendors’ technology more objectively now and get more in-depth about issues such as SDC.

SDC is very real but until and unless we learn and equip ourselves with the knowledge, just don’t take things from vendors verbatim. Find out … and be in control of what you are putting into your IT environment.

Share this:

I was chatting with a friend yesterday and we were discussing about virtualization and cloud, the biggest things that are happening in the IT industry right now. We were talking about the VMware vSphere 5 arrival, the cool stuff VMware is bringing into the game, pushing the technology juggernaut farther and farther ahead of its rivals Hyper-V, Xen and Virtual Box.

And in the technology section of the newspaper yesterday, I saw news of Jaring OneCloud offering and one of the local IT players just brought in Joyent. Fantastic stuff! But for us in IT, we have been inundated with cloud, cloud and more cloud. The hype, the fuzz and the reality. It’s all there but back to our conversation. We realized that virtualization and cloud aren’t much without storage, the cornerstone of virtualization and cloud. And in the storage networking layer, there are the data management piece, the information infrastructure piece and so on and yet … why are there so few storage networking professional out there in our IT scene.

I have been lamenting this for a long time because we have been facing this problem for a long time. We are facing a shortage of qualified and well experienced storage networking professionals. There are plenty of jobs out there but not enough resources to meet the demand. As SNIA Malaysia Chairman, it is my duty to work with my committee members of HP, IBM, EMC, NetApp, Symantec and Cisco to create the awareness, and more importantly the passion to get the local IT’s storage networking professional voice together. It has been challenging but my advice to all those people out there – “Why be ordinary when you can become extra-ordinary?”

We have to make others realize that storage networking is what makes virtualization and cloud happen. Join us at SNIA Malaysia and be part of something extra-ordinary. Storage networking IS the foundation of virtualization and cloud. You can’t exclude it.

Share this:

As far as how the next generation storage networks would look like, 10Gigabit Ethernet (10GbE) is definitely the strongest candidate for the storage network. And this is made possible with key enhancements to Ethernet that has made it possible for greater reliability and performance. This enhancement goes by several names such as Data Center Ethernet (a term coined by Cisco) and Converged Enhanced Ethernet (CEE). But probably the more widely use term is DCB or Data Center Bridging.

Ethernet, so far, has never failed to deliver and as far as I am concerned, Ethernet will rule for the next 10 years or more. Ethernet has evolved several generations from Ethernet running at 10Mbits/sec to FastEthernet, then Gigabit Ethernet and now 10Gigabit Ethernet. Pretty soon, it will be looking at 40Gbits/sec and 100Gbits/sec. It is a tremendous piece of protocol, allowing it to evolve and adapt to the modern data networks.

But before 10GbE, the delivery of packets were of best effort basis. But today’s networks demand scalability, security, performance and most of reliability. However, since the advent of DCB, 10GbE is fortified with these key technologies

iWARP – Support for iWARP is crucial for RDMA (Remote Direct Memory Access). RDMA, in a nutshell, reduces overhead of typical networking buffer-to-buffer copy, by bypassing these bottlenecks, and placing the data blocks and its bits/bytes directly into the access points of the corresponding requesting node.

Low latency cut-switching at Layer 2 by reading just the header of the packet instead of the entire full length of the packet. The information contained in the header of the packet is sufficient for it to make a switching/forwarding decision

Energy Efficient by introducing low power idle state and other implementations which makes the power consumption usage more proportional to the network utilization rate

Shortest path adaptive routing protocol for Ethernet forwarding. TRILL (Transparent Interconnections with Lots of Links) is one of the implementation. Lately OpenFlow has been jumping into the bandwagon as a viable option but I need to check out OpenFlow support with 10GbE and DCB.

FCoE (Fibre Channel over Ethernet) is all the rage these days and 10GbE has the ability to carry Fibre Channel traffic. This has sparked a initial frenzy among storage vendors.

Of course, last but not least, we are already seeing the sunset of Fibre Channel. While 8Gbps FC has been out for a while, its adoption rate seemed to have stalled. Many vendors and customers are at the 4Gbps range, adopting a wait-and-see game. 16Gbps FC has been in the talks but it seems that all the fireworks are with 10Gigabit Ethernet right now. It will rule …

Share this:

What do you think of Dell acquiring Force10? My first reaction was surprise, very surprised.

I was in the middle of a conversation with a friend when the RSS feed popped up in front of me – “Dell acquiring Force10”! I cut that conversation short to read the rest of the details … wow, that’s a good buy!

With all the rumors flying around that Brocade was the most obvious choice, Force10 was out of the blue for me. As the euphoria settled down, I thought Dell had made a very smart move. Brocade, unfortunately, is still pretty much a Fibre Channel company, with 75% of its business relying heavily on Fibre Channel and FCoE. Even though Brocade has Foundry now, Brocade has not strongly asserted itself as an front runner and innovator of 10Gigabit Ethernet.

Meanwhile, Force10 has been a up-and-coming force (pun intended) to be reckon with, strengthening its position as a 10GbE player in the market. And with 10GbE now, and 40GbE or 100GbE coming in the next 2-3 years, Force10 will be riding the wave of the future. Dell can only benefit from that momentum.

Dell has been very, very aggressive to push itself into the enterprise storage space. From its acquisition of EqualLogic in 2007, to Exanet, Ocarina and Compellent last year, there is no doubt that Dell wants this space badly.

The first challenge for Dell is to put its story together and convince the customers that they are no longer Dell, the PC/laptop direct seller, but a formidable company capable of providing enterprise solutions, services and support.

The second challenge, and even bigger one, is itself; its culture of changing mindset. The game has changed; the rule has change. The enterprise is a totally different ballgame. Is Dell ready? Is Dell ready to change itself?

Share this:

Before you start thinking that I am ripping off Lady Gaga, this blog’s name of “Storage Gaga” is NOT from Lady Gaga. It’s from Queen’s Radio Ga Ga song which I happen to be listening in my car.

Why Ga Ga? Ga Ga in the Free Dictionary (link: http://www.thefreedictionary.com/gaga) means crazy over something (at least one of the meanings anyway). That’s what I am. Since leaving my last job – which was on Tuesday (July 19th 2011) this week – I want to do more for storage networking and data management. I want to share things I find out, information that I have learned and so on.

So watch this space for more info … more on the way.

p/s. This rainy morning, I am going to arrange and organize all my computer books. It’s going to be fun!