Storage Highlights of 2009

It's the end of the year and that means it's time to either make predictions for the coming year or review the highlights from the past year. This article takes a look at the cool things that happened around storage in the past year and perhaps hints at some things in the coming year.

EXOFS – Object File Systems Come to LinuxUp to this point you are used to block based file systems regardless of whether they use tree type structures or log structures. Object based file systems, while they have been under study for some time, are only recently becoming production oriented. In version 2.6.30 of the kernel the first object-based file system, ExoFS was added.

Object based file systems are a bit different in that the data is broken into chunks. Each chunk is assigned an object ID which is how the file system refers to the data. The drive itself hides the details of the storage of the data from the file system (in general). EXOFS sits on top of an OSD initiator by using the OSD T-10 standard SCSI command set (OSD = Object Storage Device).

ExoFS is still under heavy development and is being used as an underlying file system for pNFS. But it is reached a stage where you can test it and begin to learn more about object based file systems (hint – they don’t look too different from tree-based file systems but under the covers they are really cool).

FS-Cache and Cache-FSIn version 2.6.30 of the kernel, a client-side caching system for networking file systems such as AFS and NFS was added. FS_Cache is the interface between the file system and the cache allowing the file system to be cache agnostic. CacheFS itself is the caching back end for FS-cache.

Using FS-Cache and Cache-FS with something like NFS can be effective in some cases. As described in an earlier article in some cases FS-Cache can be used effectively but the key thing is that the data has to be cached on the client. For example, the data has to be created on the client or copied to the client putting the data into the client cache. Then if the data is accessed again it will be used from the cache rather than the server.

POHMELFSDistributed file systems are pretty much the standard today. Putting your storage in a single location and then sharing it over a network with other servers or clients is basically the norm. NFS and CIFS are the most common ways to sharing data and most operating systems support them (thanks to *nix and Samba) but even these protocols have limitations.

In version 2.6.30 of the kernel (aka’ “chock-full-of-filesystems”) the long-time kernel hacker Evgeniy Polyakov, contributed a new distributed file system, called POHMELFS. It is a bit different from NFS and has some unique features:

One of the most important attributes is the ability to write to multiple servers and balance reading between multiple servers.

POHMELFS has a local coherent cache for data and metadata (this basically adds some of the features of FS-Cache and CacheFS to the network file system.)

POHMELFS is designed to have a flexible object architecture that is optimized for network processing. Network processing is a potential weak point for distributed file systems since file systems can be “chatty” and create many small messages that are not always optimal for networks. The design of the object architecture allows for very long paths to the objects and the ability remove arbitrary size directories with a single network command.

The server portion is multi-threaded and scalable and, perhaps more importantly, is in user space. There is only a driver for POHMELFS in the kernel. The client and the server are all in user-space and interact with the driver.

It has the ability for the clients to dynamically add or remove servers from a working set.

POHMELFS is also designed for strong authentication with the possibility of data encryption in the network channel.

It has extended attribute support.

It can do read-only mounts and also has the ability to limit maximum size of the exported directory.

However, be warned that POHMELFS is very much a work in progress. It is in the staging area of kernel drivers and the developer has announced that POHMELFS will be ported to use a distributed hash table. But it is definitely something to watch.

Rise in the Popularity of SSD’s

Solid State Drives (SSD) are probably the overall hottest technology in storage. You can find them in many laptops and netbooks and there are a huge number of SSD drives available for desktops and servers (just take a look at a site such as newegg). Of course they range in price from under a hundred dollars (such as the new Kingston SSDNow V 40GB MLC SSD which, for a while, was under $100 after rebate) to the PCIe Fusion-IO 640GB ioDrive which is about $19,200.

However, while SSD’s sound like the best thing since sliced bread (or perhaps bread itself), there are some limitations to them as explained in a recent article. Fundamentally, SSD’s have limitations that stem from the technology and how the chips that make up the drive are constructed. In particular,

NAND Flash cells have a limited number of erase/program cycles before they can no longer retain data

NAND Flash cells can read a byte at a time or read/write a page at a time, but an entire block must be erased if one cell in the block is erased

Seek times for NAND Flash chips is much lower than hard drives

To overcome some of the limitations drive manufacturers have developed some adaptations.

Wear Leveling (techniques for ensuring that the NAND cells have approximately the same number of rewrite cycles).

Over-provisioning (reserving some space for the drive’s use – i.e. not accessible to the user)

Write Amplification Avoidance

Internal RAID

Data Coalescence

TRIM command (helping rewrite performance)v

File systems are still being “tweaked” to work better with current SSD’s. While some people think that SSD’s will evolve to behave more like hard drives so file systems don’t need to be “tweaked”, for the next period of time file systems will have to be tweaked to take advantage of the unique features of SSD’s. Good examples of file systems with SSD adaptations are NILFS2 and Btrfs.

While I don’t like to make predictions, 2010 could be a break out year for SSD’s (Given my track record for predictions now that I’ve said it won’t come true — I just hope I don’t get sued).

Cloud Storage on the Rise

The latest Gartner hype curve has Cloud Computing at the peak of “inflated expectations.” However, it doesn’t differentiate between cloud computing itself and cloud storage (you have to pay for that level of differentiation). Needless to say, cloud storage is probably in a similar location on the hype curve but the prospect of using Cloud Storage for real storage solutions is on the rise (even if it is a little over-hyped).

Cloud Storage has many other names such as Content Addressable Storage (CSA), On-Line Archiving, and even the sometimes uncommon Unstructured Data Storage, but the premise is basically the same regardless of the name. The concept is that you can put your data on Cloud Storage along with some basic policies and then allow the system to manage your data based on the policies and guaranteeing some sort of data protection. Common commercial examples of cloud storage are Caringo and Parascale. These solutions rely on the use of replication rather than just RAID for data resiliency (i.e. the ability to tolerate errors) since they can easily consist of many petabytes in a single name space.

No solution is perfect and cloud storage has its challenges but ideally Cloud Storage has the following attributes:

Ease of Management

Self-replicating

Self-healing

Self-balancing

Some of the challenges they face include:

Security (always an issue and not necessarily a cloud storage specific issue)

Data integrity (making sure the stored data is “correct”)

Power (since you have copies you will have extra storage which adds power)

Replication time and costs (how fast can you replicate data since this can be important to data resiliency)

Cost (how much extra money do you have to pay to buy the extra storage for copies)

Reliability

Cloud Storage is definitely an option for backups of your home systems if you have a reasonable network to/from your house. Using it in an enterprise situation requires careful examination of the risks and payoffs including the option of a private cloud storage system. But the potential is there if you understand the issues. 2009 saw a great deal of growth in this area – let’s just hope that hype doesn’t slow it down too much.

10GigE is Coming Rapidly

Why talk about a network technology in a storage column? The answer is fairly evident – networks are an integral part of almost all enterprise class storage systems. Fibre Channel (FC) networks as part of Storage Area Networks (SANs) are very popular and have very good performance and great management and monitoring tools but their evolution in terms of performance and even price/performance has not been on par with Ethernet. However, up until 10GigE, FC has always been faster so despite the price of GigE, many people have not used it for SANs.

10GigE has been out for a few years providing similar performance to FC but the price has always been much higher than FC. However, during 2009, the price of 10GigE NICs and, perhaps more importantly, larger port count 10GigE switches, have reached a point where it can be practically deployed on a per-server basis. Companies such as Arista have brought the per port price of 10GigE switches down to a very competitive level with FC. Does this mean that FC will disappear? Not likely since FC is widely deployed and the cost for a wholesale switchover can be prohibitive. But as new servers are deployed 10GigE is a very attractive option since it has a price and price/performance advantage, in many cases, over FC. In addition you can also combine the storage traffic with normal communication traffic into a single network connection, saving money.

One Prediction for 2010

While I said that I would not make any predictions for 2010 since this is really a look back at 2009, I think I will go out on a limb and make one prediction for 2010. My prediction is that the people everywhere will rise up and rename the iSCSI nodes “client” and “server” as God intended them to be instead of such silly names such as “target” and “initiator” that no one except those that are fond of arcane origins of names of objects are apt to use. I mean really… “target” and “initiator”? Why not “bob” and “snorkle”?

So I am predicting that the forces of good and of light will overcome the oppressive forces of imperialism, darkness, and evil in 2010 to create an easy to understand iSCSI terminology.