Yet Another AoE vs. iSCSI Opinion (YAAVIO)

As if there wasn’t already enough discussion surrounding AoE vs. iSCSI in mailing lists, forums and blogs, I am going to add more baseless opinion to the existing overwhelming heap of information on the subject. I’m sure this will be lost in the noise but after having implemented AoE with CORAID devices and iSCSI with an IBM (well, LSI) device and iSCSI with software targets in the past I feel I finally have something share.

This isn’t a technical analysis. I’m not dissecting the protocols nor am I suggesting implementation of either protocol for your project. What I am doing is sharing some of my experiences and observations simply because I can. Read on, brave souls.

Background

My experiences with AoE and iSCSI are limited to fairly small implementations by most standards. Multi-terabyte and mostly file serving with a little bit of database thrown in there for good measure. The reasoning behind all the AoE and iSCSI implementations I’ve setup is basically to detach storage from physical servers to achieve:

Independently managed storage that can grow without pain

High availability services front-end (multiple servers connecting to the same storage device(s))

There are plenty of other uses for these technologies (and other technologies that may satisfy these requirement), but that’s where I draw my experiences from. I’ve not deployed iSCSI or AoE for virtual infrastructure which does seem to be a pretty hot topic these days, so if that’s what you’re doing, your mileage will vary.

Performance

Yeah, yeah, yeah, everyone wants the performance numbers. Well, I don’t have them. You can find people comparing AoE and iSCSI performance elsewhere (even if many of the tests are flawed). Any performance numbers I may accidentally provide while typing this up in a mad frenzy are entirely subjective and circumstantial… I may not even end up providing any! Do you own testing, it’s the only way you’ll ever be sure.

The Argument For or Against

I don’t really want to be trying to convince anyone to use a certain technology here. However, I will say it: I lean towards AoE for the types of implementations I mentioned above. Why? One reason: SIMPLICITY. Remember the old KISS adage? Well, kiss me AoE because you’ve got the goods!

iSCSI can do a lot of things, in a lot of different situations. iSCSI is routable in layer 3 by nature. AoE is not. iSCSI has a behemoth sized load of options and settings that can be tweaked for any particular implementation needs. iSCSI has big vendor backing in both the target and the initiator markets. Need to export an iSCSI device across a WAN link? Sure, you can do it, never mind that the performance might be less than optimal but the point is it’s not terribly involved or “special” to route iSCSI over a WAN because iSCSI is designed from the get-go to run over the Internet. While AoE over a WAN has been demonstrated with GRE, it’s not inherent to the design of AoE and never will be.

So what does AoE have that iSCSI doesn’t? Simplicity and less overhead. AoE doesn’t have myriad of configuration options to get wrong, it’s really so straight forward that it’s hard to get it wrong. iSCSi is easy to get wrong. Tune your HBA firmware settings or software initiator incorrectly (and the factory defaults can easily be “wrong” for any particular implementation) and watch all hell be unleashed before your eyes. If you’ve ever looked at the firmware options provided to by QLogic in their HBAs and you’re not an iSCSI expert, you’ll know what I’m talking about.

Simplicity Example: Multipath I/O

A great example of AoE’s simplicity vs. iSCSI is when it comes to multipath I/O. Multipath I/O is defined as utilizing multiple paths to the same device/LUN/whatever to gain performance and/or redundancy. This is generally implemented with multiple HBAs or NICs on the initiator side and multiple target interfaces on the target side.

With iSCSI, every path to the same device provides the operating system with a separate device. In Linux, that’ll be /dev/sdd, /dev/sde, /dev/sdf, etc. A software layer (MPIO) is required to manage I/O across all the devices in an organized and sensible fashion.

While I’m a fairly big fan of the latest device-mapper-multipath MPIO layer in modern Linux variants, I find AoE’s multipath I/O method much, much better for the task of providing multiple paths to a storage device because it has incredibly low overhead to setup and manage. AoE’s implementation has the advantage that it doesn’t need to be everything to every storage subsystem, which fortunately or unfortunately device-mapper-multipath has to be.

The AoE Linux driver totally abstracts multiple paths in a way that iSCSI does not by handling all the multipath stuff internally. The host is only provided with a single device in /dev that is managed identically to any other non-multipath device. You don’t even need to configure the driver in any special way, just plug in the interfaces and go! That’s a long shot from what is necessary with MPIO layers and iSCSI.

There’s nothing wrong about device-mapper-multipath and it is quite flexible, but it certainly doesn’t have the simplicity of AoE’s multipath design.

Enterprise Support

Enterprise support is where iSCSI shines in this comparison. Show me a major storage vendor that doesn’t have at least one iSCSI device, even if they are just rebranded. Ok, maybe there are a few vendors out there without an iSCSI solution but for the most part all the big boys are flaunting some kind of iSCSI solution. NetApp, EMC, Dell, IBM, HDS and HP all have iSCSI solutions. On the other hand, AoE only has only a single visible company backing it at the commercial level: CORAID, a spin-off company started by Brantley Coile (yeah, the guy who invented the now-Cisco PIX and AoE). I’m starting to see some Asian manufacturers backing AoE on the hardware level but when it comes to your organization buying rack mount AoE compatible disk trays, CORAID is the only vendor I would suggest at this time.

This isn’t so fantastic for getting AoE into businesses but it’s a start. With AoE in the Linux kernel and Asian vendors packing AoE into chips things will likely pickup for AoE from an enterprise support point of view: It’s cheap, it’s simple and performance is good.

Conclusion

AoE rocks! iSCSI is pretty cool too, but I’ve certainly undergone much worse pain working with much more expensive iSCSI SAN devices vs the CORAID devices. And no performance benefit that I could realize with moderate to heavy file serving and light database workloads. I like AoE over iSCSI but there are plenty of reasons not to like it as well.