What To Do When Your “Core” Infrastructure Services Aren’t In Your “Core?”

Okay. I am teh lam3r. I'd be intellectually dishonest if I didn't post this, and it's likely I'll revise it once I get to think about it more, but I've got to get it down. Thanks to an innocent tweet from @botchagalupe I had an aneurysm epiphany. Sort of

A little light went on in my head this morning regarding how the cloud, or more specifically layers of clouds and the functions they provide (a-la SOA,) dramatically impact the changing landscape of what we consider "core infrastructure services," our choices on architecture, service provisioning, and how and from whence they are provided.

Specifically, the synapse fired on the connection between Infrastructure 2.0 as is usually talked about from the perspective of the evolution from the enterprise inside to out versus the deployment of services constructed from scratch to play in the cloud.

You've no doubt seen discussions from Greg Ness (InfoBlox) and Lori Mac Vittie (f5) regarding their interpretation of Infrastructure 2.0 and the notion that by decoupling infrastructure services from their physical affinity we can actually "…enable greater levels of integration between the disparate layers of infrastructure: network, application, the endpoint, and IP address management, necessary to achieve interconnectedness."

Totally agree. Been there, done that, bought the T-Shirt, but something wasn't clicking as it relates to what this means relative to cloud.

I was slurping down some java this morning and three things popped into my head as I was flipping between Twitter and Google Reader wondering about how I might consider launching a cloud-based service architecture and what impact it would have on my choices for infrastructure and providers.

Here are the three things that I started to think about in regards to what "infrastructure 2.0" might mean to me in this process, beyond the normal criteria related to management, security, scalability, etc…

I always looked at these discussions of Infrastructure 2.0 as ideation/marketing by vendors on how to take products that used to function in the "Infratructure 1.0" dominion, add a service control plane/channel and adapt them for the inside-out version of the new world order that is cloud. This is the same sort of thing we've dealt with for decades and was highlighted when one day we all discovered the Internet and had to connect to it — although in that case we had standards!

Clouds are often discussed in either microcosmic vacuum or lofty, fluffy immensity and it makes it hard to see the stratosphere for the cirrocumulus. Our "non-cloud" internal enterprises today are conglomerates of technology integration with pockets of core services which provide the underpinnings for much of what keeps the machinery running. Cloud computing is similar in approach, but in this regard, it brings home again the point that there is no such thing as "THE Cloud" but rather that the overarching integration challenge lays in the notion of overlays or mash-ups of multiple clouds, their functions, and their associated platforms and API's.

Further, and as to my last blog post on private clouds and location independence, I really do believe that the notion of internal versus external clouds is moot, but that the definitional nuance of public versus private clouds — and their requisite control requirements — are quite important. Where, why, how and by whom services are provided becomes challenging because the distinction between inside and out can be really, really fuzzy, even more so if you're entirely cloud based in the first place.

For some reason, my thinking never really coalesced on how what relevance these three points have as it relates to the delivery of a service (and thus layers of applications) in a purely cloud based architecture built from scratch without the encumbrance of legacy infrastructure solutions.

I found this awesome blog post from Mike Brittain via a tweet from @botchagalupe titled "How we built a web hosting infrastructure on EC2" and even though the article is a fascinating read, the single diagram in the post hit me like a hammer in the head…and I don't know why it did, because it's not THAT profound, but it jiggled something loose that is probably obvious to everyone else already:

Do you see the first three layers? Besides the "Internet," as the transport, you'll see two of the most important service delivery functions staring back at you: Akamai's "Site Accelerator Proxy" CDN/Caching/Optimization offering and Neustar's "UltraDNS" distributed, topologically intelligent DNS services

The reason the light bulb went on for me is that I found that I was still caught in the old school infrastructure-as-a-box line of thought when it came to how I might provide the CDN/Caching and distributed DNS capabilities of my imaginary service.

It's likely I would have dropped right to the weeds and started thinking about which geographic load balancers (boxes) and/or proxies I might deploy somewhere and how (or if) they might integrate with the cloud "hosting/platform provider" to give me the resiliency and dynamic capabilities I wanted, let alone firewalls, IDP, etc.

Do I pick a provider that offers as part of the infrastructure a specific hardware-based load-balancing platform? Do I pick on that can accommodate the integration of a software-based virtual appliances. Should I care? With the cloud I'm not supposed to, but I find that I still, for many reasons — good and bad — do.

I never really thought about simply using a cloud-based service as a component in a mash-up of services that already does these things in ways that would be much cheaper, simpler, resilient and scalable than I could construct with "infrastructure 1.0" thinking. Heck, I could pick 2 or 3 of them, perhaps.

That being said, I've used outsourced "cloud-based" email filtering, vulnerability management, intrusion detection & prevention services, etc., but there are still some functions that for some reason appear to sacrosanct in the recesses of my mind?

I think I always just assumed that the stacking of outsourced (commoditized) services across multiple providers would be too complex but in reality, it's not very different from my internal enterprise that has taken decades to mature many of these functions (and consolidate them.)

Despite the relative immaturity of the cloud, it's instantly benefited from this evolution. Now, we're not quite all the way there yet. We still are lacking standards and that service control plane shared amongst service layers doesn't really exist.

I think it's a huge step to recognize that it's time to get over the bias of applying so called "infrastructure 1.0" requirements to the rules of engagement in the cloud by recognizing that many of these capabilities don't exist in the enterprise, either.

Now, it's highly likely that the two players above (Neustar and Akamai) may very well use the same boxes that *I* might have chosen anyway, but it's irrelevant. It's all about the service and engineering enough resiliency into the design (and choices of providers) such that I mitigate the risk of perhaps not having that "best of breed" name plate on a fancy set of equipment somewhere.

I can't believe the trap I fell into in terms of my first knee-jerk reaction regarding architecture, especially since I've spent so much of the last 5 years helping architect and implement "cloud" or "cloud-like" security services for outsourced capabilities.

So anyway, you're probably sitting here saying "hey, idiot, this is rather obvious and is the entire underlying premise of this cloud thing you supposedly understand inside and out." That comment would be well deserved, but I had to be honest and tell you that it never really clicked until I saw this fantastic example from Mike.

I'm glad to see you've reached this level of understanding. It was frustrating having to separate out the wheat of your insightful security commentary from the chaff of the rigid approach that you were taking.
Now if you'll start to evaluate the actual level of assurance (and level of industry successful adoption) provided by some of the network security products that you believe are indispensable to nearly all service architectures, you will reach the next level of Zen.

I imagine that Lori will jump in with a similar slant to this. As good as Akamai and UltraDNS are, they are not in the same league as having a traffic manager sitting in front of the web servers.
For example Mike speaks of adding further web servers manually, and it only taking 20 minutes. This is the sort of task that should be automated, and triggered by increased traffic. Then as the load falls off again, the servers need to be removed. Taking advantage of the billing model EC2 uses. This is the type of solution that can be coded in a traffic manager easily.
Otherwise I like the model and think it is a great start point.

"hey, idiot, this is rather obvious and is the entire underlying premise of this cloud thing you supposedly understand inside and out." ._.;
I'm interested in the opinions you revise based on your new understanding.

@Cory, well…sometimes it really is hard to see the forest for the trees and put on my architect hat without pre-pending (security) in front of it.
@Nick I think it's important to try and not get sucked into the best-of-breed vs. good enough argument because as it relates to where we are currently in regards to the evolution of cloud and the underlying premise of infrastructure abstraction, you don't necessarily get the choice of which (or any) traffic manager sitting in front of your web servers.
Now, folks like GoGrid not only provide "real traffic managers" but also make sure to brand them as f5…
So just to reiterate my point, I'm not arguing that this is a "better" solution that what we do in existing infrastructure/architecture, I'm just saying it opened my eyes as to how I better start thinking about getting from point A to point B.
/Hoff

@Anthony I think it's going to help me really layout no only the problems…but also the solutions to some of them in more meaningful ways. I'm SO tired about simply harping about the problems and not properly seeing both sides of the coin so as to highlight potential solutions that are balanced between both camps.
We'll see.
/Hoff

I accidentally hit post before I was done writing.
Thank you for your intellectual honesty. Very refreshing. You could have just had the "ah-ha!" moment and carried on without letting on that you were missing the point.
The wonderful thing about cloud infrastructure is that we can separate business application from infrastructure application and obtain portability like we've never had. We could deploy in house on our own infrastructure, on a customer's infrastructure or provide the service using cloud infrastructure.
The concept of deploying to the cloud should ("should") make software architects and developers build more modular applications that are ready to scale out of the box. This requires thinking about cleaner implementations up front and not doing things "the hard way" as many of my peers are prone to do. The hard way includes tying yourself to a particular platform and not having automated configuration and deploy procedures.
A long-lived production environment probably has a huge number of slap-it-in bug fixes to help the software cope with faulty environmental assumptions. When we deploy to a cloud we don't have or want that luxury. We want transient instances that are entirely self-sufficient. New instance, deploy software, run, shutdown. No human necessary. No layers of crazy glue to keep the thing running. Apps designed for the cloud FTW.

Unfortunately, the Go-Grid solution (which I have used and like) does not expose the best bits of the F5s. Just the simple load balancing, session persistance etc. Not iRules (as long winded as they are), and they definatly will not be exposing the API on a piece of shared infrastructure.
It's not about best of breed, it's about a functionality that people have become used to using to solve their application deployment problems. Why should the fact that the deployment is "somewhere" else mean they have to forego using those features. They won't, they will take that functionality with them into the cloud, and deploy it the same way they do their web apps. Software installed onto standard server builds.

@Nick: I understand what you mean about the limited functions exposed by GoGrid, but surely they could extend the offering with an abstracted capability without having to expose the entire API's feature set?
Regardless, while f5 is certainly a market leader in the space, they're not the only player that "…people are used to."
Further, we don't have feature parity in the cloud for many/most things we do in the enterprise and according to all the buzz, that doesn't seem to deter adoption.
In reality it's a demand-driven supply coupled with capability maturity…you can bet we'll see more feature rich enablement make its way to providers, whether they brand with certain infrastructure or not.
To your last point, you can't "take that functionality with them into the cloud" if it's physically impossible to do so. Again, if the point is to abstract the infrastructure, you can certainly expect like functionality, but it may be impossible to demand brand equity or exact functional duplication of infrastructure beyond OS.
This is what I was getting to in my post. If I were to select a cloud provider because they ran on f5 or Cisco (remember the "Powered by Cisco program?) then have I really removed affinity to my hardware? Am I truly abstracted from the infrastructure?
Software installed onto standard server builds is NOT the same thing as software installed onto standard server builds connected to "standard" load balancers, connected to "standard" routers, connected to "standard" switches…unless of course you see the collision of convergence and grok the vision of Cisco's new blade server/network switch/virtualization offering…
I think this is drifting a little off topic, but I do appreciate the comments.
/Hoff

And going even further, this automated process doesn't necessarily need to deal with just one service provider, in a way that you can add web servers from each of those according to the traffic characteristics and price models. It could happen in a way that prices are so dynamically affected by variable demand that they would constantly float up and down, with a management system verifying on the fly which provider from a pool is offering the best price and then getting the resources from that.
Wow! Can we call that "dynamic cloud services procurement?

Glad you enjoyed the post and that you found it helpful. It was definitely a first-round attempt at our cloud infrastructure.
We took a few shortcuts (esp. RRDNS) in the first shot, knowing that load balancing could be slotted in later. At the time, a load balancing service seemed like an obvious service for Amazon to later add. They have announced that they will indeed be adding that service soon, and I'm looking forward to trying it out.
In the meantime, the distribution of connections coming from Akamai's CDN nodes was enough to break load very evenly across our web servers. I'm running a similar project using Panther CDN now and am seeing similar results — RRDNS is "good enough" for my needs. I've looked at proxies like HAProxy that could be run in place of RRDNS, but at the expense of two more (redundant) EC2 instances. When Amazon's load balancing service becomes available, I'll look at that, too.