Archive

The premise of my Commode Computing presentation was to reinforce that we desperately require automation in all aspects of “security” and should work toward leveraging APIs in stacks and products to enable not only control but also audit and compliance across physical and virtualized solutions.

There are numerous efforts underway that underscore both this need and the industry’s response to such. Platform providers (virtualization and cloud) are leading this charge given that much of their stacks rely upon automation to function and the ecosystem of third party solutions which provide value are following suit, also.

Most of the work exists around ensuring that the latest virtualized versions of products/solutions are API-enabled while the CLI/GUI-focused configuration of older products rely in many cases still on legacy management consoles or intermediary automation and orchestration “middlemen” to automate.

Here’s a great example of how one might utilize (Perl) scripting and RESTful APIs against VMware’s vShield Edge solution to provision, orchestrate and even audit firewall policies using their API. It’s a fantastic write-up from Richard Park of SourceFire (h/t to Davi Ottenheimer for the pointer):

Here is an overview of how to use perl code to work with VMware’s vShield API.

vShield App and Edge are two security products offered by VMware. vShield Edge has a broad range of functionality such as firewall, VPN, load balancing, NAT, and DHCP. vShield App is a NIC-level firewall for virtual machines.

We’ll focus today on how to use the API to programatically make firewall rule changes. Here are some of the things you can do with the API:

In my Commode Computing talk, I highlighted the need for security automation through the enablement of APIs. APIs are centric in architectural requirements for the provisioning, orchestration and (ultimately) security of cloud environments.

So there’s a “dark side” with the emergence of APIs as the prominent method by which one now interacts with stacks — and it’s highlighted in VMware’s vCloud Director Hardening Guide wherein beyond the normal de rigueur deployment of stateful packet filtering firewalls, the deployment of a Web Application Firewall is recommended.

Why? According to VMware’s hardening guide:

In summary, a WAF is an extremely valuable security solution because Web applications are too sophisticated for an IDS or IPS to protect. The simple fact that each Web application is unique makes it too complex for a static pattern-matching solution. A WAF is a unique security component because it has the capability to understand what characters are allowed within the context of the many pieces and parts of a Web page.

I don’t disagree that web applications/web services are complex. I further don’t disagree that protecting the web services and messaging buses that make up the majority of the exposed interfaces in vCloud Director don’t require sophisticated protection.

This, however, brings up an interesting skill-set challenge.

How many infrastructure security folks do you know that are experts in protecting, monitoring and managing MBeans, JMS/JMX messaging and APIs? More specifically, how many shops do you know that have WAFs deployed (in-line, actively protecting applications not passively monitoring) that did not in some way blow up every app they sit in front of as well as add potentially significant performance degradation due to SSL/TLS termination?

Whether you’re deploying vCloud or some other cloud stack (I just happen to be reading these docs at the moment,) the scope of exposed API interfaces ought to have you re-evaluating your teams’ skillsets when it comes to how you’re going to deal with the spotlight that’s now shining directly on the infrastructure stacks (hardware and software) their private and public clouds.

Many of us have had to get schooled on web services security with the emergence of SOA/Web Services application deployments. But that was at the application layer. Now it’s exposed at the “code as infrastructure” layer.

Think about it.

/Hoff

[Update 6/7/11 – Here are two really timely and interesting blog posts on the topic of RESTful APIs:

It’s often said that in order to make money, you have to spend money; invest in order to succeed.

However, when faced with the realities of budget shortfalls and unsavory economic pressure, it seems the easiest thing to do is simply hunt around for top-line blips on the budget radar and kill them, regardless of the long term implications that ceasing investment in efficiency and transparency programs have in reducing bottom line pain.

Cutting these funds would also shut down the Fedspace federal social network and, notably, the FedRAMP cloud computing cybersecurity programs. Unsurprisingly, open government advocates in the Sunlight Foundation and the larger community have strongly opposed these cuts.

Did you catch the last paragraph? They’re kidding, right?

After demonstrable empirical data that shows how Vivek Kundra and his team’s plans for streamlining government IT is already saving money (and will continue to do so,) this is what we get? Slash and burn? I attempted to search for the investment made thus far in FedRAMP using the IT Dashboard, but couldn’t execute an appropriate search. Anyone know that answer?

Now, I’m not particularly fond of how the initial FedRAMP drafts turned out, but I’m optimistic that the program will evolve, will ultimately make a difference and lead to more assured, cost-efficient and low-friction adoption of Cloud Computing. We need FedRAMP to succeed and we need continued investment in it to do so. Let’s not throw the baby out with the Cloud water.

We literally cannot afford for FedRAMP (or these other transparency programs) to be cut — these are the very programs that will lead to long term efficiency and fiscally-responsible programs across the U.S. Government. They will ultimately make us more competitive.

Every day for the last week or so after their launch, I’ve been asked left and right about whether I’d spoken to CloudPassage and what my opinion was of their offering. In full disclosure, I spoke with them when they were in stealth almost a year ago and offered some guidance as well as the day before their launch last week.

Disappointing as it may be to some, this post isn’t really about my opinion of CloudPassage directly; it is, however, the reaffirmation of the deployment & delivery models for the security solution that CloudPassage has employed. I’ll let you connect the dots…

Specifically, in public IaaS clouds where homogeneity of packaging, standardization of images and uniformity of configuration enables scale, security has lagged. This is mostly due to the fact that for a variety of reasons, security itself does not scale (well.)

In an environment where the underlying platform cannot be counted upon to provide “hooks” to integrate security capabilities in at the “network” level, all that’s left is what lies inside the VM packaging:

If we focus on the first item in that list, you’ll notice that generally to effect policy in the guest, you must have a footprint on said guest — however thin — to provide the hooks that are needed to either directly effect policy or redirect back to some engine that offloads this functionality. There’s a bit of marketing fluff associated with using the word “agentless” in many applications of this methodology today, but at some point, the endpoint needs some sort of “agent” to play*

So that’s where we are today. The abstraction offered by virtualized public IaaS cloud platforms is pushing us back to the guest-centric-based models of yesteryear.

This will bring challenges with scale, management, efficacy, policy convergence between physical and virtual and the overall API-driven telemetry driven by true cloud solutions.

Finally, since I used them for eyeballs, please do take a look at CloudPassage — their first (free) offerings are based upon leveraging small footprint Linux agents and a cloud-based SaaS “grid” to provide vulnerability management, and firewall/zoning in public cloud environments.

/Hoff

* There are exceptions to this rule depending upon *what* you’re trying to do, such as anti-malware offload via a hypervisor API, but this is not generally available to date in public cloud. This will, I hope, one day soon change.

There are generally three dichotomies of thought when it comes to the notion of how much security should fall to the provider of the virtualization or cloud stack versus that of the consumer of their services or a set of third parties:

The virtualization/cloud stack provider should provide a rich tapestry of robust security capabilities “baked in” to the platform itself, or

The virtualization/cloud stack provider should provide security-enabling hooks to enable an ecosystem of security vendors to provide the bulk of security (beyond isolation) to be “bolted on,” or

The virtualization/cloud stack provider should maximize the security of the underlying virtualization/cloud platform and focus on API security, isolation and availability of service only while pushing the bulk of security up into the higher-level programatic/application layers, or

So where are we today? How much security does the stack, itself, need to provide. The answer, however polarized, is somewhere in the murkiness dictated by the delivery models, deployment models, who owns what part of the real estate and the use cases of both the virtualization/cloud stack provider and ultimately the consumer.

I’ve had a really interesting series of debates with the likes of Simon Crosby (of Xen/Citrix fame) on this topic and we even had a great debate at RSA with Steve Herrod from VMware. These two “infrastructure” companies and their solutions typify the diametrically opposed first two approaches to answering this question while cloud providers who own their respective custom-rolled “stacks” at either end of IaaS and SaaS spectrums such as Amazon Web Services and Salesforce bringing up the third.

As with anything, this is about the tenuous balance of “security,” compliance, cost, core competence and maturity of solutions coupled with the sensitivity of the information that requires protection and the risk associated with the lopsided imbalance that occurs in the event of loss.

There’s no single best answer which explains why were have three very different approaches to what many, unfortunately, view as the same problem.

Today’s “baked in” security capabilities aren’t that altogether mature or differentiated, the hooks and APIs that allow for diversity and “defense in depth” provide for new and interesting ways to instantiate security, but also add to complexity, driving us back to an integration play. The third is looked upon as proprietary and limiting in terms of visibility and transparency and don’t solve problems such as application and information security any more than the other two do.

Will security get “better” as we move forward with virtualization and cloud computing. Certainly. Perhaps because of it, perhaps in spite of it.

One thing’s for sure, it’s going to be messy, despite what the marketing says.

DevOps — what it means and how it applies — is a fascinating topic that inspires all sorts of interesting reactions from people, polarized by their interpretation of what this term really means.

At CloudCamp Denver, adjacent to Gluecon, Aaron Peterson of OpsCode gave a lightning talk titled: “Operations as Code.” I’ve seen this presentation on-line before, but listened intently as Aaron presented. You can see John Willis’ version on Slideshare here. Adrian Cole (@adrianfcole) of jClouds fame (and now Opscode) and I had an awesome hour-long discussion afterwards that was the genesis for this post.

“Operations as Code” (I’ve seen it described also as “Infrastructure as Code”) is really a fantastically sexy and intriguing phrase. When boiled down, what I extract is that the DevOps “movement” is less about developers becoming operators, but rather the notion that developers can be part of the process whereby they help enable operations/operators to repeatably and with discipline, automate processes that are otherwise manual and prone to error.

[Ed: great feedback from Andrew Shafer: “DevOps isn’t so much about developers helping operations, it’s about operational concerns becoming more and more programmable, and operators becoming more and more comfortable and capable with that. Further, John Allspaw (@allspaw) added some great commentary below – talking about DevOps really being about tools + culture + communication. Adam Jacobs from Opscode *really* banged out a great set of observations in the comments also. All good perspective.]

Automate, automate, automate.

While I find the message of DevOps totally agreeable, it’s the messaging that causes me concern, not because of the groups it includes, but those that it leaves out. I find that the most provocative elements of the DevOps “manifesto” (sorry) are almost religious in nature. That’s to be expected as most great ideas are.

In many presentations promoting DevOps, developers are shown to have evolved in practice and methodology, but operators (of all kinds) are described as being stuck in the dark ages. DevOps evangelists paint a picture that compares and contrasts the Agile-based, reusable componentized, source-controlled, team-based and integrated approach of “evolved” development environments with that of loosely-scripted, poorly-automated, inefficient, individually-contributed, undisciplined, non-source-controlled operations.

You can see how this might be just a tad off-putting to some people.

In Aaron’s presentation, the most interesting concept to me is the definition of “infrastructure.” Take the example to the right, wherein various “infrastructure” roles are described. What should be evident is that to many — especially those in enterprise (virtualized or otherwise) or non-Cloud environments — is that these software-only components represent only a fraction of what makes up “infrastructure.”

The loadbalancer role, as an example makes total sense if you’re using HAproxy or Zeus ZXTM. What happens if it’s an F5 or Cisco appliance?

What about the routers, switches, firewalls, IDS/IPS, WAFs, SSL engines, storage, XML parsers, etc. that make up the underpinning of the typical datacenter? The majority of these elements — as most of them exist today — do not present consistent interfaces for automation/integration. Most of them utilize proprietary/closed API’s for management that makes automation cumbersome if not impossible across a large environment.

Many will react to that statement by suggesting that this is why Cloud Computing is the great equalizer — that by abstracting the “complexity” of these components into a more “simplified” set of software resources versus hardware, it solves this problem and without the hardware-centric focus of infrastructure and the operations mess that revolves around it today, we’re free to focus on “building the business versus running the business.”

I’d agree. The problem is that these are two different worlds. The mass-market IaaS/PaaS providers who provide abstracted representations of infrastructure are still the corner-cases when compared to the majority of service providers who are entering the Cloud space specifically focused on serving the enterprise, and the enterprise — even those that are heavily virtualized — still very dependent upon hardware.

This is where the DevOps messaging miss comes — at least as it’s described today. DevOps is really targeted (today) toward the software-homogeneity of public, mass-market Cloud environments (such as Amazon Web Services) where infrastructure can be defined as abstract component, software-only roles, not the complex mish-mash of hardware-focused IT of the enterprise as it currently stands. This may be plainly obvious to some, but the messaging of DevOps is obscuring the message which is unfortunate.

DevOps is promoted today as a target operational end-state without explicitly defining that the requirements for success really do depend upon the level of abstraction in the environment; it’s really focused on public Cloud Computing. In and of itself, that’s not a bad thing at all, but it’s a “marketing” miss when it comes to engaging with a huge audience who wants and needs to get the DevOps religion.

You can preach to the choir all day long, but that’s not going to move the needle.

My biggest gripe with the DevOps messaging is with the name itself. If you expect to truly automate “infrastructure as code,” we really should call it NetSecDevOps. Leaving the network and security teams — and the infrastructure they represent — out of the loop until they are either subsumed by software (won’t happen) or get religion (probable but a long-haul exercise) is counter-productive.

Take security, for example. By design, 95% of security technology/solutions are — by design — not easily automatable or are built to require human interaction given their mission and lack of intelligence/correlation with other tools. How do you automate around that? It’s really here that the statement I’ve made that “security doesn’t scale” is apropos. Believe me, I’m not making excuses for the security industry, nor am I suggesting this is how it ought to be, but it is how it currently exists.

Of course we’re seeing the next generation of datacenters in the enterprise become more abstract. With virtualization and cloud-like capabilities being delivered with automated provisioning, orchestration and governance by design for BOTH hardware and software and the vision of private/public cloud integration baked into enterprise architecture, we’re actually on a path where DevOps — at its core — makes total sense.

I only wish that (NetSec)DevOps evangelists — and companies such as Opscode — would address this divide up-front and start to reach out to the enterprise world to help make DevOps a goal that these teams aspire to rather than something to rub their noses in. Further, we need a way for the community to contribute things like Chef recipes that allow for flow-through role definition support for hardware-based solutions that do have exposed management interfaces (Ed: Adrian referred to these in a tweet as ‘device’ recipes)

The guys from SearchCloudComputing gave me a ring and we chatted about CloudAudit. The interview that follows is a distillation of that discussion and goes a long way toward answering many of the common questions surrounding CloudAudit/A6. You can find the original here.

What are the biggest challenges when auditing cloud-based services, particularly for the solution providers?

Christofer Hoff:: One of the biggest issues is their lack of understanding of how the cloud differs from traditional enterprise IT. They’re learning as quickly as their customers are. Once they figure out what to ask and potentially how to ask it, there is the issue surrounding, in many cases, the lack of transparency on the part of the provider to be able to actually provide consistent answers across different cloud providers, given the various delivery and deployment models in the cloud.

How does the cloud change the way a traditional audit would be carried out?

Hoff: For the most part, a good amount of the questions that one would ask specifically surrounding the infrastructure is abstracted and obfuscated. In many cases, a lot of the moving parts, especially as they relate to the potential to being competitive differentiators for that particular provider, are simply a black box into which operationally you’re not really given a lot of visibility or transparency.
If you were to host in a colocation provider, where you would typically take a box, the operating system and the apps on top of it, you’d expect, given who controls what and who administers what, to potentially see a lot more, as well as there to be a lot more standardization of those deployed solutions, given the maturity of that space.

How did CloudAudit come about?

Hoff: I organized CloudAudit. We originally called it A6, which stands for Automated Audit Assertion Assessment and Assurance API. And as it stands now, it’s less in its first iteration about an API, and more specifically just about a common namespace and interface by which you can use simple protocols with good authentication to provide access to a lot of information that essentially can be automated in ways that you can do all sorts of interesting things with.

How does it work exactly?

Hoff: What we wanted to do is essentially keep it very simple, very lightweight and easy to implement without cloud providers having to make a lot of programmatic changes. Although we’re not prescriptive about how they do it (because each operation is different), we expect them to figure out how they’re going to get the information into this namespace, which essentially looks like a directory structure.

This kind of directory/namespace is really just an organized repository. We don’t care what is contained within those directories: .pdf, text documents, links to other websites. It could be a .pdf of a SAS 70 report with a signature that refers back to the issuing governing body. It could be logs, it could be assertions such as firewall=true. The whole point here is to allow these providers to agree upon the common set of minimum requirements.
We have aligned the first set of compliance-driven namespaces to that of theCloud Security Alliance‘s compliance control-mapping tool. So the first five namespaces pretty much run the gamut of what you expect to see most folks concentrating on in terms of compliance: PCI DSS, HIPAA, COBIT, ISO 27002 and NIST 800-53…Essentially, we’re looking at both starting with those five compliance frameworks, and allowing cloud providers to set up generic infrastructure-focused type or operational type namespaces also. So things that aren’t specific to a compliance framework, but that you may find of interest if you’re a consumer, auditor, or provider.

Who are the participants in CloudAudit?

Hoff: We have both pretty much the largest cloud providers as well as virtualization platform and cloud platform providers on the planet. We’ve got end users, auditors, system integrators. You can get the list off of the CloudAudit website. There are folks from CSC, Stratus, Akamai, Microsoft, VMware, Google, Amazon Web Services, Savvis, Terrimark, Rackspace, etc.

What are your short-term and long-term goals?

Hoff: Short-term goals are those that we are already trucking toward: to get this utilized as a common standard by which cloud providers, regardless of location — that could be internal private cloud or could be public cloud — essentially agree on the same set of standards by which consumers or interested parties can pull for information.

In the long-term, we wish to be able to improve visibility and transparency, which will ultimately drive additional market opportunities because, for example, if you have various levels of authentication, anywhere from anonymous to system administrator to auditor to fully trusted third party, you can imagine there’ll be a subset of anonymized information available that would actually allow a cloud broker or consumer to poll multiple cloud providers and actually make decisions based upon those assertions as to whether or not they want to do business with that cloud provider.

…It gives you an opportunity to shop wisely and ultimately compares services or allow that to be done in an automated fashion. And while CloudAudit does not seek to make an actual statement regarding compliance, you will ultimately be provided with enough information to allow either automated tools or at least auditors to get back to the business of auditing rather than data collection. Because this data gathering can be automated, it means that instead of having a PCI audit once every year, or every 6 months, you can have it on a schedule that is much more temporal and on-demand.

What will solution providers and resellers be able to take from it? How is it to their benefit to get involved?

Hoff: The cloud service providers themselves, for the most part, are seeing this as a tremendous opportunity to not only reduce cost, but also make this information more visible and available…The reality is, in many cases, to be frank, folks that make a living auditing actually spend the majority of their time in data collection rather than actually looking at and providing good, actual risk management, risk assessment and/or true interpretation of the actual data. Now the automation of that, whether it’s done on a standard or on an ad-hoc basis, could clearly put a crimp in their ability to collect revenues. So the whole point here is their “value-add” needs to be about helping customers to actually manage risk appropriately vs. just kind of becoming harvesters of information. It behooves them to make sure that the type of information being collected is in line with the services they hope to produce.

What needs to be done for this to become an industry standard?

Hoff: We’ve already written a normative spec that we hope to submit to the IETF. We have cross-section representation across industry, we’re building namespaces, specifications, and those are not done in the dark. They’re done with a direct contribution of the cloud providers themselves, because they understand how important it is to get this information standardized. Otherwise, you’re going to be having ad-hoc comparisons done of services which may not portray your actual security services capabilities or security posture accurately. We have a huge amount of interest, a good amount of participation, and a lot of alliances that are already bubbling with other cloud standards.

Cloud computing changes the game for many security services, including vulnerability management, penetration testing and data protection/encryption, not just audits. Is the CloudAudit initiative a piece of a larger cloud security puzzle?

Hoff: If anything, it’s a light bulb in the darkness. For us, it’s allowing these folks to adjust their tools to be able to consume the data that’s provided as part of the namespace within CloudAudit, and then essentially in the same way, we suggest human auditors focus more on interpreting that data rather than gathering it.
If gathering that data was unavailable to most of the vendors who would otherwise play in that space, due to either just that data not being presented or it being a violation of terms of service or acceptable use policy, the reality is that this is another way for these tool vendors to get back into the game, which is essentially then understanding the namespaces that we have, being able to modify their tools (which shouldn’t take much, since it’s already a standard-based protocol), and be able to interpret the namespaces to actually provide value with the data that we provide.
I think it’s an overall piece here, but again we’re really the conduit or the interface by which some of these technologies need to adapt. Rather than doing a one-off by one-off basis for every single cloud provider, you get a standardized interface. You only have to do it once.

Where should people go to get involved?

Hoff: If people want to get involved, it’s an open project. You can go to cloudaudit.org. There you’ll find links about us. There’ll be a link to the farm. The farm itself is currently a Google group, which you can sign up for and participate. We have calls every Monday, which are posted on the farm and tell you how to connect. You can also replay the last of the many calls that we’ve had already as we record them each time so that people have both the audio and visual versions of what we produce and how we’re going about this, and it’s very transparent and very open and we enjoy people getting involved. If you have something to add, please do.

Update 2/1/10: The A6 effort is in full-swing. You can find out more about it at the Google Groupshere.

A few weeks ago I penned a blog discussing an idea I presented at a recent Public Sector Cloud gathering that later inherited the name “Audit, Assertion, Assessment, and Assurance API (A6)”

The case for A6 is straightforward:

…take the capabilities of something like SCAP and embed a standardized and open API layer into each IaaS, PaaS and SaaS offering [Ed: At the API layer of each deployment model] to provide not only a standardized way of scanning for network vulnerabilities, but also configuration management, asset management, patch remediation, compliance, etc.

This way you win two ways: automated audit and security management capability for the customer/consumer and a a streamlined, cost effective, and responsive way of automating the validation of said controls in relation to compliance, SLA and legal requirements for service providers.

Much discussion ensued on Twitter and via email/blogs explaining A6 in better detail and with more specificity.

The idea has since grown legs and I’ve started to have some serious discussions with “people” (*wink wink*) who are very interested in making this a reality, especially in light of business and technical use cases bubbling to the surface of late.

To that end, Ben (@ironfog) has taken the conceptual mumblings and begun work on a RESTful interface for A6. You can find the draft documentation here. You can find his blog and awesome work on making A6 a reality here. Thank you so much, Ben.

NOTE: The documentation/definitions below are conceptual and stale. I’ve left them here because they are important and relevant but are likely not representative of the final work product.

I’m thinking of pulling together a more formalized working group for A6 and push hard with some of those “people” above to get better definition around its operational realities as well as understand the best way to create an open and extensible standard going forward.

Please See the follow-on to this post: http://www.rationalsurvivability.com/blog/?p=1276

Update: Wow, did this ever stir up an amazing set of commentary on Twitter. No hash tag, unfortunately, but comments from all angles. Most of the SecTwits dropped into “fire in the hole” mode, but it’s understandable. Thank you @rybolov (who was there when I presented this to the gub’mint and @shrdlu who was the voice of, gulp, reason

This lead to Craig’s excellent idea around solving a problem related to not being able to perform network-based vulnerability scans of Cloud-hosted infrastructure due to contractual and technical concerns related to multi-tenancy. Specifically, Craig lobbied to create an open standard for vulnerability scanning API’s (an example I’ve been using in my talks for quite some time to illustrate challenges in ToS, for example.) It’s an excellent idea.

So I propose — as I did to a group of concerned government organizations yesterday — that we take this concept a step further, beyond just “vulnerability scanning.”

Let’s solve BOTH of the challenges above with one solution.

Specifically, let’s take the capabilities of something like SCAP and embed a standardized and open API layer into each IaaS, PaaS and SaaS offering (see the API blocks in the diagram below) to provide not only a standardized way of scanning for network vulnerabilities, but also configuration management, asset management, patch remediation, compliance, etc.

This way you win two ways: automated audit and security management capability for the customer/consumer and a a streamlined, cost effective, and responsive way of automating the validation of said controls in relation to compliance, SLA and legal requirements for service providers.

Since we just saw a story today titled “Feds May Come Up With Cloud Security Standards” — why not use one they already have in SCAP to suggest we leverage it to get even better bang for the buck from a security perspective. This concept extends well beyond the Public sector and it doesn’t have to be SCAP, but it seems like a good example.

Of course we would engineer in authentication/authorization to interface via the APIs and then you could essentially get ISV’s who already support things like SCAP, etc. to provide the capability in their offerings — physical or virtual — to enable it.

We’re not reinventing the wheel and we have lots of technology and standardized solutions we can already use to engineer into the stack.