Category Archives: Modeling

This post is a farewell to Microsoft’s Oslo project. Not primarily because the name Oslo will be phased out at the next PDC as Doug Purdy reports. More importantly because Doug’s post puts Oslo so far on my back burner that I am unlikely to ever get back to it.

I have mostly been tracking Oslo (the modeling part of it, i.e. the “D” and then “M” stacks and associated tools) from the point of view of its applicability to IT management and Microsoft’s DSI effort. That made sense at first and I found some encouragements in this direction by (see the partial transcript of David Chappell’s video). But from the start there were also many signs of a very different vision for Oslo, as a software development tool with only a peripheral relationship to IT management.

As time went by it became clear that that vision was the real one and Doug’s announcement that Microsoft will “merge the Data Programmability team (EDM, EF, Astoria, XML, ADO.NET, and tools/designers) and the Oslo team (Quadrant, Repository, M) together” is the final confirmation.

M is still a very interesting piece of technology and I’d like to keep track of it, but it’s now too far removed from my focus area for me to justify paying much attention to it in the foreseeable future. Good luck to the team.

In the meantime, this brings back the question of what’s the current modeling story in the System Center team. SDM? SML? Some M-based replacement? Back to CIM? I am just asking because they had been a lot more public and proactive on this than other management vendors, but this has stopped abruptly.

[UPDATED 2009/8/28: Doug responded with enough of a teaser (“some things coming out of incubation in the management space”) to give me a justification to keep at least an eye on Oslo. At the very least I’ll keep monitoring his blog, waiting for this DSI link to emerge.]

What benefits does REST provide for configuration management (in traditional data centers and in Clouds)?

Part 1 of the “REST in practice for IT and Cloud management” investigation looked at Cloud APIs from leading IaaS providers. It examined how RESTful they are and what concrete benefits derive from their RESTfulness. In part 2 we will now look at the configuration management domain. Even though it’s less trendy, it is just as useful, if not more, in understanding the practical value of REST for IT management. Plus, as long as Cloud deployments are mainly of the IaaS kind, you are still left with the problem of managing the configuration of everything that runs of top the virtual machines (OS, middleware, DB, applications…). Or, if you are a glass-half-full person, here is another way to look at it: the great thing about IaaS (and host virtualization in general) is that you can choose to keep your existing infrastructure, applications and management tools (including configuration management) largely unchanged.

At first blush, REST is ideally suited to configuration management.

The RESTful Cloud APIs have no problem retrieving resource descriptions, but they seem somewhat hesitant in the way they deal with resource-specific actions. Tim Bray described one of the challenges in his well-considered Slow REST post. And indeed, applying REST to these “do something that may take some time and not result exactly in what was requested” scenarios is a lot less straightforward than when you’re just doing document/data retrieval. In contrast you’d think that applying REST to the task of retrieving configuration data from a CMDB or other configuration store would be a no-brainer. Especially in the IT management world, where we already have explicit resource models and a rich set of relationships defined. Let’s give each resource a URI that responds to HTTP GET requests, let’s turn the associations into hyperlinks in the resource presentation, let’s mint a MIME type to represent this format and we are out of the office in time for a 4:00PM tennis game when all the courts are available (hopefully our tennis partners are as bright as us and can get out early too). This “work smarter not harder” approach would allow us to present this list of benefits in our weekly progress report:

-1- A URI-based scheme makes the protocol independent of the resource topology, unlike today’s data stores that usually struggle to represent relationships between stores.

-2- It is simpler to code against than CIM-over-HTTP or WS-Management. It is cross-platform, unlike WMI or JMX.

-3- It makes it trivial to browse the configuration data from a Web browser (the resources themselves could provide an HTML representation based on content-type negotiation, or a simple transformation could generate it for the Web browser).

-4- You get REST-induced caching and scalability.

In the shower after the tennis game, it becomes apparent that benefit #4 is largely irrelevant for IT management use cases. That the browser in #3 would not be all that useful beyond simple use cases. That #2 is good for karma but developers will demand a library that hides this benefit anyway. And that the boss is going to say that he doesn’t care about #1either because his product is “the single source of truth” so it needs to import from the other configuration store, not reference them.

Even if we ignore the boss (once again) it only leaves #1 as a practical benefit. Surprise, that’s also the aspect that came out on top of the analysis in part 1 (see “the API doesn’t constrain the design of the URI space” highlight, reinforced by Mark’s excellent comment on the role of hypertext). Clearly, there is something useful for IT management in this “hypermedia” thing. This will largely be the topic of part 3.

There are also quite a few things that this RESTification of the configuration management store doesn’t solve:

-1- The ability to query: “show me all the WebLogic instances that run on a Windows host and don’t have patch xyz applied”. You don’t have much of a CMDB if you can’t answer this. For an analogy, remember (or imagine) a pre-1995 Web with no search engine, where you can only navigate by starting from your browser home page and clicking through static links step by step, or through bookmarks.

-2- The ability to retrieve the configuration change history and to compare configurations across resources (or to a reference configuration).

This is not to say that these two features cannot be built on top of a RESTful IT resource model. Just that they are the real meat of configuration management (rather than a simple resource-by-resource configuration browser) and that your brilliant re-architecture hasn’t really helped in addressing them. Does a RESTful foundation make these features harder to build? Not necessarily, but there are some tricky aspects to take care of:

-1- In hypermedia systems, the links are usually part of the resource representation, not resources of their own. In IT management, relationships/associations can have their own lifecycle and configuration properties.

-2- Be careful that you can really maintain the address of a resource. It’s one thing to make sure that a UUID gets maintained as a resource configuration changes, it’s another to ensure that a dereferenceable URI remains unchanged. For example, the admin server of a cluster may move over time from one node to another.

More fundamentally, the ability to deal with multiple resources at the same time and/or to use the model at different levels of granularity is often a challenge. Either you make your protocol more complex to account for this or your pollute your resource model (with a bunch of arbitrary “groups”, implicit or explicit).

We saw this in the Cloud APIs too. It typically goes something like this: you can address an individual server (called “foo”) by sending requests to http://Cloudprovider.com/server/foo. Drop the “foo” part of the URL and now you can address all the servers, for example to retrieve their configuration or possibly to reboot them. This gives me a way of dealing with multiple resources at time, but only along the lines pre-defined by the API. What if I want to deal only with the servers that host nodes of a given cluster. Sorry, not possible. What if the servers have different hosts in their URIs (remember, “the API doesn’t constrain the design of the URI space”)? Oops.

WS-Management, in the SOAP world, takes this one step further with Selectors, through which you can embed some kind of query, the result of which is what you are addressing in your message. Or, if all you want to do is GET, you can model you entire datacenter as one giant virtual XML doc (a document which is never assembled in practice) and use WSRF/WSDM’s “QueryExpression” or WS-Management’s “FragmentTransfer” to the same effect. BTW, I have issues with the details of how these mechanisms work (and I have described an alternative under the motto “if you are going to suffer with WS-Addressing, at least get some value out of it”).

These are all non-RESTful atrocities to a RESTafarian, but in my mind the Cloud REST API reviewed in part 1 have open Pandora’s box by allowing less-qualified URIs to address all instances of a class. I expect you’ll soon see more precise query parameters in these URIs and they’ll look a lot like WS-Management Selectors (e.g. http://Cloudprovider.com/server?OS=Linux&CPUType=X86). Want to take bets about when a Cloud API URI format with an embedded regex first arrives?

When you need this, my gut feeling is that you are better off not worrying too much about trying to look RESTful. There is no shame to using an RPC pattern in the right circumstances. Don’t be the stupid skier who ends up crashing in a tree because he is just too cool for the using snowplow position.

One of the most common reasons to deal with multiple resources together is to run queries such as the “show me all the WebLogic instances that run on a Windows host and don’t have patch xyz applied” example above. Such a query mechanism recently became a DMTF standard, it’s called CMDBf. It is SOAP-based and doesn’t attempt to have anything to do with REST. Not that it didn’t cross the mind of a bunch of people, lead by Michael Coté when CMDBf first emerged (read the comments too). But as James Governor rightly predicted in the first comment, Coté heard “dick” from us on this (I represented HP in CMDBf and ended up being an editor of the specification, focusing on the “query” part). I don’t remember reading the entry back then but I must have since I have been a long time Coté fan. I must have dismissed the idea so quickly that it didn’t even register with my memory. Well, it’s 2009 now, CMDBf v1 is a DMTF standard and guess what? I, and many other SOAP-the-world-till-it-shines alumni, are looking a lot more seriously into what’s in this REST thing (thus this series of posts for me). BTW in this piece Coté also correctly predicted that CMDBf would be “more about CMDB interoperation than federation” but that didn’t take as much foresight (it was pretty obvious to me from the start).

Frankly I am still not sure that there is much benefit from REST in what CMDBf does, which is mostly a query interface. Yes the CMDBf query and its response go over SOAP. Yes in this case SOAP is mostly a useless wrapper since none of the implementations will likely support any WS-* SOAP header (other than paying the WS-Addressing tax). Sure we could remove it and send plain XML over HTTP. Or replace the SOAP wrapper with an Atom wrapper. Would it be anymore RESTful? Not one bit.

And I don’t see how to make it more RESTful. There are plenty of things in the periphery the query operation that can be made RESTful, along the lines of what I described above. REST could make the discovery/reconciliation tasks of the CMDB more efficient. The CMDBf query result format could be improved so that from the returned elements I can navigate my way among resources by following hyperlinks. But the query operation itself looks fundamentally RPCish to me, just like my interaction with the Google search page is really an RPC call that happens to return a Web page full of hyperlinks. In a way, this query (whether Google or CMDBf) can at best be the transition point from RPC to REST. It can return results that open a world of RESTful requests to you, but the query invocation itself is not RESTful. And that’s OK.

In part 3 (now available), I will try to synthesize the lessons from the Cloud APIs (part 1) and configuration management (this post) and extract specific guidance to get the best of what REST has to offer in future IT management protocols. Just so you can plan ahead, in part 4 I will reform the US health care system and in part 5 I will provide a practical roadmap for global nuclear disarmament. Suggestions for part 6 are accepted.

It doesn’t look like much. But it’s an major step for application-driven IT management (and eventually BSM).

Take a SCA component. Follow the SCA-defined component-to-composite and service-to-reference relationships upwards and eventually you’ll get to top level application services that have a decent chance of mapping well to business-relevant activities (e.g. order processing). Which means that the metrics of these services (e.g. availability, response time) are likely to be meaningful and important to the line of business. Follow the same SCA relationships downward and you’ll end up (in a SCA-based infrastructure like Oracle SOA Suite 11G), with target components that are meaningful to the IT administrator. Which means that their metrics and configuration settings (like “inMemoryOptimization”) are tracked and controlled by IT. You now have a direct string of connections between this configuration setting and a business relevant metric. You can navigate the connection in both directions: downward/reactive (“my service just went down, what changed in the infrastructure”) versus upward/proactive (“my service is always slow, what can I do to optimize the execution”).

Of course these examples are over-simplistic (and the title of this post is a bit too lyrical, on account of this). Following these SCA relationships in brute-force fashion will yield tens of thousands of low-level configuration settings for any top-level service, with widely differing importance and impact (not to mention that they interact). You need rules to make sense of this. Plus, configuration-based models are a complement to runtime transaction discovery, not a replacement (unless your model of the application includes every single line of code). But it’s not that often that you can see a missing link snap into place that clearly.

What this shows is the emergence of a common set of entities between the developer’s model and the IT admin model. And if the application was developed correctly, some of the entities in the developer’s model correspond to entities in the mental model of the application user and the line of business manager. SCA is the skeleton for this. Attaching configuration to SCA components puts muscle on the bone.

The road to BSM is paved with small improvements in the semantic alignment between IT infrastructure and application services. A couple of years ago, I tried to explain why SCA is very relevant for IT management. Now we can see it.

I have given up, at least for now, on understanding what Microsoft wants Oslo (and more specifically the “M” part) to be. I used to pull my hair reading inconsistent articles and interviews about what M tries to be (graphical programming! DSL! IT models! generic parser! application components! workflow! SOA framework! generic data layer! SQL/T-SQL for dummies! JSON replacement! all of the above!). Douglas Purdy makes a valiant 4-part effort (1, 2, 3, 4) but it’s still not crisp enough for my small brain. Even David Chapell, explainer extraordinaire, seems to throw up his hands (“a modeling platform that can be applied in lots of different ways”, which BTW is the most exact, if vague, description I’ve heard). Rather than articles, I now mainly look at the base specifications and technical documents that show what it actually is. That’s what I did when the Oslo SDK first came out last year. A new technical document came out recently, an update to the MGraph Object Model so I took another a look.

And it turns out that MGraph is… RDF. Or rather, “RDF minus entailment”. And with turtle as the base representation rather than an add-on.

Look at section 3 (“RDF concepts”) in this table of content from W3C. It describes the core RDF concepts. Keep the first five concepts (sections 3.1 to 3.5) and drop the last one (“3.6: Entailment”). You have MGraph, a graph-oriented object model.

On top of this, the RDF community adds reasoning capabilities with RDF entailment, RDFS, OWL, SWRL, SPIN, etc and a variety of engines that implement these different levels of reasoning.

Microsoft, on the other hand, seems to ignore that direction. Instead, it focuses on creating a good mapping from this graph object model to programming languages. In two directions:

from programming languages to the graph model: they make it easy for you to create a domain-specific language (DSL) that can easily be turned into M instances.

from the graph model to programming languages: they make it easy for you to work on these M instances (including storing them) using the .NET technology stack.

So, if Microsoft is indeed reinventing RDF as the title of this entry provocatively suggests, then they are taking an interesting detour on the way. Rather than going straight to “model-based inferencing”, they are first focusing on mapping the core MGraph concepts to programming (by regular developers) and user interactions (with regular users). Something that for a long time had not gotten much attention in the RDF world beyond pointing developers to Jena (though it seemed to have improved over the last few years with companies like TopQuadrant; ironically, the Oslo model browser/editor is code-named “Quadrant”).

Whether the Oslo team sees the inferencing fun as a later addition or something that’s not needed is another question, on which I don’t see any hint at this point.

I hope they eventually get to it. But I like the fact that they cleanly separate the ability to represent and manipulate the graph model from the question of whether instances can be inferred. We could use such a reusable graph representation mechanism. Did CMDBf, for example, really have to create a new graph-oriented metamodel and query language? I failed to convince the group to adopt RDF/SPARQL, but I may have been more successful if there had been a cleanly-separated “static” version of RDF/SPARQL, a way to represent and query a graph independently of whether the edges and nodes in the graph (and their types) are declared or inferred. Instead, the RDF stack has entailment deeply embedded and that’s very scary to many.

But as much as I like this separation, I can’t help squirming when I see the first example in the MGraph document:

That last line is an eyesore to anyone who has been anywhere near RDF. I have just declared that Rich and Jenn are one another’s spouse, why do I have to add a line that says that they have spouses? What I want is to say that participation in a “Spouse” relationship entails membership in the “hasSpouse” class. And BTW, I also want to mark the “Spouse” relationship as symmetric so I only have to declare it one way and the inverse can be inferred.

So maybe I don’t really know what I want on this. I want the graph model to be separated from the inference logic and yet I want the syntactic simplicity that derives from base entailments like the example above. Is June too early to start a Christmas wish list?

While I am at it, can we please stop putting people’s ages in the model rather than their dates of birth? I know it’s just an example, but I see it over and over in so many modeling examples. And it’s so wrong in 99% of cases. It just hurts.

There are other things about MGraph edges that look strange if you are used to RDF. For example, edges can be labeled or not, as illustrated on this first example of the graph model:

In this example, “Age” is a labeled edge that points to the atomic node “42”, while the credit score is modeled as a non-atomic node linked from the person via an unlabeled edge. Presumably the “credit score” node is also linked to an atomic node (not shown) that contains the actual score value (e.g. “800”). I can see why one would want to call out the credit score as a node rather than having an edge (labeled “credit score”) that goes to an atomic node containing the actual credit score value (similar to how “age” is handled). For one thing, you may want to attach additional data to that “credit score” node (when was it calculated, which reporting agency provided it, etc) so it helps to have it be a node. But making this edge unlabeled worries me. Originally you may only think of one possible relationship type between a person and a credit score (the person has a credit score). But other may pop up further down the road, e.g. the person could be a loan agent who orders the credit score but the score is about a customer. So now you create a new edge label (“orders”) to link the loan agent person to the credit score. But what happens to all the code that was written previously and navigates the relationship from the person to the score with the expectation that the score is about the person. Do you think that code was careful to only navigate “unlabeled” edges? Unlikely. Most likely it just grabbed whatever credit score was linked to the person. If that code is applied to a person who happens to also be a loan agent, it might well grab a credit score about other people which happened to be ordered by the loan agent. These unlabeled edges remind me of the practice of not bothering with a “version” field in the first version of your work because, hey, there is only one version so far.

The restriction that a node can have at most one edge with a given label coming out of it is another one that puzzles me. Though it may explain why an unlabeled edge is used for the credit score (since you can get several credit scores for the same person, if you ask different rating agencies). But if unlabeled edges are just a way to free yourself from this restriction then it would be better to remove the restriction rather than work around it. Let’s take the “Spouse” label as an example. For one thing in some countries/cultures having more than one such edge might be possible. And having several ex-spouses is possible in many places. Why would the “ex-spouse” relationship have to be defined differently from “spouse”? What about children? How is this modeled? Would we be forced to have a chain of edges from parent to 1st child to next sibling to next sibling, etc? Good luck dealing with half-siblings. And my model may not care so much about capturing the order (especially if the date of birth is already captured anyway). This reminds me of how most XML document formats force element order in places where it is not semantically meaningful, just because of XSD’s bias towards “sequence”.

Having started this entry by declaring that I don’t understand what M tries to be, I really shouldn’t be criticizing its design choices. The “weird” aspects I point out are only weird in the context of a certain usage but they may make perfect sense in the usage that the Oslo team has in mind. So I’ll stop here. The bottom line is that there are traces, in M, of a nice, reusable, graph-oriented data model with strong bridges (in both directions) to programming languages and user interfaces. That is appealing to me. There are also some strange restrictions that puzzle me. We’ll see where this goes (hopefully this article, “Designing Domains and Models Using M” will soon contain more than “to be submitted” and I can better understand the M approach). In any case, kuddos to the team for being so open about their work and the evolution of their design.

Before rushing to standardize “Cloud APIs”, let’s take a look back at the previous attempt to tackle the same problem, which is one of IT management integration and automation. I am referring to the definition of specifications that attempted to use the then-emerging SOAP-based Web services framework to easily integrate IT management systems and their targets.

Leaving aside the “Cloud” spin of today and the “Web services” frenzy of yesterday, the underlying problem remains to provide IT services (mostly applications) in a way that offers the best balance of performance, availability, security and economy. Concretely, it is about being able to deploy whatever IT infrastructure and application bits need to be deployed, configure them and take any required ongoing action (patch, update, scale up/down, optimize…) to keep them humming so customers don’t notice anything bothersome and you don’t break any regulation. Or rather so that any disruption a customer sees and any mandate you violate cost you less than it would have cost to avoid them.

The realization that IT systems are moving more and more towards distributed/connected applications was the primary reason that pushed us towards the definition of Web services protocols geared towards management interactions. By providing a uniform and network-friendly interface, we hoped to make it convenient to integrate management tasks vertically (between layers of the IT stack) and horizontally (across distributed applications). The latter is why we focused so much on managing new entities such as Web services, their execution environments and their conversations. I’ll refer you to the WSMF submission that my HP colleagues and I made to OASIS in 2003 for the first consistent definition of such a management framework. The overview white paper even has a use case called “management as a service” if you’re still not convinced of the alignment with today’s Cloud-talk.

Of course there are some differences between Web service management protocols and Cloud APIs. Virtualization capabilities are more advanced than when the WS effort started. The prospect of using hosted resources is more realistic (though still unproven as a mainstream business practice). Open source component are expected to play a larger role. But none of these considerations fundamentally changes the task at hand.

Let’s start with a quick round-up and update on the most relevant efforts and their status.

Protocols

WSMF (Web Services Management Framework): an HP-created set of specifications, submitted to the OASIS WSDM working group (see below). Was subsumed into WSDM. Not only a protocol BTW, it includes a basic model for Web services-related artifacts.

WS-Manageability: An IBM-led alternative to parts of WSDM, also submitted to OASIS WSDM.

WSDM (Web Services Distributed Management): An OASIS technical committee. Produced two standards (a protocol, “Management Using Web Services” and a model of Web services, “Management Of Web Services”). Makes use of WSRF (see below). Saw a few implementations but never achieved real adoption.

OGSI (Open Grid Services Infrastructure): A GGF (the organization now known as OGF) standard to provide a service-oriented resource manipulation infrastructure for Grid computing. Replaced with WSRF.

WSRF: An OASIS technical committee which produced several standards (the main one is WS-ResourceProperties). Started as an attempt to align the GGF/OGSI approach to resource access with the IT management approach (represented by WSDM). Saw some adoption and is currently quietly in use under the cover in the GGF/OGF space. Basically replaced OGSI but didn’t make it in the IT management world because its vehicle there, WSDM, didn’t.

WS-Management: A DMTF standard, based on a Microsoft-led submission. Similar to WSDM in many ways. Won the adoption battle with it. Based on WS-Transfer and WS-Enumeration.

WS-ResourceTransfer (aka WS-RT): An attempt to reconcile the underlying foundations of WSDM and WS-Management. Stalled as a private effort (IBM, Microsoft, HP, Intel). Was later submitted to the W3C WS-RA working group (see below).

WSRA (Web Services Resource Access): A W3C working group created to standardize the specifications that WS-Management is built on (WS-Transfer etc) and to add features to them in the form of WS-RT (which was also submitted there, in order to be finalized). This is (presumably) the last attempt at standardizing a SOAP-based access framework for distributed resources. Whether the window of opportunity to do so is still open is unclear. Work is ongoing.

WS-ResourceCatalog : A discovery helper companion specification to WS-Management. Started as a Microsoft document, went through the “WSDM/WS-Management reconciliation” effort, emerged as a new specification that was submitted to DMTF in May 2007. Not heard of since.

CMDBf (Configuration Management Database Federation): A DMTF working group (and soon to be standard) that mainly defines a SOAP-based protocol to query repositories of configuration information. Not linked with (or dependent on) any of the specifications listed above (it is debatable whether it belongs in this list or is part of a new breed).

Modeling

DCML (Data Center Markup Language): The first comprehensive effort to model key elements of a data center, their relationships and their policies. Led by EDS and Opsware. Never managed to attract the major management vendors. Transitioned to an OASIS member section and died of being ignored.

SDM (System Definition Model): A Microsoft specification to model an IT system in a way that includes constraints and validation, with the goal of improving automation and better linking the different phases of the application lifecycle. Was the starting point for SML.

SML (Service Modeling Language): Currently a W3C “proposed recommendation” (soon to be a recommendation, I assume) with the same goals as SDM. It was created, starting from SDM, by a consortium of companies that eventually submitted it to W3C. No known adoption other than the Eclipse COSMOS project (Microsoft was supposed to use it, but there hasn’t been any news on that front for a while). Technically, it is a combination of XSD and Schematron. It appears dead, unless it turns out that Microsoft is indeed using it (I don’t know whether System Center is still using SDM, whether they are adopting SML, whether they are moving towards M or whether they have given up on the model-centric vision).

CML (Common Model Library): An effort by the SML authors to create a set of model elements using the SML metamodel. Appears to be dead (no news in a long time and the cml-project.org domain name that was used seems abandoned).

SDD (Solution Deployment Descriptor): An OASIS standard to define a packaging mechanism meant to simplify the deployment and configuration of software units. It is to an application archive what OVF is to a virtual disk. Little adoption that I know of, but maybe I have a blind spot on this.

OVF (Open Virtualization Format): A recently released DMTF standard. Defines a packaging and descriptor format to distribute virtual machines. It does not defined a common virtual machine format, but a wrapper around it. Seems to have some momentum. Like CMDBf, it may be best thought of as part of a new breed than directly associated with WS-Management and friends.

This is not an exhaustive list. I have left aside the eventing aspects (WS-Notification, WS-Eventing, WS-EventNotification) because while relevant it is larger discussion and this entry to too long already (see here and here for some updates from late last year on the eventing front). It also does not cover the Grid work (other than OGSI/WSRF to the extent that they intersect with the IT management world), even though a lot of the work that took place there is just as relevant to Cloud computing as the IT management work listed above. Especially CDDLM/CDL an abandoned effort to port SmartFrog to the then-hot XML standards, from which there are plenty of relevant lessons to extract.

The lessons

What does this inventory tell us that’s relevant to future Cloud API standardization work? The first lesson is that protocols are easy and models are hard. WS-Management and WSDM technically get the job done. CMDBf will be a good query language. But none of the model-related efforts listed above seem to have hit the mark of “doing the job”. With the possible exception of OVF which is promising (though the current expectations on it are often beyond what it really delivers). In general, the more focused and narrow a modeling effort is, the more successful it seems to be (with OVF as the most focused of the list and CML as the other extreme). That’s lesson learned number two: models that encompass a wide range of systems are attractive, but impossible to deliver. Models that focus on a small sub-area are the way to go. The question is whether these specialized models can at least share a common metamodel or other base building blocks (a type system, a serialization, a relationship model, a constraint mechanism, etc), which would make life easier for orchestrators. SML tries (tried?) to be all that, with no luck. RDF could be all that, but hasn’t managed to get noticed in this context. The OVF and SDD examples seems to point out that the best we’ll get is XML as a shared foundation (a type system and a serialization). At this point, I am ready to throw the towel on achieving more modeling uniformity than XML provides, and ready to do the needed transformations in code instead. At least until the next window of opportunity arrives.

I wish that rather than being 80% protocols and 20% models, the effort in the WS-based wave of IT management standards had been the other way around. So we’d have a bit more to show for our work, for example a clear, complete and useful way to capture the operational configuration of application delivery services (VPN, cache, SSL, compression, DoS protection…). Even if the actual specification turns out to not make it, its content should be able to inform its successor (in the same way that even if you don’t use CIM to model your server it is interesting to see what attributes CIM has for a server).

It’s less true with protocols. Either you use them (and they’re very valuable) or you don’t (and they’re largely irrelevant). They don’t capture domain knowledge that’s intrinsically valuable. What value does WSDM provide, for example, now that’s it’s collecting dust? How much will the experience inform its successor (other than trying to avoid the WS-Addressing disaster)? The trend today seems to be that a more direct use of HTTP (“REST”) will replace these protocols. Sure. Fine. But anyone who expects this break from the past to be a vaccination against past problems is in for a nasty surprise. Because, and I am repeating myself, it’s the model, stupid. Not the protocol. Something I (hopefully) explained in my comments on the Sun Cloud API (before I knew that caring about this API might actually become part of my day job) and something on which I’ll come back in a future post.

Another lesson is the need for clear use cases. Yes, it feels silly to utter such an obvious statement. But trust me, standards groups still haven’t gotten this. It’s not until years spent on WSDM and then WS-Management that I realized that most people were not going after management integration, as I was, but rather manageability. Where “manageability” is concerned with discovering and monitoring individual resources, while “management integration” is concerned with providing a systematic view of the environment, with automation as the goal. In other words, manageability standards can allow you to get a traditional IT management console without the need for agents. Management integration standards can allow you to coordinate your management systems and automate their orchestration. WS-Management is for manageability. CMDBf is in the management integration category. Many of the (very respectful and civilized) head-butting sessions I engaged in during the WSDM effort can be traced back to the difference between these two sets of use cases. And there is plenty of room for such disconnect in the so-loosely-defined “Cloud” world.

We have also learned (or re-learned) that arbitrary non-backward compatible versioning, e.g. for political or procedural reasons as with WS-Addressing, is a crime. XML namespaces (of the XSD and WSDL types, as well as URIs used in similar ways in specifications, e.g. to identify a dialect or profile) are tricky, because they don’t have backward compatibility metadata and because of the practice to use organizations domain names in the URI (as opposed to specification-specific names that can be easily transferred, e.g. cmdbf.org versus dmtf.org/cmdbf). In the WS-based management world, we inherited these problems at the protocol level from the generic WS stack. Our hands are more or less clean, but only because we didn’t have enough success/longevity to generate our own versioning problems, at the model level. But those would have been there had these models been able to see the light of day (CML) or see adoption (DCML).

There are also practical lessons that can be learned about the tactics and strategies of the main players. Because it looks like they may not change very much, as corporations or even as individuals. Karla Norsworthy speaks for IBM on Cloud interoperability standards in this article. Andrew Layman represented Microsoft in the post-Manifestogate Cloud patch-up meeting in New York. Winston Bumpus is driving the standards strategy at VMWare. These are all veterans of the WS-Management, WSDM and related wars collaborations (and more generally the whole WS-* effort for the first two). For the details of what there is to learn from the past in that area, you’ll have to corner me in a hotel bar and buy me a few drinks though. I am pretty sure you’d get your money’s worth (I am not a heavy drinker)…

In summary, here are my recommendations for standardizing Cloud API, based on lessons from the Web services management effort. The theme is “focus on domain models”. The line items:

Have clear goals for each effort. E.g. is your use case to deploy and run an existing application in a Cloud-like automated environment, or is it to create new applications that efficiently take advantage of the added flexibility. Very different problems.

If you want to use OVF, then beef it up to better apply to Cloud situations, but keep it focused on VM packaging: don’t try to grow it into the complete model for the entire data center (e.g. a new DCML).

Complement OVF with similar specifications for other domains, like the application delivery systems listed above. Informally try to keep these different specifications consistent, but don’t over-engineer it by repeating the SML attempt. It is more important to have each specification map well to its domain of application than it is to have perfect consistency between them. Discrepancies can be bridged in code, or in a later incarnation.

As you segment by domain, as suggested in the previous two bullets, don’t segment the models any further within each domain. Handle configuration, installation and monitoring issues as a whole.

Don’t sweat the protocols. HTTP, plain old SOAP (don’t call it POS) or WS-* will meet your need. Pick one. You don’t have a scalability challenge as much as you have a model challenge so don’t get distracted here. If you use REST, do it in the mindset that Tim Bray describes: “If you’re going to do bits-on-the-wire, Why not use HTTP? And if you’re going to use HTTP, use it right. That’s all.” Not as something that needs to scale to Web scale or as a rebuff of WS-*.

Beware of versioning. Version for operational changes only, not organizational reasons. Provide metadata to assert and encourage backward compatibility.

This is not a recipe for the ideal result but it is what I see as practically achievable. And fault-tolerant, in the sense that the failure of one piece would not negate the value of the others. As much as I have constrained expectations for Cloud portability, I still want it to improve to the extent possible. If we can’t get a consistent RDF-based (or RDF-like in many ways) modeling framework, let’s at least apply ourselves to properly understanding and modeling the important areas.

In addition to these general lessons, there remains the question of what specific specifications will/should transition to the Cloud universe. Clearly not all of them, since not all of them even made it in the “regular” IT management world for which they were designed. How many then? Not surprisingly (since IBM had a big role in most of them), Karla Norsworthy, in the interview mentioned above, asserts that “infrastructure as a service, or virtualization as a paradigm for deployment, is a situation where a lot of existing interoperability work that the industry has done will surely work to allow integration of services”. And just as unsurprisingly Amazon’s Adam Selipsky, who’s company has nothing to with the previous wave but finds itself in leadership position WRT to Cloud Computing is a lot more circumspect: “whether existing standards can be transferred to this case [of cloud computing] or if it’s a new topic is [too] early to say”. OVF is an obvious candidate. WS-Management is by far the most widely implemented of the bunch, so that gives it an edge too (it is apparently already in use for Cloud monitoring, according to this press release by an “innovation leader in automated network and systems monitoring software” that I had never heard of). Then there is the question of what IBM has in mind for WS-RT (and other specifications that the WS-RA working group is toiling on). If it’s not used as part of a Cloud API then I really don’t know what it will be used for. But selling it as such is going to be an uphill battle. CMDBf is a candidate too, as a model-neutral way to manage the configuration of a distributed system. But here I am, violating two of my own recommendations (“focus on models” and “don’t isolate config from other modeling aspects”). I guess it will take another pass to really learn…

[UPDATED 2009/5/7: Senior moment! When writing this entry I forgot that I wrote an earlier entry (in late 2007) specifically to describe the difference between “manageability” and “management integration”. So here it is, if you care for more details on this topic.]

Version 10.2.0.5 of OCAMM (Oracle Composite Application Monitor and Modeler) was recently released. This is the ex QuickVision product from the ClearApp acquisition last year. It is doing very well in the Enterprise Manager portfolio. In this new release, it has grown to cover additional middleware and application products. So now, in addition to the existing support (J2EE, BPEL, Portal…), it lets you model and monitor deployments of the Oracle Service Bus (OSB) as well as AIA process integrations. Those metadata-rich environments fit very naturally in the OCAMM approach of discovering the distributed model from metadata and then collecting and reporting usage metrics across the whole system in the context of that model.

For a complete list of improvements over release 10.2.0.4.2, refer to the release notes, where we see this list of new features:

Oracle Service Bus Support: Oracle Service Bus (formerly called Aqualogic Service Bus) versions 2.6, 2.6.1, 3.0, and 10gR3 are now supported.

I am glad to see that, as it inches towards standardization in the DMTF, the CMDBf specification is getting more visibility. Forrester’s Glenn O’Donnell recently wrote very positively about it on his blog, presenting it as a key enabler for a federation of MDRs (Management Data Repositories, a term introduced by the CMDBf specification so don’t look for it in ITIL). He argues this is the only way (rather than a single data store) to fulfill the ITIL-defined role of a CMS. Rob England (the IT Skeptic) has also shared his thoughts about CMDBf and they were noticeably less enthusiastic, to say the least. While Glenn calls the specification “profound”, Rob calls it “the most over-hyped vendor marketing smokescreen ever”. There is plenty of room in between them, which is where I sit. As I explained before, it does have real value (as a query language/protocol for system integration) but is nowhere near providing “federation” capabilities.

I am happy to see Glenn approve of CMDBf and I agree with him that accurate specialized MDRs are more useful than a single store that attempts to capture all the relevant data. As Glenn puts it, “pockets of the truth are far superior to unified ambiguity”. But I wasn’t very comfortable with the tone of his article, which seemed to almost encourage the proliferation of these MDRs. Maybe he was just trying to present a clean break with the “one big CMDB” approach and overreached. Or maybe I am just not reading properly.

Because while I agree that the answer is not “one and only one store” I also don’t want to loose the value of having as much unification of the IT model as possible. Both at the data level (i.e. same metamodel/model, consistent retention/roll-up policies…) and the access level (i.e. in the same physical store, with shared access control, accessible using a well-known DSL for data manipulation…). Metamodel transformation and model bridging are costly (in accuracy, maintenance, reliability). If your CMS does more than just support a “model navigation” GUI it may then need to run large queries that go across several portions of your IT model, including multiple different domains (e.g. a compliance rule kicked-off at the app level based on the type of data it manipulates that ends up having to look at the physical location of the servers running the hypervisors for the virtual machines that power the app). Through such global queries you can apply configuration rules, do impact analysis, event correlation, provide context to your transaction tracing, etc. No consolidation means no such queries (or a very limited subset). Considering the current state of federation, there is a lot more that you can do with your CMS if you have a very small number of MDRs rather than a sea of “federated” MDRs. This is why, as Oracle acquires IT management companies, we deliberately integrate their repositories with Enterprise Manager.

[UPDATED 2009/4/8: More, along the same line, from Glenn and his co-author Carlos Casanova available here. And my CMDBf partner-in-crime Van Wiles also responded to Glenn, bringing a BMC perspective.]

The mini-scandal of last week was the manifesto-gate. The mini-scandal of this week is shaping out to be the Ulitzer-gate (if you want to make sure not to miss next week’s IT scandal, subscribe to the Register feed, ferreting these out and adding a bass-heavy soundtrack is their specialty).

Turns out I am one of these Ulitzer “unaware authors” through two articles I wrote a while ago for the Web services Journal, a paper publication by Sys-con (based on a request from HP PR) and a blog post I allowed Sys-con to republish. Looks like Ulitzer and Sys-con are one and the same. Three articles, spaced two years apart. That’s enough to earn me a dedicated home page at Ulitzer and a rank of 1,000 among their more than 6,000 authors. Makes you wonder how much the 5,000 “authors” behind me have (unknowingly) produced… Whatever. At least it’s all content that I authorized Sys-con to use, not something that was lifted from my blog as apparently happened to others.

Turns out the oldest of these articles (“From Web Services Management to Utility Computing” , from 2004) is not that different from the recently-published (and amply maligned) Open Cloud Manifesto. I described my article at the time as “an attempt to explain how the different efforts going on in the industry around Web services, grid, SOA management, virtualization, utility computing, <insert your favorite buzzword>, fit together to provide organizations with the flexibility and efficiency they need from their IT in order to thrive.”

It ends with “while it would be easier to develop an end-to-end model specific to one company’s offering, standardization allows the integration of the management capabilities of all the components that compose enterprise services. We must keep the pressure on vendors to deliver modular and composable specifications (for format, function, and protocol) that expose management capabilities of infrastructure services, applications, and business processes in such a way that these capabilities can be composed by the next generation of management applications.”

Sure it has a lot more emphasis on WS-* specs than is compatible with the current zeitgeist, and it uses the now-obsolete term of “utility computing” rather than the nebulous alternative currently en vogue, but isn’t the main message there?

Just to be clear, I am not laying pretentious claims of prescience and vision (at least not in this entry). There are plenty of documents (e.g. from the Grid community) that make the same points in more eloquent terms and starting many years prior. It’s just fun to see this link from today’s scandal to the one from last week.

for old time sake, here is the content of the 2004 article:

From Web Services Management to Utility Computingby William Vambenepe

Enterprise services are created by combining infrastructure services, applications, and business processes. To be able to adapt quickly to business changes, enterprise IT must evolve from management of individual resources to management of interrelated services. This will be achieved through the development of composable and modular standards that expose the management capabilities of the building blocks of enterprise services. The Web services platform is an enabler of this transformation: a Web services-based management infrastructure provides a channel that is appropriate for dynamic resource provisioning, allocation, and configuration – often called utility computing.

We can consider this management infrastructure as a four-layered architecture. Starting at the foundation layer, the work on the base Web services infrastructure is far from over. First, until WSDL 2.0 is widely deployed, designers have to compose around the deficiencies of WSDL 1.1, such as the lack of portType inheritance. Second, there is still no standard for referencing Web services. Finally, key specifications such as WSRF (Web Services Resource Framework) and WSN (Web Services Notification), without which people were left to reinvent Web services interfaces to access stateful resources, have only recently reached the standards community. These issues are being resolved and a set of building blocks for accessing resources through an SOA (service-oriented architecture) is shaping up. It is critical that these building blocks be modular and composable to allow incremental adoption and separation of concerns.

Moving from the foundation to the management protocol layer, the OASIS WSDM (Web Services Distributed Management) technical committee, through its MUWS (Management Using Web Services) specification, is the key articulation point between the base Web services architecture and utility computing. Both the IT management community and the Grid community rely on MUWS. It defines how to express and exercise manageability capabilities through Web services, putting in place a management channel that is more interoperable and accessible than ever before.

Next is the modeling layer. Information models need to be composed so that a service can be represented based on the services that it is assembled from, be they peer or infrastructure services. Since these will be described by different models, the management channel (MUWS) needs to be model-agnostic in order to support a model-centric architecture. For example, CIM (Common Information Model) is a model that focuses on concrete resources. The DMTF WS-CIM subgroup must now open CIM to the Web services platform by developing a standard way to expose CIM-modeled resources through MUWS. Other models provide representations for service security, service-level agreements (SLA), etc. Only by composing these models will, for example, an auction service SLA be adequately managed as it depends on a combination of the performance of the servers on which the service runs, the application server that hosts it, the other services (authentication, billing, etc.) that it makes use of, and the business process engine that controls the bidding. Once this model-centric architecture is in place, management actions can be policy-driven through explicit constraints.

Finally, at the top layer, the architecture includes a set of common services for utility computing. They are being defined collaboratively by DMTF (Utility Computing working group) and GGF (OGSA working group).

All the pieces are falling into place but much remains to be done to allow comprehensive management of enterprise services in a model-centric way through Web services standards. While it would be easier to develop an end-to-end model specific to one company’s offering, standardization allows the integration of the management capabilities of all the components that compose enterprise services. We must keep the pressure on vendors to deliver modular and composable specifications (for format, function, and protocol) that expose management capabilities of infrastructure services, applications, and business processes in such a way that these capabilities can be composed by the next generation of management applications. These applications will use this to synchronize business and IT and to capitalize on change.

The recent announcement of the Sun Cloud, and more specifically its API is a good occasion to think about how much simplicity we really want in our datacenter automation mechanisms. The Sun API is very simple and its authors are proud of that fact. Indeed they should be proud of avoiding unneeded complexity. They have probably also kept out (at least so far), some needed complexity.

First, let’s focus on the important part:

It’s not REST that matters, it’s the rest

Most of the comments on the API focus on the fact that it’s RESTful. The authoritative source on this is Tim Bray’s description of the API, which he helped shape. But Tim is very down-to-earth about the reasons to use REST:

Why REST? · It’s a sensible question. The chief virtue of RESTful interfaces is massive scaling. But gimme a break, these are data-center management operations; a typical transaction frequency would be a single-digit number per week, with the single digit often being “0”, and it wouldn’t be surprising if a big multi-cluster staged-boot operation had a latency of minutes. The data-center controls are unlikely to be a bottleneck.

Why, then? Simply because we wanted a bits-on-the-wire interface. APIs, in the general case, suck; and are really hard to make portable. Bits-on-the-wire are ultimately flexible and interoperable. If you’re going to do bits-on-the-wire, Why not use HTTP? And if you’re going to use HTTP, use it right. That’s all.

The use of REST is not a fundamental characteristic of the API. In other words, if this API turns out to be useful I can rewrite it as a SOAP API and it would still be useful. Unless the SOAP API is made purposely complicated, it would only be marginally harder to use, not fundamentally less useful.

In fact, we may find out. If the rumor is confirmed and IBM decides to Tivolify (rather than kill) the Sun Cloud, the whole thing can be refactored as WS-RT/XML/XQuery (and maybe WS-ResourceCatalog) in five days, four of which would be spent capturing, sedating and restraining Tim Bray (and his “spec machete”) with the last one used for coding.

In the case of the Sun Cloud API, REST makes the API simpler in the same way that a keyless system makes a car easier to operate. You don’t have to fumble for they key, but you still need to know to parallel park, change a tire and operate the stereo.

By using REST, the Sun team has kept away some arbitrary complexity (e.g. fine-grained PUT; instead Sun decides what are the two valid sets of input parameters to create a cluster). But that’s only a small percentage of the potential complexity of the system. Not to mention that most developer will use libraries rather than on-the-wire protocols so they won’t see any difference. Instead, the real deal is:

The model

By “the model” I mean both the resource model and the capabilities of the resources. For capabilities, I don’t care whether a virtual machine can be started via an HTTP GET request on a URL that ends with ?control=start, or via a SOAP message with the wsa:Action header set to http://iloveclouds.com/vm/start or via an RPC call to a Start(…) method. I just care that the model includes the capability to start a VM. And the list of states a VM can be in.

Look at a datacenter today. Make an inventory of all the networking equipment, storage, servers, hypervisors, operating systems and infrastructure services that it contains. Consider all the configuration settings of all these resources (as they would be represented in a complete, authoritative and consistent CMDB, that most elusive creature). Add to it all the controls and APIs they expose. That’s a lot of data, even if you don’t consider the applications layer. That’s a few orders of magnitude larger than what the model in the Sun Cloud API can describe. That gap (between our CMDB model and the Sun Cloud model) is what we should look at and analyze. Why are they so far apart? How big is the ideal datacenter automation and virtualization model?

Among other things, these hundreds of configuration settings in your current datacenter are used to optimize deployments. No-one would miss the pain of dealing with the optimizations if they went away, but we would miss the performance benefits they bring. So what replaces them if the model is too simple to support any tweak? Is the infrastructure behind the API auto-optimized, based on actual application patterns? Now that would be real progress towards simplicity and may allow us to rely on an API as simple as the Sun API. But the industry has been trying to do this with little success for a long time. I expect incremental, not radical, progress on this. Alternatively, does Cloud Computing change the economics to the point where performance optimizations through configurations are no longer cost-efficient, where scaling out is the answer? Hard to make this a general statement, considering how difficult it remains for many applications to scale out. And this sounds very SUV-like in these footprint-aware times (we see how well the “stretch the hood and add two cylinders to the engine” approach worked for Detroit).

Sun might very well have this covered under the hood. But I don’t know that I want to assume that they have an auto-optimizing system just because they produced an API that would benefit from having it underneath.

Not to mention that not all configuration tweaks have to do with performance optimization. Some of them are driven by licensing, organizational, risk and compliance considerations. If auto-detecting an application performance profile is hard, try auto-detecting its regulatory requirements.

Complexity with a purpose

The right place to be, between the “omniscient CMDB model” and the “Sun Cloud model” is somewhere in the middle, with a couple of incrementally complex layers. Of course they are so far apart that saying “somewhere in the middle” is a cope-out. The current level of complexity is very hard to manage by humans (assisted by processes and tools, e.g. ITIL) and impossible to really automate. A lot of the complexity and variability is arbitrary rather than flexibility-inducing. We need to reduce this (all-out standardization is one way, stack integration is another). But the simplicity of the model in the Sun Cloud API is too extreme. Look at Amazon EC2. Everyone lauds the simplicity of the APIs and everyone, in the same breath, asks for more options (different instance types, availability zones, reserved instances…). Amazon (and Sun too, I assume) is taking the eminently rational approach of starting from simple and adding complexity (sorry, flexibility) as needed. That’s great. Just don’t get too enamored with the initial simplicity.

[UPDATED 2009/3/20: James Governor lauds the simplicity of Amazon’s cloud offering. If I understand him correctly, he sees simplicity as coming not just from “few options” but also from backward compatibility with current app infrastructure. That second part is what William Louth criticizes in his comment below. At the very least I like to keep the two separated: “how intrinsicly simple is it” and “how backward compatible is it” even though both can be seen as providing the benefit of simplicity.]

The DMTF CMDBf working group has recently published an updated draft of its specification. The final version should follow soon and I don’t expect major changes so now is not a bad time to start thinking about what this baby can do.

Since CMDBf stands for “configuration management database federation”, you might think the obvious answer to the “what can it do” question is “build a federation of configuration management databases”. Except it’s not. Despite its name, CMDBf provides little support for federation unless you take a very loose definition of the term. The specification gives you a query language and a very simple registration interface, with a sprinkle of metadata to improve interoperability. The query language lets you talk to a CMDB to retrieve information on configuration items (CIs) that it knows about. The registration interface lets you keep a CMDB informed of changes to CIs that it may care about. If you want to build on top of this a real federation, one that scales to the type of environment that CMDBs are used for today, you have to go further than what the specification provides. What CMDBf does give you is some amount of integration between CMDBs (at the protocol level at least, not at the model level). It may not sound like much but it is a lot of progress on the current situation and the right incremental step, whether you are aiming for true federation as the end goal or not.

That’s the “a lot less than you think” part. So, what’s the “a lot more than you think” part? Good stuff all around:

CMDBf provides a metamodel that is well-suited for complex IT systems and it provides an elegant graph-oriented query language on top of it. The most convenient representation for an IT system is neither “one big XML document” nor “a sea of nodes and edges”. CMDBf gives you a middle ground: a graph model with XML leaf nodes. So you can precisely model the relationships between your IT elements using explicit relationships (with their own records), but you can also attach a well-understood piece of XML to an item as a record without having to break that XML into a bunch of tiny relationships.

I am pretty sure there are other domains, beyond IT systems, for which this would be useful. It will be interesting to see if the CMDBf specification gets considered outside of its intended scope. But these domains are more likely to end up using RDF/OWL/SPARQL instead. Not everyone has made the leap from XML as a tool to XML as a religion, which made CMDBf necessary for us. But let’s not veer into another rant.

Let’s go back instead to describing how useful CDMBf can be to IT systems management, independently of any “federation” objective. Let me put it this way: if one was to create from scratch a configuration store for IT systems they should strongly consider the CMDBf conceptual model as the base metamodel. And something along the lines of the CMDBf Query (though not necessarily through its XML serialization) as the native query language for it. Most CMDBf implementers of course are not in this situation. Rather than writing the store from scratch they will create a CMDBf wrapper/interface on their current CMDB. And that’s fine too. CMDBf will work well as an interoperability protocol. Putting aside my gripes about XPath overuse, CMDBf strikes a reasonable balance that makes it implementable on top of any back-end technology (relational, XML, RDF, in-memory objects, bags of name-value pairs…). And the query patterns it supports map well to CMDB-to-CMDB integration use cases. But it is underselling it, in my view, to restrict it to this over-the-wire interoperability scenario. CMDBf also provides a very useful foundation for local access to the CMDB. CMDBf graph queries can support powerful visualization of the content of the CMDB. They can support the definition of configuration rules. They can support in-depth inspection of relationships (e.g. fault tree).

And that may jsut be the beginning. It could take three directions after v1:

The first one, as always for a standard, is that it is ignored and becomes irrelevant. I have to reluctantly list this one first, because it is statistically the most likely for a new standard. Especially one that is not a ratification of an existing de facto standard. And one that threatens an important control point for vendors. A slight variation on this scenario is for CMDBf to succeed from a marketing perspective, as a checkmark that most vendors tick, but not as a true technology. This is the “smokescreen” scenario from Mr. Skeptic. One scenario that worries me is that CMDBf could fail because of the poor models of the CMDBs that implement it. If your IT model is not granular enough or if it matches the UI of your application more than the semantics of the IT components, then CMDBf will expose these shortcomings and probably be blamed for them (with bad models, “shoot the messenger” becomes “shoot the protocol”).

The second possible direction is that CMDBf provides enough value in integrating CMDBs that people want more and challenge the group to deliver on the “f” part, federation. That could take the form of a combination of:

better integration with other protocols (mostly from the WS-Management family, like WS-Enumeration and WS-Eventing),

some optimizations in the query mechanism for distributed queries (e.g. data partition rules).

The third possible direction (not exclusive) is for CMDBf to become the basis for a standard rule language for IT models. Yeah, another one (remember SML?). SPIN and SML show us how a generic query language can be used to support configuration rules. I very much like SPIN but it requires adopting RDF as a metamodel, which is a hard sell in XML-land. SML suffers technically from being too reliant on an inappropriate validation tool (XSD) and treating relationships as a second thought rather than an integral part of the model. Which is fine in many areas (EMF does it too), but not, in my view, when modeling IT systems.

If we are not going to use RDF/SPIN then let’s copy them. We can use the CMDBf metamodel (graph-based) where SPIN uses RDF. We can use the CMDBf query language (graph-oriented) where SPIN uses SPARQL. Since CMDBf queries use XPath, we see some commonalities with SML (which uses XPath through Schematron). But in CMDBf XPath is scoped to the leaf nodes of the graph, not the entire model as it is in SML. In other words, SML adds relationship traversal to XPath, while CMDBf adds XPath to its relationship-aware queries. It’s a matter of who’s on top. It sounds academic but it isn’t.

Does the industry really want standardized, re-usable configuration rules? SML/CML seem to say no. The push towards Cloud interop, on the other hand, begs for it. At least if you believe in programming your environment in a way that is partialy declarative rather than entirely procedural.

[UPDATED 2009/3/5: Rob England (a.k.a. Mr. Skeptic as I refer to him above) provides a geek-to-English translation for this post. Neat!]

I don’t get it. I just read Reuven Cohen’s description of the Unified Cloud Interface project that he recently started. It’s nothing less than using RDF to create “a Semantic Cloud Infrastructure capable of adapting to a variety of methodologies / architectures and completely agnostic to any specific API or platform being described.”

What made me fall off my chair is the methodology/architecture part of this statement. It’s hard enough (but doable) to use RDF to map philosophically similar APIs. It’s a non-starter to use it to bridge architectural and methodological differences. I have spent a fair amount of time looking at Semantic Web technologies in the context of modeling IT systems (see the “semantic tech” category of this blog). While I think they would be a great foundation I don’t see them ever coming anywhere near what Reuven describes.

But to be fair, I am not sure what he really is describing. There are a few overly ambitious proclamations like the one above and this paragraph:

The key drivers of a unified cloud interface (UCI) is “One abstraction to Rule them All” – an API for other API’s. A singular abstraction that can encompass the entire infrastructure stack as well as emerging cloud centric technologies through a unified interface. What a semantic model enables for UCI is a capability to bridge both cloud based API’s such as Amazon Web Services with existing protocols and standards, regardless of the level of adoption of the underlying API’s or technology. The goal is simple, develop your application once, deploy anywhere at anytime for any reason.

But in his piece you’ll also find CIM being cited as an example. There are good things to be said about CIM, but it certainly is not “a dynamic computing model that can, under certain conditions, be ‘trained’ to appropriately ‘learn’ the meaning of related cloud & infrastructure resources” (or, in the case of CIM, computer system resources). Good luck “training” CIM to “learn” anything. It’s CIM that’s going to train you to do it its way, period.

The CIM example (and other standards he lists) paints the picture of defining a standard API for Cloud Computing and forcing all providers to use it. That’s the conventional approach to universality. If that’s what UCI is after then it is technically achievable. And RDF might be a very good technical foundation for it. Whether anyone can pull this off politically and commercially at this stage is a different question of course. In any case, such an effort would have nothing to do with magically wrapping whatever API each provider has defined and whatever architecture/methodology they chose.

And further down we see a sketch of another, much more modest, vision, when Reuven talks about how “these web resources could just as easily be ‘cloud resources’ or API’s” which seems to represent a whole API as an RDF resource. Sure, then you can use RDF/OWL to capture versioning information between them, backward compatibility etc. Probably very useful, but that’s a very different scope.

So which is it? Reuven is a thought leader in Cloud Computing, so I want to think I am missing his point.

So far, I haven’t seen any Cloud taxonomy that is reasonably complete and has received broad support. Shouldn’t we first try to come up with a human-readable taxonomy before we try to turn it into a machine-readable ontology? In my previous post I explicitly stayed away from being pedantic about the difference between the terms, but the confusion between a taxonomy and an ontology seems to be part of what’s going on here.

The sad thing is that they (you know, them) will point to this as a proof that Semantic Web technologies don’t work.

Or maybe I’ve just set myself up for a generous portion of humble pie on April 2nd (when Reuven says an “initial functional draft UCI implementation, taxonomy and ontology” will be unveiled). I’d love to be surprised. And my ego has taken worse hits before.

[UPDATED 2009/2/10: You should read Steve Oberlin’s take on this overall taxonomy/ontology discussion. He knows the topic, carefully reads the posts that he comments on, packs a healthy dose of skepticism and takes the time to explain what taxonomies and ontologies are, which was overdue. Plus, I just love sites that don’t feel the need to use decorative pictures. His doesn’t have a single image file which means that even if he didn’t have superb credentials (which he does) he’d get my respect by default. A blog to watch.]

Is it a matter of picking one of these? And if we do, how are we better of?

Let’s look at some related domains that are a bit more mature to see how they handled this and what benefits they got out of it. The three most relevant comparison I can come up with are ITSM, Grid Computing and SOA.

ITSM has ITIL, a key part of which is its glossary which provides value in and of itself, even if you don’t explicitly use the rest of the framework. Is this what the Cloud taxonomy should aim for? I contend that the Cloud field is not nearly mature enough to aim for this.

Grid Computing has OGSA, which started as a white paper containing a service model (section 6.1). It was later expanded (within GGF, a standards organization specific to Grid Computing) into this document and was used as a basis for many standard technical specification. Is the goal of the Cloud Ontology to follow a similar path? OGSA was more than a glossary though, it included a vision, goals and architectural recommendations. Sure you can start with the glossary, but I am willing to bet that a stand-alone glossary would change significantly if/when a Cloud Computing architecture is defined based on it. And there isn’t today, for Cloud Computing, the singularity of vision that the Globus gang brought to Grid Computing and that is necessary for an architecture.

SOA was never as structured. There was an early attempt, called the Web Services Architecture at W3C. Like OGSA, this is a high level architecture, not just a taxonomy. In any case, the WSA work didn’t really end up having much of an influence in SOA (or even Web services) practice. Then came the SOA Reference Model, an OASIS standard. Of the three (ITIL, OGSA, SOA-RM) this is what I think is closest to what a Cloud taxonomy should aim for at this point. An ITIL-like glossary would be too brittle (if it has any depth) and it’s too early for an OGSA-like architecture.

Again, I was not at the meeting and only know what was blogged about. This is not a criticism of the proposal, just some thoughts about how it might compare to similar efforts.

Side note #1: I prevented myself from going off about the difference between a controlled vocabulary, a glossary, a taxonomy and an ontology. These differences are not germane to the present discussion. And I am not wearing my pedantic hat today (for once). If some want to call a wedding cake an ontology, who am I to stop them?

Side note #2: Speaking of trying to be less pedantic in 2009, this entry represents my capitulation: I finally created a “Cloud Computing” category in this blog, rather than my prefered designation, “Utility Computing”, which I have been using so far. A small step for me, a microscopic step for humanity.

[UPDATED 2009/1/28: We’re getting there (fast!). Via John, this updated taxonomy graph from Chris Hoff is going in the right direction, I think. Next step: more text describing the relationships between the boxes to have more of a model.]

[UPDATED 2009/1/30: If you think that any single-vendor taxonomy will be suspicious, that a multi-vendor taxonomy won’t happen and that customers are too busy to do it, then you may be asking “aren’t analysts supposed to do this?”. Help may be on the way.]

Back when I was at HP and we got involved with what turned into SML (now a W3C candidate recommendation), we tried to make a case for the specification to be based on RDF/OWL rather than XML/XSD/Schematron. It was a strange situation from a technical perspective because RDF is a better foundation for an IT model than XML, but on the other hand XSD/Schematron is a better choice for validation than OWL. OWL is focused on inference, not validation (because of both fundamental design choices, e.g. the open world assumption, and language expressiveness limitations).

So our options were to either use the right way to represent the system (RDF) combined with the wrong way to capture constraints (OWL) or to use the wrong way to represent the system (XML) combined with the right way to constrain it (mostly Schematron, with some limited help from XSD). At the end, of course, this subtle technical debate was crushed under the steamroller of vendor politics and RDF never got a fair chance anyway.

The point of this little background story is to describe the context in which I read this announcement from Holger Knublauch of TopQuadrant: the new version of their TopBraid Composer tool introduces SPIN, a way to complement OWL with a SPARQL-based constraint checking and inference mechanism.

This relates to SML in two ways.

First, there are similarities in the approach: Schematron leverages the XPath language, used to query XML, to create validation rules. SML then marries Schematron with XSD, for a more powerful validation mechanism. Compare this to SPIN: SPIN leverages the SPARQL query language, used to query RDF, to create validation/inference rules. SPIN also marries this with OWL, for a more powerful validation/inference mechanism.

But beyond the mirroring structures of SPIN and SML, the most interesting thing is that it looks like SPIN could nicely solve the conundrum, described above, of RDF being the right foundation for modeling IT systems but OWL being the wrong constraint mechanism. SPIN may do a better job than SML at what SML is aiming to do (validation rules). And at the same time, you get “for free” (or as close to “for free” as you can get with software, which is still far from “free”) a pretty powerful inference mechanism. The most powerful I know of, short of using a general programming language to capture your inference rules (and good luck with maintaining these rules).

This may sound like sci-fi, but it’s the next logical step for IT configuration standardization. Let’s look at where we are today:

SML (at W3C) is an attempt to standardize the expression of constraints.

CMDBf (at DMTF) is standardizing how the model content is queried (and, to some limited extent at this point, federated).

But once you tackle reconciliation, you are already half-way into inferencing territory. At least if you want to reconcile between models, not just between instances expressed in the same model. Because the models may not be defined at the same level of granularity, and before you can reconcile items you need to infer finer-grained entities in your coarser-grained model (or vice-versa) so that you can reconcile apples with apples.

Today, inferencing for IT models is done as part of the “discovery packs” that you can buy along with your IT management model repository. But not very well, in general. Because the way you write such a discovery module for the HP Universal CMDB is very different from how you write it for the BMC CMDB, IBM’s CCMDB or as a plug-in for Oracle Enterprise Manager or Microsoft System Center. Not to mention the smaller, more specialized, players. As a result, there is little incentive for 3rd party domain experts to put work into capturing inference rules since the work cannot be widely leveraged.

I am going a bit off-topic here, but one interesting thing about standardization of inferencing for IT management, if it happens, is that it is going to be very hard to not use RDF, OWL and some flavor of SPARQL (SPIN or equivalent) there. And once you do that, the XML-based constraint mechanisms (SML or others) are going to be in for a rough ride. After resisting the RDF stack for constraints, queries and basic reconciliation (because the added value was supposedly not “worth the cost” for each of these separately), the XML dam might get a crack for inferencing. And once RDF starts to trickle through that crack, the whole dam is going to come down in a big wave. Just to be clear, this is a prophetic long-term vision, not a prediction for 2009 (unfortunately).

In the meantime, I’d like to take this SPIN feature a… spin (sorry) when I find some time. We’ll see if I can install the new beta of TopBraid composer despite having used up, a year ago, my evaluation license of the earlier version of the product. Despite what I had hopped at some point, this is not directly applicable to my current work, so I am not sure I want to buy a license. But who knows, SPIN may turn out to be the change that eventually puts RDF back on my “day job” list (one can dream)…

It’s also nice that Holger took the pain to deliver SPIN not just as a feature of his product but also as a stand-alone specification, which should make it pretty easy for anyone who has a SPARQL engine handy to support it. Hopefully the next step will be for him to clarify the IP terms for the specification and to decide whether or not he wants to eventually submit it for standardization. Maybe to the W3C SML working group? :-) I’d have a hard time resisting joining if he did.

Whether you call it a CMDB or some other name, any repository of IT model elements has the problem of establishing whether two entities are the same or not. Here is a quick map of the problem space.

Why there is no “one true solution”

There is no “true” answer to the “sameness” question. The following example illustrates this, even though it is not necessarily representative of datacenter scenarios. Ask any gamer to tell you the history of that 3-year-old PC under their desk. The power cord might be the only original piece left after they’ve upgraded the RAM, video card, sound card, hard disk(s), DVD drive and power supply. Not to mention the tragic overclocking accident that took the life of the motherboard/CPU. After the upgrade/replacement of each of these parts, the user still thought of the machine as the same PC, just upgraded/fixed. But how can it be the same as at the beginning if pretty much every single part has changed? And when time came to reinstall Windows, the registration probably failed because Microsoft decided that the same license was being used for a new machine. Sameness is in the eye of the beholder.

And it’s not just a hardware problem. When you upgrade your Oracle Database and start using a new ORACLE_HOME, it may feel like the same database to most users (including the applications that talk to the database) but a more executable-centric view might conclude that it is a new database.

Defining what makes an IT element unique is not a matter of truth. It’s a matter of usefulness for a given purpose. When trying to establish this for your model, if the conversation ever veers philosophical, you’re off track. This is engineering, not science. “A and B are the same” should be understood to be a shortcut for “it makes sense for my purpose to consider A and B to be the same”. Of course things become complicated when “my purpose” encompasses a whole set of use cases (add “management” after each of: performance, compliance, configuration, change, asset, business service, business transaction…).

How the problem arises

It can arise over time. For example the management agent has to be reinstalled and it forgets the id that had been assigned by the server. When it comes back up, it reports what looks like a new item. But you want to reconcile it with the historical data that came from the agent’s previous incarnation.

It can arise because you have different discovery channels for the same item. For example, a BPEL engine reports to the management server the processes it supports and that model includes the external Web services (partnerLink) invoked by the processes, thereby creating items for these external services in your repository. But some of those external services may be running on servers which you also monitor and the services (and more generally the applications that deliver them) may be separately discovered by the agents on these hosts, resulting in a potential duplicate representation in the repository.

Or the problem can be a result of the integration of IT management products. For example, that Dell server in my asset management system may be the same as the Linux host that runs my production database and appears in Oracle Enterprise Manager.

Fixing the mess

There are two stages to this:

First, you need some level of model alignment. In the general case, the different items that you are trying to reconcile are not expressed in the same model. The view of the server coming from the asset management system does not necessarily contain the same data as its view in your operations console. One contains the lease expiration date, the other one contains the amount of space left on the disk. Some data may be in both (e.g. number of CPUs, host name) but not necessarily in fields of the same name. Or with the same granularity (ownerName versus ownerFirstname + ownerLastname). Not to mention type system differences (but if the items are already in the same repository you have presumably already forced some level of metamodel alignment). In short, you first have the challenge of model transformation, a more general problem. With the advantage that the entire model does not need to be translated for item reconciliation, only the subset of data needed to establish “sameness”: the identifying properties. And in some cases (e.g. when a standard model is used or when two instances of the same agent report on the item), the items to reconcile are already described in the same model and this step can be skipped.

Once the necessary level of model alignment has taken place (if needed) so that items can be compared, the real task of reconciliation takes place, based on domain knowledge. It could be through a set of scripts (Python’s mix of simplicity, portability, broad array of libraries and ease of integration make it shine in this usage). It could be through some kind of reconciliation taxonomy, like this draft that IBM has contributed to the Eclipse COSMOS project. Or through metadata such as WSDM’s correlatable properties. [BTW, as the spec editor I got to insert dubious cultural references in the specification (see the <print:PrinterResourcePropDoc> example in section 5.3.3.1), but let me assure you that I have since matured… ;-)]

These are not the only ways to reconcile items, but they are the approaches that can be followed based on just the data in the repository. Beyond that, you can run a dummy transaction and trace it (if possible) across different management systems to reconcile entities between them. There are plenty of other domain-specific tricks, depending on the item type (I remember a machine room, back in the days when each server had a CD drive, where a script to open the CD tray was used to allow the operator to put a sticker on the correct machine). In general, these approaches play on external variables that are not directly part of the model of the item and yet can be influenced through it. Similar to how the bulb temperature is used in this famous brain teaser. I guess the IT equivalent would be to load-stress an application and use IPMI to see which CPUs register a rise in temperature (note: not a recommended approach in production systems…).

Coming back to the IT model repository, you also need to have plumbing in place to deal with the result of the reconciliation: requests and data may come in that reference either one of the reconciled items and you need to be able to deal with that split personality, while providing a unified view in the general case. You also need to be ready to deal with potential data discrepancy between the items (either automatically or through of process that involves humans, but this is out of scope for this entry).

Preventing the mess from happening

Can’t we just prevent the problem from occurring in the first place? To some extent yes. The main way to prevent it is to not reconcile what doesn’t need to be. This may sound heretical in these days of “single source of truth” and “end to end visibility” but reconciliation of key connection points is often enough. You may not need to have one single model that contains everything from your company’s employee directory to the fan speed of all your servers. It’s a matter of delivering on use cases, not hoarding data.

When you do want to consolidate and reconcile, one approach is to standardize on natural IDs for items of different types. But this requires domain experts to carefully select identifying (and therefore immutable) properties of the different object types, which sounds a lot easier than it is. And it requires convincing others to adopt this approach, an even harder task. But as the proverb (almost) goes, one ounce of convention is worth one pound of reconciliation.

[Note: Whenever you talk about item reconciliation, the topic of correlating events is not far behind. It is assisted by a solid underlying IT model, but it has challenges of its own, so I’ll consider this out of scope for this discussion.]

If you’ve seen a lotofnewsarticles about HP’s IT management software this week (e.g. through Cote or Doug) it’s because the company held its Software Universe conference in Vienna this week and timed a bunch of announcements and PR events to match.

Most of the articles linked above just paraphrase the press releases and talking points. So if you’re going to get the company line, might as well get it straight from the horse’s mouth. Which we can now do through a new HP blog about BSM. The first article was penned by Mike Shaw and that’s enough for me to want to subscribe (I worked with Mike a few times when I was at HP and he is very sharp). I think Mike also wrote the other entries but since they are not signed (and the account name, “adsey007”, is pretty opaque) I am not sure. In any case, they are pretty good. This one gives an overview of the Vienna announcements. The next one describes in more details the OMi product. I am not in position to know how well it works but, according to the article, OMi takes the important step of modeling and managing events in the context of the overall model in the CMDB. Such that the event management features (e.g. correlation) can use the already-discovered relationships between the IT elements involved in the events (e.g. dependencies). The article also implies that the CMDB has been integrated with NNM (OpenView), Service Manager (Peregrine) and Server Automation (Opsware). Which is a lot of progress in 16 months since I left HP, so I am taking it with a grain of salt (we all know there are different levels of integration). The press release says that the CMDB is now integrated with 17 HP BTO applications, so you may need a whole salt shaker. In any case it’s great to see that Ramin and team are forging ahead, delivering products and driving the integration of the BTO portfolio.

The last paragraph (“OMi actually sits on top of existing HP Operations Manager installations…”) is intriguing and may provide a clue about the depth of the integration. In any case, OMi is something to keep an eye on as it is positioned to leverage a lot of the key strengths of the HP BTO portfolio.

BTW, this OMi product has nothing to do with this OMI which was a precursor to WSMF, WSDM and WS-Management. And which most people currently working in HP Software have never heard of.

Microsoft’s PDC is taking place this week and more details were shared with the attendees about project Oslo, an effort announced last year to drastically improve the use of models across the application lifecycle. Some code is available (I think the Quadrant code is only for PDC attendees but the Oslo SDK is available to everyone). I am not at PDC, I didn’t see any presentation and I didn’t download any code. But Microsoft has also posted technical details on MSDN and, as far as I am concerned, that’s the most time-effective way to spend a couple of hours learning about Oslo. BTW, the way they share these early design descriptions and accept to make their evolution public is admirable.

For those who only want to spend 10 minutes rather than 2 hours, here are the thoughts that came to my mind as I was reading.

Overall I am somewhat underwhelmed, but not necessarily in a bad way. I know that’s a little schizophrenic so let me explain. After hearing a lot about how Oslo was the next big thing in modeling, it is a little surprising to read a document that can be summarized as “modeling is good, so go create some SQL tables and store them in a RDBMS”. That’s the underwhelming part. But on the other hand, it is more down to earth and practically-minded than I feared. And this is just a summary, in truth there is more than just “use SQL”.

Half of the MSDN documentation basically explains how to use SQL Server to store application models (as of today, the “Developing Models for the Metadata Store” section has only one sub-section, “SQL Server Guidelines for Modeling in the Oslo Repository“). Does this mean that all .NET applications will eventually have to carry with them a deployment of SQL Server 2008 even if they don’t use it to store the their operational data? Sure there are a few extra repository services (e.g. finer-grained change auditing) but most Oslo repository services are generic SQL Server features. That section has quite a lot of T-SQL, but it’s pretty readable. It also has a lot of dependencies on following naming conventions which makes me think that directly creating T-SQL code is not the best approach.

Fortunately there is an alternative, the “M” language. It’s a schema language with a built-in constraint mechanism. I found it more data-oriented (as opposed to resource-oriented) than I expected. Even though “each model is really a set of data structures, relationships, and constraints in serialized form“, there is a lot more support for data structures and constraints than for relationships. It’s just a foreign key. Relationships aren’t items and don’t have any property (or “field” as they’re called in “M”). For example, the relationship between a student’s enrollment record and a given class can’t have, as property, the grade that the student got for that class (as in the example in section 4.1.4 of the second LC of SML). To model this in “M” you need to create another item (e.g. “courseEnrollment”) and have a relationship from the student to that item and another one from that item to the “course” item itself. Or to replace the foreign key in the student table with a complex structure that contains both the foreign key and the properties of the relationship. At the end it has the same expressiveness potential, but in a less streamlined form. I assume Microsoft took this approach for performance reasons.

I am going on a limb here, but it may also be a difference between development-time concerns and operation-time concerns. During development (all the way to testing and packaging), you can still mostly get away with a relatively simple containment structure. You care about the components of your application and how they are packaged inside or next to one another. Sure you care about who calls who outside of the deployment unit but that’s not as core a concern as getting your class dependencies right, your tests in order and your installer configured. In fact, some of the “who calls who” bindings will be only be realized at runtime. Oslo, at least so far, clearly seems more focused on development time than operations so support for a relationship-rich model may not seem critical. At operations time, on the other hand, you don’t really care so much about how things were packaged before installation. You care a lot more about who invokes who (especially for modern distributed applications), what the network layout is, what resources a ticket is attached to, etc. The model looks a lot more like a graph with complex relationships. Something that “M” doesn’t seem ideally suited for.

Except for this caveat, I like “M”. It’s not anti-XML (you can represent values as XML if you’d like) but it avoids the “the answer is XML/XSD what is the question” approach to modeling that is sometimes a little too prevalent. “M” is a much better schema language for IT systems than XSD. I especially like its approach to types. A value is not intrinsically of a given type. A type is a condition that you happen to meet or not at the current time (“take heart little field, you can be anything you want when you grow up”). As such, you can be of several types at the same time. Refined types are potatoes inside potatoes (not sure if “M” supports definition of types as unions and/or intersection of existing types, for intersection I want to write something like”type NewType : OldType1 where this in OldType2” but there is no “this” in “M”). That approach to types (and the way constraints leverage types) is reminiscent of RDF/OWL. It’s a classification more than a typification, but I understand why they didn’t want to call it “class”. The similarities with RDF/OWL don’t go any further. As I wrote earlier, “M”is very data-focused and not resource-focused: as far as I can tell “M” types are defined syntactically, not semantically (the semantics come as a consequence). For example, I don’t think that you can assert that a given item representing a person is of type “friendly” if there is no corresponding data in the item. You’d have to first create a boolean field called “friendly” and define that those that have that field set to “true” are of type “friendly”. Unlike in RDF/OWL where you can just assert that a subject is “friendly”.

Here is another reason why you can’t have “semantics-only” types: “if you do not specify the type of a field or value, M infers a type for it“. Two things don’t sound quite right to me here. First a detail: the sentence (like others in the doc) talks about “the” type of a field of value, while there can be more than one. More importantly, what’s the point of this feature? How does it help me to have my IRC nickname classified as a post code or as a password just because it happens to be made of a compatible combination of letters and numbers? Maybe it makes sense as a storage optimization, but why does it make sense to expose this to the user?

I also like the way “extents” work. The current description of that feature is pretty limited, but based on how it is used in other parts I think one of its usages is to support a non-OO equivalent to inheritance: create two extents, one for the “superclass” and one for the “subclass” where each only contains the properties/fields defined at that level. You should get both of them in order to have the full picture (all the fields). This is, if I understand it correctly, similar to something I have been (unsuccessfully so far because “XML doesn’t do it this way”) trying to sell to the DMTF CMDBf working group: model inheritance through a set of non-overlapping records rather than dealing with a type hierarchy on record types. It’s not just that it makes relational storage easier (even though it does and that’s probably why “M” does it this way), it also makes your query/select operations a lot easier to specify and implement.

All in all (and without having gone through the exercise of defining actual models in “M”), it seems like a fine schema language (except that its dependency on the CLR base types is unpractical for users outside of the Microsoft universe) but I am not sure if it is beefy enough to be a good IT management metamodel. When the document says that “the Oslo repository provides open and flexible access to the data it contains, which enables direct access to SQL Server views of the underlying data. There are no complex data access layers or APIs” it sounds better than saying “it’s just SQL, so map your model to it and if you want relationships or type inheritance just build it on top of it and quit whining”. But it is an admission of limitation at the same time as a claim of simplicity. I also smell an assumption that LINQ will provide enough hand-holding that non-SQL-savvy developers will be ok. We’ll see.

And then there is MGrammar. Things get a little confusing at that point if you try to relate MGrammar to “M”. Actually, the FAQ states that “the M language consists of three parts: MGraph, MSchema and MGrammar“. This came a bit as a surprise to me since at that point I had finished reading (not in details but not too quickly either) the “M” documentation and I hadn’t seen these names mentioned once. Looks like there is some documentation consistency issues here, but that’s hardly surprising considering this is a “hyper-early (pre-alpha)” release as Doug Purdy puts it.

I think that everything that I have referred to as “M” above is MSchema.

MGrammar is something different altogether: it’s the source of the Domain Specific Language (DSL) references we’ve been hearing in relationship with Oslo. Technically, MGrammar is a BNF on steroids plus an automatically generated parser for your syntax. Cute. I assume that “M” (i.e. MSchema) is built as MGrammar-defined DSL but I am not sure why I would care. I am all for reuse and if someone at Microsoft thought that there was something reusable in the way they defined MSchema then it’s a good thing to expose this tool. But where does it come into play in application modeling? The last thing I want is people inventing completely independent languages to describe different domains. I am all for specialization, but a common underlying metamodel is pretty nice when you have to make sense of a whole system. I don’t see any such commonality in MGrammar: as far as I can tell it can be used to define anything from PostScript to sonnets.

From the FAQ, the connection point between MGrammar and MSchema is MGraph (MGrammar languages are parsed into an MGraph, MSchema “builds on MGraph”). That’s nice, but since neither the MSchema nor the MGrammar documentation mention MGraph I don’t really know what to make of this. David Chappell’s white paper also mentions MSchema and MGrammar but not MGraph. The introduction to the MGrammar Language Specification states that “the data that results from Mg [a.k.a. MGrammar] processing is compatible with Mg’s sister language, The Oslo Modeling Language, M, which provides a SQL-compatible schema and query language that can be used to further process the underlying information“. Compatible? I need more information here. In any case, MGrammar sounds like a fun project for a techie. Who am I to deny Microsoft engineers their fun. Jokes aside, I am probably missing something here seeing how prevalent the DSL message is in all discussions of Oslo. Look at the “highlights of this book” section for the upcoming Oslo/M book from the creators of the “M” language: half of it is about the DSL support and there must be a reason beyond pure geekery. As a side note, if you buy this book you need to understand what little shelf life it will have (I can give you a good price on a lightly-used Hailstorm/”.Net my services” specification book).

Aside from the “M” language itself, there are a few models described in the documentation. One corresponds to BPMN (actually, it says that it “closely aligns with” BPPMN 1.1, does this imply that they are not quite the same?). The fact that this model supports imports from Visio is a nice feature.

The Application model (one of the places where you can see “extents” in action) scares me a little bit because I doubt that two different people would use the same “extents” to describe the same software elements. Unless of course that’s being done for them by a pre-defined mapping to their development framework (.NET) enacted by their common development tool (Visual Studio). Which may be the assumption. Yet, the Application model is defined in generic terms, not Microsoft-specific (with a couple of slip-ups, like a WebApplicationModule being defined as a “Web application (module) implemented by IIS or WAS“. Maybe I’ll feel better about the generic applicability of this Application model when I see a full-fledged description (e.g. including relationship semantics as captured in foreign key field names) and an example.

At the bottom of that Application model, there is a lonely “Manageable” type to use if you have a LifecycleState field. This reinforces my impression that despite the claims to link development time with operational time, a lot of the focus to date has been on the former rather than the latter.

The ServiceModel model will look familiar to people familiar with SCA and is presumably complementary to the WorkflowModel and WorkflowServiceModel models, both of which are directly mapped to Windows Workflow Foundation. I guess that’s where Oslo and Dublin touch one another. I am still glad they are now clearly separated.

There is also a “Quadrant” model which concerns me a bit (it seems to be used to store customization of the Quadrant UI which, while convenient to store straight in the repository, doesn’t strike me as necessarily belonging there).

At this point, the question is not whether Microsoft can build Oslo as it is currently defined. SQL Server 2008 already exists, the usage guidelines aren’t unrealistic and even the “M-to-T-SQL” translation doesn’t seem too hard for Microsoft to implement (the SDK presumable already contains an implementation). I have no doubt they can deliver the system they describe. What I don’t know is whether and how it will be actually useful.

Describing “M” in details is good. Describing how the repository is implemented on top of SQL Server 2008 is interesting but not so relevant. What I’d like to see is a description of how all this gets used. How does it change the Visual Studio experience? How does it change the installation process/format? How does it support round-tripping between lifecycle stages (e.g. if the developer changes the workflow model, does that original BPMN model get consequently updated)? How does it relate to SLAs and policies? How does it apply to application monitoring? How does it apply to configuration management, to the change process? Etc. In short, what’s the Oslo ecosystem going to be.

These questions aren’t completely ignored in the MSDN documentation, but they are dispensed with in a couple of pages: “Application Development and Lifecycle Improvements” and “IT Operations Benefits“. The former states, for example, that “having the Oslo repository act as a central location for these models also enables a connection between the design and implementation models. This connection helps prevent these models from becoming disconnected during the development process“. Which all sounds good but is just a set of assertions that we have heard many times before (not just from Microsoft). How do “M” and the Oslo repository really make this true?

On the “IT Operations Benefits” side, things are equally blurry: “the Oslo repository can store all types of machine and application configuration data. When consistently updated, this configuration data is a catalog of the current state of all monitored machines and applications in the environment“. Notice the “when consistently updated” hand wave. That’s kind of the crux if you really want to manage across the lifecycle. How will they achieve this consistency? By centralizing all changes through a model-driven controller a la SDM/SML? Through ongoing discovery and/or change notifications? By relying on good old ITIL/MOF processes?

The FAQ declares that “having a common approach does not necessarily correlate to one physical store, but more of a federated model and we believe that some of the new Repository, along with existing investments in both Configuration Management Database (CMDB) and Team Foundation Server (TFS), will form the foundation for a common Microsoft metadata strategy and should be supported across our set of products“. OK, but who is the source of truth for application configuration data? The Oslo repository or the CMDB? Is one the desired state and the other the observed state? Does the CMDB go back to simply being a Service Desk (and if so, does the Oslo repository take on the responsibility to enforce change processes, something that requires more than the security model in Oslo)? If the CMDB is still going to use SML as its metamodel, how do you efficiently federate across such different metamodels as SML (i.e. XSD + schematron + relationships) and “M”?

Lots of questions remaining. What will Oslo have turned into in a few years? A business process design/implementation/monitoring suite (there is a strong workflow feel to many parts)? A generic drag-and-drop programming environment (“the fact that entire features are already described by models means that for a wide array of application and component categories you can start using visual tools to design and implement your components“)? A control center for end to end application management? All of the above? Nothing?

This was just a quick brain dump after reading the documents. Actually, I just realized it somehow got pretty long (congrats if you’re still reading). I hope this post is not too disorganized. Oslo is an interesting effort, but, as Microsoft is first to admit, it’s at a very early stage. I am just surprised that this first release spends so much time on the “how” rather than the “what”. Maybe it’s just because I only got my information from the MSDN documentation. We’ll see when more content from PDC finds its way online. I just want the slides, watching recorded presentations is rarely time-efficient (and you can expect them to require Silverlight).

Speaking of Silverlight, there is this new site on Oslo if you think watching some videos is worth installing Silverlight. Those screenshots don’t motivate me sufficiently.

[UPDATED 2008/10/30: Rather than going to bed I Googled around a bit and found a post by Martin Fowler that answers some of my questions about MGrammar, MGraph and MSchema. MGraph is for instances, MSchema is for types. It answers some plumbing question, but I still have questions about expected usage and relevance to applications modeling.]

[UPDATED 2008/10/30: I also found the recordings and slides from past PDC sessions. Nice job Microsoft for this quick turnaround time, even if you require Sliverlight and/or the PPTX viewer. The sessions are:

The first two sessions (deliverd Tuesday) have a replay and slides, the others should, I assume, follow soon.]

[UPDATED 2008/11/3: A nice overview of Oslo by Aaron Skonnard. Unlike most other Oslo articles over the last week, this one tries to paint the (yet-to-be-realized) full picture of the Oslo ecoystem. He mentions that “other Microsoft products and technologies are expected to build on Oslo to provide other runtimes. A few that have already been announced include Microsoft System Center (Operations Manager) and Team Foundation Server (TFS) in Visual Studio Team System”. It’s interesting that he qualifies System Center to be more specifically “operations manager” rather than “configuration manager” but I wouldn’t read too much into it at this point.]

Good news. The Oslo code name now specifically refers to Microsoft’s new modeling technologies (the part that I and, presumably, readers of this blog care about) and not the workflow/biztalk stuff that was always mixed in (to the point where some Oslo stories only mentioned workflow).

[UPDATED 2008/10/10: Now this is getting silly. Yet another name change. It’s not “D” it’s “M”. Whatever. Isn’t the whole point of code names that it doesn’t matter what they are: just pick one and stick with it until you release and then you can come up with the final name? I am not going to do another post just for this like a groupie tracking every news item, however irrelevant, about his/her favorite band. Which, for the record, is not the position I am in wrt to Oslo (at least until I know what it really is). Oh, and their graphical modeling tool is now called Quadrant. I am sure the TopQuadrant folks (creator of the TopBraid RDF/OWL/SPARQL editor which is in a very related domain) will appreciate.]

The SML working group at W3C has published the “last call” working draft of version 1.1 of the SML and SML-IF (“IF” stands for “interchange format”) specifications. You have until October 3rd to tell them what you think.

With all the Oslo fun, the OMG embrace and the silence from System Center there are more questions than answers about the use of SML at Microsoft. But the Eclipse COSMOS project (IBM and friends) is, as far as I know, valiantly going forward with the store/validator implementation. Which may or may not be the same codebase as what was used for the recent CMDBf interop demo (I am not sure how the SML and CDMBf implementations in COSMOS are articulated).

The COSMOS group also recently published an overview of SML. It doesn’t try to tell you why you’d want to use SML but it’s a good and succint description of what SML is technically (from an XML developer’s perspective).

Spoiler alert: if you like to learn things the hard way, don’t follow this link. It points to a clear description of all the problems, frustrations, disillusions and “ah ah!” moments that are ahead of you as you start to use XML and grow into an expert.

If, on the other hand, you like to be fully prepared and informed when you choose a technology and if you don’t mind sacrificing some adventure and excitement in the process, then you owe it to yourself to read Erik Wilde and Robert Glushko’s XML Fever article. Even if you already consider yourself an XML expert. Especially if you do.

I knew I would like it when I read this in the introduction:

Advanced strains of XML fever often take hold after exposure to the proliferation of more complex and esoteric XML-based technologies layered on top of it. These advanced diseases are harder to catch, but they are also harder to remedy because people who have caught these advanced strains tend to congregate with others with the same diseases and they are continually reinfecting each other.

Oh yes they do. And they speak with such authority that they infect others around them. People who don’t even understand these “more complex and esoteric XML-based technologies” end up being convinced of their magical properties and the need to use them.

I am not going to attempt to summarize the article because it is too tightly packed with great content to be summarized without being butchered. The “tree trauma” section alone could probably save the world billions of dollars in lost productivity if it was widely read. I’ll just quote a few sections to motivate you to go read the whole thing.

Tree tremors. Whereas tree trauma (discussed earlier) is a basic strain of XML fever caused by the various flavors of trees in XML technologies, tree tremors are a more serious condition afflicting victims trying to manage data in XML that is not inherently tree-structured. The most common causes are data models requiring nontree graph structures and document models needing overlapping structures. In both cases, mapping these models to XML’s tree model results in XML structures that cannot conveniently represent the application-level model.

(…)

The choice of schema languages, however, is more often determined by available tool support and acquired habits than by a thorough analysis of what would be the most appropriate language.

(…)

Triple shock. While RDF itself is simple, large datasets easily contain millions of triples (for truly large datasets this can go up to billions), and managing and querying such a big dataset can become a considerable challenge. If the schema of these large datasets is simple, but ontology overkill has set in and it has been reformulated as an ontology, handling this dataset may become considerably harder, without any immediate benefit.

This is true not just for RDF (a graph model that can be serialized in XML) but for any non-tree model that can be serialized in XML (which is to say any model one can think of). Including every graph model.

Maybe it would help if the article stated more clearly that it’s ok to serialize such a model as XML (e.g. for transmission) as long as you don’t process it (at the application level) as XML. As long as it gets accessed using an API and concepts that are aligned with the semantics of the model.

Imagine that you are receiving an RDF dataset over the wire. You could (if your app runs on the network card rather than in CPU) process it as a bunch of electrical impulses, but that wouldn’t be very convenient. You could process it as a bunch of bits, but that’s still hard. You could process it as a character stream but that’s not that much better. You could process it as XML but that’s still no great. Or you could process it as RDF triplets and be home on time to have dinner with your family. It’s not the fact that it is represented as XML at some point that’s the problem, it’s the fact that your application processes it as XML. Said in another way, just because it makes sense to store it or to send it over the network in XML doesn’t mean that you have to process it as XML in your application.

There is at least one more problem (not covered by the article) that people will eventually run into. You’d think that XML technologies are a consistent and complementary set. Not true. The lack of consistency is illustrated by the “tree trauma” section of the article. But there is also a complementarity problem, in the sense that there are large gaps between the specifications, as anyone who has tried to serialize an XPath nodeset has found out.

As the article points out, all this doesn’t mean that XML is bad or useless. XML technologies can be very useful, but for not for all tasks.