March 2009 Archives

There is a lot of talk in the Geneva forum about caching and avoiding unnecessary round trips to the STS when calling downstream Web services. After all of it, I thought that reusing the same ChannelFactory<T> object would result in a single call to the STS for a token and that subsequent calls made with that factory would reuse the SAML token. I thought that a service type like this would suffice:

It turns out this is not correct, however. Every time a channel is created, invoking operations on that proxy will result in a call to the STS for a security token (which is really a few service calls for policy and whatnot). At first, I thought that the way to avoid this was to reuse the channel and not the channel factory. This may work if the channel remains open; however, if it is closed, it will get disposed and subsequent calls will fail. I don't know if it is a bad practice in this case to keep the channel open continuously, but, as a rule, holding onto resources for longer than absolutely necessary is to be avoided. Working off this assumption, I sought to find an alternative that would avoid the extra calls to the STS without having to hold onto an open channel object.

A couple of colleagues told me that I needed to cache the SAML token outside of the channel and retrieve them from there by adding an interceptor to the WCF pipeline. They referred me to Cibrax's blog, where he describes the process. After reading that article and Eric Quist's post, I understood that hooking into WCF to replace the default SAML token provider with one that does caching requires the definition of three classes:

CacheClientCredentials - Plumbing

CacheClientCredentialsSecurityTokenManager - Plumbing

CacheSecurityTokenProvider - Actual meat

The code that Cibrax and Eric wrote is really good, and it provides an excellent starting point for doing this in Geneva; however, I needed to tweak their code a bit to get it to work with this new framework. Specifically, I had to make the following changes:

CacheClientCredentials

Must inherit from FederatedClientCredentials

Must override CloneCore and return a new CacheClientCredentials object

May provide a static method for configuring a channel factory object to use CacheClientCredentials (similarly to FederatedClientCredentials - in beta 1 at least)

// Only add the token to the cache if caching has been turned on in web/app.config.

if (CacheIssuedTokens)

{

TokenCacheHelper.AddToken(cacheKey, securityToken);

}

}

return securityToken;

}

privatestaticbool IsSecurityTokenExpired(SecurityToken serviceToken)

{

returnDateTime.UtcNow >= serviceToken.ValidTo.ToUniversalTime();

}

~CacheSecurityTokenProvider()

{

if (!disposed)

((IDisposable)this).Dispose();

}

voidIDisposable.Dispose()

{

innerProvider.Close();

disposed = true;

}

}

Conclusion

Now, having said all this and listing a bunch of code, let's step back for a moment. Caching of SAML tokens is a must to avoid extra round trips to the STS. Doing so requires a bunch of plumbing and everyone will absolutely require this functionality. So,why are we doing this? Geneva is a framework, and as such, IMO, it should be proving things like caching of SAML tokens for us. It has been said by the Geneva team that support for caching of tokens is a top priority, so hopefully the code above will be obsolete very soon.

In what's turning out to be the third in an unplanned series on Geneva-related terminology, I'll describe what a bootstrap token is. You are likely to come across this term as soon as you try to delegate identities from front-end services or Web sites to back-end systems. When a request containing a SAML token is handled by a front-end Web site that has been configured to use Geneva, the framework will convert that token into claims and put them in a cookie (or cookies), allowing them to be reused during the session. Similarly, when a request is presented with a SAML token to a front-end WCF service that uses Geneva FX, the framework will convent the SAML token into claims and put them in a Secure Context Token (SCT) to be used for the lifetime of the session. With these claims, the front-end Web app or service (RP) can make authorization decisions about whether or not the caller is allowed to perform an operation, see a particular page element, or whatever.

This is good enough until the front-end has to call a back-end system as the original user. To do this, the front-end needs to provide an STS with the original caller's token. Unlike the claims implementation initially provided with WCF, the Geneva Framework keeps the original SAML token around for just this purpose. When the front-end RP needs to retrieve a delegation token or ActAs token, as it is also called, it switches from the role of the RP and takes on that of the subject in the familiar WS-Trust communication triangle. When it does this, it needs to provide the STS with the token of the original caller in the RST sent to the STS. The original token that the front-end received is what is referred to as a bootstrap token because it is sent later to the STS to get an ActAs token when bootstrapping the delegation process. This process is shown in the following sketch.

Anyone who knows me or reads this blog regularly can tell you that I'm a totally sold on cloud computing. Recently, I wrote a little about MarkLogic, blogged about IaaS, and I got to thinking: MarkLogic should build an infrastructure service that runs in the Amazon's cloud, scales to Internet levels, and comes with a pay-per-use sales model.

MarkLogic's native XML database is proven, mature, and has been deployed to many companies with household names; however, they aren't cheap. They are competitively priced, but still prohibitively expensive for many companies. The product is licensed by CPU sockets (IIRC), and for a license to run on a one-CPU-socket machine is something like $10,000. That's a lot of zeros for small and medium businesses, especially when relational products that are trusted and better known like MySQL, PostgreSQL, and SQL Server are available at no cost or for pennies on the dollar compared to MarkLogic. Switching from a tried and true relational database to an unknown database product from a less known source sounds really risky to many purse-string holders. If MarkLogic provided its wares as a service that organizations could consume over the Internet on a pay-per-use basis, reluctant companies would be able to utilize MarkLogic's product in an actual program that goes to production, allowing naysayers to see that this alternative has a lot of benefits and that the risks are not as great as they fear.

As the software's creator, who better to create this infrastructure service? They have already used it to build a highly scalable and highly available system called MarkMail that is currently indexing and searching over 7,000 news groups. For example, they know the problems they had with caching and proxying caused by Squid which they used in their early release of the use group indexing service. They solved it, made it scale, and could do the same with a general purpose data storage service, I'm sure.

If you survey the landscape of infrastructure services running in the cloud today, you're choices are few. You have Amazon's SimpleDB, Microsoft's SQL Data Services (eventually), or Google's App Engine datastore. (There might be others, but I don't know of them.) These services all have different and proprietary interfaces. If MarkLogic were to create an alternative to these, its interface would be standard XQuery 1.0. I don't know about you, but I would find a standardized API more appealing from a business and development point of view. From the former perspective, it would assure me that I'm not writing code that is locking me into one vendor's service; from the later, it would allow me to use existing knowledge, libraries, and tools rather than forcing me to use what is provided by the vendors or forcing me to create my own.

Considering some of the recent partnerships that MarkLogic's competitors IBM and Oracle have made with Amazon, I wonder if MarkLogic can afford not to, at the very least, offer a cloud-compatible license and a pricing model that allows for elastic scalability. While this would be a great first step, I'm convinced that a full blown data storage service would be adopted by many companies trying to store large amounts of semi-structured content with Internet-sized demands. If MarkLogic does go after this market, running it in Amazon's cloud rather than someone else's would be beneficial to others running in Amazon's data centers because the transfer rate of data sent within their private network is free. If MarkLogic does not provide a data storage service that is powered by a native XML datastore, supports a standardized query interface, scales to Internet levels, and is highly available, redundant, and performant, then someone else should using an open source alternative. I would be very interested in such a service regardless of who provides it.

Microsoft announced at PDC '08 that .NET 4.0 would include an implementation of WS-Discovery. Conformance to this protocol would allow service consumers to locate providers dynamically at run-time. I'm sure I don't have to tell you that this capability in often needed when implementing large-scale connected systems. These new discovery capabilities provide two ways to locate services:

Using a known, centralized repository (what Microsoft and others are calling the "managed model")

Using ad-hoc discovery wherein services broadcast their arrival and departure from the network

The latter is restricted to a single subnet IINM and is analogous to DHCP.

Even with these improvements, I would advise against using WCF Discovery for the following reasons:

Discoverable services cannot be hosted in WAS.

There is no Windows Server role that can be added to provide a centralized repository for discovery.

It depends on .NET 4.0 which currently has no ship date (to my knowledge).

Because there is no server role, we will all end up building the equivalent of such a role. Then, Microsoft will come along with Windows Server vNext and, ta-da, it will have a repository role for run-time service discovery just like what happed with WF and Dublin: A bunch of us wrote complex workflow hosts and suffered through all the pain points just to have Microsoft come along afterwards with their own enterprise-caliber workflow host. Lastly, the dependence on .NET 4.0 means that your shipping date has to be really soft. Given all this, I can't see why anyone would seriously consider using WCF Discovery at this time unless they are in the service repository business, already have a .NET-based product that is shipping in 2011 (or so), and need to be augment it to conform to WS-Discovery.

What do you do today if you need discovery capabilities like those that will eventually be provided by WCF Discovery? (For goodness sake, don't even think about UDDI; that is for design-time discoverability not run-time! ) I would look to one of the established vendors in this space. I've heard from colleagues that Software AG's run-time repository is unusable, so axe them from your list. Based on my research and product evaluation, I would suggest checking out SOA Software or AmberPoint. From a cursory investigation that I did in H1 of 2008, I would lean toward AmberPoint. Their run-time repo seemed to be a real differentiator.

If I haven't persuaded you, I know that Microsoft is looking for customers to participate in their TAP program for WCF Discovery.

I was talking with Darrell Johnsrud the other day
about cloud computing, and he shared an insight about Platform as a Service (PaaS) that
he had that I think is spot on.He said
that the current services that the different cloud vendors are
providing are very generic and cater to a wide array of problem domains.If you think about Amazon's, Google's, and
Microsoft's services, they are very universal.Broadly speaking, you have data storage (SimpleDB, SQL Services), computational
capacity (EC2, GAE, Azure Compute Services),
storage (S3, GAE, and Azure Storage Service),
queuing (SQS), service bus,
workflow, access control (Microsoft's .NET
Services), and others.All of these types of
services are needed by many, many systems in various domains.The capabilities provided by these services are necessary for almost all enterprise caliber solutions.

That said, however,
the platform that you need to support the applications of specific domains are
a lot more specialized.Yes, TV stations
and financial institutions need data storage, compute capacity, and storage,
but they also need low-level services that are specific to their business
arena.For example, the platforms build
to support TV stations and financial institutions will each include a transfer
service.The platform that solves the
problems of broadcasters will need to include such a service that transfers digital files (e.g., movie clips, audio
files, teleprompter script documents, etc.).On the other hand, however, the transfer service included in any
platform built to meet the needs of banks will move money from one account to
another using various means (e.g., wire, ACH, etc.).

As cloud computing takes shape, I believe that companies will come along with specialized platform
services like these transfer services to support the applications of specific domains, scale to Internet levels, and have a pay-per-use business model.Because of this, I think that it is important
for practitioners to begin differentiating between these and the lower-level ones provided by cloud vendors. These base services are what constitutes Infrastructure as a Service
(IaaS) and the market-specific ones form for a PaaS offering.Using this distinction,
we can clearly say that services that cater to various domains are (in the
cloud computing context) infrastructure services and those that support a wide
array of applications tailored to meet the needs of a particular market are
platform services.This specialization,
Darrell said, and I agree, will naturally come about over time.