Authenticating Users from Passive IPs in Rich Client Apps – via ACS

It’s been a couple of years that we released the first samples showing how to take advantage of ACS from Windows Phone 7 applications; the iOS samples, released in the Summer, and the Windows8 Metro sample app last Fall demonstrated that the pattern applies to just any type of rich clients.

Although we explored the general idea at length, and provided many building blocks which enable you to put it in practice without knowing all that much about the gory details, apparently we never really provided a detailed description of what is going on. The general developer populace (rightfully) won’t care about the mechanics, but I am receiving a lot of details questions from some of the identirati, who might end up carrying this approach even further, to everybody’s advantage.

I am currently crammed in a small seat in cattle class, inclusive of crying babies and high-pitch barking dogs (yes, plural), on a Seattle-Paris flight that is bringing me to a 1-week vacation at home. It’s hardly the best situation to type a lengthy post, but if I don’t do it I know this will but me for the entire week and I don’t want it to distract me from the mega family reunion and the 20th anniversary party of my alma mater so my dear readers, wear confortable clothes and gather around, because in the next few hours (for the writer, at least) we are going to shove our hands deep into the guts of this pattern and shine our bright spotlight pretty much in every fold. Thank you Ping Identity for the noise-cancelling headset you got me during last Cloud Identity Summit, without which this post would not have been possible

The General Idea

Rich clients. How do they handle authentication? Let’s indulge our example bias and pick some notable case from the traditional world.

Take an email client, like Outlook: in the most general case, Outlook won’t play games and will simply ask you for your credentials. The same goes for SQL Management Studio, Live Writer, FileZilla, Messenger. That’s because your mail server, SQL engine, blog engine, FTP server and Live ID all have programmatic endpoints that accept raw credentials.

A slightly less visible case is the use of Office apps within the context of your domain, in which you are silently authenticated and authorized: however I assure that you are not using your midichlorians to submit that file share to your will. It’s simply that you are operating in one environment fully handled by your network software, where there is an obvious, implicit authority (the KDC) which is automatically consulted whenever you do anything that requires some privilege. There is an unbroken chain of authentication which starts when you sign in the workstation and unfolds all the way to your latest request, and it is all supported by programmatic authentication endpoints.

All this has worked pretty well for few decades, and still enjoys very wide application: however it does not cover all the authentication needs of today’s scenarios. Here there’s a short list of situations in which the approach falls short:

No endpoints for non-interactive authentication. Nowadays some of the most interesting user populations can be found among the users of social networks, web applications and similar. Those guys built a business on people navigating with their browsers to their pages, or on people navigating with their browsers to others’ pages which end up calling their API. Although the meteoric rise of the mobile app is shifting things around, the basic premises are sound. Those applications and web sites might provide endpoints accepting raw credentials, but more often than not they will expose their authentication capabilities via browser-based flows: that works well when you are integrating with them from a web application, as the uses would consume it from a browser. Things get less clear when the calling application is a rich client.:Manufacturing web requests and scraping the responses is technically feasible, but extremely brittle and above all a very likely a glaring violation of the terms and conditions of the target IP. My advice is, don’t even try it. There are other solutions.

Changing or hard requirements on authentication mechanics.When you drive authentication with raw credentials from a rich client, you bear the burden of making things happen: creating the credential gathering experience, dispatching the credentials t the IP according to the protocol of choice & interpret the response, and so on. If you just hurl passwords from A to B, or you take advantage of an underlying Kerberos infrastructure, that’s not that bad: but if you need to do more complicated stuff, things get tougher. Say that your industry regulations impose you to use, in addition to username and password, a one-time key sent to the user’s mobile phone": now you need to create the appropriate UI elements, drive the challenge/response experience, and adapt the protocol bits accordingly. What happens if in 1 month the law changes, and now your IP imposes you to use a hard token or a smart card as well? That means writing new infra code, and redeploy it to all your clients. Not the happiest place to be.

Consent experiences. More and more often, the authentication experience is compounded with some kind of consent experience: not only the user needs to authenticate with the IP, he or she also needs to decide what information and resources should be made available to the application triggering the authentication flow. Needless to say, each IP can have completely different needs here: for example, different resource types require different privileges. Icing on the cake, those things are as fluid as mercury and will change several times per year. Creating a general-purpose rich UI to accommodate for all that is perhaps not impossible, but would require a gargantuan all-up effort for which there’s simply no pressure for. Why bother, when a browser can easily render your latest consent experience no matter how often you change it?

Chaining authoritiesThis is subtler. Say that the service you want to call from your rich client trusts A; however your users do not have an account with A. They do have an account with B, and A trusts B. Great! The transitive closure of trust relationship will pull you out of trouble, right? B can act as federation provider, A as identity provider, and the user gets to access the service. Only, with rich clients it’s not that simple. In the passive case, you can take the user to a rise and redirect him/her as many times as needed, each leg rendering its own UI, gathering its own specific set of info using whatever protocol flavor it deems appropriate, without requiring changes in the browser or in the service. In the rich client you have to muscle through every leg: every call to an STS entails knowing exactly the protocol options they use, every new credential type must be accounted for in the client code: doing dynamic stuff is possible (some of the oldest readers know what I am talking about) but certainly non-trivial.

Ready for some good news? All of the above can be solved, or at least mitigated, by a bit of inventiveness. Rich clients render their UI with their own code, rather than feeding it to a browser: but if the browser works so well for the authentication scenarios depicted above, why not opening one just when needed for driving he authentication phase? The idea is sound, but as usual the devil is in the details. With some exceptions, the protocols used for browser based authentication aim at delivering a token (or equivalent) to one application on a server; if we want this impromptu browser trick to work, we need to find a way of “hijacking” the token toward the rich client app before it gets sent to the server.

The good news keep coming. ACS offers a way for you to use it from within a browser driven by a rich client, and faithfully deliver tokens to the calling client app while still maintaining the usual advantages it offers: outsourcing of trust relationships, support for many identity provider types, claims transformation rules, lightweight token formats suitable for REST calls, and so on.

If all you care is knowing that such method exists, and you are fine with using the samples we provide without tweaking them, you should stop reading now.

If you decide to push farther, here there’s what to expect. The way in which you use ACS from a browser in a rich client is based on a relatively minor tweak on how we handle WS-Federation and home realm discovery. Since I can’t assume that you are all familiar with the innards of WS-Federation (which you would, if you would have taken advantage of yesterday’s 50% off promotion from O’Reilly ) I am going to give you a refresher of the relevant details. Done then, I’ll explain how the method works in term of the delta in respect to traditional ws-fed.

An ACS & WS-Federation Refresher

The figure below depicts what append during a classic browser-based authentication flow taking advantage of ACS.

In order to keep things manageable I omitted the initial phase, in which an unauthenticated request to the web app results in a redirect to the home realm discovery page: I start the flow directly from there. Furthermore, instead of showing a proper HTML HRD page I use a JSON feed, assuming that the browser would somehow render it for the user so that its corresponding entries will be clickable. Hopefully things will get clearer once I get in the details.

Home Realm Discovery

Let’s say that you want to help the user to authenticate with a given web application, protected by ACS. ACS knows which identity providers the application is willing to accept users from, and knows how to integrate with those. How do you take advantage of that knowledge? You have two easy ways:

you can rely on one page that ACS automatically generates for you, which presents to the user a list of the acceptable identity providers. That’s super-easy, and that’s also what I show in the diagram above in step 1. However it is not exactly the absolute best in term of experience; ACS renders the bare minimum for the user to make a choice, and the visuals are – how should I put it? – a bit rough. And that’s by design.

Another alternative is to obtain the list of identity providers programmatically. ACS offers a special endpoint you can use to obtain a JSON feed of identity providers trusted for a given relying party, inclusive of all the coordinates required for engaging each of them in one authentication flow. Once you have that list, you can use that info to create whatever authentication experience and/or look&feel you deem appropriate.

The second alternative is actually pretty clever! Let’s take a deeper look. How do you obtain the JSON feed for a given RP? You just GET the following:

The first part is the resource itself, the IP feed; it follows the customary rule for constructing ACS endpoints, the namespace identifier (bold) followed by the ACS URL structure. The green part specifies that we want to integrate ACS and our app using WS-Federation. The last highlighted section identifies which specific RP (among the ones described in the target namespace) we want to deal with. What do we get back? The following:

Well, that’s clearly meant to be consumed by machines: however JSON is clear enough for us to take this guy apart and understand what’s there.

First of all, the structure: every IP gets a name (useful for presentation purposes), a login URL (more about that later), a rarely populated logout URL, the URL of one image (again useful for presentation purposes) and a list of email suffixes (longer conversation, however: useful if you know the email of your user and you want to use it to automatically pair him/her to the corresponding IP, instead of showing all the list).

The login URL is the most interesting property. Let’s take the first one:

This is the URL used to sign in using Live. The yellow part, together with various other hints (wreply, wctx, etc) suggests that the integration with Live is also based on WS-Federation. This is what is often referred to as a deep link: it contains all the info required to go to the IP and then get back to the ACS address that will take care of processing the incoming token and issue a new token for the application. The part highlighted green shows such endpoint. Note the wsfederation entry in the wctx context parameter, it will come in useful later. You don’t need to grok al the details here: suffice to say that if the user clicks on this link he’ll be transported in a flow where he will authenticate with live id, will be bounced back to ACS and will eventually receive the token needed to authenticate with the application. Al with a simple click on a link.

Now that’s a much longer one! I am sure that many of you will recognize the OpenID/attribute exchange syntax, which happens to be the redirect-based protocol that ACS uses to integrate with Yahoo. The value of the return_to parameter hints at how ACS processes the flow: once again, notice the wsfederation string; and once again, all it takes to authenticate is a simple click on the link.

Google also integrates with ACS via OpenID, hence we can skip it. How about Facebook, though?

Yet another integration protocol: this time it’s OAuth2. The flow leverages one Facebook app I created for the occasion, as it’s standard procedure with ACS. Another link type, same behavior: for the user, but also for the web app developer, it’s just a matter of following a link and eventually an ACS-issued token comes back.

Getting Back a Token

Alrighty, let’s say that the user clicks on one of the login URL links. In the diagram that’s step 2. In this case I am showing a generic IP1. As you know by now, the IP can use whatever protocol ACS and the IP agreed upon. In the diagram I am using WS-Federation, which is what you’d see if IP1 would be an ADFS2 or Live ID.

WS-Federation uses an interesting way of returning a token upon successful authentication. It basically sends back a form, containing the token and various ancillary parameters; it also sends a Javascript fragment that will auto-post that form to the requesting RP. That’s what happens in step 2 (IP to ACS) and 3 (ACS to the web application). Let’s take a closer look to the response returned by ACS:

Now THAT’s definitely not meant to be read by humans. I did warn you at the beginning of the post, didn’t I

Come on, it’s not that bad: let me be your Virgil here. As I said above, in WS-Federation tokens are returned to the target RP by sending back a form with autopost: and that’s exactly what we have here. Lines 3-27 contain a form, which features a number of input fields used to do things such as signaling to the RP the nature of the operation (line 4: we are signing in) and transmitting the actual bits of the requested token (line 5 onward).

Lines 28-30 contain the by-now-famous autoposter script. And that’s it! The browser will receive the above, sheepishly (as in passively) execute the script and POST the content as appropriate.
The RP will likely have interceptors like WIF which will recognize the POST for what it is (a signin message), find the token, mangle it as appropriate and authenticate (or not) the user. All as described in countless introductions to claims-based identity.

Congratulations, you now know much more about how ACS implements HRD and how WS-Fed works than you’ll ever actually need. Unless, of course, you want to understand in depth how you can pervert that flow into something that can help with rich clients. …not from a Jedi

The Javascriptnotify Flow

Let’s get back to the rich client problem. We already said that we can pop out a browser from our rich client when we need to – that is to say when we have to authenticate the user - and close it once we are done. The flow we have just examined seems almost what we need, both for what concerns the HRD (more about that later) and the authentication flow. The only thing that does not work here is the last step.
Whereas in the ws-fed flow the ultimate recipient of the token issuance process is the entity that requires it for authentication purposes, that is to say the web site, in the rich client case it is the client itself that should obtain the token and store it for later use (securing calls to a web service). It is a bit if in the WS-Federation case the token would stop at the browser, instead of being posted to the web site. Here, let me steal my own thunder: we make that happen by providing in ACS an endpoint which is almost WS-Federation, but in fact provides a mechanism for getting the token where/when we need it in a rich client flow. We call it javascriptnotify. Take a look at the diagram below.

That looks pretty similar to the other one, with some important differences:

there is a new actor, the rich client application. Although the app is active before and after the authentication phase, here we represent things only form the moment in which the browser pops out (identified in the diagram by the browser control entry, which can be embedded or popped out as dialog according to the style of the apps in the target platform) to the moment in which the authentication flow terminates

the diagram does not show the intended audience of the token, the web service. Here we focus just on the token acquisition rather than use (which comes later in the app flow), whereas in the passive case the two are inextricably entangled.

Here I did in step 1 the same HRD feed/page simplification I did above; I’ll get back to it at the end of the post. The URl we use for getting the feed, though, has an important difference:

The URI of the resource is the same, and the realm of the target service is obviously different; the interesting bit here is the protocol parameter, which now says “javascriptnotify” instead of “wsfederation”. Let’s see if that yields to differences in the actual feed:

The battery is starting to suffer, so I have to accelerate a bit. I am not extracting the Login URLs of the various IPs, but I highlighted the places where ws-federation has been substituted by javascriptnotify (facebook follows a different approach, more about is some other time: but you can see that the two redirect_uri are different).

The integration between the IP and ACS – step 2 - goes as usual, modulo a different value in some context parameter; I didn’t show it in details earlier, I won’t show it now. What is different, though, is that coming back from the IP there’s something in the URL which tells ACS that the token should not be issued via WS-Federation, but using Javascriptnotify. That will influence how ACS sends back a token, that is to say the return portion of leg 3. Earlier we got a form containing the token, and an autoposter script; let’s see what we get now.

I hope you’ll find in you to forgive the atrocious formatting I am using, hopefully that does not get in the way of making my point.

As you can see above, the response from ACS is completely different; it’s a script which substantially passes a string to whomever in the container implements a handler for the notify event. And the string contains.. surprise surprise, the token we requested. If our rich client provided a handler for the notify event, it will now receive the token bits for storage and future use. Mission accomplished: the browser control can now be closed and the rich native experience can resume, ready to securely invoke services with the newfound token.

The notifi-ed string also contains some parameter about the token itself (audience, validity interval, type, etc). Those parameters can come in handy to know what the token is good for, without the need for the client to actually parse and understand the token format (which can be stored and used as amorphous blob, nicely decoupling the client from future updates in the format or changes in policy).

A bit more on token formats. ACS allows you to define which token format should be used for which RP, regardless of the protocol. When obtaining tokens for REST services, you normally want to get tokens in a lightweight format (as you’ll likely need to use them in places – like the HTTP headers - where long tokens risk being clipped). In this example, in fact, I decided to use SWT tokens for my service urn: testservice. However ACS would have allowed me to send back a SAML token just as well. One more point in favor of keeping the client format-agnostic, given that the service might change policy at any moment.

That’s it for the flow! I might be biased, but IMHO it’s not that hard; in any case, I am glad that the vast majority of developers will never have to know things at this level of detail.

If you are interested in taking a look at code that handles the notify, I’d suggest getting the ACS2+WP7 lab and taking a look at Labs\ACS2andWP7\Source\Assets\SL.Phone.Federation\Controls\AccessControlServiceSignIn.xaml.cs, and specifically to SignInWebBrowserControl_ScriptNotify.Now that you know what it’s supposed to do, I am sure you’ll find it straightforward.

A Note About HRD on Rich Clients

Before closing and getting some sleep, here there’s a short digression on home realm discovery and rich clients.

At the beginning of the WS-Federation refresher I mentioned that web site developers have the option of relying on the precooked HRD page, provided by ACS for development purposes, or take the matter in their own hands and use the HRD JSON feed to acquire the IPs coordinates programmatically and integrate tem in their web site’s experience.

A rich client developer has even more options, which can be summarized in the following;

Provide a native HRD experience. This is what we demonstrate in all of our rich client +ACS samples such as the WP7 and the Windows8 Metro ones: we acquire the JSON IP feed and we use it to display the possible IPs using the style and controls afforded by the target platform. When the user makes a choice, we open a browser (or conceptual equivalents, let’s not get too fiscal here) and point it to the LoginUrl associated to the IP of choice.
There are many nice things to be said about this approach, the main one being that you have the chance of leveraging the strengths of your platform of choice. The other side of the coin is that you might need to write quite a lot of code. Also: if you are writing multiple versions of your client, targeting multiple platforms, you’ll have to rethink the code of the HRD experience just as many times.

Drive the HRD experience from the browser. Of course you can always decide to just follow the passive case more closely, and handle the HRD experience in the browser as well. There are multiple ways of doing so, from driving it with a page served from some hosted location to generating the HTML from the JSON IPs feed directly on the client side. The advantage is that the structure of your app can be significantly simplified, with all the authentication code nicely ring-fenced in the browser control sub-system. The down side is that you’ll likely not blend with the target platform as well as you could if you’d use its visual elements & primitives directly.

Well folks, I have a confession to make. Right now I am still writing from a plane, and there are still babies crying around, but this time it’s the Paris-Seattle: the vacation is over and I am coming back. I didn’t finish this post on my way in, the magic noise cancelling headset fought bravely but in the end the babies-dogs combined attack could not be contained. Believe it or not, I actually managed to enjoy my vacation without thinking about this: however as soon as I got on the return plane I ALT-TABbed my way to Live Writer (left open for the whole week) and finalized. Good, because I caught few bugs that eluded by stressed self of one week ago but were super-evident (“risplenda come un croco in polveroso prato” – from memory! No internet on transatlantic flights) to the rested self of time present.
I am not sure how generally useful this post is going to be. Once again, to be sure; this is NOT for the general-purpose developer. However I do know that this will provide some answers to very specific questions I got; and If you got this far, however, something tells me that “general-purpose” is not the right label for you As usual: if you have questions, write away!