Patching PicketLink to support multiple LDAP stores

The PicketLink framework[1] provides identity management (IDM)[2] to applications based on different identity providers. PicketLink offers support for a number of different identity store back-ends like LDAP or RDBMSes.

We are successfully using PicketLink in several internal and external applications and it is also a foundation for many other frameworks like seam-security[3] or the GateIn[4] portal server.

The Problem

One of our customer applications is based on a customized version of GateIn 3.1.0 and uses PicketLink internally to load users from an LDAP. That application is facing an international rollout which requires it to talk to an additional LDAP store. However, this is a feature which is not yet fully supported in the version of PicketLink (1.1.5CR01) used in GateIn 3.1.0. The application is depending on a few quirks of GateIn 3.1.0 and our customizations to that. Therefore an update to a newer version of PicketLink or even GateIn is out of the question. The risk of braking changes or worse, subtle incompatibilities is too high.

To solve that problem, we explored the option to enhance PicketLink 1.1.5 CR01 to support multiple LDAP stores and use that patched version in GateIn. We successfully managed to create a fork of 1.1.5 that sequentially queries multiple LDAP servers. We made the modification available at GitHub. This blogpost describes the changes we made to PicketLink and their motivations.

PicketLink

The fact that PicketLink does not support multiple LDAP stores was not clear from the beginning. The configuration file (usually picketlin-idm-config.xml) happily accepts an arbitrary number of LDAP stores. The following listing shows a snippet from a valid configuration file with multiple stores defined:

Yet, there is a caveat. Only the last configured identity-store is used by PicketLink. So our first task was to understand where and why the other configured stores get lost.

PicketLink has an ObjectTypeIdentifier that tells it, what kind of object it is dealing with. This ObjectTypeIdentifier is not much more than a simple String. For users, it is usually configured to be ‘USER’ but this can be changed in the configuration file. During the bootstrap of PicketLink, it creates a HashMap from these ObjetTypeIdentifiers to map to the store that is configured for that specific type. And of course, if two stores define the same ‘identity-object-type’, they fall into the same bucket of the map. Every subsequent store overwriting the previous one.

The naive approach of just defining two different identity-object-types, one per store does not work. When PicketLink queries a user, it looks for the type of users and only looks in the identity store, that matches the ‘UserType’. The second store with a different ‘identity-object-type’ is never queried.

But this is exactly the entry point we decided to expand upon. The class PersistenceManagerImpl has a number of methods that eventually query an identity store. Three of those methods are actually used by our application. The query for a single user, the query for a list of users and the query for the count of users (instead of returning the list, the last call only returns the resultset size).

We modified each of those methods. Instead of just querying the store that matches the configured ‘ObjectType’ for user, the methods now query all configured identity-stores sequentially, merging the results. The map containing the identity-stores is easily available at runtime. The modfications are simple and straight forward.

One optimization, we made was to shortcut subsequent queries for users if we already have found a user. This implies the assumption, that a user is unique throughout all LDAP stores – in our usecase, this assumption is sound.

The following listing shows as an example the changes we made to the ‘findUser’ method. The ‘identityStoreMappings’ contain the configured ObjectTypeIdentifier as keys. We iterate over those keys and create new SimpleIdentityObjectTypes from each key. PicketLink will then choose the store to query based on the ObjectType and therefore query a different store each time. Finally, we collect and accumulate the results.

This works fine for querying users. A first victory! But not yet the final result. Although PicketLink can now query for users, it is still unable to fetch the attributes, like names or mail-adresses from those users. The querying for attributes happens in another class: FallbackIdentityStoreRepository. The method ‘getAttributes(IdentityStoreInvocationContext invocationContext, IdentityObject identity)’ is responsible for that. We extended it in a similar fashion. First, we let PicketLink decide, which store to search for attributes first. If the resultset is not empty, we obvioulsy have found the user and can proceed as normal. This query can also be executed against a ‘Hibernate Store’ or any other store. If we were unable to find attributes that way, we iterate over all configured ‘identity-store-mappings’ again. This time also making sure, that we do not query the store that was used in the previous query again. If the user is present in any of the stores, we query that store for the attributes, hopefully finding them and proceeding as normal.

Modified FallbackIdentityStoreRepository to get attributes from multiple LDAP store:

With those changes in place, we are now able to query our users from an arbitrary number of identity stores and successfully fetch the attributes to those users. Our testcases are green and the application is behaving as expected with those modifications in place.

Bootstrap

There is only one minor annoyance left. During the bootstrap of GateIn, we import a number of users into a primary hibernate identity-store. The bootstrap first checks whether those users already exist and insert them if thy can’t be found. During that bootstrap, PicketLink appears to be in an imcompletely configured state. At that time, getIdentityStoreMappings() returns an empty map. Since we iterate over the keys of that map, our modifications aren’t querying anything and users, that already exist won’t be found. This results in an exception when the bootstrap tries to create them.

Those exceptions do not interfere with the application but they litter the server log. To avoid this strange edge-case of an incompletely configured PicketLink, we perform a check to see if the configured store-mappings are empty. When they are empty, we add the user-object-type to the keyset before iterating over it. Therefore we always query at least the store that is configured as the default store for users.

findUser result caching

One minor remark: During our debugging, we discovered a caching problem in PicketLink. GateIn first queries PicketLink for all users matching a certain criteria. And later in the rendering of the GUI, it performs individual queries for each user again. Therefore it makes a lot of sense to cache those users. However, The caching strategy of PicketLink caches complete searches, only. A second query for a list of users with the same criterias should result in a cache hit. However, individual queries for those users are always cache-misses. We decided to change that behaviour. The findUsers Method puts not only the ‘Result-Collection’ in the cache. It now also caches each individual user. Therefore all subsequent requests to findUser for each individual user can be answered directly from the cache. This dramatically reduces the number of LDAP queries performed by PicketLink. And since each findUser query now has to potentially go to every configured LDAP store, this can be quite significant.

Tuning performance – maximum LDAP results

According to the LDAP spec, each LDAP request returns maximum pages size of 1000 entries. After extending PicketLink to support multiple LDAPs, we were forced to potentially handle 1000 * n entries (where n is the number of configured LDAP Stores). As PicketLink doesn’t provide the capability to limit the result set out of the box, we also added this behaviour. (BTW, this is implemented in PicketLink since version 1.3.0.Final)

The PicketLink configuration file accepts an arbitrary number of name-value options. We decided to use the same option that is also used in later versions of PicketLink. The following listing shows the required configuration to set the result set size for an LDAP Store to 10 (You might want to set that number higher in real scenarios)

configure for each LDAP strore, picketlink-idm-config.xml:

maxSearchResults
10

The next step is to make PicketLink aware of the new option. We tried to keep our changes as local and as minimal as possible to keep the risk of unforeseen side-efects as small as possible. Instead of adding a new pair of getter/setter to the the ‘LDAPIdentityStoreConfiguration’, we directly access the options. The options from the configuration file are parsed into a hash map that is kept around and accessible at runtime through the LDAPIdentityStoreConfiguration and its MetaData. We decided to create a default value and lazily initialize a new instance variable in LDAPIdentityObjectImpl to control the result set size. Based on that number, we create a ‘javax.naming.ldap.PagedResultsControl’ object and add it to the existing LDAP request controls effectively limiting the result size. The following listing shows those changes:

Deployment

With all those issues sorted out, we had a working PicketLink implementation with only a few minor changes to the base version we started from. All our requirements are fulfilled and the required patches are clear and minimal.

To maintain our changes and to contribute our modifications back to the community, we decided to use GitHub. We started with a new GitHub project and checked in the unmodified PicketLink sources in version 1.1.5.CR01. We then formulated our own requirements in the form of issues against that github repository. We tracked and documented our progress by commiting our progress to github and updating the relevant issues.

We decided against a true fork of PicketLink in the form of a new, separate maven artefact. Since we do not build GateIn and PicketLink is integrated within GateIn, the benefit of such a fork would be minimal. Instead, we manually build our version of PicketLink and replace the jars inside GateIn. We already had scripts in place to create and configure GateIn for our different environments. Those scripts now also replace PicketLink in GateIn.

Conclusion

In this post we described required steps to extend PicketLink to support multiple LDAP stores at the same time. We described how and why we changed PicketLink.

An overview of our changes:

Support multiple usertype mappings that each map to different LDAP identity stores.

Iterate over the userTypeMappings, that have identity-stores instead of using the single one user-type mapping

One thought on “Patching PicketLink to support multiple LDAP stores”

The LDAP specification does not refer to limiting users to 1000 entries per search (or simple paged search). You might be referring to the 1000 entry size limit imposed by active directory.

It is bad form, and a poor practice, for a client to assume knowledge of the server vendor or server configurations( of which size limit is one configuration parameter).

Many LDAP SDKs impose a size limit of 0, which means return all entries that match the search parameters to the client. Since this a setting in the API, it is called a client-requested limit. The client-requested limit cannot override the server-imposed limit.