Most of the end user services which use LDAP for authentication/authorization purposes usually feature the following characteristics:

The LDAP queries performed target a single entry. The most common query filter is uid=<username> which can be easily indexed (equality index) and normally should only have one candidate entry (which means that the LDAP server can just use the index to find the entry in the database).

Only a small percentage (as small as 10-20%) of the total user population is active at any given moment. That means that your cache memory needs not be as large as your database but only as large as your active working set. The user entries in the working set are the ones that will get accessed a lot and increase your cache hit ratio while the rest of the entries might never get accessed or be asked for very infrequently so maintaining a cache memory large enough to keep them is just a waste of memory. For example in the Greek School Network we ‘ve got around 200,000 ldap entries but in a single day only 40,000-50,000 entries will be accessed. That’s 20-25% percent which means that you can get great performance by tuning entry/directory cache to hold 40-50% of the database. In our case we get 98%(!) entry cache hit ratio with an entry cache size of 60,000 entries.

Although the above are the usual case, there are a few services that involve much heavier LDAP queries. One common example is mailing lists. We ‘ve got 70,000 teachers so an LDAP query to create a mailing list holding all their emails will hit the All-ID’s threshold (thus will not use an index) and will need to look through the whole entry database. Imagine having to run the same type of query for various other user categories (students, user’s belonging to specific organizational units and so on). The load on the LDAP server starts getting bigger as well as the total execution time of all queries.

In our 200K user database the id2entry file is 1,3GB and consequently scanning all the database from disk can take quite some time. That translates to roughly 6KB/entry. If you also take into account the indexes total size (500MB) and a minimum entry cache of around 60,000 entries (~900MB) it’s obvious that you need at least 4GB to keep database on memory and still be able to handle other LDAP queries and return processed entries quickly.

Since in this case you ‘ll hit the All ID’s threshold no matter how you run the LDAP query the only alternative is to try and find a way to lower the database size. The way to do that is Fractional Replication. Fractional Replication allows you to setup a replica that will only request a specific set of attributes for all entries. You obviously have to be a bit careful to always request all attributes required by the objectclasses on your LDAP entries but other than that you can keep the attribute set to the bare minimum to have your heavy LDAP queries working. For the mailing list example you only need to keep the attributes used for entry selection (edupersonaffiliation, edupersonorgunitdn), the mail attributes (mail, mailalternateaddress) and drop the rest. That way you can manage to lower your database size to a fraction of the original and be able to keep it in memory for all required queries lowering your query processing time dramatically.

Fractional Replication is a feature provided by all modern LDAP servers like OpenLDAP and Sun Java DS. Sun DS also provides Class of Service which can be used to dynamically create attribute values and lower your original database size.

We ‘ll be moving to OpenLDAP 2.4 shortly at ntua as our primary LDAP server. I wanted to post a few details on my testing so here goes:
In general the server is quite stable, fast and reliable. The only thing an administrator should really keep an eye on is making sure that the database files have the correct ownership/permissions after running database administration tools (slapadd, slapindex etc).

Features

OpenLDAP has a really nice set of features which can prove quite useful like:

Per user limits. The administrator can set time and size limits on a per user or user group basis apart from setting it globally. One nice application is to limit anonymous access to only a few entries per query while allowing much broader limits for authenticated users. In my case I found that you have to set both soft and hard limits for things to work correctly.

MemerOf Overlay. Maintains a memberOf attribute in users which are members of static groups.

Dynamic Group Overlay. Dynamically creates the member attribute for dynamic groups based on the memberURL attribute value. Since the memberURL is evaluated on every group entry access care should be taken so that only the proper users access the group entries.

Constraint Overlay. A very handy overlay which allows the administrator to set various constraints on attribute values. Examples are count constraint (for instance on the userpassword attribute), size constraint (jpegPhoto attribute), regular expression constraint (mail attribute) and even value list constraint (through an LDAP URI) which can be very handy for attributes with a specific value set (like edupersonaffiliation). Make sure you use 2.4.13 though since value list constraint will crash the server in earlier versions.

Monitor Backend in order to monitor directly through LDAP server status, parameters and operations.

Dynamic Configuration which provides an LDAP view of all the server configuration thus allowing the administrator to dynamically change most of the server configuration. I would not recommend only using dynamic config (although it is possible) since it’s a bit cryptic and hard to administer. What i did is to enable it on top of the regular text slapd.conf in order to be able to dynamically change a few parameters, especially the database read-only feature (which can be used to perform online maintenance tasks like indexing or backing up data).

We ‘ve been using nuSOAP as a PHP web services framework for quite some time. It’s just a couple of PHP files meaning that you only need to include it in your php code, it’s easy to code and elegant. You just register functions and the framework takes care of creating WSDL (through a ?wsdl binding in your php web service pages) and all the SOAP communication with minimal effort. The problem is that it’s not maintained anymore and thus there’s no real support for the WS-*specification stack. We ‘re particularly interested in the WS-Security in our case. WSO2 provides a PHP framework that provides all that (based on the Apache Axis2/C code). The API is quite easy to understand, supports using REST style calls and consuming WSDL. The WSDL mode is the easiest to use for writing both the client and the server but requires having the WSDL file ready.

I ‘ve never given too much attention to LDAP Proxy Authorization till recently when a colleague brought it up. It’s actually a very neat way to perform operations on an LDAP server as a normal user without requiring knowledge of the user’s credentials. As a result you don’t need to setup an all powerful account but you can actually perform actions using a target user’s actual identity, simplifying access control policy on the LDAP server.

The above can come very handy in Single Sign On cases. Imagine a web site which uses Shibboleth to authenticate users (and thus has no knowledge of their password) but also has to perform actions on the user accounts stored on an LDAP server, either directly or by using a web service interface. Since the web site does not have the user’s password, it cannot perform an LDAP Bind with the user’s credentials. What it can do though is to bind as a special user which has proxy authorization privileges, SASL authorize to the user’s DN and perform the corresponding actions.

More information on Proxy Authorization and how to set it up in OpenLDAP can be found in the Administrator’s Guide ‘Using SASL’ chapter. One nice feature in OpenLDAP is that you can limit the accounts to which a user can authorize to to a specific user set defined either by an LDAP URL or a DN regular expression.

After many months of straggling with hardware stability issues of our new directory service infrastructure we managed to solve them and tried to actually move to using it. What we found out was that our master servers (almost identical with the read-only replicas except for Fibre Channel capabilities and disk controller on-card instead of on-board) had terrible performance compared to the read-only replicas. Talk about taking a day(!!) to create some new sub-string indexes on a 200,000 entries database.

We needed to find a way to consistently measure i/o performance on top of the (solaris) OS (and not just a disk read benchmark) so that we could open a case to our server manufacturer. After some searching we came across PostMark, an i/o benchmarking utility originally from NetApp. It seems that the software is not maintained anymore but you can find the source code on Debian. It’s only one tiny .c file so you just need to run ‘gcc -o postmark postmark.c’ to get things going.

This is the configuration i used in order to simulate a directory server instance. In general, create 20 files ranging from 5 – 30 MB’s each (common index file size), only run reads and appends on them (with a 4/1 ratio) and no creates/deletes:

set location <your location directory>

set size 5000000 30000000

set number 20

set bias read 2

set bias create -1

The results were a bit… terrifying: Our read-only servers were actually three times faster (while having same disks and almost the same controller). It would be a nice idea to always run Postmark on your servers to see what’s happening and how different servers can handle the same i/o load.

We have a 6.X setup running for quite a while (but not servicing requests yet) which will replace the current 5.X setup. In order to keep data current we have the primary 6.X master acting as a consumer of the 5.X master. We ‘ve run into the same problem many times and it seems that 5.X and 6.X act differently when adding an entry with empty RDN (something like uid=, ou=people, dc=<domain>). 5.X will allow the entry to be added while 6.X will reject it. The end result is that every time an entry like that is added (mostly by mistake) in the 5.X infrastructure, replication to the 6.X servers is halted and we need to initialize all of them from the start.

Seems strange that the servers exhibit such a different behaviour. If there’s a way to make them behave… properly i ‘d be glad to learn about it.

FreeRADIUS 2.0 has been released after a long and productive development cycle. It’s much more scalable, fast and simple while providing even more powerful features like a policy language, virtual hosting and IPv6 support. More information available on the freeradius website.

We ‘ve setup a DS6.X farm for the Greek School Network. It includes two datacenters (one for north and one for south Greece), each with one write master and two read-only replicas. The write masters are in a multi-master topology while the read-only replicas all get updated from both masters. The idea is to be able to lose up to one data-center without complete ldap service failure. We ‘ve been playing with the servers for about a month now and my impressions so far are:

The web console is far superior to the previous Java console. Much faster, easily accessible by just a web browser and nicely setup.

The installation is much harder now and requires far too many commands to get things going. Instead of just a zip with an installer you now have to install the directory server, play with cacao and dscc (directory server control center), add the console in the application server and learn a lot of <whatever>adm commands. Also we found the installation guide to be quite lacking and a bit unclear in a few issues.

Security should be tighter with the new server since you now have to worry about a dozen ports instead of just the ldap(s) and console ports. You have a few ports for cacao two ports (http/https) for the console and the usual directory server ports

The documentation quality is a bit degraded compared to previous products. Seems like documentation writing is starting to be outsourced to India as well 🙂 I got the impression that the 5.X documentation was written by the engineers themselves while the 6.X was written from an outside partner.

Until 6.1 we ‘ve faced quite a lot of stability issues (the servers run Solaris on 64-bit x86). The ldap server would crash for no reason and sometimes replication would fail and replicas would have to be reinitialized. We ‘ve installed 6.2 today and so far so good but i can say i ‘m rather disappointed with the software quality compared to the one i had gotten used to with the 5.X versions.

We haven’t tried to do any stress testing yet to see how the servers behave from a performance point. Online Replica initialization (over a LAN) on the other hand can reach speeds of at least 500 entries/sec (with 256MB import cache) which are quite impressive. I ‘ll try and update this post as things move on.

It was the first conference that I almost forgot about using the Internet. Really excellent presentations and top participants. Also, having your room just a few floors over the the conference rooms was a nice change compared to the usual TERENA conference setting (conference held in a university with the hotel usually being over an hour away).

Day one

The conference started with a nice overview of the standards status by Kurt Zeilenga, followed by a talk from Ludovic Poitu on the merits of the upcoming OpenDS. Seems like it’s getting to a pretty mature status and the figures are impressive: 10 times faster than Sun One and they still have not worked on optimizations. From the looks of it, 2008 will be OpenDS year since 1.0 will be released probably before the end of 2007.

Java based LDAP servers got a lot of attention in this conference. Alex Karasulu described Apache view on LDAP roadmap as well as Apache Directory. They envision Directory Services resembling RDBMS offering Triggers, Stored Procedures and views (though the later could be implemented with a strong proxying interface like the one available in OpenLDAP). Ersin Er gave a more detailed presentation on the actual implementation of Stored Procedures in Apache DS. The idea is to offer an API and the ability to write Java code to implement operations while triggers will actually have a stored procedure as the scheduled action. Personally, i feel a little nervous about having code executed inside the server context. Although triggers might be a nice (and less heavy) alternative to things like persistent searches.

Howard Chu, chief architect of OpenLDAP and employee at Symas gave a nice presentation on the status of version 2.4. Benchmark numbers are very, very impressive (150 million entries, 4800 writes/second 32,000 queries/second, only 6 hours load time) while the cn=config and dynamic configuration/loading of everything makes remote configuration a reality and restarts a thing of the past. If only there was a strong configuration GUI for things like configuring Syncrepl. N-way multimaster replication is also now available making OpenLDAP an equal competitor to commercial offerings.

Giovanni Baruzzi gave an enlightening presentation on how to properly design an LDAP Directory Information Tree (DIT). The main idea is: ‘Keep the tree as flat as possible, as deep as needed’. Groups implementation were also discussed in detail: Static groups can work great as long as they are under 80,000 members (at which case an update can take more than 5 minutes) while memberOf is very flexible but poses security risks (write access to the memberOf attribute means that an administrator can add a user to any of the available groups). I wasn’t able to attend the presentation on Apache Directory Studio but i downloaded it later and played with it. Very impressive and long awaited set of tools. It includes a powerful directory browser, an entry/schema editor as well as some Apache Directory specific tools for ACI and configuration editing.

Hilla Reynolds from far away Australia presented an in-house X.500/LDAP infrastructure with advanced features including geographical distribution of data access, concurrent replication (though i believe this is something difficult to achieve without enlarging the directory update time) and chained queries in order to return combined results from multiple sources.

Day two

Second day started with Steven Legg presenting LDAP and XML integration process. Nice work though a bit technical and hard to follow. I ‘d like to see where things will lead, although at this point the only actual implementation is by him. Next were two presentations from Sun employees on LDAP Proxies/Virtual Directories and Scaling Directories.

Following was … my presentation 🙂 One excellent and lively presentation from Felix Gaehtgens on how to write efficient LDAP applications followed. Keynotes were to keep connections open, parallelize operations through either multiple threaded connections or by using asynchronous reads (though the later involves more effort from the application writer), not using ‘Directory Manager’ to perform all operations and making use of the ProxyAuth mechanism if possible. I am happy that FreeRADIUS already uses most of the above directives in the LDAP module. Lastly, the conference was closed with a presentation from Volker Lendecke on lessons learned from Samba’s LDAP backends. The general conclusion was that various libc functions were broken and samba had to reimplement them correctly.

Got my paper included in the 1st LDAP Conference which will take place in Cologne, Germany between 6-7 September. Seems that all of the right people will be there including Kurt Zeilegna (OpenLDAP founder), Howarch Chu (SYMAS – OpenLDAP), Alex Karasulu (ApacheDS), Ludovic Poitou (Sun, OpenDS). Usually i can only find a few presentations in a conference that i feel i must attend. In this case i cannot find a single presentation not worth it’s time. Seems it will be two very busy days.