Choosing Attribute Syntaxes

Most enterprise application developers and architects have a solid understanding of column data types in SQL repositories. However, LDAP syntaxes can be a bit mysterious at first. One of the most important aspects of LDAP schema design is choosing the appropriate attribute syntax for the data we wish to store. These pointers should help us to understand which attribute syntaxes to select based on the type of data we wish to store.

Throughout this section, we refer to attribute syntaxes by their OIDs and OM syntaxes, as this is how we define an attribute's syntax programmatically. Please refer back to Table 6.1 in Chapter 6 for a full listing of the available syntaxes and their LDAP OID and OM syntax values.

String Data

This is probably the most common data type we will want to store in LDAP. As we see in Table 7.2, there are a few options here.

Table 7.2. Various String Attribute Syntaxes

Syntax Name

LDAP Syntax

OM Syntax

Comment

String(Teletex)

2.5.5.4

20

Case insensitive for searching; Teletex characters only

String(Printable)

2.5.5.5

19

Case sensitive for searching; printable characters only

String(IA5)

2.5.5.5

22

Case sensitive for searching; IA5 string

String(Numeric)

2.5.5.6

18

Contains only digits; rarely used in Active Directory

String(Unicode)

2.5.5.12

64

Case insensitive for searching; contains any Unicode character

Our recommendation is to use 2.5.5.12, the Unicode string syntax, for general-purpose string usage. The Unicode string will translate nicely back and forth with the standard System.String data type in .NET, which is also Unicode. The data is also case insensitive for searching, which is really what we want in almost all cases. Very rarely is a case-sensitive search a good or useful thing, and more often than not it will simply bewilder those who are querying the data.

Date/Time Values

The obvious thing to do here is to use 2.5.5.11 (UTC Time or Generalized Time), with either OM syntax 23 or 24 for UTC or Generalized time values, respectively. The major advantage of this approach is that ADSI and System.DirectoryServices (SDS) will marshal the data values in and out as .NET System.DateTime. This is extremely convenient from a programming perspective. The LDAP time formats are also searchable with >= and <= filter types, so there is no disadvantage there. We generally recommend using UTC time values rather than values that contain specific time zones.

The other approach is the one that Microsoft uses in Windows for many date/time values in Active Directory, such as the accountExpires attribute. Because Windows uses 8-byte FILETIME structures internally for so many time values, it was natural to use this format in Active Directory as well. The primary disadvantage of this approach is that data marshaling is so messy in ADSI and SDS (see Chapter 6), so we do not recommend following Microsoft's lead here, unless there is a really compelling scenario that supports this. We also need code to interpret time values stored in this format. We cannot use a normal LDAP search utility to read these values and know what they mean by simply looking at them. This is not the case with the standard UTC and Generalized time formats, which are human readable in their native format.

Numeric Data

For integer data, we almost certainly want to use 2.5.5.9 (Integer or Enumeration) for 4-byte Int32 values and 2.5.5.16 (LargeInteger or Interval) for 8-byte Int64 values. For floating-point numbers, however, there is no obvious way to proceed. We might consider storing them either as strings in a string syntax attribute or as binary data. The difficulty here is that there is no good way to force the directory to enforce any syntax verification on our numeric data if we use a string or byte array for storage, so this will have to be managed by the application instead.

Binary Data

Octet string, 2.5.5.10, is the obvious choice here. SDS and ADSI marshal this data as raw byte arrays, so that provides the easiest programming model. Alternately, we might consider using a string attribute to store Base64 encoded data, but there is no compelling reason to choose this option.

Boolean Data

Boolean, 2.5.5.8, is the obvious choice here as well. The data is marshaled as System.Boolean, so that is by far the easiest programming model. The primary thing to watch out for is that the attribute will likely be nullable, so we must use caution in our conversions. LDAP Booleans are generally "trinary" rather than binary. The Nullable type in .NET 2.0 makes this easier to deal with, but it is still something to keep in mind.

Object Identifiers

This one is pretty obvious as well. If we need to store an OID, we should use the 2.5.5.2 syntax. This rarely comes up in typical enterprise data modeling exercises, but it may apply when doing work in cryptography or something similar that makes extensive use of OIDs.

Foreign Keys

Active Directory and ADAM support three different syntaxes for expressing the concept of a foreign key in LDAP: 2.5.5.1 or Object(DS-DN), 2.5.5.7 or Object(DN-Binary), and 2.5.5.14 or Object(DN-String). The attribute types are also known as DN-syntax attributes and they receive special treatment in the directory. LDAP allows us to express relationships between our objects using both the object hierarchy and foreign keys. We should not hesitate to use these features when our data model suggests them.

In choosing among the three options, we essentially have to decide whether we want to express a foreign key relationship or whether we need to express a compound key that associates a foreign object with another piece of data.

In most cases, we will want simple foreign keys and should use the basic Object(DS-DN) 2.5.5.1 syntax. The vast majority of these types of attributes in Active Directory and ADAM are built this way. They have the additional benefit of having very simple marshaling to System.String. As we saw in Chapter 6, the story with DN-With-Binary and DN-With-String is not as good.

Several additional best practices are associated specifically with link value attribute design. We cover these in the upcoming section, Modeling One-to-Many and Many-to-Many Relationships.

Other Data Types

When we have a data type that does not fit neatly into one of these buckets, then our best bet is most likely to use either a string or a binary syntax and find a reasonable way to encode our data into one of those attributes.