Have you tried searching for anything about SQL Server User Defined Type (UDT) in your favorite developer portal? More often than not, you’ve ended finding something that ultimately culminates to debates about the author’s choice of model or whether UDT does have a place in software development at all. Argumentations like these are healthy for experienced developers, but for a beginner, they can be frustrating. I’ve been in those shoes before, and being an open-minded person, I did not care about what other people said, because they would just distract me from my focus on learning something that has the greatest potential of changing the way we develop database applications.

This article provides exactly those things I craved for when I was still writing my first UDT. It discusses UDT from the technical perspective, by taking you step by step in creating one. Along the way, you will discover quirks, inconsistencies, obscure settings, and workarounds. Different ways of implementing your UDT in terms of settings and .NET types are also discussed and pitted against each other, just to find out if common conceptions about them indeed translate to real-world behaviors.

A less conventional approach is employed for this article, by focusing primarily on the concepts that influence the structure and behaviors of UDT rather than the codes that implement them. This makes the text more verbose than your typical technical article, so people with short attention deficit, you’ve been warned. As for those who are happy to know the causes of things, read on!

A User Defined Type (UDT) is a SQL Server data type created using your favorite .NET language. It is one of the artifacts you can create with SQL Server CLR Integration (SQL-CLR for short). Let’s just say it’s the closest thing SQL Server can get to object-based programming.

In a blatant gesture that would probably earn ire to the purists, we’re going to model an entity which is, of course, not a scalar – the vector. A vector is a quantity composed of a magnitude and a direction. We confine our discussion to the two dimensional vector which can be represented in the Cartesian coordinate system using two points, with one being understood as the origin (0,0). Our vector is then isomorphic to the polar coordinate and thus, can also be used to represent the said entity. The unit for direction is radian, for easy manipulation in .NET and T-SQL. Examples of vectors are velocity, acceleration, force, and magnetic fields. I won’t elaborate much about vector anymore, because the Internet has lots of good materials about it. For those who need some refresher on the model and other concepts applied here, these are some good resources:

Our vector is a simplified model of its real world counterpart. It is small, and the implementations of the members are naïve. It’s not industrial strength; having only enough members to illustrate the different capabilities of a UDT. We opted for a simplified model so we can concentrate on the main objective of the article, which is just to show you how to create a UDT, and not how to choose the right entity to model, let alone perfect it.

A UDT can be implemented using a struct (structure in VB.NET) or a class. The struct is recommended because it is stored in the stack, and is generally faster. Being a set-based language, SQL can rip this benefit in transactions involving very high numbers of rows. It also requires less coding than the class counterpart. We’ll know later with our informal tests if there is really truth about all these theoretical claims.

The codes in this article uses .NET 2.0, and runs against SQL Server 2005 and 2008. I refrained from using features of the later versions of the .NET Framework as much as possible to maintain backwards compatibility. The changes made for UDT in SQL Server 2008 is so subtle and there's a dedicated small section on that at the latter part of the article.

A UDT, like any other artifact in SQL-CLR, is integrated to SQL Server via a .NET assembly. To do this, you should create a class library project. The only relevant assemblies for the project are System, System.Data, and System.Xml, so you can delete the rest that Visual Studio imports by default. The initial implementation of our model is shown below:

To turn our struct into a UDT, we need to adorn it with the SqlUserDefinedTypeAttribute found in the Microsoft.SqlServer.Server namespace. This tells the compiler that our type can be integrated to SQL Server as a UDT. It has one required parameter which specifies the kind of serialization format the UDT is using. Specify Format.Native for now. We’ll elaborate on this later.

C#:

We got our first taste of attributes in UDT programming, and just a heads-up; brace yourself for the coming attributes galore! UDT involves ample dosage of declarative programming to define some of its behaviors in SQL Server.

Accompanying the first ever adornment are four members that our struct should implement in order to qualify as a full-fledged UDT. We are going to add them one by one, as we explore the concepts of database and SQL that necessitate them.

Before you can use your UDT, you must initialize it by assigning values to its fields. This is not as straightforward as a native SQL type, because SQL does not have the concepts of constructor and property initializer which let you separate the values for each field. The only way to achieve value assignment in UDT while sticking to the standard SQL syntax is by putting all the values in one big chunk of string, like this:

Fig. 3 – UDT SQL initialization

What’s happening behind the scenes is SQL Server calling a factory function Parse which accepts a string argument representing the input in SQL. This is responsible for splitting the string input, converting them to the desired types, and ultimately, assigning them to the fields. Of course, you’re the one who should provide the logic for that. You must also provide a delimiting character to facilitate easy splitting. The bulk of the code deals with format validations, and the more fields you have, the more tedious your work would be. If you want to showcase your RegEx prowess, this is the right place!

If you’ve read so many things about validation being one of the most important benefits of UDT, I’m sorry this article disappoints you. For me, it’s enough that you know where to put these validations. The logic is entirely up to you. So, for our UDT, let’s just assume that every input is correct.

The SqlString type of the parameter is mandatory. It allows the use of the SQL keyword NULL, which is actually not a string, but a state that your UDT can assume.

Fig. 5 - Assigning null

SqlString is just one of the many wrappers for .NET types located in the System.Data.SqlTypes namespace. Its main purpose is to provide the NULL state for the .NET data types; making them isomorphic to their SQL Server counterparts. For example, SqlDouble is isomorphic with the SQL Server double data type. These wrapper types can also be used in UDT, but we opted for the native types for our fields because the model requires it.

The null state we’re talking here is relevant only to SQL, and is different from the null in .NET. We’ll talk more about this when we tackle the other required members.

The UDT should be able to display itself as a distinct entity, in a certain format, using all the values of its fields. You can specify this format in the ToString function. (Yes, this is the ubiquitous member of System.Object that every type should override.) Its implementation is a no brainer. Just make sure that you adhere to good UDT design by ensuring that its return value is readily consumable by Parse, just like this:

Fig. 6 - Using ToString for initialization

This is what our vector achieves with its naïve implementation of ToString:

C#:

In the initialization section, I mentioned that our UDT should be capable of assuming an unknown state called null. This is different from our common notion of null in conventional programming, which is a reference to an empty memory. The null state and its logic are implemented by the properties Null and IsNull, respectively. If not properly implemented, it’s possible that SQL Server interprets your UDT as null when, in fact, it’s not. This is because null is actually an instance of your UDT which is understood by SQL Server as unknown. SQL Server uses the static read-only property Null to get this instance. (From this point on, I will refer to the UDT null as SQL-null, while the invalid-memory reference in conventional programming as simply null.)

If both Null and Parse return a valid instance of our UDT, how would SQL Server know which is SQL-null? It simply checks the read-only property IsNull. This member is an implicit implementation of the System.Data.SqlTypes.INullable interface. This interface implementation gives our UDT the SQL-null state, just like the SqlString parameter of Parse.

There are two approaches to implementing the SQL-nullability logic of a UDT. The first approach evaluates the values of your fields, while the second uses a simple flag field. If SQL-null is desired for the first approach, the values of the fields should be something that are not acceptable in the context of the model. Let’s say you have a UDT that has a field mass. It’s universally accepted that mass is never negative. In the UDT, a negative value for this field can convey a SQL-null:

C#:

double _mass;
staticpublic SimpleUdt Null
{
get
{
SimpleUdt u = new SimpleUdt();
u._mass = -1.0// This is the flag
return v;
}
}
publicbool IsNull
{
get
{
// If this instance is from Null property,
// the next line must be true
return (_mass < 0.0;)
}
}

Fig. 8 - SQL-nullability using member values

Personally, I prefer the second approach because it’s succinct and promotes self-documenting codes. These could also be the reasons why it is the default implementation in Visual Studio. In our UDT, that is a simple flag before we return the value in the Null property. Below are the SQL-nullability and null implementations of our UDT.

The UDT object is different from your usual .NET object because it is saved and retrieved directly in the disk. You don’t have any other layers sitting between your UDT and the disk which do the saving and retrieval for you. These processes of saving and retrieving objects are called binary serialization and binary deserialization, respectively. The rest of the text may refer to both as simply serialization.

Implementation of serialization in UDT is implicit if the types of your fields are readily serializable. Such types are said to be blittable in programming parlance. They don’t require special handling in .NET because their representation is the same as that in unmanaged code. No additional code is required, and all we have to do is tell SQL Server that it can handle the serialization. We already did this in Figure 2. Our UDT satisfies all the conditions for native serialization, and hence it’s now usable. It should have the following members by now:

Fig. 10– Vector UDT basic members

If there is at least one field that is not blittable, then we must specify Format.UserDefined as the serialization format and implement the serialization manually. We do the latter by providing the implementation of the Microsoft.SqlServer.Server.IBinarySerialize interface. This interface has two members, aptly named Read and Write. You use the BinaryWriter argument to persist the values of your fields, while the BinaryReader to retrieve and assign them back. The order of your writing should be the same as your reading. The implementation ranges from the most trivial read/write to the uber-complex byte manipulations. The latter is beyond the scope of this article.

Suppose we decided to have the highest accuracy for our magnitude. This is the right job for System.Decimal, a type that SQL Server doesn’t know how to serialize. The manual binary serialization is still straightforward:

' These are the fields
' we need to serialize/deserialize
Private _magnitude AsDecimalPrivate _direction AsSinglePrivate _isNull AsBooleanPublicSub Write(ByVal w As System.IO.BinaryWriter) _
Implements IBinarySerialize.Write
w.Write(IsNull)
' No point of serializing
' the other fields if the UDT is SQL-NULL
IfNot IsNull Then
w.Write(Magnitude)
w.Write(Direction)
EndIfEndSubPublicSub Read(ByVal r As System.IO.BinaryReader) _
Implements IBinarySerialize.Read
' Note the order
_isNull = r.ReadBoolean()
' We do the same check as that
' in the Write
IfNot IsNull Then
Magnitude = r.ReadDecimal()
Direction = r.ReadSingle()
EndIfEndSub

C#:

// These are the fields
// we need to serialize
decimal _magnitude;
float _direction;
bool _isNull;
void IBinarySerialize.Write(System.IO.BinaryWriter w)
{
w.Write(IsNull);
// No point of serializing
// the other fields if the UDT is SQL-NULL
if (!IsNull)
{
w.Write(Magnitude);
w.Write(Direction);
}
}
void IBinarySerialize.Read(System.IO.BinaryReader r)
{
// Note the order
_isNull = r.ReadBoolean();
// We do the same check as that
// in the Write
if (!IsNull)
{
Magnitude = r.ReadDecimal();
Direction = r.ReadSingle();
}
}

Our vector is now officially a UDT. We are going to copy the assembly to SQL Server and extract the UDT from it. This process of creating objects in SQL Server is known as cataloguing, because incidentally, you are making your objects visible in the catalogue views of SQL Server. You should build the class library, and copy the assembly to a desired location. If you haven’t enabled SQL-CLR Integration, you may execute these statements:

Fig. 12 - Enabling SQL-CLR

The first step in cataloguing your UDT is to copy the assembly to the database using the CREATE ASSEMBLY command. You specify the path of your assembly and the name that should appear in the database. Your database and the location of your assembly may differ from mine, so change them accordingly. In the commands below, we just use the assembly name as its logical name inside the database:

Fig. 13 - Cataloguing the assembly

Cataloguing our UDT is achieved by the CREATE TYPE command. The fully qualified name of your type, this time, has the format <name of assembly in the database>.[<namespace in the original assembly>.<type name in the original assembly>]. We enclose the namespace and the type with “[]” because currently, SQL-CLR does not support namespaces.

Fig. 14 - Cataloguing the UDT

You may verify our assembly and UDT by querying their corresponding catalogue views:

Fig. 15 - Verifying assembly and UDT

The ALTER ASSEMBLY command is handy when you make modifications in the functions and some parameters in the attributes of the UDT. This saves you one step because you’re not required to re-catalogue your UDT anymore. The syntax is very similar to its CREATE counterpart.

Fig. 16 - The ALTER ASSEMBLY command

You may not use the ALTER ASSEMBLY command if you renamed, added, removed members, or made changes that affect the serialization of the UDT. You must re-catalogue everything. This means dropping all the objects in the assembly dependency tree and then repeating the previous steps. You will be using the commands below for most of the subsequent modifications to our UDT:

Fig. 17 - Re-cataloguing commands

The commands I just presented are the most compact variants that handle our needs. Consult the Microsoft documentation for comprehensive discussions of them.

It’s time to see our UDT in action. We will examine what’s going on inside it as we execute some of the most common SQL statements. We might discover new things and learn more about creating UDT as we go along. Bear in mind though that this is a happy test; we must provide the correct input at all times, otherwise exceptions would be thrown.

Let’s kick-start by declaring a variable of our UDT and then displaying it. As expected, the following statements prints “NULL” because we haven’t initialized it.

Fig. 18 - Simple UDT usage

If we initialize it before selecting it, we get something gibberish:

Fig. 19 - Hexadecimal display of initialized UDT

This weird series of characters is actually the hexadecimal representation of a vector. Somewhere in that hexadecimal is your magnitude and direction. In order to get a meaningful display, we need to call the ToString function:

Fig. 20 - ToString display of initialized UDT

In my opinion, the integration to T-SQL would have been much more seamless if SQL Server calls ToString underneath if we omit ToString. This behavior would have been consistent with the implementation of the initialization with Parse.

Before we proceed, be aware that the members of your UDT are case sensitive, and the following statements illustrate that:

Fig. 21 - UDT member case-sensitivity

Displaying properties does not require a call to ToString unless the type is a UDT. Non-UDT properties can be mapped directly to some SQL native types so SQL Server has no problem displaying them.

Fig. 22 - Direct call to scalar property

To prove that our UDT adheres to good design, Parse should be able to consume ToString without a problem:

Fig. 23 - Parsing with ToString

And we can now create a table with a Vector column, and insert or update rows:

The ubiquitous equals "=" operator in SQL is not compatible with UDT, by default, and using it results in an error.

Fig. 25 - Equality operator call error

Being a .NET developer, the first thing that would come to your mind is operator overloading. Unfortunately, operator overloading is not yet supported in SQL-CLR. The good news is turning on this equality operator support in UDT is very simple. We’ll just tell SQL Server to use the persisted binaries of the fields when comparing. We do this by setting the IsByteOrdered property of the SqlUserDefinedTypeAttribute to true. Be aware that if your UDT is a class, there will be additional work involved. We’ll discuss this later.

The IsByteOrdered setting means SQL Server comparing the magnitude followed by the direction and then nullability. This suits our needs since it already conforms to the semantics of vector equality.

The new adornment is a significant change in the UDT, so a simple alteration is not allowed. You must re-catalogue everything. After the re-catalogue, executing the last command should work just fine.

Fig. 27 - IsByteOrdered at work

The negated equal operator "!=" is also implemented as a logical consequence:

Fig. 28 - Negation operator

And even the rest of the equality operators, even though they don’t make any sense in the context of Vector:

Fig. 29 - Other equality operators

If SQL Server can compare our vectors, definitely it can sort them too. Once again, sorting the Vector does not make sense, but here it is:

Fig. 30 - Sorting vectors

Everything is set for the other roles the UDT can assume at this point, like primary key, foreign key, and indexed key. You can indeed create these constraints with UDT, but it doesn’t mean you should. Besides being a blatant violation of fundamental relational principles, UDT was not built for this purpose, and the performance penalty for such an implementation can be very severe. Here’s a table with a UDT as a primary key:

Do you notice something odd about our implementation of Parse in Figure 4? The code always returns non-SQL-null, but if we assign null to our UDT, it behaves accordingly.

Fig. 32 - Using the NULL keyword bypasses Parse

We can infer that maybe SQL Server is bypassing Parse when NULL is assigned as the value. It could be because calling Parse explicitly is another story (calling static functions of UDT in T-SQL uses the :: scope qualifier).

Fig. 33 - Explicit call to Parse

The only logical conclusion we can arrive at is that SQL Server somehow has its own way of tagging the memory occupied by our UDT as SQL-null whenever we use the keyword NULL instead of calling Parse. It’s talking direct binaries this time instead of string. The error message tells us that we cannot use the value of the argument because it is SQL-null. The exception is thrown the moment we attempt to split the input.

Explicit call to Parse is not the recommended way of assigning values to a UDT. I just showed you that in order to make a point. If you still want to play safe, you can avoid this exception by making sure that you are not splitting a SQL-null argument, as what the next block of code shows:

This unusual overloading is required to boost performance. Bypassing Parse means SQL Server can implement the native way of handling SQL-nulls just like how it does on non-UDT types. This also means that we can actually assign non-string literals to initialize a UDT:

Fig. 36 - Direct binary assignment

This can be scary. We now have something similar to the dreaded “SQL-injection” attack. In this case, it’s a direct binary deserialization attack. Garbage can be entered as long as its binary is converted to the UDT type. You’re lucky if an error results, or if it results to SQL-null; otherwise, it can wreck havoc to your program. The next set of statements show that inserting 12-bytes (8 bytes for the magnitude, 4 bytes for the direction) binary avoids SQL-null and NAN (Not A Number) while still ensures garbage results. Note that the length of the binary is the total length of our fields: double=8 + float=4 + Boolean=1 = 13.

Fig. 37 - Assignment using arbitrary string literal

SQL Server provides a safeguard from this circumvention through a function which is called whenever an external binary deserialization is performed. You specify this in the ValidationMethodName property of the SqlUserDefinedTypeAttribute. This function should be parameter-less, and return a Boolean indicating if the input is valid or not. So, for the sake of illustration, we simplify the model by accepting only a positive magnitude and a direction between 0 and 2PI. Our validation function becomes:

The real vector does not have these restrictions. A negative magnitude means that the vector is actually pointing to the opposite direction. Our model forces the user to reverse the direction instead of supplying a negative magnitude. A direction less than 0 or greater than 2PI is also allowed, but can be converted to the equivalent value between 0 and 2PI. We’ll provide the appropriate helper function for this later. With the validation function in place, those unreasonable values derived from arbitrary string literals are caught this time:

Fig. 39 – Validation function firing

The validation can also be applied in Parse, but we need to transfer the logic to another static function. The validation function for binary deserialization merely delegates the task to the new static function as shown:

In production, malicious binary deserialization from arbitrary literals is virtually impossible unless your system is already vulnerable to other conventional means of attack, like SQL-injection. Preventing attacks like this is the primary responsibility of the front-end applications and the security infrastructure. But, in deference to Murphy's Law, just make sure you have the validation function in your UDT.

We have seen our UDT in an Update statement when we discussed about UDT initialization and value assignment. It was more of “replacing” since we simply create another instance of our UDT and assign it to the column. It needlessly updates all the fields, and this is not what we always want. Sometimes, we want to update just one property. The type of the properties Magnitude and Direction map to SQL types double and float respectively. This makes them readily updateable in SQL. (See the Microsoft documentation for a full list of CLR to SQL type mappings.)

Fig. 42 - Direct update on a property

And for some unknown purposes, string literals of valid inputs is also possible. Look!

Fig. 43 - String literals as property input

Although, it produces errors with invalid literals like ‘Hello there’, I still find this very unnecessary. In the previous version of this article, I express my hope that this feature be removed from SQL Server 2008. Unfortunately, it's still there.

SQL Server does not allow updating of two columns in one UPDATE statement. Attempting so returns an error:

Fig. 44 - No update for two columns

This must be due to the internal mechanics of the database engine, or some other relational principles, but I could care less. If there is a need to change two or more fields at the same time, it could be an indicator for the presence of a function. This function leverages on encapsulation by hiding the intricacies of updating the fields while making our UDT self-documenting. We can apply this to represent some common operations of a vector. Let’s augment our vector with these functions, starting with scalar multiplication. Scalar multiplication is nothing more than increasing and decreasing the magnitude by a certain factor. We can represent this with a less academic name Scale. If the factor is negative, that means the direction is shifted by 180 degrees or 2PI.

What happened was that SQL Server could not recognize the member as a mutator. In programming parlance, a mutator is a function that alters the state of your object. Alter in the sense that the value of at least one of your fields is changed. In a language that does not have a construct for property, mutator is synonymous to a setter function. Therefore, it makes perfect sense that SQL Server considers a property as mutator, by default. You can change this setting through the IsMutator property of SqlMethodAttribute. SQL-CLR goes even beyond the default because IsMutator seems to have no effect on the property. You should try the code below and see for yourself.

C#:

In a good OOP design, a mutator is usually a public instance function that returns void. Even your property set function is converted to such a function. SQL Server makes sure you adhere to this by restricting mutator calls to UPDATE and SET statements only. If you don’t tag a void function as mutator, you are authorizing SQL Server to call that function in SELECT or PRINT statements. Error results because SELECT and PRINT expect argument values which should have been returned by the UDT function.

In the other hand, a non-mutator or accessor returns a value to the caller. SQL Server restricts accessor calls to SELECT and PRINT statements. It’s perfectly legal to mark an accessor as mutator, but in so doing, you render it useless. SQL Server cannot use the return value because the attribute precludes SELECT and PRINT. You can use UPDATE and SET, but they’re worthless because they do nothing to the return value of the function. The following table summarizes the SQL statement's dependency to the IsMutator property:

Fig. 49 - SQL statement dependency to IsMutator

We add an accessor to return the vector which points to the opposite direction. You may omit the adornment because the default is already false.

C#:

After flagging our Scale function as a mutator, the last SQL statements using it run just fine:

Fig. 51 - Scale function at work

And so does the new accessor:

Fig. 52 - GetReversed function at work

We add the other vector operations of our model as static functions. This gives us the ability to use them in both displaying and updating our UDT. If we opted for an instance implementation instead, it would require us to create an equivalent static function if we only want to display the result. And because SQL-CLR does not support function overloading, we would end up with a function with a different name - not elegant at all! We added a helper class that converts all directions less than 0 or greater than 2PI, because it’s possible that intermediate operators can yield values beyond that range.

We test these operators on two vectors of equal magnitudes: one is 45 degrees while the other is 135. The scope qualifier for the static function is :: which is first shown in Figure 34. Note again that our vector UDT accepts only angles expressed in radians, and that the precision of our answers is dependent on the precision of our inputs.

Fig. 54 - Common vector operations at work

Function overloading and operator overloading are still not part of SQL Server 2008 but they would be a nice addition someday. Operator overloading is a must for mathematical entities like vectors where the overridable CLR operators already have fixed and well-defined semantics. With the current shortcomings of UDT, it’s still impossible to write a 100% SQL-agnostic .NET struct or class. More often than not, UDTs are not mere ports from an existing library. It’s primarily created with SQL Server in mind.

Before we end this section, let me just also point out that SQL Server does not support out and ref parameters. Your codes compile but calling the function in SQL returns this vague message:

Our assignment statements for the last test are not efficient. We initialize the direction with 0, only to change it with another value later on. We did this because we wanted to use the PI() variable, but since the default UDT initialization accepts only string literals, we are forced to assign a dummy value for the direction. We can rectify this by another factory function, this time with the properly typed parameters. It’s just natural to initialize the fields using their own types rather than strings, which is very prone to error. From now on, we’ll be using this function a lot more often:

At this point, we’ve seen all the members of the UDT in action except IsNull. We already know that SQL Server uses this internally to determine if the instance is a SQL-null after deserialization. If this is the case, then using it should be straightforward in SQL, right? Wrong. Take a look at these statements:

Fig. 58 - Unexpected default behavior of IsNull

Examining the properties of our UDT explains why the delete did not happen as expected.

Fig. 59 - Calling properties of SQL-null UDT

This quirk is due to the SQL Server default behavior of bypassing calls to members of a SQL-null UDT. It is a performance-enhancing technique. SQL Server can save processing power since it does not have to deserialize the fields of the UDT anymore. It peeks into the memory just long enough to find out whether or not the UDT is tagged as SQL-null.

In a SQL-null UDT, it’s perfectly logical to get null from the other members except for IsNull. IsNull returning a SQL-null is misleading. An uninitiated SQL developer could use this property as a predicate in a WHERE clause, resulting in a subtle bug. To avoid this, you can tell SQL Server to call a member of a SQL-null UDT by setting the property InvokeIfReceiverIsNull of SqlMethodAttribute to true. This tells SQL Server that if the UDT is SQL-null, it still has to deserialize it so that a call to the member is possible.

Be aware that this incurs unnecessary processing. Microsoft recommends you stick to the existing SQL way of testing nullability using the IS NULL predicate operator. This also gives you the correct result minus the deserialization of UDT fields, thus offering a significant performance boost.

I conducted an informal test on the performance between IsNull and the IS NULL construct on a 10M-row table with 33% SQL-nulls. The script and a sample output are shown below:

Fig. 62 - IsNull vs. IS NULL script and output

There were so many processes in my machine that could affect the execution of the script, so I made 10 iterations in order to arrive to a much conclusive result. Of course, the more iterations, the more conclusive the result would be, but I didn’t have the time and resources to set a clean testing environment. Now you know why I called them “informal” tests.

The ten iterations yielded consistent values of IS NULL being 6 times faster than its UDT counterpart, as shown in the following graphs:

Fig. 63 - IS NULL vs. IsNull results

The adverse effect of IsNull is probably negligible on a small number of rows, but it’s always safe to shy away from it on set-based operations because you never know when and how fast your table grows. For non-set-based operations like a condition in your IF statement, you’re pretty much safe with it.

While SQL-null is still fresh in our minds, let’s tackle another behavior that has something to do with it. This time it’s not the UDT that is SQL-null, but the argument passed to its function. Like the case of IsNull, SQL Server treatment of SQL-null arguments takes both performance and semantics into consideration. This behavior is controlled by the OnNullCall property of the SqlMethodAttribute. Of course, the purpose of this property is fully realized only if the parameters are SQL-nullable.

Leaving OnCallNull to its default value of true tells SQL Server to call the function even if at least one of the arguments is null. This may seem absurd because it implies that performance is not the primary motivation for this property. We’ll find out the rationale behind this before we end this section.

Let’s start our exploration of this new property by adding a mutator function AddVector with OnNullCall set to false:

C#:

If we pass a null argument to this function, we get a vague error message:

Fig. 65 - Error from a mutator supplied with a SQL-null argument

We know that, generally, SQL operations involving a SQL-null return SQL-null. SQL Server assumes that by tagging your function as a mutator, you’re going to use the SQL-null argument to an operation involving your fields. In theory, this in turn can render your UDT SQL-null or your fields in an inconsistent state. This is what the previous message is all about.

SQL Server prevents the unintentional rendering of your UDT to SQL-null by restricting OnNullCall only to true in the case of mutators. Because the values in your fields are vital, SQL Server wants you to take the appropriate actions instead of just rendering your UDT SQL-null right away. You have the options of exiting the function or throwing an exception. If ever you want to render your UDT SQL-null, you may do so by setting the nullability flag field to true. SQL Server imposes a rule that setting a data to null should be done through explicit null assignment, as shown in Figure 5, and not as a mere effect of an operation to which the UDT is an operand. We fix our codes by setting OnNullCall to true, and exit the function if the argument is SQL-null:

Fig. 67 - Accessor returning an instance when a SQL-null argument is supplied

The values of the fields might be valid for a particular operation even if SQL-nullability is ignored. This is the case of our vector UDT where the SQL-null values of zero magnitude is a special kind of vector – the zero vector. The zero vector is also an additive identity which means that adding that to a non-zero vector simply yields the same non-zero vector. The addition did happen, but we are given the impression otherwise, because the result is the same as the non-zero vector operand. You can confirm this by running the same statement, but this time on a function that does not return if the argument is SQL-null. We can go even further by changing the values of the fields in the Null property’s get function.

The previous statements this time will yield a different result, which clearly shows that the addition does happen:

Fig. 69- Default values affecting addition

During development, you must double check all your mutators to see if all the null arguments are explicitly handled. Ignoring them does not return an error, and this might lead you to believing that SQL Server does the job of returning the current instance for you. The operation is actually performed using the SQL-null arguments, and it succeeded because .NET has no concept of SQL-null; the SQL-null object is still a valid .NET object!

Accessor is a different story. SQL Server allows setting of OnNullCall to false on it because your UDT is in no way affected by the operation. And since almost all operations involving SQL-null yield SQL-null, SQL Server can safely return the SQL-null right away. If ever there are special cases, you are still allowed to set OnNullCall to true and handle the SQL-null argument accordingly.

I conducted another 10-iteration test to find out how much performance benefit you can get from setting OnNullCall to false. For this test, I made two variants of our vector UDT, each having an accessor GetScaled with a SQL-nullable parameter. The implementation of GetScaled for the variant that allows calling of this method is shown below:

C#:

Fig. 70 - An accessor version of Scaled which is called on SQL-null arguments

The test simulated insertion to two dummy tables with the SQL-null return values of GetScaled from the two variants. The following shows the script and a sample result:

Fig. 71 - Script for testing OnNullCall

There might be some processes in my machine that caused an unusual spike in one of the iterations, but it is still clear that there is indeed a performance benefit with bypassing the function call. My machine yielded a 21% performance advantage as shown here:

Fig. 72 - OnNullCall test results

The OnNullCall setting precedes other validation checks that SQL Server does to the argument of a function. If the parameter of your accessor is a non-SQL-nullable, SQL Server does not return an error even if you supply a null value to it, because the check is already suppressed by the OnNullCall setting. To illustrate this, let’s create another accessor variant of our Scale, but unlike Figure 70, we retain the non-SQL-nullable parameter which is central for this discussion.

C#:

Fig. 73 - OnNullCall set to true for function with non-SQL-nullable parameter

As I’ve said earlier, setting OnNullCall to true on this function does not make sense because you would never be allowed to supply a SQL-null. SQL-Server does not know how to convert the argument of the function to SQL-null, and you’ll get an error similar to this:

Fig. 74 - Error from calling an accessor with a non-SQL-nullable parameter

Don’t let the message fool you. Converting our parameter to out or ref doesn’t work either as shown in Figure 55. To avoid this error, you set OnNullCall to false. SQL Server never bothers to check if the parameter is SQL-nullable. There’s no point since it will not call the function anymore.

It’s clear at this point that OnNullCall is very much dependent on IsMutator. We’ll end this discussion by another table that summarizes this dependency. The table shows why SQL Server differs when it comes to the default OnNullCall setting, having performance as an important consideration notwithstanding. We can surmise that it’s just a natural consequence of SQL Server adherence to fundamental database concepts and its regard to the safety of your data.

Figure 26 shows how easy it is to make our UDT index-ready through a simple flag in the UDT attribute. It’s not even our intention then, but just a mere side-effect of telling SQL Server how to compare two UDTs. In production however, you will find yourself indexing functions (property-gets are also functions) much more often than UDT, because generally, they are scalar. But unlike scalar outside a UDT, a scalar function is not index-ready by default, as shown here:

Fig. 76 - Scalar function is not index-ready

The message says that SQL Server requires the function should be deterministic. A function is deterministic if there is only one possible output for a particular input. In other words, no two inputs produce the same output. For example, the equation x+1 is deterministic because it produces a unique value for every value of x. On the other hand, x^2 (x squared) is not deterministic because it can produce the same output from two numbers; e.g. +2 and -2. In mathematics, a deterministic equation is actually called a function. Do not confuse this with how we use the term in the context of programming.

SQL Server cannot determine whether or not a function is deterministic. It’s your job to tell it by using the property IsDeterministic of the SqlMethodAttribute. It’s obvious that our candidate property qualifies since it does nothing but expose our field.

To see how our index works, we perform two queries, with the first one leveraging on the clustered index. The execution plan shows that we get a significant performance boost in the first query because SQL Server doesn’t have to perform the sort anymore.

Fig. 79 - Indexed property at work

Before we proceed, let’s take a brief look at the statements for creating an index. The statement says that creating an index from a UDT property entails saving the value for that property on a derived column. This is achieved by the keyword PERSISTED, which also implies that the column is intended to be indexed.

You can still enhance the performance of a property- or function- based index by telling SQL Server that there is no floating point operation involved in yielding the return value. You should assure SQL Server that the result is precise. Unfortunately, none of our properties guarantee exact precision. If magnitude were precise, we could have set the IsPrecise property of the SqlMethodAttribute attribute like this:

C#:

When dealing with floating point values, one would later find out that the ubiquitous equals “=” operator may be rendered unreliable. This is because the chance of two values to be exactly the same is very slim. The number of significant figures can vary from one input to another, and a certain degree of imprecision is introduced in the intermediate computations. Our UDT is very susceptible to this anomaly because all of the fields used in comparison are floating point types. The next set of statements simulates an equality test anomaly between two vectors conceived to be equal but proven otherwise by the difference in input presentation. The second vector accepts an approximate value for the direction, which leads to a very minute yet significant difference.

Fig. 81 - Different input presentation causes equality anomaly

If we have reasonable number of significant figures, we can tolerate a certain difference between the two operands. We still consider them equal if the difference falls within this range. It is usually very small and insignificant to the model. We can call this range value our margin of tolerance. We augment our UDT with another member to cater for this:

You can achieve similar results by rounding off the two operands to a certain decimal place. In this case, the parameter is the number of decimal places to which the values will be rounded to. I like the first approach because it centers on the culprit value. We can easily zero in to that value by displaying the difference of the direction between the two vectors.

Fig. 83 - Didfference due to precision lost

It’s a very minute quantity indeed, and yet very capable of wrecking havoc to your query. This is rectified by the new operator we just added:

Fig. 84 - ApproxEquals at work

It is recommended that you have an approximate equality operator for a byte-ordered UDT. This gives you a flexible workaround for the putative equality anomalies when dealing with floating point values. Some developers choose to implement this in the Equals override to make the UDT a little compact. This approach has two disadvantages. First, it dilutes the true meaning of “equals”. It could be easily interpreted as the absolute equality operator because of its name. Second, it does not give you the flexibility of adjusting the margin of tolerance. It would require code modification just for changing the value of the margin.

At this point, our UDT is complete, and you should be able to take it for the ride. The illustration below shows a rundown of its members. You may verify its functionalities using some of the sites you can get from Googling "vector calculator". Remember that you won't get exactly the same answers because approximation might be involved in those calculators.

Class implementation is a neglected aspect of UDT discussion. Most of the time, developers become dogmatic about the inherent benefits that value types, like struct, have when it comes to performance. We’ll find out if this is a reason enough to shy away from class implementation.

In this section, we also explore the implementation consideration entailed by a reference type such as a class. We’ll tackle them one by one as we create a class version of our struct UDT. To start, you need to copy and paste all the code from the struct UDT into a class named CVector.

One quirk worth mentioning before we go further is that in SQL-CLR, assigning an instance to a variable always creates a new instance regardless of if the UDT is implemented as a class or a struct. This is a departure from our usual understanding of the class implementation where assignment simply points the variable to the instance. It is still recommended that any implementation that is specific to the struct should be changed accordingly to maintain the intended behavior of the UDT outside SQL Server. This means changing our implementation of GetReversed such that it creates a new instance rather than point the new variable to the current instance. Our code does not compile yet because we haven’t defined our constructors.

The next set of modifications has something to do with the way we access our fields. If you look at Figure 9, we directly access our field even though it’s private. A class is strict with access modifiers, and this is no longer permitted. We’ll write constructors to cater for this need.

The private overload of these constructors is used just for setting the SQL-nullability field. It’s private in order to prevent setting of the SQL-nullability outside the SQL where it doesn’t have any meaning. The default overload is added to validate the lines that were originally calls to the constructor of the struct. The public and private operators are used by the Parse and Null functions respectively.

Our class UDT compiles without a glitch, but if you catalogue it, you would receive the following error:

Fig. 88 - Class UDT native serialization error

It’s obvious that this one is another case of a missing attribute. Don’t bother using the good ol’ Intellisense to rummage for the property in the existing attributes, because this one is arcane. If you’ve written considerable interop codes, then you must have encountered the StructLayoutAttribute in the System.Runtime.InteropServices namespace, which is what we need. It is relevant only to a class or struct type that is passed to unmanaged code. This is exactly what happens when SQL Server takes over the binary serialization of our class during persistence. The attribute has a Value property that is of type LayoutKind. This tells the unmanaged code how the members of the class are laid out in the memory. The LayoutKind.Sequential means the fields of our class UDT should be arranged in the memory in the same order we declare them. A struct type has an implicit LayoutKind.Sequential adornment, which is a good reason to prefer it over a class. Heeding the SQL Server request, our code becomes:

Cataloguing should succeed by now, and your class UDT is ready for action. There’s no other better way of doing this than a head to head match up against its struct counterpart, so I conducted another test. This time, I also included the user-defined-formatted class UDT. The test script simulates insertion and deletion of three tables, one for each serialization format of the UDT. The two transactions are contiguous, but I decided to show the results separately for clarity. Shown below is a fragment of the script intended for the table that uses the user-defined-formatted class UDT.

Fig. 90 - UDT insertion sample test script

The results for the insertion test were pretty much what we’ve expected. The difference between the struct and the struct-wannabe class was very insignificant, with the struct getting a meager 1.1% advantage. On the other hand, the manual binary serialization of the user-defined-formatted class led to its 32% lag from the struct-wannabe.

Fig. 91 - Insertion test result

If the insertion was pretty much expected, the deletion was a total surprise. The graph below doesn’t show any trend at all. I was fully aware that some other processes in my machine somehow affected the results, but what I couldn’t figure is how come the adjoining insertions seemed to be impervious. Another big surprise was the user-defined-formatted besting the native struct and class with advantage margins of 4.5% and 8.9%, respectively; not big, but still a point to ponder for an inquisitive mind. I’m not a SQL Server or CLR internal guy, so I can’t offer you a clear explanation for this.

Fig. 92 - Deletion test results

Based on the results, it looks like serialization is the performance clincher, and not the implementation type as probably most of us have assumed. We just saw how a class UDT achieved almost the same performance as that of a native struct by setting its serialization to native. Likewise, a user-defined-formatted struct performance should be no different from a user-defined-formatted class. But, if there’s one thing that should deter you from using a class, it has to be the fact that it involves more code than its struct counterpart.

If your UDT is user-defined, SQL Server has no idea how much space is needed to store it. You must specify the maximum possible byte size in the MaxByteSize property of the SqlUserDefinedTypeAttribute. The maximum amount for SQL Server 2005 is 8000, but this restriction is removed in SQL Server 2008. I'll elaborate on this momentarily.

We illustrate the usage of MaxByteSize by creating a user-defined-formatted version of our struct UDT. We add a property Unit of type string which accepts values “rad” for radian and “deg” for degree. You should also add the implementation of the IBinarySerialize similar to Figure 11, and the necessary modifications to the other members. The code fragments below show the adornment for the new UDT.

The maximum byte size of 17 was arrived by adding all the bytes needed for our fields. We already have 13 bytes, and the additional 4 is the maximum number of characters of the input: 3 bytes for the string plus the 1-byte overhead of string serialization. The size should be increased if you have plans for supporting non-US ASCII characters.

You may add validation routines to the property if you want this to become part of your UDT. Our current implementation simply accepts any literal of not more than 3 characters. Beyond this number, SQL Server would run out of memory for the serialization, as shown below:

Another SqlUserDefinedAttribute property pertinent to user-defined-formatted UDT is IsFixedLength. This flag is a misnomer in my opinion, because it specifies whether or not you want SQL Server to allocate the maximum byte size for all the instances of your UDT. This is optional and defaults to true. If set otherwise, SQL server allocates only the exact bytes needed for the values of the fields for the instance. The decision involved here is usually influenced by the developer’s regard to performance and disk space. A UDT with fixed length is usually faster in queries, while one with variable length is space-efficient. Similar pros and cons apply to UDT as that for scalars, e.g., char vs. varchar, binary vs. varbinary, and others.

SQL-CLR allows you to control the characteristics of return values of your functions and properties thorough the SqlFacetAttribute. This attribute has properties corresponding to the different characteristic inherent to non-blittable types and the types found in the System.Data.SqlTypes namespace. You can find the list of supported data types and the properties they support in the Microsoft documentation. Our UDT does not contain any of the supported data types, but we can still illustrate this by changing the magnitude type to decimal. Be warned that this has an extensive effect in your existing codes. The snippet below shows an adornment of the attribute in a property.

In the code above, Precision is the total number of digits, while Scale is the number of digits to the right of the decimal point. We use the values in Figure 54 to illustrate the effects of the adornment. Note in the statements below, we call the property instead of ToString because SqlFacetAttribute has no effect inside the UDT. If you compare the results below with Figure 54, you can see that SQL Server indeed rounds off the magnitude as specified.

Fig. 96 - Precision and Scale at work

The placement of SqlFacetAttribute is misleading as far as a property is concerned. It suggests that the attribute encompasses both input and output. This can lead you to thinking that it has validation functionalities, when in fact, it only deals with presentation specification. We can confirm this by entering values that do not conform to the specifications of our adornment and see that it will still succeed:

Fig. 97 - SqlFacetAttribute has no validation specification

For me, the SqlFacetAttribute has limited role in UDT. I even avoid using the properties we just illustrated, because they can introduce precision loss if those adorned UDT properties are used in intermediate computations.

The remaining properties of SqlFacetAttribute are information that SQL Server can use when defining a database object based on the member being adorned. This is very rare in UDT, but common among other CLR artifacts like managed procedures, functions, and user defined aggregates. We will not explore them anymore.

With the rare use of UDT and the shipping of Microsoft's own UDT's in the form of spatial data types, it's no surprise that UDT received only one enhancement in SQL Server 2008. As I mentioned a while ago, UDT is no longer limited to 8000 bytes. This might just be an afterthought during the development of the spatial data types which are nothing more than UDT's themselves but it's still better than nothing. When I first apply this feature though, I was left scratching my head. The documentation is misleading. If you want a UDT bigger than the 8000 bytes, you are only allowed the value -1 in SqlUserDefinedAttribute.MaxSize property. Specifying anything greater than 8000 results to cataloguing error.

You’ve probably noticed that we haven’t delved deep into exceptions. Worse, we never even had any exception handling at all! I pointed out that Parse would be teeming with validation, and of course, exception handlers, but I completely left them out for brevity. Besides, a UDT is already in the core of an application, and the best thing you can do in an exception handler is re-throw the exception, accompanied by a friendly message. There might be situations specific to your domain where you should handle an exception in a different manner. As for the other unhandled exceptions, don’t fret; someone, somewhere has to catch them for you. The bottom line is that there are no other considerations when it comes to UDT exception handling. The best practices in .NET programming also apply here.

During the early days of my UDT tinkering, I thought that since enumeration is nothing more than a named numeric type, I would have no problem using it in a UDT. The tests I made then proved nothing could be farther from the truth. In a native format UDT, SQL Server insists that you should adorn the enumeration with LayoutKind.Sequential just like our native class UDT. This is not possible because StructLayoutAttribute cannot be applied to an enumeration. While in the user-defined format, your code will compile, but SQL Server cannot recognize the type. You will get a message similar to this:

Fig. 98 - Vague message from using an enumeration

The idea of having enumeration support for UDT was once attractive to me. Enumeration helps minimize errors through its type-safety feature while enhancing the readability of your SQL. I realized later that this is too much to ask for, considering that it’s actually tantamount to implementing an entirely new SQL-CLR object. If implemented, it could have rendered the lookup (name-value) table obsolete, and caused another barrage of debates; and we’ve had enough of that.

The User Defined Type (UDT) is probably the most significant among the SQL-CLR artifacts. No other artifact has effects on SQL Server as extensive and profound as UDT does. UDT became instrumental to the evolution of SQL Server beyond its relational foundation through the introduction of concepts like “object” and properties. Along with these are new T-SQL features required to interact with a UDT.

Unfortunately, UDT is also the most complex among the SQL-CLR objects. Its implementation is on one language, but its structure, characteristics, and behaviors are defined by another. This dual nature necessitates a slight departure from our conventional understanding of some common concepts in programming; something that can be attributed to the UDT's obscure nature. Adding to the complexity is the ample dosage of declarative programming involved to achieve the intended behaviors of the UDT.

UDT can be implemented using a class or a structure .NET construct. The common conception that value types like structures have performance advantage over reference types when it comes to high volume transactions is inapplicable in UDT. The performance determinant is not the type, but the serialization format used in the UDT. Native serialization is faster than user defined. If there is a good reason to shy away from a class, it's the fact that it's more verbose than a structure implementation.

Perhaps, the greatest benefit of UDT has yet to be realized. By allowing us to store data of virtually any structure, it can pave the way for an entirely new class of applications. Microsoft seems to be heading that way, and it’s just a matter of time before UDT would finally gain prominence.

Share

About the Author

Gio has been writing programs for almost 10 years. Sometime on 2005, he decided to cut his hair and remove his nail polish so he could enter a commercial building in downtown Manila and submit his resume. That bold action eventually led him to the city by the bay in the sunshine state. He really like it there because he's still able to experience hurricanes which makes him very nostalgic albeit disappointed because it doesn't automatically translate to work suspension. If he's not outside watching squirrels play, he explores esoteric topics using .NET and SQL Server. And like squirrels, Gio is a very docile yet hardworking creature. If ever you see faults in his work, be gentle with the criticisms. He greatly appreciates this and rest assured your advice would be embodied in his future works.

Comments and Discussions

Probably the best single article I've found so far on CLR UDTs. Very nice organization and presentation in a way that builds on the prior concepts introduced.

I look forward to the update. In the meantime, can you explain further what you mean when talking about uniqueness/PK constraints on UDT columns? From the article specifically, "Besides being a blatant violation of fundamental relational principles, UDT was not built for this purpose...". What fundamental relational principles are you talking about? At the logical level, a UDT is just a data type like any other, and carries no restrictions on how it may participate in constraint definitions (apart from those implied by the domain and column constraints, such as comparability, etc.)

I can definitely agree with exercising care when using a UDT in a constraint as the current implementation has physical performance penalties (illustrated by your indexing example). But this didn't seem to be the concern your quote addressed.

I'm glad you like the article. And thank you for the encouraging words.

My statement about UK/PK UDT being a blatant violation to fundamental principles is based on the fact the UDT in this article is not a scalar. I already alluded on this in the earlier part when I said I'm going to earn the ire of the purists by choosing such a model. Not being a scalar means the UDT is not atomic anymore - at least on the theoretical sense - and this is already a violation of the 1NF. Using a my UDT as a PK/UK is like having the combination {EmployeeID, HireDate} as PK/UK for Employment table except that they are all cramped in one column. Using a non-scalar UDT for PK/UK is perfectly legal but I believe this is not what SQL UDT is created for in the first place.

Thanks for pointing this out and giving me the opportunity to make this clear. It's my fault not to make the clear distinction between the article's UDT and UDT in general.

Regarding the update, as you can see, it's already stale. The update refers to the second bullet in the history section. I'm currently working on another article about a real application of UDT in temporal database but I'm not sure if it's gonna be Code Project this time. I'm opting for a portal with a more database-centric audience and there are already some that express interest.

I suspected that the data type atomicity was the impetus behind your statement, but I wanted to be sure before I commented further.

Datetime data values can be decomposed into year, month, day, etc. Strings can be decomposed into their individual characters. Integers can be decomposed into their prime factors. But we don't have any qualms about utilizing columns defined over these types in our keys.

To quote Date from 2006; "...the concept of atomicity has no absolute meaning, and 'scalarness' is just the concept of atomicity by another name. Thus, the relational model nowhere formally relies on the scalar versus nonscalar distinction." [original emphasis, p.35]

As long as we have a type (domain) with the operators for manipulation within the algebra, any table with columns defined to hold a single value per row of such types will be, by definition, in 1NF.

Furthermore, Date explains that relations themselves are valid domains over which to define attributes within a relation. These are called relation-valued attributes (RVAs), and I have seen examples where such an attribute is a very good choice for a candidate key on a relation, design-wise. Unfortunately, SQL doesn't support them in a way that we can make use of their power.

So, if you have defined a data type that is distinguishable in the values that variables of that type may take, it is perfectly reasonable to consider it for candidate key use in a relation. Whether or not this is a good design choice, however, is another matter. In the case of the Vector UDT, maybe not. A coordinate UDT (latitude/longitude, e.g.), on the other hand, might make a great candidate key on a table.

Thanks so much for the info TroyK. I really admire the brilliant minds behind the relational model and post-relational model but honestly, I haven't delved deeper into their writings. C.J. Date is simply amazing in taking us beyond the relational model. I read some parts of his book way back in college. Very informative but they can be very dry and sometimes dizzying. Furthermore, it's very discouraging because industry is so slow (or probably un-interested) in implementing the new concepts.

One interesting book that I just read, courtesy of a friend of mine, is "A Developer's Guide to SQL Server 2005" by Beauchemin and Sullivan. I flipped the pages just to validate some of the points I raised here. There is a section in the book which serves as a guide on how to determine if an object (in OOP sense) is a scalar or not. That already presupposes that there are indeed non-scalar UDT's. It points out that SQL DateTime is scalar and so is the example given in the book, the LDim which a length dimension UDT. Its basis is the fact that those objects can be ordered/compared/sorted - Jan 11,2008 comes before Jan 12,2008 and 14 inches is greater than 1 foot. Right now, I don't have any guides on the choice of UDT but I still find that section quite interesting.