Auxiliary Tables, Bit by Bit

Editor's Note: Send your experts-only T-SQL tips to Itzik Ben-Gan at blackbelt@sqlmag.com. If we use your tip in the magazine, you'll recieve $100 and an exclusive T-SQL Black Belt shirt.

Auxiliary tables are extra, "helper" tables you create in your database specifically to help you provide T-SQL solutions. With auxiliary tables, you can write shorter, simpler, and more elegant code. I can best explain the concept of using auxiliary tables through an example. In the public technical newsgroups, I found two interesting examples of scenarios that can benefit from auxiliary tables. I discuss one of the examples (which I found on news://msnews.microsoft.com/microsoft.public.sqlserver.programming) in this article and I will discuss the other example in the next issue. To start, let's review some bitwise arithmetic.

Bit Arithmetic

Usually, when you store a value in a column or variable, that value represents one attribute (e.g., a sales quantity, an order ID). To store several attributes in a table, you need as many columns as you have attributes. However, your table might need to store many attributes, each of which can have only a small number of possible values. Each attribute might be a flag that can have the value on or off.

For example, suppose you want to store a set of permissions for each of an application's users. Each permission is either granted or not granted to each user. If you have 30 possible permissions per user, you could store them in 30 different bit columns. Manipulating and maintaining 30 columns can be a hassle; instead, you can store the 30 permissions in a 4-byte integer or binary column. But you need a way to construct a 4-byte value that represents a series of attributes and a way to extract a single attribute from the value. Bitwise operators come in handy here.

Let's look at two such operators: bitwise OR (|) and bitwise AND (&). Both of these operators accept two arguments and perform bit-by-bit calculations. One argument must be an integer data type—tinyint, smallint, int, bigint (only in SQL Server 2000)—and the other argument must be either an integer or a binary data type. Bitwise OR examines each bit in both arguments. If either argument has the bit in a certain position turned on, bitwise OR turns on the bit in the same position in the result; if neither argument has that bit turned on, bitwise OR turns that bit off in the result.

In the 30-permissions example, suppose you need to generate a value that grants 3 of the 30 possible permissions—SELECT, UPDATE, and DELETE—to a certain user. The first, third, and fourth bits, respectively, represent those permissions. You can achieve your goal by inserting a bitwise OR between the binary values associated with these byte positions: 1 | 4 | 8, where 1 has only the first bit turned on, 4 has only the third bit turned on, and 8 has only the fourth bit turned on. The result—13—has the first, third, and fourth bits turned on.

Bitwise OR also lets you merge several bit strings into one cumulative value. For example, suppose a user belongs to several groups and each group has a certain set of permissions. If you want the cumulative permissions for that user, you can execute a bitwise OR between the user's permissions and the groups' permissions. For example, suppose group1 has SELECT and DELETE permissions (9 = first and fourth bits turned on), group2 has SELECT and UPDATE permissions (5 = first and third bits turned on), and the user has UPDATE and DELETE permissions (12 = third and fourth bits turned on). After the bitwise OR operation, the cumulative permissions are 5 | 9 | 12 = 13, where the first, third, and fourth bits are turned on, representing the SELECT, UPDATE, and DELETE permissions.

Bitwise AND examines each bit in the two arguments. If both bits in the same position in the two arguments are turned on, the & turns on the bit in the same position in the result; if either or both of the arguments' bits are turned off, that bit is turned off in the result. Therefore, you can determine whether a certain bit in a source value is on or off by executing a bitwise AND between that value and another value (let's call it the mask value), in which only the bit you want to check is turned on. If the result is 0, the bit you're checking is turned off; if the result is the mask value, the bit you're checking in the source value is also turned on. In the permissions example, a value of 13 represents a certain user's permissions. To determine whether that user has INSERT permission, perform a bitwise AND between 13 and a second argument that has only the second bit turned on, representing INSERT. You can represent this second argument either as the integer value 2 or as a binary value, which expressed in hexadecimal base in T-SQL is 0x00000002. The result of 13 & 0x00000002 is 0, so you know that the user doesn't have INSERT permission.

Calculating Aggregate Bitwise OR

Now let's move on to an interesting problem presented by Edmund Davis of OnLine Technology. The beauty of Davis's request is in its simplicity. In Davis's own words: "I am looking for a way to do a bitwise OR on a column of integers in a similar way to the aggregate SUM or COUNT functions. I can obviously use '|' as an operator on two arguments, but what about an arbitrary dataset?"

What if you want to execute a bitwise OR between several values that you have stored in a table column? T-SQL provides aggregate functions such as SUM, MIN, and MAX, but it doesn't have an aggregate bitwise OR. To solve this problem, run the script in Listing 1 to create and populate an example Bitvalues table. Note that although the script loads the data into the data_bits column as integer values, each data_bits value represents a set of bits, such as a set of permissions or any other set of flags.

For each group of rows that has the same group_col value, you need to calculate the aggregate bitwise OR of all the values in the data_bits column. To lay the groundwork for the solution, let's first manually calculate the bitwise OR for the data_bits values that belong to group1 (1, 9, and 4). Using the techniques I discussed earlier, the natural way to calculate a bitwise OR is to translate the integer values to their binary representations (i.e., bit strings). The comments in Listing 1 contain the binary representation of the data_bits values. Let's perform the bitwise OR calculation on a bit-by-bit basis. In the example, the result for a bitwise OR operation between the values 1, 9, and 4 is 13 (1101 in binary).

You can now start developing the algorithm to solve the aggregate bitwise OR problem. Translate each on bit to the integer value that the bit represents. You can think of a bitwise OR as a simple sum of all the integers that the on bits represent. In the example, 13 decimal = 1101 binary = 8 + 4 + 1. Note that if a certain bit is on in more than one base value, you consider that bit only once. For example, the first bit is turned on in 1 and 9, but you consider that bit only once in the result. With that in mind, you're close to a solution to the aggregate bitwise OR problem. First, take each value in the data_bits column, and break it into the values that its bits represent. Second, for each group, use the SUM(DISTINCT (expression)) aggregate function to sum all the distinct values participating in that group.

The query in Listing 2 provides a solution to the problem that doesn't use an auxiliary table; Table 1 shows the query results. This query performs a bitwise AND operation between the data_bits column and a series of binary values—each of which has a different bit turned on—to parse each data_bits value into the values that its bits represent. A bitwise AND between two arguments returns a value in which only the bits that were turned on in both arguments are turned on in the result. So, a bitwise AND between data_bits and arg2, where arg2 has one bit turned on, can result in either 0 (if the bit turned on in arg2 is turned off in data_bits), or arg2 (if the bit turned on in arg2 is also on in data_bits).

Let's run the calculations that the query in Listing 2 performs for group1 (data_bits values 1, 9, and 4). First, let's perform a bitwise AND operation between the data_bits values and the possible binary values where each has a different bit turned on. Figure 1 shows the progression.

Note that both 1 and 9 have the first bit turned on. You use DISTINCT because you consider that first bit only once; if you consider it more than once, you get the wrong result when you sum the values. Next, the query performs the following calculation:

For this calculation, 1 + 0 + 4 + 8 = 13, the result has the first, third, and fourth bits turned on and all the other bits turned off. In other words, the result is an aggregate bitwise OR.

To get better performing code, you can use the MAX(expression) aggregate function instead of using SUM(DISTINCT). However, I presented the SUM(DISTINCT) solution here because I'm going to base the auxiliary-table solution on it. Although the query in Listing 2 gives the desired result, it requires a lot of code because it evaluates each bit in the data_bits column separately. If you need to perform an aggregate bitwise OR in several places in your application, you'll end up with large amounts of code. You could encapsulate such a query in a view, but what if you had to perform an aggregate bitwise OR against several different tables? By using an auxiliary table, you can considerably reduce the amount of code you need.

Using Auxiliary Tables

To set up an auxiliary table solution, first break a value into its respective bits. The code in Listing 3 creates an auxiliary table called Bits and populates it with 32 binary values, each of which has a different bit turned on. Then, join the Bitvalues table to the Bits auxiliary table, as in the following query:

SELECT * FROM Bitvalues JOIN Bits ON data_bits & bin_val = bin_val

Note the special JOIN condition. The query performs a bitwise AND between each value in the Bitvalues table's data_bits column and each value in the Bits auxiliary table's bin_val column. Each bit that is turned on in a data_bits value gets a matching bin_val value in which the same bit is turned on. So, a value of 7 in the data_bits column gets three matching rows in the Bits auxiliary table with the bin_val values 0x00000001, 0x00000002, and 0x00000004. Table 2 shows the results.

The second and final step in solving the problem is summing the distinct values for each group. This part is fairly easy after you break up the values in the data_bits column. Just perform a GROUP BY query in which, for each group, you use the SUM(DISTINCT) function to calculate the distinct sum of all the values that were broken up, as the following query shows:

SELECT group_col, SUM(DISTINCT(CAST(bin_val AS int))) AS or_data_col FROM Bitvalues JOIN Bits ON data_bits & bin_val = bin_val GROUP BY group_col

Note that the Bits auxiliary table stores values in binary. To sum those values, you must convert them to integers.

Which Way Is Better?

You can calculate an aggregate bitwise OR with or without an auxiliary table, but which approach is better? I compared the performance of the long SUM(DISTINCT) query, the long MAX query, and the short SUM(DISTINCT) query that uses an auxiliary table. The long SUM(DISTINCT) query performed considerably worse than the other two. The short SUM(DISTINCT) query performed slightly worse than the long MAX query, but it requires much less code.

Experimenting with this solution with the MAX instead of SUM, I found out that the MAX option won't work if your Bitmap array uses all 32 positions of an INT. The INT is Signed and the negative values produced from the 32th bit being 'on' would cause the MAX function to return 0 for the Maximum value instead of -2147483648. (0x80000000)

Doesn't it make more sense to do something like the following (provided one has SQL Server 2000)? The code seems more intuitive, and I suspect it would run faster than the auxillary table example. I'd be interested to find out how it compared to the other suggested solutions in time cost comparisons.
CREATE FUNCTION dbo.bitwiseOR(
@group_col int
) RETURNS int
AS
BEGIN
DECLARE
@iAccum INT
, @iWork INT
SET @iAccum = 0
DECLARE dataBitsCursor CURSOR FOR
SELECT data_bits
FROM Bitvalues
WHERE group_col = @group_col
OPEN dataBitsCursor
FETCH dataBitsCursor INTO @iWork
WHILE 0 = @@fetch_status
BEGIN
SET @iAccum = @iAccum | @iWork
FETCH zFooBit INTO @iWork
END
CLOSE dataBitsCursor
DEALLOCATE dataBitsCursor
RETURN @iAccum
END
GO
SELECT DISTINCT group_col, dbo.bitwiseOR(group_col) FROM Bitvalues
GO
I should mention that I borrowed the variable names iAccum, and iWork from Pat Phelan.

From the Blogs

Many organizations today cannot use public cloud solutions because of security concerns, administrative challenges and functional limitations. However, they still need a centralized platform where end users can conduct self-service analytics in an IT-enabled environment....More

It is crucial to move away from data and analytics stored on individual desktop computers. Today’s solutions must promote holistic, collective intelligence. The strong, continued alliance between Microsoft and Pyramid Analytics helps make all this possible....More

To become a truly data-driven enterprise, many business leaders recognize that they must extend the capabilities of self-service business intelligence (BI) and analytics to more of their business users. Many BI tools tackle part of this need, but they don’t offer a complete enterprise solution....More