Build a Generic Histogram Generator for SQL Server : Page 4

Histograms help people analyze large amounts of data, whether you display them as tables or as charts. This article shows you how to do both.

by Michael Sorens

Jul 3, 2008

Page 4 of 7

The Summary vs. Detail Histogram The single-row result set typically provides a summary histogram—an analysis of an entire table or entire criterion. A detail histogram, on the other hand, relates two variables—one enumerated in columns, the other enumerated in rows. Each row provides a separate mini-histogram for a single enumerated value of the second variable. This type of histogram is also called a cross tabulation (or crosstab for short).

You may generate a detail histogram for any of the types of histograms specifiable with @Mode. To extend the previous example you need add only one more parameter—@IdExpr—to convert the summary histogram into a detail histogram.

This query addresses the question: Between March and August in the 4-year span 2001 to 2004, how many orders per month did each employee enter? Table 6 shows the result set.

Table 6. Detail Histogram: A detail histogram introduces a row for each value of the EmployeeID field.

EmployeeID

March

April

May

June

July

August

164

33

31

36

37

39

42

198

16

11

16

19

18

17

223

36

33

41

41

43

46

231

30

31

39

35

40

41

233

30

30

38

37

39

41

238

29

30

39

36

39

41

241

33

30

38

35

40

41

244

32

30

39

36

40

40

261

31

31

36

36

40

40

264

33

29

37

36

41

39

266

36

33

42

40

44

45

274

14

14

16

17

15

19

If you compare Table 6 to Table 4, you can see that each column in Table 6 adds up to the single value in the same column in Table 4. So Table 6 shows a breakdown by employee of the histogram in Table 4. In other words, each row of Table 6 is a summary histogram focused on a single employee.

Filtering in Two Dimensions

Figure 2. Filtering Variations: Both column (@BucketFilter) and row (@IdFilter) filters may be applied to a detail histogram. The unfiltered query (1) sheds columns (2) when you filter on a particular status value; sheds rows (3) when filtered by Employee IDs, and sheds both rows and columns (4) when both filters are applied.

Figure 2 provides a visualization of how filtering works. All the result sets shown in the figure come from this base query:

In the figure, @BucketFilter and @IdFilter parameters are added to the base query individually or in combination to change the output from the unfiltered result in frame one to the filtered results in frames two through four. An @IdFilter operates on the domain specified by the @IdExpr. Since the purpose of the @IdExpr is to enumerate values into rows, applying the @IdFilter typically reduces the number of rows (see frame three in Figure 2).

Applying a @BucketFilter, on the other hand, may or may not reduce the number of columns. Theoretically speaking, it serves to constrict the "flow" of values to any or all buckets. Taking constriction to the extreme it may actually eliminate an entire column, as happens in frame two. There are no values in the table for 2001 or 2002 with a status of 2, so those columns do not appear.

You might argue that, yes, filtering by rows should reduce the number of rows, but the column count should be inviolate, independent of column filtering. Why? When counting things, zero is still valid! So why are columns eliminated in Figure 2 and what could you do about it? Run the procedure for Figure 2, frame 2, with verbose set to 3, then review the diagnostic and query output (see Figure 3).

Figure 3. Column Constriction vs. Column Elimination: The @BucketFilter clause applied to the boundary determinants causes columns to be eliminated. Applied to the selection query, on the other hand, @BucketFilter causes columns to be constricted.

Author's Note: For clarity, Figure 3 omits irrelevant lines from the output, shows the remaining output reformatted , and highlights the application of the @BucketFilter parameter.

As Figure 3 clearly shows, the @BucketFilter is applied in two areas, first to the boundary determinants—which may eliminate columns—and then to the selection query, which only constricts columns. Now that you see how columns can get eliminated, you may also have guessed what you could do about it—avoid letting the histogram procedure determine the boundaries. If the procedure is not calculating boundary determinants, it will not be able to eliminate columns. In this case, all you need do is supply @LowerBoundDate and @UpperBoundDate parameters.