February 16, 2013

It produces human-readable output. This is immediately useful when creating a Web API-based (SOA) service—you don’t need to have a consuming application in place before you can start testing functionality.

The ability to Self-Host. The Web API can be hosted in a process outside of IIS with very little setup or configuration. This is a fantastic option for services that don’t really lend themselves to running as a web application, for example services with no UI or processes that need to remain resident at all times (timers, messaging handlers, etc.). It’s also an easy way to expose functionality from a legacy application that runs in a Windows Service or console application.

On a recent project, my team was struggling to troubleshoot a problem we were having with a rather complex Windows Service. Worse, the problem we were having was showing up in the test environment, but appeared to be working flawlessly in the development and demo environments. Adding to our misery was the fact that the problem seemed to come up intermittently with no apparent rhyme or reason to when or why it happened. As any developer knows, the worst type of bug is one that you can’t reproduce easily.

The obvious approach would be to add logging around the key areas of the system where the problem would present itself. This was discussed, but because of the complexity of the system, putting logging around the problem would result in a massive number of writes to the log, and a very labor-intensive task to parse all that data.

What we needed was a mechanism to see the real-time, current state of certain parts of the application, and to see it on demand.

Simple Self-Hosted Web API Example

This is a technique that should (probably) be used sparingly, as I wouldn’t recommend exposing the state of all of your Windows Services via the Web API (or any other tool). With the ability to expose state externally, and in a human-readable form, there may come a temptation to allow this to bleed into the design of your application. This is similar in principle to making a private method (or property) public in the name of testability—the structure of your code/objects should reflect the domain you are trying to represent, and shouldn’t be released to production exposing state just for debugging/testing purposes.

With all those disclaimers out of the way, let’s take a look at our console app. I won’t show all the steps to setting up a new solution in Visual Studio, but will focus on the required bits to get the Self-Hosting up and running. The actual problem would be a bit much to explain, and coming up with a valid analogy is difficult, so I’m just going to use a simple example. I’m also going to use a console application instead of a Windows Service, as it works exactly the same way.

For simplicity’s sake, I chose to configure this directly in the console application’s entry point, and keep the server in a static variable. Instead of using MapRoute to setup our routes (as we would in an MVC application), we use MapHttpRoute. We also add the path “api/” before the controller in our route—this helps the consumer differentiate between a call to an MVC controller (that renders HTML) and an API/service call (that returns data). This is, of course, only a convention, and we could make the path anything we choose.

As we create an instance of HttpSelfHostConfiguration, we need to specify the URL where the service will be listening for HTTP requests. The path is hard coded in this example, and would likely be loaded via a configuration file or some other means. Reserving a URL for listening to HTTP requests requires specific permissions, so if you are not running Visual Studio as Administrator, you may get the following error when you try to debug:

The sample application is a simple (!) queue manager, where items get queued up for processing via one timer, and then get processed via a separate timer. The issue is that some items seem to get processed much later than expected. Maybe this isn’t an issue itself, but users (or developers) need insight into when and why this happens so a compensating action can be taken.

This was put together very quickly to demonstrate this technique, and should not be held up as a standard of good code (e.g. I’m pretty sure it’s not thread-safe):

And code to tell it to start processing. This can really be anywhere in Main() before the Console.ReadLine() call:

_orderProcessor.StartProcessingItems();

If we run this as-is, the processing will not be able to keep up with the number of items that are being put in the queue, as the two timers run on the same interval and some items get returned to the queue. You’ll have to use your imagination to envision a more complex case, as clearly a bit of logging code would suffice to help understand what is going on in this simple example.

To expose the state of the queue at the class level, I added the following code to the OrderProcessor:

public List<string> GetQueuedItems()
{
return _items.ToList();
}

I would again caution that if this breaks the natural encapsulation of your design, you probably just want to use this as a temporary measure while debugging, and remove it before it goes to final QA testing and production.

Finally, to expose this functionality to the outside world via http, I add a very simple controller that calls the above method:

If I wait a little longer, I can start seeing the items that are getting processed out of order, which show up randomly:

Because the output of the service is human-readable, I can easily see that Item #39 is currently queued to be processed out of order, I can see the exact reason why.

Conclusion

With this new functionality, we were easily able to root out the cause of the problem—a race condition that was the result of slow/overburdened hardware in the test environment. Although the system we were troubleshooting was much more complex than the sample shown above, the technique was identical, and involved exposing current state in a few key places. This technique should not take the place of other means of rooting out problems, like logging, code reviews and normal debugging, but in cases where simply knowing the exact current state of the system in real time can shed light on a problem, I find this approach to be infinitely useful.

June 12, 2010

(Update: someone asked me if I could post this in Excel format. It’s linked at the bottom. It took me about 5 minutes compared to ~45 with WPF. Right tool for the job? 🙂 )

(Disclaimer: I am not a tax professional, and you should absolutely do your own due diligence on all the information provided below.)

Kind of an odd thing to be posting on a blog that is supposed to be about .Net and SQL, but since (1) I’ve been obsessed with figuring out the best way of doing it and (2) since I actually wrote an app to do a specific calculation to help me figure it out (yes, I’m a nerd), I thought I would share.

There are about a million articles out there explaining the new law, and the special provisions for 2010. There are about another thousand that include calculators showing the value of converting vs. not converting, and the general advantages of the Roth vs. the Traditional IRA. I don’t have a lot to add to those topics, but the basics are:

The Roth IRA is generally a better deal than the Traditional IRA except in very specific cases.

Starting in 2010, conversions of Traditional IRAs to Roth IRAs will be allowed, regardless of income. This will be the case until the law changes again, so 2010 is *not* the only/last chance to take advantage of this.

Any money converted will be considered income, so you will need to pay the taxes out of non-retirement funds in the year of the conversion.

However, a special provision allows you to convert the money in 2010 and have it considered as income in BOTH 2011 and 2012 (50-50 split) to ease the tax burden in any one year.

As I said, I am not an expert on taxes or politics, but my non-professional opinion is that income taxes are very likely to go up in the near future and are not likely to be this low again for a long time. So converting as much as possible as soon as possible seems like a smart financial move. At the same time, the tax burden on converting 100% of your Traditional IRAs could be substantial, and it probably isn’t worth the risk of completely draining your savings to take advantage of current tax rates.

Trick #1: a less known (but not secret) fact: you can re-characterize ("de-convert") a converted Roth IRA if you change your mind for any reason, as long as you do it before you file your taxes for the year of the conversion. If the market tanks after you do the conversion, for example, and your IRA loses a lot of value, you are probably better off re-characterizing and not paying taxes on that money. You can always re-convert the re-characterized IRA at a later time (after 30 days or in the next calendar year, whichever is later).

Trick #2: building on that, I learned very recently that converting and re-characterizing is allowable at the account level. That means that you can choose to convert each and every fund that you have in your Traditional IRAs into a separate Roth IRA. That would give you the flexibility to re-characterize only those funds that lose value, and keep the conversion on those that don’t. One account per fund is probably overkill, but you get the idea–you could split your investments into separate accounts by investment type (foreign, domestic, bond, etc.). After filing your taxes, you can combine your Roth IRAs back into a single account for easier management.

Trick #3: the best trick of all, and the subject of the calculator I put together, is a trick that allows you to convert in 2010 and split your tax burden over THREE years. Unfortunately, you can’t take advantage of this if you’re single. Because each individual may independently elect to pay income taxes on the conversion in 2010 or split the taxes across 2011 and 2012, you and your spouse may strategically elect to each choose a different option, effectively giving you the maximum flexibility to take advantage of the (relatively) low current tax rates.

The application was written using Windows Presentation Foundation (WPF) and Visual Studio 2010. It will only run on Window platforms and requires .Net 4 (download) installed. I threw it together pretty quickly, so I don’t expect that it is completely bug-free. If you see any problems, post them in the comments and I will do what I can to fix them.

Excel (97-2003 format) (will show preview when clicked—you can also download the file to manipulate the values)

Essentially you can change the total conversion amounts for both you and your spouse, and adjust the marginal tax rates you expect to pay in the years 2010, 2011, and 2012 (remember that state taxes should also be considered in this rate). The application displays a grid of your tax burden for each year, and the total, for each of four strategies:

Paying all taxes in 2010

Paying all taxes split across 2011 and 2012

Paying taxes on your conversion in 2010, and splitting the taxes on your spouse’s conversion between 2011 and 2012

Paying taxes on your spouse’s conversion in 2010, and splitting the taxes on your conversion between 2011 and 2012

December 10, 2009

I was asked recently to write up a basic set of database standards and methodologies we would be using for an upcoming project. I tend to use the same standards on every project that I manage, and I am usually the final authority on determining our approach, so there usually isn’t a lot of discussion around this topic. However, in this case the client wanted some insight into our best practices, so we needed to hash it out and provide it in writing.

As a general matter, I feel strongly that having a set of standards, any standards, is preferable to everyone just doing things in their own way. I would prefer to have a terrible standard that everyone follows than to not have a standard at all–this goes for all aspects of the code base: the database, application code, javascript, CSS, etc. Obviously, having a solid standard that is rooted in pragmatism is preferable to one that is thrown together with little or no thought, or is based on an outdated principle (e.g. scarcity of storage space or Hungarian notation). I believe that conforming to a solid standard is one of the keys to developing maintainable software.

Anyway, here are my thoughts on data modeling standards and conventions. It’s basically just a brain dump at this point, and I am posting it (almost) verbatim from the initial, informal draft, with minor commentary in italics.

Modeling and Design

The logical and physical database modeling will be created up front, and will be built to reflect our best understanding of the data storage/retrieval requirements of the system.

Although the deployed database may be generated using [our ORM’s] schema export tool, the database will be designed and modeled up front, and independently of the application domain model. In other words, we will not design the database to conform with the domain structure, nor design the domain structure to conform with the database—we will develop each independently and use [our ORM tool] to bridge the gap.

This is something I firmly believe it: your data model is at least as important as your domain model, and should reflect the absolute best practices for the storing and querying of the system’s data. The data model is not an afterthought or an implementation detail—it is one of the core components of your system.

I may do a full post on this specific point, as I have pretty strong opinions on it. 🙂

Naming Conventions

Every effort will be made to make each identifier in the database as descriptive and expressive as possible.

Unless noted elsewhere, identifiers will not use a type-specific prefix (e.g. tables will not begin with “tbl”, views will not begin with “vw”).

All table, column, and view names will be expressed in lower case letters.

All table, column, and view names will be singular (“customer” instead of “customers”).

Words that make up table, column, and view names will be separated by an underscore (e.g. “customer_status_id”).

Words that make up table, column, and view names will not be abbreviated, concatenated, or acronym-based unless necessary.

Many-to-many table names will include both related table names, when possible.

Where appropriate, column names will be prefixed with their table name (“customer_id” instead of “id”).

I have to admit I don’t have an air-tight reason for this last one. I like PK/FK names to be the same on both sides, but this is not always possible anyway in cases of a self-referencing table, or multiple FKs to another table.

All identifiers will be limited to 30 characters. This rule supersedes the rules regarding expressiveness and abbreviations. [I only include this requirement if there is any chance the system will need to be database platform agnostic.]

Indexes will be prefixed with “IX_”.

Primary Keys will be prefixed with “PK_”.

Foreign Keys will be prefixed with “FK_”.

Unique Constraints will be prefixed with “UC_”.

General Design Principles

The physical representation of the data model will use physical PK/FK relationships, not logical/metadata relationships.

Seem like a ridiculous thing to have to say? As a consultant, you will definitely see legacy systems with no foreign key constraints—there are still developers out there who believe that formally enforcing referential integrity is overkill.

Each logical entity in the database will be represented by a separate table. There will be no concept of a “general entity” table.

I am not a fan of systems that track relationships between logical entities as metadata. To me this is a design smell that typifies developing a domain model (based on OOAD/OOP principles) and trying to retrofit it into a data model. A data model and a class hierarchy are fundamentally different conceptually, and the concerns that each addresses should be considered completely without the context of the other. There are ways of doing inheritance in a data model, and they certainly can work, and there are circumstances where they may be necessary, but a “separate table per logical entity” is always my opening position when starting a new project.

Common sense will dictate the up-front creation of basic indexes (e.g. indexes on foreign keys). Performance testing and tuning will determine where other indexes need to be created.

Index creation and tuning may be implementation specific; each client implementation may need a different level of tuning based on their data.

Database normalization (to 3NF) will be adhered to all aspects of the transactional data model. Normalization principles will only be violated in situations where a performance problem can’t be solved by any other means.

(EDIT: I can’t believe it I did this, as it’s such a key topic for me, but I substituted “4NF” both in this post and in the comments when I really meant “3NF”. I fixed it above.)

I originally wrote about three paragraphs about normalization, as I have very strong opinions on this topic. It was mostly ranting about misconceptions about normalization, what it does and doesn’t mean, and (again) how OOAD/OOP principles tend to bleed into database conversations (just because you have a separate address class, doesn’t necessarily mean you need a separate address table!).

Of course, ranting belongs here, not in a client deliverable, so I pared it down to the text shown above.

Where possible, a single-column, database-generated surrogate key will be created for each table in lieu of using a natural or composite key.

Fine-grained security will be handled in the application layer, and where possible a single database login will be used for application access to the database.

—

So that’s my list/rant of standards, or at least what I came up with off the top if my head. If you vehemently disagree with any of the above, feel free to say so in the comments.

December 7, 2009

Had a question today from a developer from another software project within my company:

How do I begin and rollback database transactions from a unit test?

My short answer . . .

You don’t.

. . . led to a pointed follow-up question . . .

So how do I write unit tests for my data access code?

. . . which led to a somewhat lengthy discussion. It’s an interesting topic, and one that I have pretty strong opinions on, so I thought I would write a bit about it here.

Before I start ranting, I want to be clear that what follows is my opinion, and is intended as a discussion of general principles. Every problem is different, and providing the right solution absolutely requires understanding the full context of that problem. I don’t, and never would support the idea that there is only one correct approach. Although I do strongly advocate the use of certain methodologies, technologies and frameworks, I’m all about choosing the right tool for the job. For example, even though I vastly prefer using an ORM tool over inline SQL or stored procedures, there are times when the latter is actually a better choice.

Where Should Database Transaction Code Live?

So the question behind the original question goes something like:

Should I expose the control of database transactions (start/rollback/commit) to logical layers outside of the Data Access Layer (DAL)?

In a word, no. I don’t say this because it’s a difficult thing to do (it’s not), or because it will result in a big change to the Data Access API (it won’t). I say it because exposing fine grained database transaction control outside the DAL dramatically changes the conceptual meaning of the DAL as a logical layer. The DAL is a service/set of services whose sole purpose is to encapsulate all the functionality required to save, delete, and query data to and from the database. Exposing transaction control outside of this layer violates the encapsulation of your DAL in the worst possible way.

Why? Because database transactions are one of the foundations on which any database application is built. Transactions are fundamental in ensuring the integrity and consistency of your data.I don’t think I’m going out on a limb by saying that the data in your application is by far the most important element of the system; it’s likely the whole reason your application was created in the first place.

So although implementing database start/commit/rollback is generally not complex in terms of lines of code, I would highly recommend as a best practice:

Encapsulate database transaction logic into a well-understood class or set of classes that is accessible only from within the Data Access Layer, and not left open to the wide world and any team member (current or future) who decides to tinker with it.

Then How Do I Create Automated Tests for my Data Access Code?

The question itself contains something of a false premise from my perspective. Before even attempting to answer, I would want to be sure that the following questions have been answered to my satisfaction:

Do you really need to test your DAL discretely/directly? If so, why?

It sounds like a basic question, but you might be surprised by the puzzled looks you get when you ask it. There is a reason why you want to test your DAL, so what is it? What, exactly are you trying to protect against?

I can think of a few answers that would satisfy me that some form of automated, discrete Data Access testing would need to be done:

The DAL is going to be refactored and needs some basic tests to ensure it was done correctly, and a full regression test on the system is not possible for whatever reason.

The system has some incredibly complex edge cases for data that need to be tested, and would be extremely difficult to set up for manual/integration/QA testing. The testing would be limited to these cases.

The software is very data-centric, so our discrete DAL testing essentially is our integration/system testing.

Uncertainty in the parity between ORM representation of your database, and your actual database schema (whether it be LLBL generated code or NHibernate configs).

So assuming it turns out that, yes, you need to create discrete tests against your DAL, the question becomes:

How do I create automated tests for my Data Access code without exposing database transaction control outside of the DAL?

If you’re going to be testing the DAL discretely, it’s a given that you’ll be doing it against known data. If setting up predictable, known data in advance (e.g. via a restore of a reference database) is not an option, that means you will be creating the data from within your testing code, either as part of each discrete test or as a setup method that gets hit at the start of each test run. This data can clutter your dev/test database over time, and become huge and unwieldy. Dealing with this data really becomes the core of the question. I think the options are as follows, in order of preference:

Option 1: let the data build up, and don’t worry about it. It’s a test database, so filling it up with meaningless data should not have any real impact. If the data created by these tests really is problematic (e.g. getting in the way of manual unit testing, demoing, or QA testing), create a separate database specifically for your DAL tests. In the unlikely event that even this separate database gets out of hand over time, you can simply re-create it from scratch periodically.

Option 2: track the data you create and delete it in your teardown method. If you can create the data in your setup method, what’s stopping you from deleting it in your teardown method? This will only work of your Data Access API supports deleting the type of data you are working with; I would not recommend adding delete functionality to your API for the sole purpose of testing.

Option 3: mark the created data in such a way that it can be easily deleted. I have taken this approach in the past—basically when I created data that was to be used in discrete automated tests, I would prefix and/or suffix key areas with a known string of characters, like “***TESTDATA***“. This made it easy to find and purge the data later by deleting it directly via a specially-created stored proc or ad-hoc query. This cleanup code could be run manually when needed, via a scheduled process, or as part of the build.

Option 4: use your unit testing framework’s Rollback support to wrap your test in a COM+ transaction. This can be tricky to set up, depending on the database you are using and the security environment of your servers. But if you can get past the initial setup and get it working both locally and on your build server, this may actually be the best and easiest option. MbUnit supports this functionality out of the box (MbUnit RollbackAttribute), and I believe NUnit supports it via an add-on (Google "NUnit Rollback"). I’m sure other frameworks support it in one form or another as well.

Conclusion

Use of automated testing should be part of an overall testing/QA strategy. It is a great tool for your tool belt, and one that I recommend and use extensively on my own projects. But it is a means to an end, not an end in itself. You can have 100% coverage of your application code and still have software that doesn’t work, and you can have 0% coverage and have software that is completely bug-free.

Also note that when I say “coverage” here, I mean automated coverage (as measured by a tool such as NCover). Overall test coverage really includes a number of factors, including:

Automated unit testing

Manual unit testing

Integration testing (automated or manual)

QA/System/Regression testing (automated or manual)

Consider the possibility that your DAL is being tested implicitly by some other aspect of your overall testing strategy, and that discrete, automated testing is not necessary.

December 1, 2009

If you’ve read this series up to this point, you pretty much have everything you need to implement dynamic queries in your own application. Just a few notes to wrap up the series and I will get on to other topics.

Creating a Search Service Facade

Depending on your requirements, it may be perfectly reasonable to create a tight coupling between DTO and search implementation. In cases where the return object of the search is strongly typed (i.e. not a dataset/datatable), this actually may be a necessity. However, there are a number of reasons why creating a simpler “catch-all” service interface might be desirable. You may want to actually develop the parameter/DTO implementation and query implementation separately, so your UI developer can write their code before the query has been written. You may simply want to hide the complexity of the Data Access Layer from other layers of the application. Or your search service may actually be part of a “real” service layer (SOA).

In any case, you may want to utilize a simple service facade for search execution that will take in any of your DTO types and determine the correct type of query to perform. The basic idea is that the consumer using the search service only needs knowledge of the parameter/DTO object and the service interface itself. Any coupling between the DTO and the query implementation exists only behind the service facade. So the developer calling the service needs only to set the desired values on the DTO and call SearchService.ExecuteSearch(myDTO).

This looks a little ugly because I didn’t use an interface for my service (or DTO for that matter). A more permanent implementation might use a factory to select the query, and couldlook something like this (consider this pseudo-code!):

Conclusion

Since I started using this technique in ~2005, I have implemented it on at least five systems with consistent success. The key to that success? Basing the design of the framework on the specific needs of each project.

In one of the implementations I worked on, virtually every query against the database needed to be filtered by client Id, so it made sense to create an abstract base query class, with an abstract method FilterByClientId which had to be overridden in each derived class.

In another project the requirement to filter on multiple values in the same field (with different comparison operators) came late in the development cycle, and we had to scramble to update the framework to support this.

In a third project, the query specification needed to be persisted to the database, so the DTO was actually a representation of a set of parameter values that could be used repeatedly to formulate the same query (this was for a reporting system).

Because this technique employs OOP concepts like encapsulation and loose coupling, responding to changing requirements is made much easier, and some measure of flexibility is built right in.

I’m not sure of the EXACT topic of my next post yet, but I have done a lot of work with an Inversion of Control (IOC) framework in the last year, and I have come to believe that the benefits of using these frameworks are huge. So my next post(s) will share some code and/or thoughts around utilizing an IOC container.

November 20, 2009

In my last post, we added paging back to our original requirements, leaving only sorting left to implement. So since we are covering that topic in today’s post, we can re-add it, which gets us back to all the functionality we had initially envisioned (with some minor clarifications to the original version):

Show all orders, filtering by an optional list of customers [using Company Name] and an optional list of employees who took the order and within an optional date range. By default, the orders should be sorted by date [Ascending], then customer name [Ascending], but can be sorted by any single or pair [or any number] of columns in ascending or descending order. The results should show a page of x records at a time, where x can be selected by the client. The client should also be able to request and receive a specific page of data by entering the desired page number.

Like paging, sorting is built right in to the LLBLGen Dynamic List functionality. So writing the lines of code that actually generate the ORDER BY clause is pretty straightforward. However, figuring out how to fit a dynamically selected sort order into our framework presents a few design problems that we need to address.

Sorting is a PITA

As noted in Part 1, Dynamic SQL at least allows us to avoid one sorting challenge—using (or generating) a SQL CASE statement and casting the result to a common type. But whether you are using static SQL (a stored procedure) or Dynamic SQL (as we are in this article), a common problem remains: providing the consumer the ability to effectively express the intended sort order without exposing the inner workings of the query.

Our query (and any query) returns a fixed set fields in the result set. The client needs the ability to communicate to our service (via the DTO) the fields that should participate in the sort order, the rank of each field (first sort by fieldX, then by fieldY), and the operator to use for each (ASC or DESC). So how do we express this in the DTO? Remember that while the query service does (and must) have knowledge of the the DTO, the DTO has no knowledge of the query service, so there is no guarantee that the DTO’s fields match with the fields in the service.

There are a number of ways we can handle this, but none of them are perfect. Most of the time we end up simply managing sort fields by ordinal number. So in our query, if a client wants to order the data by company name ascending, then order date descending, the code that sets values on the DTO will need to indicate that we want “field 3 ascending” then “field 1 descending” (3 and 1 are the ordinal numbers that represent the positions of company name and order date, respectively). This convention needs to be well understood by the developer who is writing the code to add these values to the DTO, as it can get confusing when, for example, a grid in the UI puts fields in a different order than the way they come back in the result set.

Wait, Can’t We Just Sort on the Client Side?

Deferring sorting to the client side (e.g. jQuery, Dataset/Dataview sorting) may look tempting. And it actually is a viable option for paging, sorting and filtering, if (and only if):

the set of data is small, and

there is little concern about performance and scalability

From my experience, the systems that meet this kind of criteria are few and far between. Note that it’s not possible to mix and match sorting/paging in the application/database—if you ask the database for an unsorted 100 rows of data, then sort those rows on the application side, you’re likely not going to get the set of data you were expecting. This may seem obvious, but I’ve seen the mistake made enough times that I wanted to at least mention it.

It is much more efficient to sort and filter in the database for a number of reasons. For one thing, database engines are highly optimized for doing exactly the kind of work that sorting and filtering entail; this is what their underlying code was designed to do. But even barring that—even assuming you could write code that could match the kind of sorting, filtering and paging performance of a mature database engine—it’s still preferable to do this work in the database, for the simple reason that it’s more efficient to limit the amount of data that is transferred from the database to the application server.

So for example, if you have 10,000 rows before filtering, and your query pares that number down to 75, filtering on the client results in the data from all 10,000 rows being passed over the wire (and into your app server’s memory), where filtering on the database side would result in only the filtered 75 rows being moved between database and application. This can make a huge impact on performance and scalability.

Sorting is a PITA 2: Revenge of the RDBMS

One last note about using sorting and paging in conjunction. A query that implements paging can have odd results if the ORDER BY clause does not include a field that represents an empirical sequence in the data; sort order is not guaranteed beyond what is explicitly specified in the ORDER BY clause in most (maybe all) database engines. An example: if you have 100 orders that all occurred on the exact same date, and you ask for the first page of this data sorted by this date, then ask for the second page of data sorted the same way, it is entirely possible that you will get some of the data duplicated across both pages. So depending on the query and the distribution of data that is “sortable,” it can be a good practice to always include a unique field (like a primary key) as the final field in a sort clause if you are implementing paging.

Have I mentioned that sorting is a PITA?

Our Approach

Now that I’ve explained my views on Life, the Universe, and Sorting in 5000 words or less, let’s get to the ~20 lines of code that need to be added to implement sorting in our framework.

Let’s create an enum to represent sort direction (ascending or descending). LLBLGen Pro actually provides a similar type out of the box, but for the purpose of this article, I don’t want knowledge of the LLBLGen framework exposed above the service layer (to be clear, this is not a restriction that I normally enforce):

publicenumSortDirection { Ascending, Descending }

Next we need a class to represent the combination of (ordinal) field and the sort direction:

Since we will provide the ability to sort by multiple fields, we need a list of our SortField class on our DTO. Since sorting will likely be supported on every implementation, we’ll add it to the base class. I’m only showing code that changed in the base class here—grab the source code if you want to see it in context:

privatereadonlyList<SortField> _sortFields = newList<SortField>();

publicvoid AddSortField(SortField sort) { _sortFields.Add(sort); }

publicList<SortField> SortFields {get { return _sortFields; } }

Our Dynamic List wrapper needs to change in a few ways. We need to create a private method to transform our list of sort fields into a sort expression:

Finally, we need to implement the default sort order from our requirements if no order was specified, and call the new version of the ExecuteQuery method. Note that we also add the PK to the sort order to ensure that our sequence is empirically maintained between pages of data (see above):

That should do it for sorting, and that should be all you need to get up and running with a single implementation of a Dynamic Search Object. In my next post (which I promise will be brief!), I’ll talk a little bit about creating a clean service interface for situations when you have many implementations.

November 17, 2009

Show all orders, filtering by an optional list of customers and an optional list of employees who took the order and within an optional date range.

While this took us a step closer to our original requirements (see Part 1), we are still missing a few features–paging and sorting. We’ll cover paging in this installment, and save sorting for next time. So for now our requirements can be amended to add:

The results should show a page of x records at a time, where x can be selected by the client. The client should also be able to request and receive a specific page of data by entering the desired page number.

Our framework is based on LLBLGen Pro’s powerful dynamic list functionality, which already supports paging, so adding this flexibility is very straightforward. We already have a class that encapsulates our search criteria (the parameter or DTO class), so all we really need to is modify it with a few additional fields, then use those fields at the appropriate time in our query service.

Violating YAGNI

I’m going to violate YAGNI slightly by creating a base class for our DTO which will store and expose paging criteria. Why? Because I know for a fact that I’m going to be writing more than one of these dynamic searches, and I’d rather not re-create the same functionality in each implementation.

The only change to our DTO/parameter class (not shown) is that it now inherits from a base class:

Our search Service class didn’t need to change much–the call to the execute method now passes the paging information from the DTO object:

return listManager.ExecuteQuery(param.PageNumber, param.PageSize);

I also fixed a bug in the service class that I came across when I implemented paging (glad I decided to make writing tests part of my process!). I found an explanation of why specifying the type is necessary here. The code in question before:

I’m also going to modify the SearchQueryResult class to include the paging information. This is useful in scenarios where the consumer of the result (e.g.the view in MVC) needs this information but has no knowledge of the the DTO or its values.

November 16, 2009

In my last post, we created a Dynamic List Wrapper to help facilitate generating dynamic SQL. Our wrapper exposed the PredicateExpression property of LLBLGen’s RelationPredicateBucket class, which is essentially the mechanism by which a developer can add to the WHERE clause of the final query. In this installment, we will create a helper class that will ease transforming our filter criteria into an LLBL Predicate Expression.

How Far Should We Take This?

I have implemented this pattern in a number of systems to date, and the widest range of difference between the implementations is, without question, the level of flexibility/granularity in the filtering system. There are a lot of possible options that can really amp up the power and utility of your framework:

Support for all comparison operators: =, !=, >, <, <=, >=

Support for all string comparisons: Equal, StartsWith, EndsWith, Contains

Support for aggregate filters (HAVING clause)

Support for combining filters with AND or OR (“OrderDate > x AND OrderDate <= y” )

Support for subquery filters (“WHERE CustomerId IN (SELECT CustomerId FROM . . .)”)

At the same time, implementing all these features increases complexity and adds time to unit and system testing. For example, if you implement the ability to filter on multiple values on a string field AND choose the string comparison used for each value, your parameter object will need to express both the value and the operator to use, otherwise the query implementation will have no way to interpret it (you might use a dictionary or more complex structure to accomplish this). This will result in more code paths, and many more cases to test against known data to ensure that your code is working properly.

Once again, the overall requirements of your system should dictate how much of this flexibility really needs to be built.

Keeping it Simple (at Least for Now)

We don’t need to get that granular for this article, so we’re going to define a few conventions:

Date comparisons supported will be “greater than or equal” and “less than or equal”

All other comparisons will use the Equals operator.

Before we get to the code, let’s add back some of the requirements that we pared down in Part 1. We limited our requirements to:

Show all orders, optionally filtered by any combination of: a customer, occurring after start date, occurring before an end date.

Let’s modify that to:

Show all orders, filtering by an optional list of customers and an optional list of employees who took the order and within an optional date range.

Previous code examples have shown the customer filter to use the CustomerId field. I’m going to change that slightly here to demonstrate the string comparison functionality, which really wouldn’t make a lot of sense with an ID field (which would likely use an exact match, as criteria would be picked from a list). I’m going to switch this to filter against the CompanyName field, which a user might more realistically want to search using only part of the name.

The Code

Here is the code for the predicate helper. Since our requirements are pretty simple, there is no reason to break this into smaller classes at this point:

November 12, 2009

In my last post, I demonstrated a concrete implementation of a dynamic query, not very far removed from the basic functionality that ships with LLBLGen Pro. In this installment, we’ll create a convenience wrapper that will generally make creating dynamic lists easier, and further facilitate rapidly moving from a set of requirements for a search to a working implementation.

The wrapper is essentially a stateful class that combines the structures that make up a Dynamic List, and provides a convenient way to put those structures together and then execute the query.

This class will be responsible for:

Allowing fields to be created

Managing the GroupBy collection via field creation

Allowing relations to be added

Allowing filters to be added

Allowing sort expressions to be added (future post)

Add paging (future post)

Managing the state of all of the above structures before query execution

Executing the query and returning a SearchQueryResult

The exact details of the wrapper should be tailored to the overall requirements of your system. For example, if none of your search results require any kind of aggregation, the GroupBy clause functionality isn’t necessary.

Note that the responsibility of defining fields, adding relations, creating aggregates, etc. are still the responsibility of the query implementation. The core of the pattern is to have one query implementation per parameter implementation; there is no getting around having to build classes that have specific knowledge about the fields and entities involved in the query. Our goal is to make each implementation as simple as possible.

//setup relations. This is roughly the equivalent of the FROM and JOIN statements listManager.AddRelation(OrdersEntity.Relations.CustomersEntityUsingCustomerId); listManager.AddRelation(OrdersEntity.Relations.EmployeesEntityUsingEmployeeId); listManager.AddRelation(OrdersEntity.Relations.OrderDetailsEntityUsingOrderId);

//filter by StartDate if it was providedif (param.StartDate.HasValue) { listManager.Filter.Add(OrdersFields.OrderDate >= param.StartDate); }

//filter by EndDate if it was providedif (param.EndDate.HasValue) { listManager.Filter.Add(OrdersFields.OrderDate <= param.EndDate); }

return listManager.ExecuteQuery(); } }

This is a pretty good start, and is an improvement over what we put together in Part 1. However, we’re still dealing with a pared-down set of requirements which really don’t necessitate a dynamic query. In my next post, we’ll add back some of these requirements, modify our parameter object to support more flexible filtering, and create a helper class to simplify creating those filters.

November 10, 2009

A problem that is common to almost any database application is the need to search the database for specific information, based on a set of criteria provided either by a user/UI or another process. When the requirements are simple, for example:

Show all orders for a specific customer that occurred during a specific period of time.

the problem is easily solvable using a static query/stored procedure:

SELECT * FROM orders WHERE CustomerID = @customer_id AND OrderDate Between @start_date AND @end_date

However, the problem becomes much more complex when the requirements begin to include things like:

Multiple criteria for a specific field

Optional criteria

Optional sorting, especially with reversible field sorting (ascending and descending) and the ability to sort by multiple fields

Show all orders, filtering by an optional list of customers and an optional list of employees who took the order and within an optional date range. By default, the orders should be sorted by date, then customer name, but can be sorted by any single or pair of columns in ascending or descending order. The results should show a page of x records at a time, where x can be selected by the client. The client should also be able to request and receive a specific page of data by entering the desired page number.

There ARE ways of accomplishing all of this with static SQL, but the end result is a massively complex query that is not easily maintainable. The sorting alone is a mess, requiring a SQL CASE statement that needs the result of each case to be cast to the same type (see Dynamic ORDER BY Statements in Stored Procedures). Using multiple criteria per field requires passing an array of values as a single parameter, which then needs to be split back into separate values and integrated into the query (see Arrays and Lists in SQL Server 2005 for what this looks like in SQL Server). Optional criteria requires testing the parameter against NULL as part of the WHERE clause, e.g. "WHERE CustomerID = @customer_id OR @customer_id IS NULL."

Thankfully, with the growing acceptance of Object Relational Mapping (ORM) and the Dynamic Query Engines (DQE) that they provide, creating powerful, flexible queries that support highly complex criteria has become much easier. In this series of articles, I will demonstrate harnessing the power of a DQE to meet the above requirements (and beyond?) in a type-safe and highly maintainable way.

Implementation Details

I will be using LLBLGen Pro (v2.6) as the ORM/DQE. More specifically, I will be using the Dynamic List feature of LLBL, which results in vanilla DataTables/DataSets as a query result. Although DataSets (vanilla or typed) are not a good substitute for real business objects, I find them extremely useful for representing lists of data, particularly when the data includes columns from multiple database entities. A flat list of data is frequently convenient for iteration when data binding (in Web/WinForms) or generating HTML (in MVC).

While this article will focus on creating a set of tools to be used in conjunction with LLBLGen and DataSets, the overall intent is to demonstrate the concept of harnessing any DQE. This same pattern is equally applicable to using full-featured business/domain objects, and should be possible using any mature ORM framework, including NHibernate, Entity Framework, or Linq To SQL. I am choosing LLBLGen as a DQE because it is the one I am most familiar with, and I find its predicate syntax to be very intuitive and expressive.

I will be using SQL 2005 as a database engine, and the Northwinds database as a target schema. You won’t need a full version of LLBLGen to tinker with the sample code, as I will provide the LLBL-generated code, which is freely distributable.

Solution Overview

The basic concept is really quite simple. Create a class (essentially a DTO) that represents all the possible criteria the client requires. Then create a corresponding class in the Data Access Layer (DAL) that can translate the values from the DTO into a dynamic query.

The client only needs to have knowledge of the DTO, the service, and the type that is returned by the service (Business Object, DataSet, List, etc.). In the case of my example, some knowledge of what will be provided in the DataSet (column names, for example) will be required as well, since DataSet columns are not typed.

That’s really all there is to it. As simple as it sounds, I have found it to be a very effective way to quickly create powerful searches that perform well and are easy to maintain.

A Super-Basic Example

Before getting too deep into creating abstractions, wrappers, and helper classes for our framework, let’s start with a concrete example of how this is going to work. Let’s temporarily pare the requirements down to:

Show all orders, optionally filtered by any combination of: a customer, occurring after start date, occurring before an end date.

This is intended as an illustration only—if the requirements were really this simple, a more straightforward approach, like a single method that takes in three parameters, would probably be a more appropriate solution.

//setup relations. This is roughly the equivalent of the FROM and JOIN statementsvar bucket = newRelationPredicateBucket();
bucket.Relations.Add(OrdersEntity.Relations.CustomersEntityUsingCustomerId);
bucket.Relations.Add(OrdersEntity.Relations.EmployeesEntityUsingEmployeeId);
bucket.Relations.Add(OrdersEntity.Relations.OrderDetailsEntityUsingOrderId);

I know what you’re thinking: that looks like a lot of code to do something relatively simple. But remember that:

Our “real” requirements are going to add much more complexity, and

we’re going to create a more refined toolset to make much of this code reusable.

In my next post, I will create the first draft of our “Dynamic List Wrapper”, which will make creating fields, relations, and aggregations easier, and will centralize the code that actually makes the calls to the database.