Trick Question – Part I

Trick question: How do you connect an application to a database? What's best practice?

It seems simple enough, yet there's so little consensus about how to do this basic thing that we grapple with every day. In my work, I have to poke and prod a wide variety of third-party applications, and I can say that there's fairly little consistency -- and, frankly, a pretty low level of quality -- around this issue, if you go shopping to buy software.

Everyone is basically doing the same "thing": creating an object-oriented application (be it web, web service, windows client, you name it) and using a database engine like SQL Server to store the data. Yet everyone is doing that in a different way, and the various methods are very frequently pain points. The result is that things break, it seems, much more often than necessary.

I think that the root cause is that this problem seems simple, but is in fact deceptively complex. The result of its perceived simplicity is that software engineers "skip over" this part of the design as a self-evident bit of architecture (we'll "just use a database") and only later do the gaps in the design become real problems. They don't have time, or don't make time, to consider things like the object-relational mismatch, or how the business consumers of the software will really use and abuse their creation when it's loose in the world, or the full implications of their selection of an ORM tool over some other data access method.

So, perhaps foolishly, I am wondering if it's actually possible to lay out how to do this well, in today's environment, using less than a book's-worth of text. It might not be feasible, but I want to take a shot at it anyway. Where are we in the industry, really, if it's not possible to outline the best way to connect applications to data in a couple of pages? I am not an expert, and I'm sure I won't get this exactly right, but perhaps this can at least generate some good discussion. It might be that I have no idea what I am talking about. That happens a lot.

First, some prerequisites: I'm going to start with the assumption that we know how to write both SQL and OO code well. The audience, lets say, is skilled but looking for good design ideas, so we won't be in the weeds with issues about how to program. (Advice like "Don't use cursors," for example, is a given.) I also want to take as a given that we know about the advantages and disadvantages of things like stored procs, and not belabor that. Finally, it seems reasonable to assume that we want to pursue some basic goals with our imaginary system that are everyone shares:

We want the combination of database and application to be robust and performant.

The system should be safe from common attacks like SQL Injection, and should handle failures gracefully.

We should be able to make revisions to the system to add features or enhance performance without undue pain and suffering. It should be truly modular.

We are in the real world, making actual apps that some group of people like a business might buy and/or use. Reality counts.

This is just a basic list of goals that it seems like any application architect would want, yet many real-world applications break down around just these fundamental issues.

From a database perspective, most applications look sort of like this:

We’re interested in the best way to structure the yellow and orange bits in the center. We want to negotiate the divide between OO structures and relational structures, while retaining most of the goodness of abstraction, modularity, security, extensibility and so on.

Worthy of note: every real organization I have worked with uses SQL Server or other relational databases as an integration platform, and not just an application “back end.” There are problems and challenges with that, but it’s a basic fact. The idea that the database has some invisible shield around it allowing only one privileged app to connect is a myth, and a destructive one:

Fantasy?

In fact, IT groups in many organizations actively select applications where they are allowed access to the data directly in a standard format and interface like SQL. Almost everyone wants or has a data warehouse and a BI solution, or at least custom reporting. The real landscape, for better or worse, looks like this:

This is imperfect, as many developers have pointed out, because “raw” data in SQL Server tables has very little context around it, and only very basic type information. Often it would be ideal to have more sophisticated integration, such as web services, that could be more in tune with the details of how an application models a business process rather than how a database models it – and I would agree, but many businesses simply don’t have that layer or the means to create it, and they do integrate directly from data.

The problem is aggravated by those applications that don’t make any real attempt to accurately model business process in the database, putting all their energy into the object model in the application, alone, on the flawed assumption that the app is the only access that users or systems will have to the data. It’s one of many symptoms of the object-relational conflict we constantly grapple with.

So, at a minimum, we have to look at the goals we’ve set in the context of a real-world scenario where multiple client applications will also access the database, and the data will probably outlive the application.

In my next post, I’m going to start trying to break down what the key architectural decisions have to be made to arrive at a reasonable plan for a data access layer, and see if I can document two or a handful of real, simple design patterns in a reasonable space. I'm a halfway smart guy, and if I can't do that I think we're all in a heap of trouble.