25

by Joel Spolsky

Thursday, March 25, 2004

Thanks to everyone who came to the open house last night. If you have pictures, send me a link!

We had an interesting conversation about how the impedance mismatch between contemporary high-level programming languages (Java, C#, Python, VB) and relational databases. Since a huge percentage of code requires access to databases, the glue (a.k.a. the connecticazoint) between the RDBMS layer and the application code is very important, yet virtually every modern programming language assumes that RDBMS access is something that can be left to libraries. In other words, language designers never bother to put database integration features into their languages. As a tiny example of this, the syntax for "where" clauses is never identical to the syntax for "if" statements. And don't get me started about data type mismatches: just the fact that columns of any type might be "null" leads to an incompatibility between almost every native data type and the database data types.

The trouble with this is that the libraries (think ADO, DAO, ODBC, JDBC, embedded SQL, and a thousand others) need to be general purpose to be reusable, and yet what you really want is a mapping between a native data structure and a table row or query result row. Inevitably, you have to hand roll this mapping and wire it up manually, which is error prone and frustrating.

I think this is a fatal flaw in language design, akin to the bad decision by the designers of C++ that it was not necessary to support a native string type. "Let a thousand CString/TString/String/string<char> types flourish," they said, and then spent more than a decade adding new features to the language until it was marginally, but not completely, possible to implement a non-awful string class. And now we have a thousand string types (most large C++ bodies of code I've seen use three or four) and a bunch of really good books by Scott Meyers about why your personal hand-rolled string class is inadequate. It's about time that a language designer admitted that RDBMS access is intrinsic to modern application implementation and supported it in a first-class way syntactically.

Now for all the disclaimers to prevent "but what about" emails. (1) in functional languages like lisp the syntax layer is so light that you could probably implement very good RDBMS shims in ways that feel almost native. Especially if you have lazy evaluation of function parameters, it's easy to see how you could build a "where" clause generator that used the same syntax as your "if" predicates. (2) Access Basic, later Access VBA, had a couple of features to make database access slicker, specifically the [exp] syntax and the rs!field syntax, but it's really only 10%. There are probably other niche-languages or languages by RDBMS vendors that do a nice job. (3) Attempts to solve this problem in the past have fallen in two broad groups: the people who want to make the embedded SQL programming languages better (PL/SQL, TSQL, et al), and the people who want to persist objects magically using RDBMS backends (OODBMSes and object persistence libraries). Neither one fully bridges the gap: I don't know of anyone who builds user interfaces in SQL or its derivatives, and the object persistence implementations I've seen never have a particularly good implementation of SELECT.

Have you been wondering about Distributed Version Control? It has been a huge productivity
boon for us, so I wrote Hg Init, a Mercurial tutorial—check it out!