Tuesday, March 13, 2007

Is SQL Manual Memory Management?

One of the huge advantages to programming in languages like Ruby, Perl, Lisp, Python, Smalltalk, Java, and JavaScript, among many others, is that you never need to do manual memory management. You don't reference and dereference pointers; you really don't even need any understanding of how memory works internally at all.

This is a stark contrast to the experience of programmers from the C/C++ generation, as well as (obviously) anybody who works with those languages today. I've never learned a thing about pointers in my life, except during a brief exploration of C and C++ out of curiousity, inspired by friends who worked in video games. It's never been necessary knowledge. Garbage collection is so pervasive today that many, many programmers have never written a line of code without it.

One of the most awesome things about Rails is ActiveRecord, and the near-seamless experience of writing all your migrations in Ruby, so that you can easily code tons of different complete, fully-fledged applications without ever even seeing a single line of SQL. This prompts the question, is SQL essentially the same thing as manual memory management?

Anybody who's ever wrestled with SQL -- especially anybody who had to do it by hand in Perl in the mid-90s, before Perl even had its DBI library -- is going to find the idea that auto-generating SQL is the same thing as garbage collection very appealling and intuitive because the obvious implication is that SQL is, in some sense, garbage. The counterargument is that there are still plenty of cases, even in Rails apps, where you need all sorts of direct access to the database, because there are so many advanced database features that Rails doesn't even make the slightest effort to address. But this counterargument just gives credibility to the main idea -- I'm sure that when C reigned supreme, there were a lot of limitations to using garbage-collected languages, and that these languages had disadvantages which forced even their biggest fans to use C for certain things, until the garbage collection got good enough that it was no longer necessary.

This is kind of programming language sci-fi, but maybe one day people will look back on Rails not as the best Web framework ever, but as the first SQL-collected language. Maybe one day the idea that database management should be part of the language will seem as obvious and basic as the idea that programmers shouldn't have to do memory management by hand. Maybe a language which doesn't handle SQL for you is like an operating system with no sockets model. Maybe an object which can't write itself to the database is a primitive thing. (For that matter, maybe an object which can't generate a URL to itself is a backward thing too.)

I'm sure that when C reigned supreme, there were a lot of limitations to using garbage-collected languages, and that these languages had disadvantages which forced even their biggest fans to use C for certain things, until the garbage collection got good enough that it was no longer necessary.

This is still true. For example, you'd be hard pressed to find a bigger fan of Ruby than _why the lucky stiff. However his excellent HTML parsing library for Ruby, Hpricot, drops into C for speed in some places.

Likewise, there are many places where the SQL that ActiveRecord generates is perfectly acceptable, even on par with hand crafted queries, but some tweaking and manual SQL work is probably going to be necessary in performance-sensitive parts of applications for the immediate future.

I think the problem with the RoR model, what I've seen of it, is it presumes that an object represents a set of field values from one table, and that most of the field values should be accessible to the application. This model does not seem to address the idea of many-to-many tables, where the whole purpose of the table is to create a relationship from two other tables. It also doesn't take joins into account, where the idea is to denormalize a set of tables and their field values, so their data can be presented in a single UI table, for example. Joins also present the problem that they tend to be read-only. This makes C/R/U/D impossible. You can read the join in, but to create, update, or delete, each table must be addressed on its own terms. One could argue that you can read in the necessary tables and do the join with objects, but now you've wasted resources, because you probably don't need all of the field values of all of the tables that get read in through ORM, and you've broken ORM encapsulation, because you have to construct a class that represents the join, iterate through the objects that were created through ORM, create the join representation yourself in code, and somehow map that to UI elements. And you still have to somehow update the tables individually, if you want C/R/U/D. Personally I think doing the join at the database level is easier, but with ORM it's a sticky situation.

As with all problems in computing, it seems there needs to be another layer of abstraction added so there is no more literal mapping of objects to tables. What it really means is integrating more database metadata into the object model, so that when it sees more advanced structures coming forth from the database it will have some rules to follow so the data model is not broken.

@Jeff Moore

LINQ takes a different approach than Rails. Rails is a data-driven approach. You define the data model in the database, provide some hints to the ORM, and ActiveRecord takes care of the rest. You don't even write a query. The data is just there in objects at the right time. With LINQ you still write queries, but Microsoft has abstracted the process so it's more efficient. You are no longer creating connection and command objects with connection and DML strings, and dealing with datareaders. With LINQ you write the query in code, and .Net does all that stuff under the covers. What you get back is a collection of objects you can iterate through. What's cool about LINQ is if you want to create your own data model of objects in memory, you can query them as well with it, writing the queries in the code.

I think the problem with the RoR model, what I've seen of it, is it presumes that an object represents a set of field values from one table, and that most of the field values should be accessible to the application....

Active Record, the ORM used in Rails, does address these issues, some better than others.

You really need to get in there and code something tricky with Rails to find out how it handles those many-to-many things. It actually has pretty sophisticated systems for doing it; it's just that they are very different from what you would expect, if you're used to handling it with manual SQL.

uThe thing about the whole 'do the work in SQL/do the work in Ruby' thing is that the computation ends up happening somewhere. If you build an all singing, all dancing system that will do as much of the computation as possible on the SQL side, you're using a bunch of computrons building the query as well as all the computrons that get used inside the the RDBMS.

ActiveRecord's approach is to apply the KISS principle to the mapper and avoid building complex SQL in favour of expressing the computation in Ruby (which is, after all, what Rails programmers probably know). This probably uses more computrons overall, the trade off is that you get to write everything in the same language. (And if you really need to get close to the database metal you can always roll your own SQL and use that.