LINQ to SQL: Objects all the way down

There are a lot of different opinions on just what LINQ to SQL is or is not.Some of them are actually based on investigation by people who have kicked the tires and taken a test drive. Being its architect, I realize that I am the foremost expert on what’s under the hood, yet that alone does not give me the omniscience to know how the product measures up to everyone’s expectations.What it does give me is the ability to share with you the impetus behind the design and hope that gives you enough insight to make up your own mind.

It’s probably no secret to anyone that’s been following along that the high-order bit with LINQ to SQL has always been LINQ; though it might not be so obvious just how deep that truth goes. LINQ to SQL is not just another LINQ provider. It has been from the beginning and still is the quintessential LINQ provider, shaping and being shaped by the same process that designed LINQ.So to understand the impetus behind LINQ to SQL you need to understand the impetus behind LINQ.

LINQ is more than just new query syntax added into existing programming languages. LINQ is the recognition that objects (in our case CLR objects) are the core of everything you as a programmer do. In the past, you had multiple disciplines to master, including databases, transforms and other scripts. Yet at the center of all your work was your mainstream programming language and runtime that you did everything else with, and instead of using objects to naturally represent this diverse work you used coarse grained API’s, instead of built-in language constructs supported by your development tools you used domain specific languages wedged into the system as unrecognizable text.Using these API’s was adequate but not ideal. You got your work done, but it was like poking at your data through keyholes, and when some of your data did get sucked back through it was basically dead-on-arrival, since none of your tools for manipulating it carried forward across the divide. Believe me, I know. I built many of these API’s.

So for LINQ the high-order bit was not only making it possible to represent your domain information as objects but making the ability to manipulate those objects in domain specific ways first class. LINQ to SQL was the poster child. The important area of manipulation was the query and since query was pervasive throughout most other highly used domains it was obvious that query would need to become first class in the language and runtime. LINQ to SQL’s primary job became the representation of the relational database data as objects and the translation of language integrated queries into remotely executing database queries.

Of course, since this coincides with the territory of ORM systems it should come as no surprise that LINQ to SQL has taken on that role as well, enabling a degree of mapping between the objects you use in the language and the shape of data in the relational database. We took from the experience of customers the most valued features of such systems and laid out a roadmap for delivering those features, yet like with any shipping product reality eventually crept in, so priorities were set and unfortunately a lot that we would have loved to do did not make the cut for the first version. But this is no apology. LINQ to SQL has amassed a set of features that will be compelling for a large part of the market, and over time it will only get better.

The truly interesting thing to understand about LINQ to SQL is just how deep the rabbit hole goes.

One of our primary tenets from the get-go was to enable plain-old-CLR-object (POCO) development. We received enough feedback from earlier prototypes of ObjectSpaces to know that customers really cared about this and what specifically about it mattered the most to them. And yet while we found reason to offer specialized classes such as EntityRef and EntitySet, we never strayed from the objective, since use of these classes has always been optional. You certainly can build an object model with plain-old object references and collections such as lists or arrays; you just don’t get deferred loading or bi-directional pointer fix up. And although some would have preferred for us to invent a cheap interception solution that would have allowed these behaviors without needing to use specialized classes, no such solution was on the horizon and the use of these types could easily be disguised behind properties.

It’s also worth pointing out that these specialized classes don’t actually cause you to mingle data access logic with your business objects. EntityRef and EntitySet don’t actually tie back to the database or LINQ to SQL infrastructure at all. They were designed to be completely free of any such entanglements, having no references back to contexts, connections or other abstractions.Deferred loading is based on assigning each a simple IEnumerable source, such as a LINQ query. That’s it. You are free to reuse them for whatever diabolical purpose you can imagine.

But it does not stop there. The use of objects, your object’s specifically, and the CLR is pervasive throughout the design of LINQ to SQL. You see it in the way that behavior overrides for Insert, Update and Delete are enabled. Instead of offering a mapping based solution to connect these submit actions to database stored-procedures, the solution is designed to take advantage of objects and the runtime, the ability to add code to a system by defining a method. You define an InsertCustomer method as part of your DataContext and the LINQ to SQL runtime calls it, letting you override how an insert is performed. You can do anything you want in this method, logging, tracing, executing any SQL you prefer, or no SQL at all. Of course, all this is wired up for you when you use the designer to connect a submit action to a stored procedure.But the beauty of it lies in the simplicity of using the runtime and basic extensibility mechanism of the language to enable any custom solution you require.

You see it in the way that mapping can be defined using CLR custom attributes. Of course, an external mapping file variation is also available, but the attribute model was paramount. Some will argue that using attributes in the code breaks from pure POCO design. That might be true. However, it’s precisely the ability to declare mapping inline with the definition of your objects that makes LINQ to SQL simple to use and easy to get started because you always stay focused on your objects and your code.

You also see it in how LINQ to SQL operates internally or communicates to its provider. Queries are LINQ expression trees handed all the way down to the query translation pipeline. It’s your object types and runtime metadata that are reasoned about directly, constructed, compared and navigated. Even stored procedure calls are understood as references to the actual signature of the method defined on the DataContext and the results of ad hoc queries (projections) are never some general object with awkward accessors like DataReader, they are always your objects or objects defined implicitly through anonymous type constructors and are interacted with though strongly typed fields and properties.

Looking back, a lot of this just seems obvious now, but believe me, none of this was readily apparent at the time we started the project. For example, we designed ObjectSpaces with none of this in mind. Before LINQ not much of it was even possible to consider. Yet when it came time to build LINQ to SQL, tradeoffs in design were resolved by keeping true to your objects and the simplicity gained by using the built-in mechanisms of the runtime and language to manipulate them.

LINQ allowed us to finish the puzzle that was started when database access was first mashed together with object oriented languages. LINQ to SQL became the embodiment of this object-to-database solution, focusing the design on query, domain objects and the integration of both into the language and runtime.

Of course, there were many other design goals as well; simplicity, ease of use and performance lead to many interesting consequences that are equally deserving of their own post. I suppose I ought to write about them too. J

A great essay on LINQ to SQL that’s unencumbered with marketing hype and pr bs. There are other features that many of us would like to see implemented in v1 (a more flexible inheritance model, RDBMS agnosticism, etc.)

"Yet at the center of all your work was your mainstream programming language and runtime that you did everything else with, and instead of using objects to naturally represent this diverse work you used coarse grained API’s, instead of built-in language constructs supported by your development tools you used domain specific languages wedged into the system as unrecognizable text. "

Well, perhaps in nhibernate’s HQL, but a lot of o/r mappers out there don’t use string based queries for example, and our o/r mapper even uses objects as meta-data elements you can use to build your projections/queries/filters what have you with in code.

So it’s not that Linq brings something new to the table, it brings the same in a different format. The sad thing is however that it is baked into the C# and VB.NET syntaxis. This is precisely what I find really odd. Why didn’t you implement a DSL awareness in C# and VB.NET so that the context switching between the languages (C# vs. DSL and back) is done for you and you can refer to each other’s elements.

The advantage of that system would have been that you would be able to add Linq to C# in the form you would have done now, but at the same time, any other language would be able to be addable to C# as well leveraging the same system. Now the elements added are mainly added to provide Linq support and also in such a form that linq is possible but not anything else. The main example for that is the presense of extension methods but the absense of extension properties (for example).

There’s a golden rule in DSL land: you will have to refactor your DSL, no matter what. The thing is: if you embed the DSL inside a language like C#, refactoring it would be painful as you can only add things, otherwise migrating code will be hell.

Also, I find the focus of you all onto POCO a little odd. The thing is: you didn’t design it into the framework at all. If you would have done that, you would have implemented some form of dyn. proxy or post-compile IL manipulation. The thing is: if you don’t, the system will be pretty dumbed down: every feature you actually want will have to be written by hand into the domain classes (and why should a user do that, s/he uses a tool to get RID of that dumb repetitive code!) (pk/fk syncing/collection syncing etc. this isn’t trivial code nor a small blurp) OR has to be done by the central session/context object (change tracking) but as there is no helper code inside the domain object, this will be terribly slow compared to solutions which do offer the helper code (either through dyn. proxies or post-compile IL manipulation).

As a developer of an o/r mapper which doesn’t use POCO, I know where the POCO people come from and I do understand why they want POCO classes in the beginning. The thing is though: why would they pick Linq to Sql over the other POCO offerings, if Linq to Sql has no advantage for them in the POCO field, simply because it will bring them only MORE work and less performance?

I then can only conclude that it’s a little odd POCO is suddenly a major design goal and a USP. ;).

Btw, I’m pretty convinced that most developers who want to use o/r mapping aren’t interested in POCO at all. POCO has its place, but one also has to understand that the reason it exists is mainly EJB-CMP, (POJO) and not something else. Having POCO classes which automagically become persistable has a price, and as soon as a developer understands what the price is, added to that the realization that in .NET land things like INotifyPropertyChanged on the entity, IBindingList/ITypedList/IListSource implementing collections, pk-fk syncing at runtime etc. etc. are must have elements of your domain/collection classes, POCO isn’t that important anymore, because why type all the code yourself if you get it for free from a base class? Isn’t that what OO is all about?

Thanks for yet another interesting posting on the LINQ to SQL journey. It’s refreshing to read about the thoughts, tradeoffs and reasons why LINQ to SQL became what it is today (as a fellow architecht, I share your pain – great API design is an art that takes a long time to master).

I am very curious about the performance of LINQ to SQL in beta 2 – hoping that the initialization of DataContexts as well as entity materialization will improve.

As mentioned in the end of your essay, a dedicated essay on Performance within LINQ to SQL would be great. Again, thoughts shared on your design proces may help others make smarter choices.

With the anticipation of the "mini connectionless datacontext" (an ObjectTracker?) as well as the announced improvements arriving in beta 2, it’s clear that the elegance of LINQ to SQL is about to improve.

"But the beauty of it lies in the simplicity of using the runtime and basic extensibility mechanism of the language to enable any custom solution you require."

– synonymous to –

"We really don’t do much as far as updates go, so we just provide you hooks where things get complex."

But isn’t that complex mapping issue that is the real problem here anyway?

I don’t think LINQ to SQL is useless, it is certainly awesome. But I do feel that querying/selecting is the easier problem, and persisting and translating during persistence is the tougher problem. LINQ to SQL handles the simpler scenarios very well, but it basically throws it’s hands up in the air when complex mapping issues arise.

So, that tougher problem is not handled very well in LINQ to SQL, and that is where 90% of the pain lies anyway.

Nice Essay nonetheless, but I really really want to see more hardcore evidence, than the sickening waft of "LINQ is a paradigm shift", "brings querying into C#/VB" .. etc. etc. Heard that a million times already, and it is time to hear stuff beyond marketing and get into actual applications.

How about – write a real application using LINQ to SQL? Like a starter kit of some sort? And let the community kick the tires of that solution?

Sahil, I’ll anser your first question. Updates are actually quite sophisticated. The part I was mentioning, however, was the ability to override the update function in order to inject your own special case processing. If you don’t override anything LINQ to SQL will go ahead and build and execute dynamic SQL. You can even override the behavior, do you own thing, AND use dynamic updates by calling one of the DynamicInsert, DynamicUpdate or DynamicDelete methods. Then I mentioned that the ‘using stored procedures for your updates’ feature is implemented by using this same overriding mechanism. So, you don’t have to override anything and get default behavior, map to stored procedures in the designer or DBML file, or override the behavior entirely and do your own thing.

The point that I was trying to bring out was that when it came to designing a feature that let you control how updates were executed, we decided that ‘your objects, your code’ should prevail so instead of first inventing a complex xml file language for describing update mapping, we first devised a system by which you use actual CLR code.

I understand that you do not seem to value Persistance Ignorance (POCO is a bad term), but those working in a Test-Driven or Domain Centric approach do rate it very highly.

"in .NET land things like INotifyPropertyChanged on the entity, IBindingList/ITypedList/IListSource implementing collections, pk-fk syncing at runtime etc. etc. are must have elements of your domain/collection classes"

Really, I’ve never needed them and I have delivered a lot of systems without them.

Ian, I sort of agree with Frans on the bit about bi-directional pointers. I’m guessing that a pure P.I. approach that does not model bi-directionality will tend to have subtle problems keeping the object model in sync with the database. However, I’m not in favor of a ‘magic’ solution that keeps your pointers correct without needing any user code or special classes either. Having your objects explicitly implement this behavior is actually ideal, since it is your objects that enable this behavior, not some side effect of the persistence system. I think there is some expectation by people that POCO means "I write nothing but a class that declares simple properties plus a few business methods and everthing else just magically happens so I don’t have to think about it." Yet, what is really means is that your class is not littered with actual data access logic. Behaviors like bi-directionality are not data access logic. They are, however, not built-in behaviors of the runtime either so your going to have to opt into them manually by doing ‘something.’

So now were only dickering on whether you have to write a few lines of code or have the behavior packaged into a custom class. We chose the ‘few’ lines of code to represent the bi-directionality behavior to keep the size of EntityRef down so we could maintain it as a value-type and not require the use of reflection to make it work.

I have said before that I’m not that concerned about needing EntityRef and EntitySet for collection types – there is always a practicality trade off and I agree with the notion of explicit purchase. And agreed the issue is less about persistence and more about bi-directionality.

In fact I think this argument drifts back to the old one of services vs. frameworks in that a service is something I can purchase – I call it – whereas a framework is something that I ihabit – it calls me. The former tend to be much more flexible than the latter. So I’m all for services that are opt in, but less for regulation.

My objection is more to the argument that devs don’t really want PI because they have to implement a lot of infrastructure code in the .NET world anyway. That just seems defeatist and in fact I know a lot of devs who avoid framework features like data-binding in many cases.

Ian, if you don’t use dyn. proxy using o/r mappers, and you do want to have your classes behave in databinding scenario’s, how would you do that without INotifyPropertyChanged, and for example ITypedList? Or do you define your collections as BindingList<T> ?

Sure, if you say "I don’t use databinding", then you don’t need databinding enabled code, but if you DO, you won’t get away with a simple class, as there’s no code there which signals bound controls something has changed, etc.

I simply don’t buy this statement:

"My objection is more to the argument that devs don’t really want PI because they have to implement a lot of infrastructure code in the .NET world anyway. That just seems defeatist and in fact I know a lot of devs who avoid framework features like data-binding in many cases."

If you don’t use databinding, you have to do some other kind of MVC approach to make data show up in a form and have the data be transported back from the controls to the objects.

All great, but you have to write that code manually as well. That’s infrastructure plumbing code, Ian. 🙂 Sure, you don’t need the databinding interfaces etc., but it doesn’t mean you dont have to do some work (and writing the controllers isn’t a few lines of code either) somewhere else to get your app working.

Make no mistake, I’m all for dropping the databinding framework TODAY, so I can get rid of the interfaces in my base classes and perhaps even get away with just code generation and no base classes at all. (see this code generation as the typing of the infrastructure code you need to get POCO classes get persisted and MANAGED in memory.) That by itself asks for a different solution to the same problem, e.g. MVC could be that solution. This then asks for another way to avoid having to write a lot of code to be able to use data in a UI without having to write code which passes data from/to controls.

Persisting a POCO class isnt free. Magic is involved, somewhere, OR you have to do it the slow way via reflection.

The thing is though: using an o/r mapper isn’t solely about persistence. Persistence is just a small part of using an o/r mapper. The biggest gains are won in the areas where validation, auditing, authorization, flexible concurrency control etc. are taken care of by the framework which takes care of the entities anyway, namely the o/r mapper. If you have to write that code as well by hand (and in the case of linq to sql you have to) all over the place in your POCO classes, have fun. 🙂

Sure, you’ll now say that by using additional frameworks like spring.net and castle it will be much easier. I fully agree, but not addressing that part when claiming POCO or PI is great and better is IMHO simply telling half the story.

I also still find it odd that some TDD people are still blind for the fact that entities don’t fall out of the sky. Even the most purest die-hard agile TDD developer won’t have visions behind the keyboard where a bright white light suddenly wispers in his eary "Thou shall create a customer class! With these 3 properties!". The developer will think about what to create, which entities are there in the system. Use DDD to discover which entities there are, define them first, then type in the code for the class definitions. THis effectively means that the class is just another representation of the abstract entity definition that’s the theoretical basis of the entity.

Or are you suggesting that there’s no theoretical basis for a random entity class in your system? 🙂

Using that theoretical basis you can thus opt for a different route with entity classes which contain plumbing code and which allow you to STILL do TDD, simply because the TDD you’ll be doing is about the system functionality, not about the plumbing code you need to get the functionality code up and running, as that’s already been taken care of.

Oh, and I do value PI, it’s just that I react to the remarks done by some people in the blogosphere where they tend to claim that PI is free and won’t give you anything to worry about. Its not free nor will it free you from a lot of work, on the contrary. It WILL force you to write other infrastructure code, if you like it or not.

If .NET and VS.NET was build around MVC, my framework would have been PI more or less today, but the opposite is true. Add to that that if you want to have real performant o/r mapping, you can’t do it without either post-compile IL manipulation or dyn. proxies, IF you want to go the PI route. Both have side effects and downsides and limitations for your own code. So does using a base class/code gen. I won’t deny that, it’s just that saying that using POCO/PI isn’t having sideeffects/limitations is simply a lie.

Btw, I’d like to add that I’m more or less suggesting that code inside the entity classes (be it via a base class, code genned or hand-typed for the people who like to type everything in despite the fact that it’s repetitive and boring) should be targeted towards:

1) entity management. This is infrastructure at its best. You either have to type this in by hand or get it from the framework in some form (dyn. proxy/post-compile IL man./codegen/baseclass)

2) make o/r mapping more efficient. This is also infrastructure code.

One might close his eyes and denying the db is there, but that’s just stupid. The db is there, deal with it. THat’s not to say that the entity should have methods like Save() etc., that’s not what I’m suggesting as I personally also don’t like that (which is why we support both types)

One other reason I often hear from people who want POCO is that it gives them the choice to switch to another framework.

That’s a big myth. Your code will be soaked with the o/rmapper’s details, query language etc. Switching to another framework isn’t going to be a walk in the park, on the contrary, even if you have poco classes.

In nhibernate this isn’t true:

after:

myCustomer.Orders.Add(myOrder);

it’s not true that myOrder.Customer points to myCustomer. Some other frameworks DO make sure this is true.

Switching a framework will break that kind of code, even if you use POCO classes, and even worse: you won’t notice it until you run the app and it hits the situation where this is important. 🙂

Ian, in a perfect world, I would definitely agree with you, and in theory I still do, the reality is however that poco/pi has disadvantages too, and as .NET isn’t really tailored to support them (in java all methods etc. are virtual, so dyn. proxy is easy, also, byte code manipulation is easy, in .NET it’s much harder as there’s no real runtime IL manipulation possible which performs really well) you have to develop your software in a different way, where you can’t utilize the tools available to you in the .NET framework.

I think what we’re really saying here is that OR mapped domain objects are not just ‘POCO’ objects, since they embody behavior beyond what is typical to the runtime. This behavior is not persistence related, it is a by-product of the meta entity that is being modelled and represented in the runtime environment. Users and builders of these objects should realize that these objects are neither limited to the capabilities of ‘normal’ objects nor can they be described using all the qualities possible in the runtime. The data model of the meta entity overlaps with the CLR, neither encompasses the other. In order to model what is missing in the runtime, user code or conventions must be added to the mix. It makes no sense to be opposed to this.