The ADO.NET Entity Framework Overview

Visual Studio 2005

Microsoft Corporation

June 2006

Applies to:
ADO.NET
.NET Framework
Language-Integrated Query

Summary: Eliminate the impedance mismatch between data models and between languages that application developers would otherwise have to deal with new features available in the upcoming version of ADO.NET. (33 printed pages)

Contents

Introduction

When one takes a look at the amount of code that the average application developer must write to address the impedance mismatch across various data representations (for example objects and relational stores) it is clear that there is an opportunity for improvement. Indeed, there are many scenarios where the right framework can empower an application developer to focus on the needs of the application as opposed to the complexities of bridging disparate data representations.

A primary goal of the upcoming version of ADO.NET is to raise the level of abstraction for data programming, thus helping to eliminate the impedance mismatch between data models and between languages that application developers would otherwise have to deal with. Two innovations that make this move possible are Language-Integrated Query and the ADO.NET Entity Framework. The Entity Framework exists as a new part of the ADO.NET family of technologies. ADO.NET will LINQ-enable many data access components: LINQ to SQL, LINQ to DataSet and LINQ to Entities.

This document describes the ADO.NET Entity Framework, what problem spaces it is targeting and how its various components address those problems.

Where We Want To Go

An ideal environment for creation of business applications should allow developers to describe the business logic and state of the problem domain which they are modeling with minimum or no "noise" coming from the underlying representation and the infrastructure that supports it. Applications should be able to interact with the stores that maintain the persistent state of the system in the terms of the problem domain; specifically in the terms of a conceptual domain model, completely separated from the logical schema of the underlying store.

One would expect developers to be able to write something like the piece of code below:

// we'll use the order-tracking store
using(OrderTracking orderTracking = new OrderTracking()) {
// find all the pending orders for sales people
// in Washington
var orders = from order in orderTracking.SalesOrders
where order.Status == "Pending Stock Verification" &&
order.SalesPerson.State == "WA"
select order;
foreach(SalesOrder order in orders) {
// obtain a list of StockAppProduct objects
// to be used for validation
List<StockAppProduct> products = new List<StockAppProduct>(
from orderLine in order.Lines
select new StockAppProduct {
ProductID = orderLine.Product.ID,
LocatorCode = ComputeLocatorCode(orderLine.Product)
}
);
// make sure all products for this order
// are in stock through the stock management
// system
if(StockApp.CheckAvailability(products)) {
// mark the order as "shippable"
order.Status = "Shippable";
}
}
// if we marked one or more orders as shippable, persist
// the changes in the store
orderTracking.SaveChanges();
}

There are two elements that are important to highlight in the code above:

No artificial constructs. It's common to see applications that need to adapt to peculiarities of the underlying store schema. For example, applications built on top of relational databases often have to make extensive use of joins in order to navigate through relationships. In the code above, in contrast, the "shape" of the data follows the abstractions of the problem being modeled; there are "orders", which have "order lines" and that are related to a "sales person".

No plumbing. The code is very database intensive, yet there are no database connection objects, no external language such as SQL for query formulation, no parameter binding, no configuration embedded in code. In this sense, you could say this code is "pure business logic".

This is the class of expressiveness and abstraction level that ADO.NET, and in particular LINQ and the Entity Framework working together, brings to application development.

The rest of this paper describes in detail the various elements that work together in order to make the sample above work.

The ADO.NET Entity Framework: Modeling at the Right Level of Abstraction

Every business application has, explicitly or implicitly, a conceptual data model that describes the various elements of the problem domain, as well as each element's structure, the relationships between each element, their constraints, and so on.

Since currently most applications are written on top of relational databases, sooner or later they'll have to deal with the data represented in a relational form. Even if there was a higher-level conceptual model used during the design, that model is typically not directly "executable", so it needs to be translated into a relational form and applied to a logical database schema and to the application code.

While the relational model has been extremely effective in the last few decades, it's a model that targets a level of abstraction that is often not appropriate for modeling most business applications created using modern development environments.

Let's use an example to illustrate this point. Here is a fragment of a variation of the AdventureWorks sample database that's included in Microsoft SQL Server 2005:

Figure 1

If we were building a human-resources application on top of this database and at some point wanted to find all of the full-time employees that were hired during 2006 and list their names and titles, we'd have to write the following SQL query:

This query is more complicated than it needs to be for a number of reasons:

While this particular application only deals with "employees", it still has to deal with the fact that the logical database schema is normalized so the contact information of employees—e.g. their names—is in a separate table. While this does not concern the application, developers would still need to include this knowledge in all queries in the application that deal with employees. In general, applications can't choose the logical database schema (for example, departmental applications that expose data from the company's core system database), and the knowledge of how to map the logical schema to the "appropriate" view of the data that the application requires is implicitly expressed through queries throughout the code.

This example application only deals with full-time employees, so ideally one should not see any other kind of employees. However, since this is a shared database, all employees are in the Employee table, and they are classified using a "SalariedFlag" column; this, again, means that every query issued by this application will embed the knowledge of how to tell apart one type of employee from the other. Ideally, if the application deals with a subset of the data, the system should only present that subset of the data, and the developers should be able to declaratively indicate which is he appropriate subset.

The problems highlighted above are related to the fact that the logical database schema is not always the right view of the data for a given application. Note that in this particular case a more appropriate view could be created by using the same concepts used by the existing schema (that is, tables and columns as exist in the relational model). There are other issues that show up when building data-centric applications that are not easily modeled using the constructs provided by the relational model alone.

Let's say that another application, this time the sales system, is also built on top of the same database. Using the same logical schema we used in the previous example, we would have to use the following query to obtain all of the sales persons that have sales orders for more than $200,000:

Again, the query is quite complicated compared to the relatively simple question that we're asking at the conceptual level. The reasons for this complexity include:

Again, the logical database schema is too fragmented, and it introduces complexity that the application doesn't need. In this example, the application is probably only interested in "sales persons" and "sales orders"; the fact that the sales persons' information is spread across 3 tables is uninteresting, but yet is knowledge that the application code has to have.

Conceptually, we know that a sales person is associated to zero or more sales orders; however, queries need to be formulated in a way that can't leverage that knowledge; instead, this query has to do an explicit join to walk through this association.

In addition to the issues pointed out above, both queries present another interesting problem: they return information about employees and sales persons respectively. However, you cannot ask the system for an "employee" or a "sales person". The system does not have knowledge of what that means. All the values returned from queries are simply projections that copy some of the values in the table rows to the result-set, losing any relationship to the source of the data. This means that there is no common understanding throughout the application code about the core application concepts such as employee, or can it adequately enforce constraints associated with that concept. Furthermore, since the results are simply projections, the source information that describes where the data came from is lost, requiring developers to explicitly tell the system how inserts, updates and deletes should be done by using specific SQL statements.

The issues we just discussed fall into two main classes:

Those related to the fact that the logical (relational) model and related infrastructure cannot leverage the conceptual domain knowledge of the application data model, hence it is not able to understand business entities, their relationships among each other, or their constraints.

Those related to the practical problem that databases have logical schemas that typically do not match the application needs; those schemas often cannot be adapted because they are shared across many applications or due to non-functional requirements such as operations, data ownership, performance or security.

The issues described above are very common across most data-centric enterprise applications. In order to address these issues ADO.NET introduces the Entity Framework, which consists of a data model and a set of design-time and run-time services that allow developers to describe the application data and interact with it at a "conceptual" level of abstraction that is appropriate for business applications, and that helps isolate the application from the underlying logical database schemas.

Modeling Data at the Conceptual Level of Abstraction: The Entity Data Model

In order to address the first issue identified in the previous section what we need is a way of describing the data structure (the schema) that uses higher-level constructs.

The Entity Data Model—or EDM for short—is an Entity-Relationship data model. The key concepts introduced by the EDM are:

Entity: entities are instances of Entity Types (e.g. Employee, SalesOrder), which are richly structured records with a key. Entities are grouped in Entity-Sets.

The introduction of an explicit concept of Entity and Relationship allows developers to be much more explicit when describing schemas. In addition to these core concepts, the EDM supports various constructs that further extend its expressivity. For example:

Inheritance: entity types can be defined so they inherit from other types (e.g. Employee could inherit from Contact). This kind of inheritance is strictly structural, meaning that there is no "behavior" inherited as it happens in object-oriented programming languages. What's in inherited is the structure of the base entity type; in addition to inheriting its structure, a instances of the derived entity type satisfy the "is a" relationship when tested against the base entity type.

Complex types: in addition to the usual scalar types supported by most databases, the EDM supports the definition of complex types and their use as members of entity types. For example, you could define an Address complex type that has StreetAddress, City and State properties and then add a property of type Address to the Contact entity type.

With all of these new tools, we can re-define the logical schema that we used in the previous section using a conceptual model:

Figure 2

In plain English this schema has the following elements:

Three entity types: SalesPerson, SalesOrder and StoreSalesOrder. Note that StoreSalesOrder "is a" SalesOrder (in inherits from SalesOrder), with the special characteristic of having tax information.

A relationship between the SalesOrder and the SalesPerson entity types

Two entity-sets: SalesOrders and SalesPeople; note that the SalesOrders entity-set can have instances of both SalesOrder and StoreSalesOrder entity types.

This new model is much closer to the view that a sales application would use for its store. A few important things to note include the fact that sales persons are no longer spread across multiple tables, they are in a single entity-set instead; also, there are no primary/foreign keys in the schema; instead, a relationship is explicitly declared to exist in the model.

To provide a concrete example, in the previous section we needed a 3-way join to access the names of the sales people in a query, something like:

This is significantly simpler and has the exact same semantics, with the added bonus that the information about how to build the appropriate view of the data for this application is now expressed declaratively in an external artifact (the EDM schema and mapping that we'll discuss later).

ADO.NET includes visual tools that are used to design these schemas. The output of the tool is an XML file that describes the conceptual schema using the Schema Description Language or SDL. See the section 6.1 "Sample EDM schema represented as XML" in the appendix for the XML version of the EDM schema above.

Now, if this new conceptual schema is different from the logical schema in the actual database, how does the system know how to go back and forth between schemas? The answer is "mapping".

For more information about the Entity Data Model see the [EDM] reference at the end of this document.

Bringing Data into an EDM Model: Mapping

The EDM is a conceptual data model that can be used to model the data of a given domain. However, at some point the data needs to be stored in an actual database, typically a relational database.

In order to provide a mechanism for storing data modeled using the EDM in relational databases, the ADO.NET Entity Framework houses a powerful client-views infrastructure designed to manage the transformations between the logical database schema that's present in the relational store and the conceptual EDM schema used by the application.

In addition to the EDM schema, the system takes as input a mapping specification; this mapping specification is produced by the mapping tools and is also an XML file.

To continue with the example, if we wanted to map the logical database schema that we used at the beginning of this section to the conceptual EDM schema of the previous section, we'd do something like this:

Figure 3 (Click on the image for a larger picture)

When the mapping tool is used to create a conceptual to logical mapping, it produces an XML file that can be consumed by the run-time components of the ADO.NET Entity Framework. The appendix includes the XML representation of the mapping shown above in the section 6.2 "Sample mapping represented as XML". Fortunately, tools will make it unnecessary for the vast majority of users to have to understand or deal with these XML files.

In addition to providing support for surfacing schemas as EDM schemas, the client-views infrastructure in ADO.NET has other benefits. At the beginning of this section we discussed how databases with schemas now owned by the application developer can introduce complexity into the application code. By using client views, the complexity brought in by the external logical schemas can be stopped before it reaches the application code; instead, views can be created to perform any re-shaping required for the data the application consumes. That way the application has a view of the data that makes sense for the problem space that it addresses. This is useful independently of whether new EDM constructs are used in the resulting model.

An obvious question at this point would be why not just use traditional database views for this. While database views can abstract many of the mappings, often that solution won't work for several process and functional reasons: (a) many of the views are simply too complex to be generated and maintained by developers in a cost-effective way, even for some simple conceptual to logical mappings, (b) the classes of views that have the property of being automatically updatable at the store are limited, and (c) databases for core-systems in medium and large companies are used by many central and departmental applications, and having each individual application create several views in the database would pollute the database schema and create significant maintenance workload for the database administrators. In addition, database views are limited to the expressivity of the relational model, and typically lack some of the more real-world concepts of the Entity Data Model, such as inheritance and complex types.

ADO.NET client-views work entirely on the client, so each application developer can create views that adapt the data to a shape that makes sense for each particular application without affecting the actual database or other applications. The class of updatable views supported in the Entity Framework is much broader than those supported by any relational store.

Surfacing the EDM and Mapping to the ADO.NET API: The Mapping Provider

The EDM and mapping concepts seem to be fairly abstract at first, so at this point one could wonder how they are concretely surfaced in the ADO.NET API.

We chose to introduce a new data-access provider for ADO.NET called the "mapping provider". Just like a regular provider connects to a store and provides the application with a view of the store data in its logical schema, the mapping provider connects to a conceptual EDM model and provides the application with a conceptual view of the data.

The mapping provider is given the EDM schema and the mapping information, so it can internally use the mapping infrastructure to translate between the logical and conceptual schemas.

So, for example, to execute a query against the EDM model that finds the names and hire-dates of the sales people after a given hire date, the ADO.NET code would be:

Note that the pattern should be very familiar to ADO.NET developers; it looks just like ADO.NET 2.0 code, with the only difference that it uses a different provider.

Under the covers, the mapping provider will use the EDM schema and the mapping/view information to translate to and from the conceptual model. Then it'll use a regular ADO.NET provider to talk to the underlying database (e.g. it would use System.Data.SqlClient to talk to a SQL Server database).

Querying Against an EDM Model: Entity SQL

When an application uses an EDM model and the mapping provider to access it, it no longer connects directly to a database or sees any database-specific construct; the entire application operates in terms of the higher-level EDM model.

This means that you can no longer use the native database query language; not only will the database not understand the EDM model, but also current database query languages do not have the constructs required to deal with the elements introduced by the EDM such as inheritance, relationships, complex-types, etc.

In order to enable query against EDM models, the ADO.NET Entity Framework introduces a query language that's designed to work with the EDM and can leverage the full expressivity of the entity data model. The language is called Entity SQL and it should look familiar to all developers that have used some SQL dialect before. Entity SQL provides the Entity Framework with a dynamic query capability, where queries can be statically formulated at design time or constructed at runtime in the context of late bound applications.

For example, this is a valid Entity SQL query from the previous example:

The overall structure of an Entity SQL query is the usual SELECT-FROM-WHERE sequence that's present in traditional SQL. In addition to this basic elements Entity SQL introduces various concepts to allow developers leverage the expressivity of conceptual EDM models; following is a description of the key additional concepts brought to the picture by Entity SQL:

Dealing with entities. Conceptual EDM schemas are designed around entities. Business concepts are reflected directly in EDM entities types whose instances are stored in entity-sets. In the same way queries in the relational world are formulated against tables, queries in the EDM world are formulated against entity-sets. So the starting point for a query is a set of entities coming from one or more entity-sets.

Depending on your needs in each particular scenario, you can choose to project out individual values or preserve the whole entity. Preserving the whole entity is interesting when you want the system to assist you with services that are built around entities. For example, thanks to the metadata provided in the EDM schema and mapping specification, the ADO.NET Entity Framework knows how to reflect updates to entities back in the store without the user having to provide INSERT, UPDATE and DELETE commands as the ADO.NET DataAdapter has traditionally required.

Queries with projection look pretty much like regular SQL queries, with the caveat that the table aliases are mandatory:

The "VALUE" modifier indicates the system that it should produce a set of values in the result-set representing either entity instances or scalars; the results will preserve all of the metadata required to fully describe the values, including how to extract primary keys for entities, where did the entity come from, etc.

From the result-set perspective, when an entity is projected out the resulting DataReader object has a column for each top-level member of the entity type (so it effectively looks like all of the columns were projected out in the query, but with the difference that additional metadata about the entity is available).

Relationship navigation. In addition to entities, the other key element of the EDM is an explicit concept of association through relationships; since the system knows about the relationships between entities, the query language can be used to explicitly navigate those relationships without having to use constructs such as joins.

For example, before we used this query to find sales persons with orders of more than a certain value:

The complexity of this query comes from 1) the fact that sales person information is spread across tables and 2) navigation from sales person to sales order needs to be done indirectly, through a join.

Using the new conceptual EDM schema we defined previously, we can formulate the query in this way:

The "NAVIGATE" operator allows a query to explicitly traverse a relationship; the application does not need to know how the relationship is maintained or use indirect methods such as joins to use it.

Inheritance support. Since the EDM supports inheritance for entity types, the query language has to enable users formulate queries that look at the inheritance hierarchy of types.

For example, our conceptual EDM schema has an entity type SalesOrder, and a subtype StoreSalesOrder that has some specific characteristics. Since every StoreSalesOrder "is a" SalesOrder, a query against the SalesOrders entity-set would return a polymorphic result-set containing instances of both SalesOrder and StoreSalesOrder types:

SELECT VALUE o
FROM AdventureWorks.AdventureWorksDB.SalesOrders AS o

If we wanted only sales orders from stores, we'd ask the system that explicitly by leveraging the type hierarchy:

SELECT VALUE o
FROM AdventureWorks.AdventureWorksDB.SalesOrders AS o
WHERE o IS OF (AdventureWorks.StoreSalesOrder)

Here, the IS OF operator checks to see if an expression ("o" in this case) is an instance of the type specified in parenthesis.

Lessons learned. In addition to the various extensions in Entity SQL designed to provide a first-class query experience against EDM schemas, Entity SQL incorporates various enhancements that come from experience with more traditional SQL dialects.

For instance, in Entity SQL expressions can yield scalars or collections, and collections are first-class constructs that can appear in most expression contexts, making them fully composable. For example, a collection can go in the FROM clause and act as the source for a query, or it can go in the SELECT list, which will result in one of the columns of the result-set being of a collection type instead of a scalar.

For an in-depth description of Entity SQL see [eSQL] in the references at the end of this document.

ADO.NET Entity Framework: Object Services

The vast majority of new code for business applications is written in general purpose, object oriented programming languages such as Visual Basic and C#. These programming languages and their surrounding development environments model business entities as classes and their behaviors as code. To contrast with this, ADO.NET so far has exposed data from databases as "values", that is, as rows and columns. In order to interact with databases, applications have to deal with the impedance mismatch that exists between the data and the application code; this includes both the way queries are formulated and the way results are exposed.

The ADO.NET Entity Framework includes an object services layer that reduces, and often eliminates, this impedance mismatch.

Same Data, But as Objects

Applications, particularly large applications or large systems made of several applications, can rarely use a single representation of the data across its entire code base; various aspects such as static versus dynamic knowledge of queries and result-set structure, user-application interaction model, etc. affect the way applications need to interact with the data in databases.

Rather than introducing an entirely new, stand-alone infrastructure for exposing database data as objects, the ADO.NET Entity Framework includes an "object services" layer that integrates with the rest of the stack and exposes entity values as .NET objects as a presentation choice.

Regardless of whether you choose to consume your data as "values" (rows and columns) or as objects, you still use the same infrastructure; that is, the same conceptual EDM schema can use used, along with the same mapping and the same query language —Entity SQL.

For example, the following code snippet obtains a subset of the sales people in the system and manipulates the results using a regular DataReader (that is, using "values"):

While this is nice for late-bound scenarios such as reporting and business intelligence, or for directly serializing results, for example, from a webservice, for the cases where heavy business logic needs to be written it's usually much better to have objects representing the business entities. The object-based version of the code above using the next version of ADO.NET looks like this:

Instead of using a command object to represent the query, we use an "object context" that acts as the entry point to the object services and a Query object that represents a query in the object space. Note that we're still using the same map connection (pointing to the same EDM schema and mappings) as well as the same query language. The only difference is that now the results are being returned as objects.

One question that comes immediately from this example is: where is the type "SalesPerson" coming from? The ADO.NET Entity Framework includes a tool that, given an EDM schema, will generate the .NET classes that represent the EDM entities inside the .NET environment. The generated classes are partial classes, so they can be extended with custom business logic in separate files without interfering with the code generator.

If the tool has access to the entire EDM schema, it not only has access to the entity types definition, but also to the entity-sets, relationships and so on. Based on that information the tool will not only generate classes for each entity type, but also a top-level class that represents the store as viewed from the EDM perspective, with its entity-sets, relationships, etc. This further simplifies writing data-access code that uses objects. The same example we showed above, when using the strongly-typed version of the object context that is produced by the tool, becomes:

You can see that most of the plumbing code has been eliminated, and only the code that represents the application intent remains. At this point most of the impedance mismatch problem mentioned earlier has been eliminated. This is a key goal of the ADO.NET Entity Framework.

EDM Concepts in the Object Space

In section 2.1 we discussed various EDM concepts such as entities, relationships and inheritance. In general, all of the EDM concepts are mapped by the object services to the .NET environment.

Entities are simply mapped to classes that follow a particular contract. To make things simple, those classes are automatically generated out of the EDM schema by a tool.

Relationships are exposed in the object space as properties that can be navigated directly. For example, our EDM model of AdventureWorks has a relationship between the SalesPerson and the SalesOrder entity types. Based on that information the code-generation tool will create the appropriate members so the following code will work as expected:

Note that relationship navigation simply happens by referencing a property in one of the entities that's one of the end-points of the relationship.

Another interesting way of illustrating how EDM elements are surfaced to the object services layer is inheritance. The EDM schema we've been using has the SalesOrders entity-set that contains instances of both the SalesOrder and the StoreSalesOrder classes; this is directly represented in the .NET environment as inheritance in the class hierarchy. The code-gen tool will generate a StoreSalesOrder class that inherits from a SalesOrder class, and instances will be materialized appropriately.

Manipulating Data and Persisting Changes

Usually when an application wants to make changes to data in a database it has to issue INSERT, UPDATE and DELETE statements against it. In practice, applications tend to have various "entities" (even if informally or implicitly defined), and developers have to write (or use tools to generate) SQL DML statements for each of the "entities" in the system. This is a tedious, error prone task that generates a large code base that will require maintenance in the future.

The ADO.NET Entity Framework has enough metadata about the entities in the system that it can reflect changes in entities back to the database without requiring the user to provide SQL statements that do it. For example:

Updates are simple because the Entity Framework is doing several things under the covers to simplify the task for the developer. In particular, the system is:

Keeping track of every object that's handed out through a query. The object context, either a regular ObjectContext instance or a typed one like AdventureWorksDB here, acts as the logical scope within which instances are tracked.

Keeping the original versions of the values used to materialize each object. This allows the system to perform optimistic concurrent checks against the database during updates.

Keeping track of which objects changed, so when you call SaveChanges ADO.NET can know which entities need to be updated in the store.

Keeping metadata information from the result-sets that describes which entity sets these entities came from, what was the entity type, etc. All this information allows the system to generate the required statements for updates without requiring the user to provide any information.

Transforming object-level changes to conceptual and to logical (relational) updates.

Updates not only work on the immediate results of a query, they also work through the collections that represent the relationships between entities. For example, adding a new sales order and associating it with a given sales person is a form of change, and it would be achieve by simply adding a new SalesOrder object to the Orders collection of the sales person and then calling update.

LINQ to Entities: Language-Integrated Query

Despite the great advancements in integration of databases and development environments, there is still an impedance mismatch between the two that's not easily solved by just enhancing the libraries and APIs used for data programming. While the Entity Framework minimizes the impedance mismatch between logical rows and objects almost entirely, the integration of the Entity Framework with extensions to existing programming languages to naturally express queries within the language itself helps to eliminate it completely.

More specifically, most business application developers today have to deal with at least two programming languages: the language that's used to model the business logic and the presentation layer—which is typically a high-level object-oriented language such as C# or Visual Basic- and the language that's used to interact with the database—which is typically some SQL dialect.

Not only does this mean that developers have to master several languages to be effective at application development, but this also introduces seams throughout the application code whenever there are jumps between the two environments. For example, in most cases applications execute queries against databases by using a data-access API such as ADO.NET and specifying the query in quotes inside the program; since the query is just a string literal to the compiler, it's not checked for appropriate syntax or validated to make sure that it references existing elements such as tables and column names.

Addressing this issue is one of the key themes of the next round of the Microsoft C# and Visual Basic programming languages.

Language-Integrated Query

The next generation of the C# and Visual Basic programming languages contain a number of innovations around making it easier to manipulate data in application code. The LINQ project consists of a set of extensions to these languages and supporting libraries that allow users to formulate queries within the programming language itself, without having to resort to use another language that's embedded as string literals in the user program and cannot be understood or verified during compilation.

Queries formulated using LINQ can run against various data sources such as in-memory data structures, XML documents and through ADO.NET against databases, entity models and DataSets. While some of these use different implementations under the covers, all of them expose the same syntax and language constructs.

The actual syntax details for queries are specific to each programming language, and they remain the same across LINQ data sources. For example, here is a Visual Basic query that works against a regular in-memory array:

Queries against data sources such as entity models and DataSets look the same syntactically, as can be seen in the sections below.

For more background and further details on the LINQ project see [LINQ] in the references section.

LINQ and the ADO.NET Entity Framework

As we discussed in the section on ADO.NET Entity Framework Object Services, the upcoming version of ADO.NET includes a layer that can expose database data as regular .NET objects. Furthermore, ADO.NET tools will generate .NET classes that represent the EDM schema in the .NET environment. This makes the object layer an ideal target for LINQ support, allowing developers to formulate queries against a database right from the programming language used to build the business logic. This capability is known as LINQ to Entities.

For example, earlier in the document we discussed this code fragment that would query for objects in a database:

Using aw As New AdventureWorksDB(Settings.Default.AdventureWorks)
Dim newSalesPeople = From p In aw.SalesPeople _
Where p.HireDate > hireDate _
Select p
For Each p As SalesPerson In newSalesPeople
Console.WriteLine("{0} {1}", p.FirstName, p.LastName)
Next
End Using

This query written using LINQ will be processed by the compiler, which means that you'll get compile-time validation as the rest of the application code would. Syntax errors as well as errors in member names and data types will be cached by the compiler and reported at compile time instead of the usual run-time errors that are commonplace during development using SQL and a host programming language.

The results of these queries are still objects that represent ADO.NET entities, so you can manipulate and update them using the same means that are available when using Entity SQL for query formulation.

While this example is just showing a very simple query, LINQ queries can be very expressive and can include sorting, grouping, joins, projection, etc. Queries can produce "flat" results or manufacture complex shapes for the result by using regular C#/Visual Basic expressions to produce each row.

For example, here is a variation of the code above that orders the results by hire-date and then by name, and outputs only a few members instead of full entities:

Furthermore, not only relationships are surfaced as properties that can be used for navigation that we mentioned in the previous section, but also they can be used in a query to perform sophisticated data extraction and shaping operations in a very declarative way; for example, to obtain all the orders from employees that are new hires (e.g. hired this year):

DataSet Finally Gets Full Query Capabilities: LINQ to DataSet

One of the key elements of the ADO.NET programming model is the ability to explicitly cache data in a disconnected and backend agnostic manner using the DataSet. A DataSet represents a set of tables and relationships, along with the appropriate metadata to describe the structure and constraints of the data contained in it. ADO.NET includes various classes that make it easy to load data from a database into a DataSet, and push changes made within a DataSet back into the database.

One interesting aspect of DataSet is that it allows applications to bring a subset of the information contained in a database into the application space and then manipulate it in-memory while retaining its relational shape. This enables many scenarios that require flexibility in how data is represented and handled. In particular, generic reporting, analysis, and intelligence applications support this method of manipulation.

In order to query the data within a DataSet, the DataSet API includes methods, such as DataTable.Select(), for searching for data in certain, somewhat predefined ways. However, there hasn't been a general mechanism available for rich query over DataSet objects that provided the typical expressiveness required in many data-centric applications.

LINQ provides a unique opportunity to introduce rich query capabilities on top of DataSet and to do it in a way that integrates with the environment.

The ADO.NET Tech Preview includes full support for LINQ queries over both regular and typed DataSets. To illustrate the functionality, let's say we have a DataSet with two tables, "SalesOrderHeader" and "SalesOrderDetail"; here is how you'd get the order identifier and order date from all of the orders posted online:

An interesting observation is that Visual Basic's support for late binding allows queries against untyped DataSets to be easier to read than their C# counterparts. Below we'll discuss how typed-DataSets address this issue making it significantly better for both languages.

One common request for DataSet is support for joins across DataTables; this is now possible with LINQ. Here is an example that joins the SalesOrderHeader and SalesOrderDetail tables:

While LINQ and DataSet as shown above result in a very powerful tool, the code is a bit cluttered because of various reasons:

Column access is still done in a late-bound way, so column names are in quotes (not only this clutters the code, but also prevents the compiler from checking column names at compile-time).

DataSet field access is untyped by default (that is, the result of doing table["column"] is of type object); one way of dealing with that would be to use a cast, but that wouldn't prevent against null values (which are represented as DBNull.Value in DataSet). In order to make field access uniform and handle nulls appropriately, a new operator "Field" is introduced. So in order to access a column called "Date" of type DateTime you can use table.Field<DateTime>("Date").

If the schema of the DataSet is known at application design time, then using typed DataSets provides a much better experience when using LINQ. Tables and row types in typed DataSets have typed members for each of columns, which makes access much easier; additionally, the DataSet itself has properties for easy access to the various tables contained in them.

If we create a typed DataSet with the same tables used in the previous example, we could write the first query as:

In addition to supporting the appropriate contracts to integrate with LINQ, the DataSet LINQ implementation is smart enough to evaluate certain queries and decide whether or not the default execution (done through the standard query operators implementation) will be appropriate for a given DataSet. If there are indexes created in the DataSet that could be used to speed up query execution, then the execution strategy will be adapted at run-time in order to attempt to process queries more rapidly.

LINQ to SQL

For developers that do not require mapping to a conceptual model, LINQ to SQL (formerly known as DLinq) enables developers to experience the LINQ programming model directly over existing database schema.

Like the LINQ support over ADO.NET Entities described in Section 4.2, LINQ to SQL allows developers to generate .NET classes that represent data. Rather than mapping to a conceptual data model, these generated classes map directly to database tables, views, stored procedures, and user defined functions. Using LINQ to SQL, developers can write code directly against the storage schema using the same LINQ programming pattern as previously described for in-memory collections, Entities, or the DataSet, as well as other data sources such as XML.

Using LINQ to SQL generated classes, we can write the following now familiar-looking code against the SQL Server Northwind database in a local version of SQLExpress.

As you can see, this looks very similar to the code we wrote against a conceptual view over the AdventureWorks database using Entities in Section 4.2. The power of LINQ is that it enables this same simple programming pattern against any type of data.