Kentico 8 Technology - Introduction to DataQuery

Kentico 8 comes with a brand new API that simplifies the manner in which you can query data from a database. Read this article to learn more!

Hi There,

Kentico 8 has officially been released, and it is time for us to teach you all of the goodies that it brings on the technical side. Our plan is to release two deep-dive technical topics per week for the next several months, so that you can leverage all of the new technical improvements. In case you are eager to learn something sooner rather than later, do not hesitate to let us know by posting a comment and we will prioritize each question (or set of questions) accordingly.

I would like to begin this series with something that is extremely important, as it is something that you probably use in your daily job if you leverage our API in your customization. So today's topic is "Querying the data".

Why did this even need improvements?

If you are already familiar with Kentico 7, then you know that the best way to get more than just one record of data of a particular type used to be by calling a method such as this:

var users = UserInfoProvider.GetUsers(where, orderBy, topN, columns);

Thereafter, either work with that data as a DataSet (all versions), or as a strongly typed Enumerable<UserInfo> (latest versions).

While this doesn't seem very complicated, it is rather old-fashioned, and suffers from a couple of problems:

You passed all nested parts of the query as strings, so you needed to take care of handling SQL injection. For most providers there was no way to pass parameters to the query, which also affected query caching, and therefore, the performance of the SQL server

That SQL code you passed was specific to particular DB implementation (MS SQL Server), which prevents us from potentially supporting other databases in future

Even though the result was enumerable and you could use LINQ expressions on it, the results were already loaded from the database. Thus, even basic additional filtering was ineffective and you weren’t able to simply pass the query further to someone else so that they could modify it more.

It was hard for us to provide new features for the queries, such as distinct, groupby etc., because that would mean producing really heavy method headers, and synchronizing them across all providers.

I am not going to sort this list of problems by priority, and I am sure that there are others which worth mentioning. Let me know if there are other problems solved by the concept I am going to discuss.

Making things simpler

So how do you make sure that the method headers do not contain too many parameters while you are adding functionality and keeping it as simple as possible for user? ... By removing them completely!

In a good design, it’s not the responsibility of the provider to mimic all features that the data engine provides; its responsibility is to identify where the data is and provide a way to access it and modify it. So the result that you can now find in almost all providers that you get is the following:

var usersQuery = UserInfoProvider.GetUsers();

No parameters, no complication! You get all users this way, but how do you narrow them down to what you need, and at the same time, apply other parameters?

The answer is already in the title of this article ... the concept of DataQuery. I have intentionally named the resulting variable usersQuery precisely because by calling that code you don't just get the results instead, you get an open query that can be further modified to your liking.

The exact result type that you now get out of this call is ObjectQuery<UserInfo>, but I will get to that later.

DataQuery concept

Without further ado, let me tell you what DataQuery is, and how it solves the problems mentioned above.

If you already know the data engine of Kentico 7, you know that it kind of wraps the final call to IDataConnection.ExecuteQuery / Reader with all of the different features that it comes with.

If you are new to it (it is not necessary for Kentico developers to know such things unless you need to make some low-level modifications), just know that DataQuery is an object-oriented abstraction of the database query for reading data, that eliminates all of your concerns about properly using Sql commands with their parameters, Data adapters, Connections, and other stuff related to communicating with the database while reading data. It is very similar to what the whole LINQ concept or entity framework does, but in some aspects it is even better, as I will show you.

Note, that anytime I mention DataQuery, the same aspects are provided within ObjectQuery, DocumentQuery, and other potential XYZQuery that we provide, as these are strongly typed or otherwise enhanced extensions of the base DataQuery. I will explain how that "inheritance" is provided later as well.

The key aspects of the DataQuery are the following:

Support for LINQ and strongly-typed enumeration

Fluent syntax within its API for most readable code

Abstraction of the database syntax

Parameter-driven queries

Lazy loading of results

Centralized source point of data

Before I talk about each one’s particular aspects, let me give you two examples so that I can comment on them later:

Or cast to DataSet if you want to work with it in a traditional, and more performance-effective way.

LINQ support

Similar to the InfoDataSet<UserInfo> that the provider returned earlier, the ObjectQuery<UserInfo> that it returns now is strongly typed and enumerable, so you can query it with LINQ. It works in a similar way to InfoObjectCollection<UserInfo>, which was the only way to access data with LINQ in previous versions, so that you can write any query while the system attempts to parse it and execute it in database. If it’s not successful with the parsing, it uses fallback to LINQ to objects to execute the operation.

The parsing process can handle some basic operations that are similar to what LINQ to SQL can handle, we provide a set of methods that are guaranteed to execute in DB when used. You can find them in class LinqSqlMethods. Regarding the native string methods, we cover StartsWith, EndsWith and Contains, but it can be expanded to other methods as well. I will show you how on another occasion.

As long as the parsing process is able to do all operations in database, it stays on the same type of query (even though you just get an Enumerable result), within fallback to LINQ to objects it converts to a plain enumerable, so that you can technically test if the result had to use the fallback or not by checking the type of result it yields.

Fluent syntax

LINQ is a powerful tool, but the query is sometimes not readable enough, and the parsing of it is complicated in some more advanced scenarios. Also, some developers are not really fans of the LINQ, either for legacy reasons, or simply that they like to know exactly what is happening in the background and how much overhead there is. Therefore, it is good to have some more traditional ways to address these needs.

DataQuery provides Fluent interface, which means that you can call several subsequent methods on a particular object to modify its behavior.

This way, you can easily write your code as actual readable "sentences" to make it clear what the query represents. There are numerous methods available for particular types of queries related to the context of the queries. Basic DataQuery contains only standard methods related to the query itself, such as Where, OrderBy, TopN, Columns, Distinct, GroupBy (yes, we now support also distinct and group by!), and other queries that represent strongly typed objects, such as ObjectQuery or DocumentQuery, which provide more context-related methods like WithCodeName, OnSite, OnlyPublished, and others.

I am not going to give you a full list of these as there are several options available. Instead, see the IntelliSense for particular queries to get an idea. I will cover more examples of advanced scenarios in another article.

Each call of such a method returns the same object, only that it’s modified by the method itself. While LINQ by default provides immutable instances, DataQuery is mutable by default in orderto provide the best performance instead of cloning the result with each iteration. There are many ways to make the particular query immutable, branch them, or simply clone them, but I will cover that sometime in the future as well.

Hierarchy of the queries

I have mentioned a couple of query types already, so let me give you a basic idea of how it works internally, and how it is possible that queries inherit from their bases, but still allow the returning of the identity of the same type they belong to within the fluent syntax.

The hierarchy of the queries is shown in the following diagram:

Essentially, there are two types of query classes:

Base class - This is abstract and generic, typed by the final type (the identity) and other types necessary, and serves for the inheritance of the queries. These are marked as the Inheritance chain in the diagram.

Final class - This is non-generic or generic just by the type of data it serves, which materializes a particular query and is instantiated. These are marked as Materialized queries.

In the end, this means that the queries are capable of providing correct fluent syntax and IntelliSense, and we are able to simply support and distribute functionality from the base class to others by inheriting base classes.

Each of these base classes also have both generic and non-generic interfaces, so that they can be used in a more general context where there is no way to use the strong typed data, or for the purpose of down-casting a particular query.

The inheritance also has one extra branch to support queries with multiple sources. These are called Multi-queries and are leveraged, for example, by a document engine, but we will cover these later.

The data flow

I have called this part the data flow. I will explain to you where and how in a particular system the queries are created, and how the whole concept is centralized. Centralization is important for customization, integration, and faking of data in unit tests, which my colleagues and I will cover later.

As I said, the query results are lazy-loaded, so anything that is shown in the given diagram is an open query that can be further parameterized. The database isn't called until someone actually starts to work with the result data either by enumeration, or by casting it to a different type, e.g DataSet. The diagram shows what the system does for the particular ways of "delivering query":

Create a new standalone ObjectQuery instance, either one which is general, by object type, or strongly typed with a Generic constructor.

Call the GetDataQuery method from a particular info object (through its generalized interface)

What is important in this diagram for your reference in the future, is that all read operations go through the provider method GetObjectQueryInternal, and in the end are executed as plain untyped DataQuery that calls IDataConnection.ExecuteQuery as I mentioned earlier. So it doesn't really change the way how the system operates with database, instead it provides an object oriented abstraction over it. In the end, the communication with database is still done by using the good old Data engine.

This also means that there is now a single point of control over the source data that allows redirection to other potential sources of data. This is leveraged by our automated testing engine, which is able to fake the provider data in this manner. You can learn more about that in another article related to Kentico 8 support for automated testing. Last but not least, it will give all of us more flexible ways to provide the ability to integrate with external systems.

Make sure that all methods that return some data are based on existing ones that return ObjectQuery, or on GetObjectQuery method of the provider, so that you don't lose this advantage.

How to convert older code?

Since part of introducing the DataQuery is stabilizing the API, we had to get rid of some of those existing cumbersome methods that had the where, orderby, etc. parameters mentioned above. If I managed to convey the whole concept to you, you probably have an idea how to easily convert potential existing code. Just to make sure, here is a specific example of the original code and its converted version:

You can still pass full parts of SQL statement such as the where condition, however, in this case it is up to you to handle potential SQL injection in the conditions, column names, etc. as we don't have control over that part of SQL query in that case. As we progress with the transition to DataQuery in next version, we will eventually remove this option, so I recommend you to prefer setup using .Where (columnName, operator, value) or its specific variants.

I am using var for the result, because it is more open to potential future changes. If you are using vars instead of specific complex types in your code, the upgrade process will be easier for you with that code. DataQuery has a very similar interface to previous return types of such operations, and implicit conversion operators, so using var gives you a smoother transition to the new code.

You can easily skip any setup that is not necessary, e.g. if you only need to set up TopN and Columns, you don't need the .Where and .OrderBy code at all and you don't need to pass in any nulls or default values.

Wrap up

I have already unveiled many internals, so let me finish this article and summarize:

Instead of cumbersome methods with numerous parameters to get a list of data, we now provide simple methods with no parameters that return an open query, which can be further modified

The query supports both the LINQ (is strongly typed enumerable) and fluent interface, the choice is up to you.

There is a chain of inheritance in the queries that allows to further extend them or make them your own. Everything supported by DataQuery is supported by ObjectQuery, DocumentQuery, etc.

The data retrieval is still based on the same data engine, just wrapped with the DataQuery abstraction.

The data retrieval is centralized for providers that use DataQuery for all operations.

I will follow up with some specific examples in another article.

Any feedback regarding whether or not you like the way Kentico API is going is appreciated.