Interview: Richard Hipp on UnQL, a New Query Language for Document Databases

UnQL (Unstructured Query Language) is meant to be an open query language for document databases. It is open because its initiators, Richard Hipp, creator of SQLite, and Damien Katz, creator of CouchDB, want to avoid any vendor lock-in and UnQL development to be driven by community. Unlike relational databases consisting of fixed tables, an UnQL document database contains collections of documents expressed in JSON. But UnQL is built to be a functional superset of SQL, seeing tables as documents with a flat structure and fixed fields. Thus, UnQL can be used to query a RDBMS, but it cannot be used to create/delete tables or change its schema.

Following is an interview on UnQL with Richard Hipp who wanted to make sure we know UnQL is a team effort and he is not the sole decision maker. One of the interesting details coming out from the interview is that he intends to create UnQLite, a “small embeddable document-oriented database in the same spirit as SQLite.”

InfoQ: Why UnQL? What problem are you trying to solve?

RH: UnQL is a database query language, akin to SQL, but designed for modern document-oriented databases.

SQL assumes a rigidly defined data schema. Each table has a fixed number of columns and each column has a defined datatype.

UnQL assumes a more flexible approach to storing data. Instead of "tables", UnQL uses "collections". (The concept is the same, but the name is changed since "tables" are square whereas "collections" can be of varying shape.) Each collection consists of zero or more documents represented as JSON strings. A document in UnQL corresponds to a row in SQL. The big difference is that with UnQL, the documents (rows) do not have a fixed number of columns, the columns (now called "fields") do not have a fixed datatype, and the fields can be nested - the value of a field can be another document for example.

There are several document-oriented databases available today and they are growing in popularity. But all existing document-oriented databases have their own proprietary and incompatible query methods, meaning that it is hard to move an application from one database engine to another. And the query methods that are available tend to be very low-level, meaning that a lot of the query logic that used to be handled automatically by the database engine must now be manually coded into the application by the programmer.

UnQL aims to remedy this situation by providing a common database query language that can be used to access document-oriented databases from multiple vendors. This helps developers write portable applications and avoid database-vendor lock-in. UnQL also strives to provide a very powerful and rich query language that transfers much of the complex algorithm-picking logic back to the database engine, saving lots of code in the application, and lots of developer time and frustration.

The data manipulation and query language of UnQL is a functional superset of SQL, so UnQL can, in theory, also be used to access legacy SQL database engines. From the point of view of UnQL, a legacy SQL database looks like a document-oriented database where every document has the exact same flat structure using exactly the same datatypes for each field. (Note that the Data Definition Language (DDL) of SQL is not replicated in UnQL and so UnQL is unable to do actions like CREATE TABLE or DROP INDEX on a legacy SQL database. UnQL can insert, delete, update, or query data in a legacy database, subject to the SQL formatting constraints, but it cannot change the schema of a legacy SQL database.)

InfoQ: Do you want to create a language fit for all NoSQL data stores, similar to what SQL is to RDBMSes?

RH: First off, I prefer the term "post-modern database" over "NoSQL". Post-modern databases are designed to work around the CAP theorem. Traditional relational databases cling to consistency (the C in CAP) and as a result have to sacrifice either Availability or tolerance of Partition. Post-modern databases are willing to give up consistency in order to have both Availability and Partition at the same time. But by giving up consistency, that means that there is an absence of objective truth with a post-modern database, and the absence of objective truth is the defining feature of post-modernism - hence the name. To think of it another way, when you ask a question of a post-modern database, you don't get back a fact, you get back an opinion.

The previous paragraph is not a slur against post-modern databases. Post-modern databases are very powerful and definitely have there place. Many people believe (myself included) that post-modern databases will eventual come to dominate the database ecosystem. But at the same time, it is important to understand their limitations. Post-modern databases transfer the burden of maintaining a consistent view of the world from the database engine into the client application. This makes it much easier for the database engine to scale out, but it also makes more work for the application. Developers need to approach the use of post-modern databases will full and sober knowledge of the tradeoffs.

To answer your question: Yes, UnQL is intended to be for post-modern document-oriented databases what SQL is to relational databases. UnQL intends to be a pragmatic universal access language that all document-oriented database speak.

InfoQ: Have you used UnQL with CouchDB? If not, do you have plans to do so? (I've seen D. Katz involved in the project.)

RH: Damien Katz intends to provide an UnQL interface to CouchDB in the near future, yes. Other plans are in the works to provide UnQL interfaces to other database engines. I hope to provide an UnQL interface to legacy SQLite databases, for example. I also hope to make available "UnQLite" - a small embeddable document-oriented database in the same spirit as SQLite, but with a new file format.

InfoQ: I understand that UnQL is in early phases. What should we expect from it in the future? Where do you want to take it?

RH: Right now, we have only a rough prototype. We are continuing to refine the language based on input from perspective users and taking into account the lessons we are learning while implementing the prototype. We want to have one or more actual, usable database engines available and ready for development use during this calendar year, with deployment-ready implementations available next year.

At the end of our interview, Hipp expressed his desire to see the community involved in this project.

RH: We really, really want community input on this effort. Volunteers to help with the language design, suggest improvements, and help with the prototype implementation(s) will be greatly appreciated. Everything is open-source.

UnQL is not a power play by a few companies try to force their views on the world. UnQL is intended to be a pragmatic community-driven effort to provide cross-platform database functionality and make life easier for application developers. Our goal is to end the vendor lock-in problem that we believe is holding back development and acceptance of post-modern document-oriented databases.

Won't you please join us in helping to make UnQL a success?

UnQL is currently a prototype. Its website contains more information on the syntax and a link to a repository containing the source code and some examples.

Tell us what you think

According to the railway diagrams at website they have chosen to have SELECT before FROM. I hope this will be reconsidered -- IMO it's more natural to start with context (collections) and then specify details (fields), as it is in LINQ. This would also make code-completion easier as IDE will know which fields to offer (not sure if this is relevant in a no-schema database).