UnQL: A Standardized Query Language for NoSQL Databases

Traditional SQL developers looking warily at the rapid growth of NoSQL databases need not worry about transferring their skills to a new programming paradigm, as UnQL, the query language specification for NoSQL, features many of the same constructs as SQL itself. In addition to document databases and non-relational stores, UnQL is also suitable for querying data stored in JavaScript Object Notation (JSON) format used for data object interchange between systems.

This growing language specification was primarily developed by the people behind the SQLite and CouchDB databases. In general, UnQL can be thought of as a superset of SQL focused on collections and documents as opposed to tables and rows. Additionally, UnQL provides no functionality typical of data description languages (DDL) such as the CREATE or DROP TABLE and CREATE or DROP INDEX statements.

Richard Hipp, inventor of SQLite and one of the main developers on the UnQL project stated:

“Relational database technologies – and the SQL query language – have served us very well for over forty years, but modern applications require far more flexibility at the data layer than those of generations past. UnQL builds upon our experience with SQL, supplementing that language with syntax and concepts appropriate for the unstructured, self-describing data formats of post-modern applications.”

Many leaders in the NoSQL space believe a standardized query language remains vital in driving adoption as the nascent post-modern database industry continues to mature. Hipp’s main compatriot in the development of UnQL, Couchbase CTO Damien Katz, provides insight into UnQL’s importance in growing the world of NoSQL:

“The work we’ve done on UnQL has been very gratifying. UnQL stems from our belief that a common query language is necessary to drive NoSQL adoption in the same way SQL drove adoption in the relational database market. I look forward to continuing my work alongside SQLite to push this new language forward.”

While the full UnQL syntax is still a work in progress, enough of the language constructs are in place to get a feel for writing queries. Once a SQL developer gets past the fact that tables and rows are essentially replaced by collections and documents, she will feel at home considering the similarity between both query languages.

Taking a Look at UnQL’s Syntax

Creating collections in a NoSQL database using UnQL is accomplished by the CREATE statement, i.e. CREATE cool_nosql_collection. A future enhancement to the language is expected to allow collections to be directly created by adding documents using the INSERT INTO statement, but it is not part of the current specification, even though some UnQL compliant databases currently behave in this manner.

Inserting documents into a collection using UnQL provides more power to the database developer compared to many earlier flavors of SQL. In addition to using the VALUE clause of the INSERT statement to insert strings or comma-delimited values, the developer can insert complete objects by following JSON syntax.

For example, the following INSERT example stores an object into a collection:

INSERT INTO cool_nosql_collection VALUE {

type:"nested",

content: {

content: "document object",

docNumber:1,

articleContent: {str:"Dataversity", str2:"Article"},

newArticle:true

}

};

The braces along with the “type” and “content” keywords define the structure and content of the object to be inserted. In addition to only objects, any content marked up as a valid JSON string can be inserted into a collection using UnQL. Placing double quotes around the field names (docNumber, articleContent, newArticle) is not a requirement of valid UnQL, but the quotes are returned in the output when that object is queried.

For example, entering the following UnQL statement, SELECT FROM cool_nosql_collection, after inserting the previous object returns the following output:

WHERE clauses also benefit from dot notation. SELECT FROM cool_nosql_collection WHERE cool_nosql_collection.docNumber = 1 returns the same document record as the first SELECT statement in this section.

DELETE statements work in the same manner as in earlier flavors of SQL, both with and without an associated WHERE clause.

The UPSERT statement is not part of the official SQL language standard, but it is supported in some newer database engines, like MongoDB. Essentially, the statement is similar to a SQL MERGE, where a record is inserted if it does not exist, or updated if the record already exists. UnQL uses a nested combination of the UPDATE and INSERT statements to accomplish this functionality.

For example, this nested statement pair increments a page counter if that page already exists, or inserts it into the collection otherwise:

At a glance, it is obvious that the UnQL syntax offers familiarity to developers experienced with both the syntax of SQL as well as the dot notation used in most object-oriented languages (even sometimes within SQL itself).

Further Considerations on UnQL Syntax and Usage

While an UnQL database is generally considered to contain zero to more collections, indexes can be used to improve performance. An index can be created explicitly through the CREATE INDEX statement, and it is also possible to create them automatically, depending on the specific database vendor implementation.

Documents stored in an UnQL supported database need not be in the format of a JSON object, but its record in the database can be represented by a JSON object.

If a CREATE COLLECTION statement is executed on an existing collection, no operation occurs. Conversely, a new collection is created when an INSERT or UPSERT operation is made on a non-existent collection.

DROP COLLECTION works in a similar manner. In both cases, the IF EXISTS clause used in SQL is not needed when creating or deleting collections in UnQL. Dropping a collection also deletes that collection’s documents and indexes. Some database systems automatically drop a collection after its last document is deleted.

Finally, UnQL supports the time-honored SQL clauses like GROUP BY, HAVING, ORDER BY, and LIMIT with their UnQL functionality being essentially similar to their implementation in SQL. A full slate of operator support is also included, with their behavior being mostly compatible with JavaScript.

UnQL in the Near Term

Considering the massive growth of unstructured data, much of it residing in documents, it stands to reason that this will continue to drive growth in NoSQL database implementations as well as raise the stakes for the fledgling UnQL query language.

A recent interview with Richard Hipp revealed one of his main points in beginning development on UnQL was to prevent vendor lock-in, considering the growing number of players in the still-emerging NoSQL database sector. Hipp opined:

“There are several document-oriented databases available today and they are growing in popularity. But all existing document-oriented databases have their own proprietary and incompatible query methods, meaning that it is hard to move an application from one database engine to another. And the query methods that are available tend to be very low-level, meaning that a lot of the query logic that used to be handled automatically by the database engine must now be manually coded into the application by the programmer…[and] UnQL aims to remedy this situation by providing a common database query language that can be used to access document-oriented databases from multiple vendors. This helps developers write portable applications and avoid database-vendor lock-in. UnQL also strives to provide a very powerful and rich query language that transfers much of the complex algorithm-picking logic back to the database engine, saving lots of code in the application, and lots of developer time and frustration.”

With the language specification essentially defined, most of the current work with UnQL involves building interfaces to many of the current NoSQL databases. Damien Katz is currently working on an UnQL interface for CouchDB, while Hipp is developing UnQLite as an embeddable document database suitable for mobile devices, similar to what his SQLite provides for relational databases.

There remains little doubt that any DBA or database developer currently exploring “post-modern databases,” to borrow the term preferred by Hipp over NoSQL, needs to take a closer look at UnQL. The similarity with traditional SQL means most database professionals should have little difficulty picking up UnQL’s syntax as well as the subtle differences when compared to SQL. Hopefully, the NoSQL industry as a whole sees the benefits of a standardized query language, and UnQL interfaces will become available for most popular NoSQL databases in the near term. Richard Hipp gets the last word:

“Right now, we have only a rough prototype. We are continuing to refine the language based on input from perspective users and taking into account the lessons we are learning while implementing the prototype. We want to have one or more actual, usable database engines available and ready for development use during this calendar year (2011), with deployment-ready implementations available next year (2012).”