My latest book Next Generation Databases is now available to purchase! You can buy it from Amazon here, or directly from Apress here. The e-book versions are not quite ready but if you prefer the print version you’re good to go.

I wrote this book as an attempt to share what I’ve learned about non-relational databases in the last decade and position these in the context of the relational database landscape that I’ve worked in all my professional life.

The book is divided into two sections: the first section explains the market and technology drivers that lead to the end of complete “one size fits all” relational dominance and describes each of the major new database technologies. These first 7 chapters are:

Three Database Revolutions

Google, Big Data, and Hadoop

Sharding, Amazon, and the Birth of NoSQL

Document Databases

Tables are Not Your Friends: Graph Databases

Column Databases

The End of Disk? SSD and In-Memory Databases

The second half of the book covers the “gory details” of the internals of the major new database technologies. We look at how databases like MongoDB, Cassandra, HBase, Riak and others implement clustering and replication, locking and consistency management, logical and physical storage models and the languages and APIs provided. These chapters are:

Distributed Database Patterns

Consistency Models

Data Models and Storage

Languages and Programming Interfaces

The final chapter speculates on how databases might develop in the future. Spoiler alert: I think the explosion of new database technologies over the last few years is going to be followed by a consolidation phase, but there’s some potentially disruptive technologies on the horizon such as universal memory, blockchain and even quantum computing.

The relational database is a triumph of software engineering and has been the basis for most of my career. But the times they are a changing and speaking personally I’ve really enjoyed learning about these new technologies. I learned a lot more about the internals of the newer database architectures while writing the book and I’m feeling pretty happy with the end result. As always I’m anxious to engage with readers and find out what you guys think!

You would have to have been living under a rock for the past few years not to have heard of Bitcoin. Bitcoin is an electronic cryptocurrency which can be used like cash in many web transactions. At time of writing there are about 15 million Bitcoins in circulation, trading at approximately $USD 360 each for a total value of about $USD 5.3 billion.

Bitcoin combines peer-to-peer technology and public key cryptography. The owner of a bitcoin can use a private key to assert ownership and authorize transactions – others can use the public key to validate the transaction. As in other peer to peer systems such as Bittorrent, there is no central server which maintains Bitcoin transactions – rather there is a distributed public ledger called the Blockchain. I wrote a short article on blockchain here.

The implications of cryptocurrencies is way beyond our scope here, but there are definite implications for database technologies in the Blockchain concept. Blockchain replaces the trusted third party that must normally mediate any transfer of funds. Rather than a centralized database that records transactions and authenticates each party, Blockchain allows transactions and identity to be validated by consensus with the Blockchain network – each transaction is confirmed by public-key based authentication from multiple nodes before being concluded. Blockchain could be as disruptive to our traditional notions of banking and non-monetary transactions as peer-to-peer systems like napster were to the music business.

The Blockchain underlying Bitcoin is public, but there can be private (or permissioned) Blockchains which are “invitation only”. Whether Private or public, Blockchains arguably represent a new sort of shared distributed database. Like systems based on the Dynamo model, the data in the block chain is distributed redundantly across a large number of hosts. However, the Blockchain represents a complete paradigm shift in how permissions are managed within the database. In an existing database system, the database owner has absolute control over the data held in the database. However in a Blockchain system, ownership is maintained by the creator of the data.

Consider a database that maintains a social network like Facebook: although the application is programmed to allow only you to modify your own posts or personal details, the reality is that the Facebook company actually has total control over your online data. They can – if they wish – remove your posts, censor your posts or even modify your posts if they really wanted to. In a Blockchain based database you would retain total ownership of your posts and it would be impossible for any other entity to modify them.

Applications based on Blockchain have the potential to disrupt a very wide range of social and economic activities. Transfers of money, property, management of global identity (passports, birth certificates), voting, permits, wills, health data, and a multitude of other transactional data could be regulated in the future by Blockchains. The databases that currently maintain records of these types of transactions may become obsolete.

Most database owners will probably want to maintain control of the data in the database, and therefore it’s unlikely that Blockchain will completely transform database technology in the short term. However it does seem likely that database systems will implement Blockchain based authentication and authorization protocols for specific application scenarios. Furthermore, it seems likely that formal database systems built upon a Blockchain foundation will soon emerge.

Once of the concepts I found difficult initially when looking at non-relational systems is the concept of the vector clock. Some databases – like Cassandra - use timestamps to work out which is the “latest” transaction. If there are two conflicting modifications to a column value, the one with the highest timestamp will be considered the most recent and the most correct.

Other Dynamo systems use a more complex mechanism known as a vector clock. The vector clock has the advantage of not requiring clock synchronization across all nodes, and helps us identify transactions that might be in conflict.

Despite its name, the vector clock does not include any timestamps. Rather it is composed of a set of counters. These counters are incremented when operations complete, in a similar way to the traditional System Change Number pattern that we are familiar with from relational systems like Oracle. The set contains one counter for each node in the cluster. Whenever an operation occurs on a node, that node will increment its own counter within its vector clock. Whenever a node transmits an operation to another node it will include its vector clock within the request. The transmitted vector clock will include the highest counter for the transmitting node as well is the highest counters from other nodes that the transmitting node has ever seen.

When a node receives possibly conflicting updates from other nodes, it can compare the vector clocks to determine the relative sequencing of the requests. There is a defined set of vector clock operations that can tell if:

The two vector clocks come from nodes that are completely in sync

One node is “out of date” with respect of the other node

The clocks are “concurrent” in that each node has some information that is more up to date than the other node. In this case we can’t choose which update is truly the more correct.

Vector clocks are notoriously difficult to understand, though the underlying algorithm is really quite simple. The diagram below shows an example of three vector clocks incrementing across three nodes. The algorithm is somewhat simplified to improve clarity

In the example the vector clocks start out set to 0 for all nodes (1). Updates to nodes from external clients caused the nodes to increment their own element of the vector clock (2). When these changes are propagated to other nodes, the receiving node updates its vector clock and merges the vector clocks from the other nodes (3). Event (H) occurs when node 2 receives the vector clock (F) from node 1 and (G) from node 3 (4). Each of these vector clocks contain elements higher than the other - vector clock F has the higher value for node 1, while vector clock G has the higher value for node 3. There is no way for node 2 to be sure which of the two vector clocks represent the most up-to-date data - each of the sending nodes “knows” something that the other node does not, and consequently it’s not clear which of the two nodes “knows” best.

For those of us from the world of strictly consistent databases like Oracle, think of the vector clock as a set of System Change Numbers from each system. We examine the SCNs from each node to see if there are nodes that might not have seen all the changes that have been recorded on another node.

The Vector clock in above us that Version G and Version F are conflicting – each contains information from unique updates that could both contain important information. What then, is the system to do? Here are some of the options:

Revert to last write wins: two updates are unlikely to have occurred at the exact same nanosecond, so one will have a higher timestamp value. We could decide that the highest timestamp “wins”.

Keep both copies, and require that the application or the user resolve the conflict.

Somehow merge the data. This is the approach taken by the original Dynamo which managed Amazon’s shopping cart. If there are two conflicting shopping carts they are merged and the worst that can happen (from Amazon’s point of view) is that you buy some things twice. Another merge can occur with things like counters: rather than having one counter increment overwrite another, we can deduce that both operations wanted to increment the counter and increment it twice. A special class of data types: Conflict-Free Replicated Data Type (CRDT) exist that allow these sort of merges to be predefined.

There are advocates for the vector clock – such as the architects of Riak - , and advocates for the timestamp system used in Cassandra. Neither party disagree about the concrete implications of the two approaches: they differ on the desirability of the consequences. Last Write Wins represents a simpler model for the application developer and administrator, Vector clocks allow for conflicts to be identified but which must then be resolved. In a later post I’ll give an example of how you programmatically resolve conflicts in Riak.

Most of the examples use the sample films documents shown below (this is the same sample data we created for MongoDB in this post):

N1QL allows us to perform basic queries to retrieve selected documents or attributes of selected documents:

N1QL allows us to access nested documents within the JSON structure using array notation. So for instance in the example below Actors[0] refers to the first nested document within the actors array:

We can query for subdocuments which match a search criteria using WHERE ANY syntax:

The UNNEST command allows embedded documents to be “joined” back to the parent document. So here we get one result for each Actor who starred in film 200, with the film title included in the results:

The UNNEST command allows us to perform the equivalent of joins between parent and children documents where the child documents are nested within the parent. N1QL also allows us to join across disparate documents, providing that one of the document collections contains a reference to the primary key in the other.

So for instance if we had a bucket of documents that contains the primary keys of “overdue” films in our imaginary (and by now definitely struggling) DVD store, then we can join that to the films collection to return just those films:

N1QL also contains DML statements allowing us to manipulate the contents of documents, and DDL statements allowing creation and modification of indexes.

N1QL is ambitious attempt to bring SQL into the world of document databases. It’s interesting to consider that as the same time that companies like CouchBase are introducing SQL support into their database, that companies like Oracle are introducing strong JSON support into their SQL-based database. It would seem that the two worlds are coming together.

As I write this I’m at 35,000 ft (or so) on my way to yet another Collaborate. This year I’ll be presenting two sessions and participating in one panel:

963: Writing to Lead

Monday, April 13 | Banyan E, South Convention Center, Level 2

10:30 a.m. – 11:30 a.m.

Jonathan Gennick will be moderating a panel including myself, Bobby Curtis, Charles Kim, Darl Kuln and Michael Rosenblum. We’ll be talking about how writing – book, blogs, articles, twitter, etc – to best effect. The panel are all authors and hopefully between us we’ll have a lot of useful advise to offer the aspiring or existing book authors.

995: Top 5 Trends in Database Technology

Tue, Apr 14, 2015 (04:30 PM - 05:30 PM) : Banyan E

It’s been an amazing 5 years in database technology – we went from RDBMS for everything to Big Data, NoSQL, NewSQL and a lot of innovation within RDBMS as well. This talk covers 5 big technologies that are influencing Oracle and other database technologies. It includes some of my best recycled jokes as well hopefully some usefull technical content. There’s an article based on this presentation if you want a preview here.

442: Top 10 Oracle Database Tuning Tips and Techniques

Thu, Apr 16, 2015 (08:30 AM - 09:30 AM) : Palm B

Seasoned presenters know that for some reason “top 10” presentations somehow get accepted more readily than more abstract titles. Something to do with the impression of a concrete amount of information I suppose. But once you start giving a top 10 talk you quickly realize you can only budget 5 minutes at the most for each topic. But actually I feel pretty happy about this talk as a high level summary of the core concepts that I’ve picked up in 25 years of Oracle performance tuning. The main thing I’ve tried to accomplish in this talk is to organize the core concepts of performance tuning around a sensible tuning philosophy. I’m not assuming a great deal of Oracle tuning experience.