Category: SQL

A while back I posted a blog post how to setup a High Available SQL cluster on Azure using SIOS Datakeeper. As I’m an avid believer of storage spaces, I was looking for a moment to test drive “storage spaces direct” on Azure. The blog post of today will cover that journey…

UPDATE (01/02/2017) ; At this point, there is no official support for this solution. So do not implement it for production at this point. As soon as this changes, I’ll update this post accordingly!

UPDATE (08/02/2017) ; New official documentation has been released. Though I cannot find official support statements.

To measure is to know. If you can not measure it, you cannot improve it!

Today’s post will go more in-depth on what performance to expect from different SQL implementations in Azure. We’ll be focussing on two kind of benchmarks ; the storage subsystem and an industry benchmark for SQL. This so that we can compare the different scenario’s to each other in the most neutral way possible.

It is important to know that you will only get an SLA (99,95%) with Azure when you have two machines deployed (within one availability set) that do the same thing. If this is not the case, then Microsoft will not guarantee anything. Why is that? Because during service windows, a machine can go down. Those service windows are quite broad in terms of time where you will not be able to negotiate or know the exact downtime.

That being said… Setting up your own high available SQL database is not that easy. There are several options, though it basically bears down to the following ;

an AlwaysOn Availability Groups setup

a Failover Cluster backed by SIOS datakeeper

Where I really like AlwaysOn, there are two downsides to that approach ;

So a lot of organisations were stranded in terms of SQL and moving to Azure. Though, thank god, a third party tool introduced itself ; SIOS Datakeeper ! Now we can build our traditional Failover Cluster on Azure.

When taking a look towards the landscape of databases, one can only accept that there has been a lot of commotion about “SQL vs NoSQL” in the last years. But what is it really about?

SQL, which stands for “Structured Query Language”, has been around since the seventies and is commonly used in relational databases. It consists of a data definition language to define the structure and a data manipulation language to alter the data within the structure. Therefore a RDBMS will have a defined structure and has been a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s.

NoSQL, which stands for “Not only SQL”, departs from the standard relational model since it saw its first introduction in the nineties. The primary focus of these database was performance, or a given niche, and focus less consitency/transactions. These databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve.

So it depends on your need…

Do you want NoSQL, NoSQL, NoSQL or NoSQL?

NoSQL comes in various flavors. The most common types of NoSQL databases (as portrayed by Wikipedia) ;

There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational databases. Nevertheless, a basic classification is based on data model. A few examples in each category are:

A document-oriented database is designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. The central concept of a document-oriented database is that Documents, in largely the usual English sense, contain vast amounts of data which can usefully be made available. Document-oriented database implementations differ widely in detail and functionality. Most accept documents in a variety of forms, and encapsulate them in a standardized internal format, while extracting at least some specific data items that are then associated with the document.

A key-value (an associative array, map, symbol table,or dictionary) is an abstract data type composed of a collection of key/value pairs, such that each possible key appears just once in the collection.

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Example

MultiModel

Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database.

And what flavor do I want?

Each type and implementation has its own advantages… The following chart from Shankar Sahai provides a good overview ;

Any other considerations I should take into account?

Be wary that most implementations were not designed around consistency integrity and more towards performance. Transactions are referential integrity are not supported by most implementations. High availability designs (including on geographic level) are possible with some implementations, though this often implies a performance impact (as one would expect).

5. Conclusion
As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.
For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.

They did a very decent job in performance testing various implementations!

Up in the Clouds

Views are my own

The content of this blog will, at all times, portray my own views. At no time will this reflect the views of the organization I am linked to. Neither can the information provided be used as support statement.