What is NoSQL?

What is a NoSQL (Not Only SQL) Database?

A NoSQL database environment is, simply put, a non-relational and largely distributed database system that enables rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types. NoSQL databases are sometimes referred to as cloud databases, non-relational databases, Big Data databases and a myriad of other terms and were developed in response to the sheer volume of data being generated, stored and analyzed by modern users (user-generated data) and their applications (machine-generated data).

In general, NoSQL databases have become the first alternative to relational databases, with scalability, availability, and fault tolerance being key deciding factors. They go well beyond the more widely understood legacy, relational databases (such as Oracle, SQL Server and DB2 databases) in satisfying the needs of today’s modern business applications. A very flexible and schema-less data model, horizontal scalability, distributed architectures, and the use of languages and interfaces that are “not only” SQL typically characterize this technology.

From a business standpoint, considering a NoSQL or ‘Big Data’ environment has been shown to provide a clear competitive advantage in numerous industries. In the ‘age of data’, this is compelling information as a great saying about the importance of data is summed up with the following “if your data isn’t growing then neither is your business”.

Types of NoSQL Databases

There are four general types of NoSQL databases, each with their own specific attributes:

Graph database – Based on graph theory, these databases are designed for data whose relations are well represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Examples include: Neo4j and Titan.

Key-Value store – we start with this type of database because these are some of the least complex NoSQL options. These databases are designed for storing data in a schema-less way. In a key-value store, all of the data within consists of an indexed key and a value, hence the name. Examples of this type of database include:Cassandra, DyanmoDB, Azure Table Storage (ATS), Riak, BerkeleyDB.

Column store – (also known as wide-column stores) instead of storing data in rows, these databases are designed for storing data tables as sections of columns of data, rather than as rows of data. While this simple description sounds like the inverse of a standard database, wide-column stores offer very high performance and a highly scalable architecture. Examples include: HBase, BigTable and HyperTable.

Document database – expands on the basic idea of key-value stores where “documents” contain more complex in that they contain data and each document is assigned a unique key, which is used to retrieve the document. These are designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Examples include: MongoDB and CouchDB.

The following table lays out some of the key attributes that should be considered when evaluating NoSQL databases.

The Growth of Big Data

Big Data is one of the key forces driving the growth and popularity of NoSQL for business. The almost limitless array of data collection technologies ranging from simple online actions to point of sale systems to GPS tools to smartphones and tablets to sophisticated sensors – and many more – act as force multipliers for data growth.

In fact, one of the first reasons to use NoSQL is because you have a Big Data project to tackle. A Big Data project is normally typified by:

High data velocity – lots of data coming in very quickly, possibly from different locations.

Data variety – storage of data that is structured, semi-structured and unstructured.

Data volume – data that involves many terabytes or petabytes in size.

Data complexity – data that is stored and managed in different locations or data centers.

Continuous Data Availability

In today’s marketplace, where the competition is just a click away, downtime can be deadly to a company’s bottom line and reputation. Hardware failures can and will occur, fortunately NoSQL database environments are built with a distributed architecture so there are no single points of failure and there is built-in redundancy of both function and data. If one or more database servers, or ‘nodes’ goes down, the other nodes in the system are able to continue with operations without data loss, thereby showing true fault tolerance. In this way, NoSQL database environments are able to provide continuous availability whether in single locations, across data centers and in the cloud. When deployed appropriately, NoSQL databases can supply high performance at massive scale, which never go down. This is immensely beneficial as any system updates or modifications can be made without having to take the database offline. This fact alone draws the attention of businesses that are serving customers who expect availability of applications and where downtime equates to real dollars lost.

Real Location Independence

The term “location independence” means the ability to read and write to a database regardless of where that I/O operation physically occurs and to have any write functionality propagated out from that location, so that it’s available to users and machines at other sites. Such functionality is very difficult to architect for relational databases. Some techniques can be employed such as master/slave architectures and database sharding can sometimes meet the need for location independent read operations, but writing data everywhere is a different matter, especially when those data volumes are high. Other scenarios where location independence is an advantage are many and include servicing customers in many different geographies and needing to keep data local at those sites for fast access.

Modern Transactional Capabilities

The concept of transactions appears to be changing in the Internet age, and it’s been demonstrated that ACID transactions are no longer a requirement in database driven systems. At first blush, this assertion sounds extreme, as transactional integrity is a characteristic of most every data system – especially those with information requirements that demand accuracy and safety. However, what this refers to is not the jeopardizing of data, but rather the new way modern applications ensure transactional consistency across widely distributed systems. The “C” in ACID refers to data Consistency in relational database management systems which is enforced via foreign keys/referential integrity constraints. This type of consistency is not utilized in progressive data management systems such as NoSQL databases because there are no JOIN operations, as this would require more rigid enforcement of consistency. Instead, the “Consistency” that concerns NoSQL databases is found in the CAP theorem, which signifies the immediate or eventual consistency of data across all nodes that participate in a distributed database. The data is still safe and meets the AID portion of the RDBMS ACID definition, but its consistency is maintained differently given the nature and architecture of the system.

Flexible Data Models

One of the major reasons businesses move to a NoSQL database system from a relational database management system (RDBMS) is the more flexible data model that’s found in most NoSQL databases. The relational data model is based on defined relationships between tables, which themselves are defined by a determined column structure, all of which are explicitly organized in a database schema – all very strict and uniform. Problems begin to arise with the relational model around scalability and performance when trying to manage the large data volumes that are becoming a fact of life in a modern IT and business environment. A NoSQL data model – often referred to as schema-less – can support many of these use cases and others that don’t fit well into a RDBMS. A NoSQL database is able to accept all types of data – structured, semi-structured, and unstructured – much more easily than a relational database which rely on a predefined schema. This characteristic of a relational database can be a hindrance on flexibility because a predefined schema rigidly determines how the database and database data are organized. Many of today’s business applications actually have the ability to enforce rules on data usage themselves making a schema-less database platform a viable option.

Finally, performance factors come into play with an RDBMS’ data model, especially where “wide rows” are involved and update actions are many, which can have real implications on performance. However, a NoSQL data model easily handles such situations and delivers very fast performance for both read and write operations.

Better Architecture

Another reason to use a NoSQL database is because you need a more suitable architecture for a particular application. It’s critical that organizations adopt a NoSQL platform that allows them to keep their very high volume data in the context of their applications. Some, but not all, NoSQL solutions provide modern architectures that can tackle the type of applications that require high degrees of scale, data distribution, and continuous availability. Data center support, and as is more common, multiple data center support, should be a use case with which a NoSQL environment complies. It’s not just what your big data needs look like today but also out to greater time horizons that decisions should be made.

Analytics and Business Intelligence

A key strategic driver of implementing a NoSQL database environment is the ability to mine the data that is being collected so as to derive insights that puts your business at a competitive advantage. Extracting meaningful business intelligence from very high volumes of data is a very difficult task to achieve with traditional relational database systems. Modern NoSQL database systems not only provide storage and management of business application data but also deliver integrated data analytics that deliver instant understanding of complex data sets and facilitate flexible decision-making.

Which NoSQL database should you use?

Choosing the Right NoSQL Database

To this point we have been able to detail many reasons for a business to adopt a NoSQL, or non-relational database platform, and also covered different types of NoSQL database options. Determining which NoSQL database to adopt is an exercise that requires a business to a long hard look at itself and its applications – which is always good. Understand clearly what your business goals are, what your application(s) needs – and challenges – are today, and what those needs may be on the horizon. Also, make sure that both the business and the technology goals are aligned so that the platform decision made delivers for all key stakeholders. Then work back to the NoSQL platform that will satisfy these needs and provide the most solid foundation for you to build on.

Key considerations when choosing your NoSQL platform include:

Workload diversity – Big Data comes in all shapes, colors and sizes. Rigid schemas have no place here; instead you need a more flexible design. You want your technology to fit your data, not the other way around. And you want to be able to do more with all of that data – perform transactions in real-time, run
analytics just as fast and find anything you want in an instant from oceans of data, no matter what from that data may take.

Scalability – With big data you want to be able to scale very rapidly and elastically, whenever and wherever you want. This applies to all situations, whether
scaling across multiple data centers and even to the cloud if needed.

Performance – As has already been discussed, in an online world where nanosecond delays can cost you sales, Big Data must move at extremely high velocities no matter how much you scale or what workloads your database must perform. Performance of your environment, namely your applications, should be high on the list of requirements for deploying a NoSQL platform.

ContinuousAvailability – Building off of the performance consideration, when you rely on big data to feed your essential, revenue-generating 24/7 business applications, even high availability is not high enough. Your data can never go down, therefore there should be no single point of failure in your NoSQL environment, thus ensuring applications are always available.

Manageability – Operational complexity of a NoSQL platform should be kept at a minimum. Make sure that the administration and development required to both maintain and maximize the benefits of moving to a NoSQL environment are achievable.

Cost – This is certainly a glaring reason for making the move to a NoSQL platform as meeting even one of the considerations presented here with relational database technology can cost become prohibitively expensive. Deploying NoSQL properly allows for all of the benefits above while also lowering operational costs.

StrongCommunity – This is perhaps one of the more important factors to keep in mind as you move to a NoSQL platform. Make sure there is a solid and capable community around the technology, as this will provide an invaluable resource for the individuals and teams that will be managing the environment. Involvement on the part of the vendor should not only include strong support and technical resource availability, but also consistent outreach to the user base. Good local user groups and meetups will provide many opportunities for communicating with other individuals and teams that will provide great insight into how to work best with the platform of choice.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.