Introduction

Let’s say you’ve decided to set up a website
or an application. You’ll obviously need something to manage the
data. Yes, that’s right, a database. So, what is it going to be? MySQL, MS-SQL,
Oracle or PostgreSQL? After all, nothing can be as amazing as a good old RDBMS
that employs SQL to manage the data.

Well, allow me to introduce to you an entirely
unique and unconventional Database model – NoSQL. Just like every other fine
article out there, we too shall begin ith...eh....disclaimers!

NoSQL stands for not-only-SQL. The idea here
is not to oppose SQL, but instead provide an alternative in terms of storage of
data. Yet, for the obvious reason that most users are well versed with SQL,
many NoSQL databases strive to provide an SQLlike query interface.

Why NoSQL?

That’s a valid question, indeed. Well, here
are the reasons:

Managing Large Chunks of Data: NoSQL databases can easily handle numerous
read/write cycles, several users and amounts of data ranging in petabytes.

Schema? Nah, not needed: Most NoSQL databases are devoid ofschemaand thereforeveryflexible. Theyprovidegreat choices when it comes to constructing a
schema and foster easy mapping of objects into them. Terms such as
normalization and complexjoinsare,well, notneeded!

Programmer-friendly: NoSQL databases provide simple APIs in every major programming
language and therefore there is no need for complex ORM frameworks. And just incase APIs are not available for a particular programming language, data
can still be accessed over HTTP via a simple RESTful API, using XML and/or
JSON.

Availability: Most distributed NoSQL databases provide easy replication of data
and failure of one node does not affect the availability of data in a major
way.

Scalability: NoSQL databases do not require a dedicated high performance
server. Actually, they can easily be run on a cluster of commodity hardware and
scaling out is just as simple as adding a new node.

Low Latency: Unless you are running a cluster of a trillion data servers (or something
like that, give or take a few million of them), NoSQL can help you achieve extremely lowlatency.
Of course, latency in itself depends on the amount of data that can be
successfully loaded into memory.

Triple stores save data in the
form of subject-predicate-objectwith the predicate being the
linking factor between subject and object. As such, Triple Scores too are
variants of network databases. For instance, let’s say “Jonny Nitro reads Data
Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center
Magazine is the object, and the term ‘reads’ acts as the predicate linking the
subject with the object. Quite obviously, mapping such semantic queries into SQL
will prove difficult, and therefore NoSQL offers a viable alternative. Some of
the major implementations of Triple Stores are Sesame, Jena, Virtuoso,
AllegroGraph, etc.

SQL ideology

Basically, NoSQL drops the traditional SQL
ideology in favor of CAP Theorem or Brewer’s Theorem, formulated by Eric Brewer
in 2000. the theorem talks about three basic principles of Consistency,
Availability and Partition Tolerance (abbreviated as CAP), adding that a
distributed database can at the most satisfy only two of these. NoSQL databases
implement the theorem by employing Eventual Consistency, which is amorerelaxed formof consistency that performs the task over a sufficient period of time.
This in turn improves availability and scalability to a great extent.This paradigmisoftentermedasBASE–implyingBasically Available, Soft state, EventualConsistency.

NoSQL DataModels

Some of the major and most prominent
differentiations among NoSQL databases are as follows:

1.
Document Stores

2.
Hierarchical

3.
Network

4.
Column-oriented

5.
Object-oriented

6.
Key-value Stores

7.
Triple Stores

Document stores

Gone are the days when data organization used
to be as minimal as simple rows and columns. Today, data is more often than not
represented in the form of XML or JSON (we’re talking about the Web,
basically). The reason for favoring XML or JSON isbecauseboth ofthemareextremelyportable,compact and standardized. Bluntly put, it makes little sense to map XML
or JSON documents into a relational model. Instead, a wiser decision would be
to utilize the document stores already available. Why? Again, simply because
NoSQL databasesareschema-less,andthereexistsnopredefinedforanXML or JSON document and as a result, each
document is independent of the other. The database can be employed in CRM,
web-related data, real-time data, etc. Some of the most well known implementation
models are MongoDB, CouchDB and RavenDB. In fact, MongoDB has been used by
websites such as bit.ly and Sourceforge.

Hierarchical Databases

These databases store data in the form of
hierarchical relevance, that is, tree or parent-child relationship. In terms of
relational models, this can be termed as 1:N relationship. Basically,
geospatial databases can be used in a hierarchical form to store location
information which is essentially hierarchical, though algorithms may vary.
Geotagging and geolocation are in vogue of late. It is in such uses that a
geospatial database becomes very relevant, and canbeusedinGeographicalInformationSystem. Major examples of the same include PostGIS, Oracle
Spatial, etc. Also, some of the most well known implementations of hierarchical
databases are the Windows Registry by Microsoft and the IMS Database by IBM.

Graph Network Databases

Graph databases are the most popular form of
network database that are used to store data that can be represented in the
form of a Graph. Basically, data stored by graph databases can growexponentially andthus,graph databasesare idealforstoring data that changes frequently. Cutting the theoretical part,
graph databasehasperhapsthemost awesomeexampleinthelikes of FlockDB, developed by Twitter to
implement a graph of who follows whom. FlockDB uses the Gizzard Framework to
query a database up to 10,000 times per second. A general technique to query a
graph is to begin from an arbitrary or specified start node and follow it by
traversing the graph in a depth-first or breadth-first fashion, as per the
relationships that obey the given criterion. Most graph databases allow the
developer to use simple APIs for accomplishing the task. For instance, you can
make queries such as: “Does Jonny Nitro read Data Center Magazine?” Some of the
most popular graph databases include, apart from FlockDB, HyperGraphDB and
Neo4j.

Column-oriented Databases

Column-orienteddatabasescameinto existenceafter Google’sresearch paper on its BigTable distributed storage system, which is used
internally along with the Google file system. Some ofthe
popular implementations are Hadoop Hbase, Apache Cassandra, HyperTable, etc.

Such databases are implemented more like
three-dimensional arrays, the first dimension being the row identifier, the
second being a combination of column family plus column identifier and the
third being the timestamp. Column-oriented databases are employed by Facebook,
Reddit, Digg, etc.

Object-oriented Databases

Whether or not object-oriented databases are
purely NoSQL databases is debatable, yet they are more often than not
considered to be so because such databases too depart from traditional RDBMS
based data models. Such databases allow the storage of data in the form of
objects, thereby making it highly transparent. Some of the most popular ones
include db4o, NEO, Versant, etc. Object-oriented databases are generally used
in research purposes or web-scale production.

Key-value stores

Key-value stores are (arguably) based on
Amazon’s Dynamo Research Paper and Distributed hash Tables. Such data models areextremelysimplifiedandgenerallycontain onlyonesetof global key value pairs with each value
having a unique key associated to it. The database, therefore, is highly
scalable and does not store data relationally. Some popular implementations
include Project Voldemort (open-sourced by LinkedIn), Redis, Tokyo Cabinet,
etc.

Triple stores

Triple stores save data in the form of
subject-predicate-objectwith the predicate being
the linking factor between subject and object. As such, Triple Scores too are
variants of network databases. For instance, let’s say “Jonny Nitro reads Data
Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center
Magazine is the object, and the term ‘reads’ acts as the predicate linking the
subject with the object. Quite obviously, mapping such semantic queries into SQL
will prove difficult, and therefore NoSQL offers a viable alternative. Some of
the major implementations of Triple Stores are Sesame, Jena, Virtuoso,
AllegroGraph, etc.

Summary

So, what now? Well, you’ve just been
introduced to NoSQL. However, does this mean that you should make the switch to
it from SQL? Perhaps. Or perhaps not. The answer varies from situation to
situation. If you find SQL queries way too much to cope with, chances are
you’ll find NoSQL equally difficult. However, if you’re looking for a more
flexible alternativeanddonotmindgettingyourhandsdirty,youshould definitely give NoSQL a spin! The choice, obviously, is yours!Happy data managing to you!

About the author

Sufyan bin Uzayr is a 20-year old Freelance
writer, graphic artist and photographer based in India. Sufyan has been
extensively involved in the field of graphic
design and web
development, and he
has also developed
apps for the mobile platform.
Currently writing for two print magazines and six blogs, Sufyan is also the
Editor-in-Chief of Brave New World, a contemporary electronic journal. Visit
Sufyan’s website at www.sufyan.co.nr or his e-journal
at www.bravenewworld.in You may also mail him at sufyan@live.in

Share

About the Author

Software Developer's Journal (formerly Software 2.0) is a magazine for professional programmers and developers publishing news from the software world and practical articles presenting very interesting ready programming solutions. To read more