Pages

Sunday, March 29, 2015

Tabula Rosa Systems Blog Of The Day - We Think We Know What A Data Base Is - But Perhaps We Do Not!

By
now, most of us understand that the Internet of Things and BigData will
change our cyber world dramatically and exponentially in the next few
years. Many of the foundations of The Internet and its components will
also undergo dramatic changes.

The article below present some interesting thoughts about our data bases now the the next generation of databases.

============================================

March 19, 2015

by James
Kobielus

Big Data
Evangelist, IBM

We all think we know what a database
is—but do we really?

Most of us were schooled in database
fundamentals that were grounded in the basic relational principles laid down by
IBM-er Edgar F. Codd more
than 40 years ago. Relational databases layer features such as
transactionality, replication, indexing, caching and materialized views which
we’ve come to associate with enterprise database management systems.
Nevertheless, at the core of the discipline remains the 12 rules that Codd
specified many long years ago.

At heart, a database remains an
organized collection of data that represents a state of affairs in the subject
domain as implemented through a consistent schematic model. Another way of
expressing this is the notion that a database is a “global, shared, mutable
state,” as discussed in this thought-provoking recent post by Martin Kleppmann. This
definition combines four concepts, discussed here in reverse order of their
sequence in that phrase:

·State: A database is a stateful resource. This pivots on the
concept of “state,” which refers to an identity and a value associated with particular
logical entity at a particular point in time, and for which the accuracy,
consistency and integrity are ensured through transactional controls such as ACID.

·Mutable: A database is a mutable resource. This means that the
values of the entities described in the underlying schematic model may change
over time, but the identities of those entities remain constant over time.

·Shared: A database is a shared resource. This means that it the
data is addressable and accessible via primary and secondary keys.

·Global: A database is a global resource. This means that all
information in the database implements a consistent schematic model which may
change over time.

Most of these fundamentals haven’t
changed in the intervening decades, though NoSQL databases with their emphasis
on “eventually consistency” have pushed the transactionality bar into looser, less ACID-ic territory.
With that in mind, I took great interest in Kleppmann’s discussion of “turning
the database inside-out,” specifically with regard to his vision of the
evolving database as an “always-growing collection of immutable facts.”

That sounds suspiciously like the data
warehousing notion of a “single version of the truth,” but it hints at an
architecture that might be applicable across a wider range of data management
use cases. As described by Kleppmann, his next-generation database architecture
involves taking “streams of facts as they come in, and functionally process
them in real-time.” Considering that this hints at a grand synthesis of at-rest
databases and in-motion data-stream processing, I paid even closer attention.

More concretely, Kleppmann describes
this vision as a “distributed, durable commit log” which spans two or more
streams and which constitutes the entire virtual data collection accessible to
applications. In other words, it’s a distributed event log in which the
“immutable facts” are tagged by time and which, like any commit log, can be
rolled back to reveal the entire state of affairs being represented in the
distributed data collection at any point in the past.

The more I thought about this concept,
the more I realized that it’s essentially what underlies the established notion
of “temporal databases.” Essentially,
temporal databases are audit logs for time-series discovery, which persists two
(in Kleppmann’s words) “immutable facts” for any given state: the actual value
back then, as it was known back then (transaction time), and the actual value
back then, as it is known now (valid time). None of these concepts are new,
and, in fact, were
incorporated by the database industry into the core SQL standard
(which, of course, derives from Codd’s groundbreaking work) in the early 1990s.

With all of that in mind, I wondered if
Kleppmann was saying anything radically new or simply rephrasing temporal
database concepts and applying them in a new application context. In his case,
that new context is the NoSQL distributed stream-processing framework called Apache Samza.

As I thought about it, I realized that
what Kleppmann was proposing is a fundamental new framework that, though it obliquely
draws on temporal database computing principles, goes well beyond databases as
we know them, into the new world where the practical distinctions between data
at rest and data in motion are dissolving. It’s all about temporal stream
computing. It’s geared to a world, such as the Internet of Things (IoT), in
which the data collections are mutably dynamic collections of time-stamped
event-log messages whose immutability (data integrity) is enforced at the
message level, so that any views of the distributed logs can always be rolled
back to a single “transaction time” version of truth as known at any point in
the past. Check out my post on how distributed event-logging infrastructure is the principal database
abstraction in the IoT.

Kleppmann pretty much says all of this,
though not in so many words. By the notion of “turning a database inside-out,”
he speaks of the notion of a “replication stream” not being peripheral to the
core concept of a database (per Codd’s framework), but of being central: as a
transaction log and event stream. “Now,” under his conception, “each write is
just an immutable event that you can append to the end of the transaction log.
The transaction log is a really simple, append-only data structure.”

A key concept of his that I hadn’t
previously considered is the central importance of materialized views in this
new streaming big data world of distributed event logs. In a world where data
at rest may become a quaint, outmoded concept, materialized views become the
core access abstraction. As Kleppmann notes, “A materialized view is just
a cached subset of the log, and you could rebuild it from the log at any
time. There could be many different materialized views onto the same data: a key-value
store, a full-text search index, a graph index, an analytics system and so on.”

That’s a powerful concept. As it gains
traction in real-world IoT and other stream-centric data environments, it could
pave the way for a world where databases in the traditional sense occupy a less
pivotal role in the end-to-end architecture. You won’t need to reconsider your
commitment to relational, columnar or any other established database technology
any time soon, as all of it plus Hadoop, NoSQL, in-memory and other newer
architectures will play together very nicely within hybrid virtualized
environments.

But the IoT’s advance will almost
certainly put distributed stream-computing platforms, such as IBM InfoSphere Streams, in the driver’s seat of database
evolution in the decades to come.

In addition to this blog, Netiquette IQ has a website with great
assets which are being added to on a regular basis. I have authored the
premiere book on Netiquette, “Netiquette IQ - A Comprehensive Guide to Improve,
Enhance and Add Power to Your Email". My new book, “You’re Hired! Super Charge
Your Email Skills in 60 Minutes. . . And Get That Job!” will be published soon
follow by a trilogy of books on Netiquette for young people. You can view my
profile, reviews of the book and content excerpts at:

If you would like to listen
to experts in all aspects of Netiquette and communication, try my radio show on
BlogtalkRadio
Additionally, I provide content for an online newsletter via paper.li. I
have also established Netiquette discussion groups with Linkedin and Yahoo. I am
also a member of the International Business Etiquette and Protocol Group and
Minding Manners among others. Further, I regularly consult for the Gerson
Lehrman Group, a worldwide network of subject matter experts and have been a
contributor to numerous blogs and publications.

Lastly, I
am the founder and president of Tabula
Rosa Systems, a company that provides “best of breed” products for network,
security and system management and services. Tabula Rosa has a new blog and Twitter site which offers great IT
product information for virtually anyone.

No comments:

Post a Comment

About Us

Tabula Rosa Systems (TRS) is dedicated to providing Best of Breed Technology and Best of Class Professional Services to our Clients. We have a portfolio of products which we have selected for their capabilities, viability and value.