Blogs

A First-hand Account of the Big SQL Technology Preview

One of the recurring themes at yesterday’s “Big Data at the Speed of Business” launch was comsumability, which is just a fancy word for ease of use. Let’s face it, Hadoop can be hard; big data can be complicated, and there’s certainly a learning curve involved in being able to leverage most big data solutions. That’s why I am particularly excited about one of the announcements from yesterday, which is a new feature in the upcoming release of InfoSphere BigInsights 2.1: Big SQL.

While BigInsights 2.1 isn’t yet available, Big SQL is. IBM is running a Big SQL technology preview where you can get your hands on Big SQL and start playing with it. I recently joined, myself, so I could experience it firsthand, and it was very easy to get started. A short welcome video featuring Leon Katsnelson walked me through what to expect and how to get started, and pointed out the many resources available to help along the way.

Because this is a technology preview, the team intends to update both the code and the program frequently, and that means that those who participate get to interact directly with the Big SQL development team and provide input that will help shape the technology. You could potentially share feedback and a couple of weeks later see your suggestion already implemented.

So what is Big SQL exactly?

As the name suggests, it’s SQL access to data in Hadoop, or in this case, BigInsights, our Hadoop-based big data platform. This new technology leverages ANSI SQL to gain access to data across any system, whether it’s Hadoop or a data warehouse, via JDBC or ODBC. Why is this important? Because it opens up the world of Hadoop to developers that are already familiar with the SQL programming language. You can jump into BigInsights without having to learn new skills and languages. Also, Cognos is certified for Big SQL, which means you can now use Cognos BI tools to access data in BigInsights using Big SQL.

Legacy applications depend on SQL to access stored data, and SQL is the de-facto language used to query structured data. In fact, some people choose to work with Hive even when it’s not the best storage format, only because it has a familiar SQL interface. With Big SQL, now all of your data is SQL accessible. It gives you a structured view of your data and allows you to choose the storage format best suited for your application. You can leverage MapReduce parallelism when needed for complex data sets and avoid it when it would hinder, using direct access for smaller, low-latency queries.

Big SQL Usage Patterns

There are different types of queries you will encounter with Big SQL.

“Point queries” – These are queries that need to return very fast, like HBase queries, for example. In these types of queries, you cannot use MapReduce.

Big ad-hoc queries – These larger, more complex queries tend to require MapReduce parallelism to be able to break down these massive data sets.

Standards-compliant via JDBC – This is how most applications access databases and in this usage pattern, you can use the same to access your Hadoop-based data store.

Standards-compliant via ODBC – For non-Java developers, this usage pattern allows you to access database with specific products or tooling that only use ODBC.

It’s a small big data world after all

While I myself am new to the world of big data, having taken on the Product Marketing role for BigInsights and Streams last summer, one of the things that has fascinated me most is watching the growth of what I refer to as the big data club. Everybody wants to be a member and yet very few are equipped to actually work in it. I don’t mean this as a slam on anybody. It’s more a reflection of the ever-present skills shortage outcry that I hear.

Yet as the perception of big data broadens, people beyond the data scientists are going to get more involved with big data. That’s why technologies like Big SQL are exciting to me: it allows new players like DB admins and data management professionals to come into the fray and suddenly become BigInsights experts. Big SQL not only builds a bridge between data management and big data, but is an indication that they are in fact part of the same big data world.