I've lived around databases all my life, 21st century is challenging for them: big data, throughput, complexity, virtualization, global distribution - it's all scalability.
I'm the founder and CTO of ScaleBase, solving this problem is a workoholic's heaven, so I'm having great time!
My agenda is to stay technical, no marketing and sales BS, give my summarized set of views and opinions to urgent topics, events and latest news in database scalability.

Tuesday, July 10, 2012

So now Hadoop's days are numbered?

I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is this: Hadoop is a foundation, low-level platform. I used the word "platform" just because of a lack of a better word. Wait there is a great word that captures it all!

This word is Assembler.

When computers begun 70 years ago or so, Assembly is the mother of all programming languages, Assembler made it work in real world computers, silicone and copper. In the world of Big Data, map-reduce, massive distribution and parallelism is the mother of all living things (Assembly). And Hadoop enables it to actually run in the real world (Assembler)...

Like Assembler, Hadoop core is far from being really usable. Doing something real, good, working, repeatable with it requires skills that only a few people can really master (Like good Assembler programmers, back in 1960's).

While I consider myself lucky to have the chance to actually punch cards with brilliant(?) Assembler code, many of today's brightest minds in Silicone Valleys around the world never wrote one opcode. They're all using PHP, Ruby, Java and node.js, which are great "wrappers" around good old Assembly to bring programming, innovation, disruptiveness - to the masses, make the whole world a better place. It's how it should be.

Hadoop will die only if data and big data dies. Nonsense. Data is by far the most important asset organizations have. Facebook as well as Bank Of America will be worth a fraction of their value in minutes if they loose the same fraction of their data. Both won't be able to compete if they can't be intelligent and analyze their data that multiplies every (low number) days/weeks/months. The data makes a business intelligent and Hadoop helps exactly there.

Hadoop is the Assembler of all analytical big data processing, ETL and queries. The potential around it and its ecosystem is literally unlimited, tons of innovation and disruptiveness are poured by startups and communities all over, like Splunk, HBase, Cloudera, Hive, Hadapt, and many many more. And we're just in the "FORTRAN" phase...

Love the analogy, Doron. When I started in this business, the moldy old guys were bragging about how they used to string wires between boards to make things work. To them, assembler was a mamby-pamby tool for wimps who couldn't handle real computing!

Since I was an economics major and feeling a bit inferior about my technical background, I decided to read the S/370 Principles of Operation in detail. Best thing I ever did. A low-level understanding of how the machine itself actually works is indispensable in this business. It certainly served me well at Intel, where the machine's internal architecture isn't nearly as straightforward as that of the S/370!

But it isn't about bragging rights and how much techno-detail one can master, is it? The fundamental barrier we all face, in life and technology, is TIME. It's incompressible, and it's inexorable.

If we can find ways to productively use up vast amounts of excess raw computational power to save time, it's worth it. That's what PHP and all the rest are all about.

An observation: all of the computational power in the world amounts to a hill of beans when it comes to bandwidth and latency (at the limit of compressibility). Since we now know that neutrinos actually do follow the laws of general relativity, we're back to the limitations imposed by the speed of light on that score.

To PHP on a MacBook is much simpler and graspable than to FORTRAN on punchcards (or on wires)! It requires less training so many more can actually be part of "computing" and its innovation, from JCL and batch calculations, all the way to PC, iPad, Amazon, Facebook, Angry Brids!!If we take the above to the extreme, we can bring Einstein once again with his famous quote: "Two Things Are Infinite: the Universe and Human Stupidity"... :)

Follow by Email

Subscribe now

Total Pageviews

Share

Google+ Followers

About Me

Technology leader, data and databases expert, hand on system
architect, senior consultant and project manager, with great experience in
understanding various aspects of organizations, distributed applications, and
integration of various technologies, hardware and software solutions.

I'm the founder and CTO of ScaleBase, a venture backed startup company building a next-generation distributed database engine based on standard MySQL databases to bring true cloud elasticity and scale-out capabilities to standard relational databases

An Oracle DBA since 1997, have administered database versions from Oracle7 till Oracle11g. My experience includes a senior and leading DBA in large organizations in the government and hi-tech industries, administering complex databases serving critical applications and data warehouses with large volumes and 24X7 availability, and with integration of almost every feature and product in the Oracle database offering.

An enterprise application software architect since 2001, specializes in the Java/JEE technology stack, with specific focus on Oracle middleware offering – from Oracle9iAS, OC4J and nowadays – the WebLogic platform.