PIG

Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and NoSQL DB. It was nice to see a good size crowd sacrificing their Saturday evening to attend this informative session. It was more surprising to see many of them were actually users of these technology.

We (at myBantu / GloMantra ) are using MongoDb which is a document orient database. We store XML document (actually when store it is BSON in MongoDB) and queries use Scripting language for conditions. Other alternative in this class is CouchDB which is more Web-like and gives REST based access. Other famous Non-Relational (popularly called as NoSQL) we of course Hadoop and Cassandra. Both are apache projects with few very good show case implementations. However, recently when Diggs had problem and was using Cassandra, it got a bad name which is not that accurate. Anyway, Hadoop and its database called HBase are making more buzz. It was interesting news when Facebook also moved their messaging system from Cassandra to HBase. Its interesting especially because Cassandra originally came from engineers at Facebook. They used in their InBox search. There is some interesting work on Hadoop is happening in Facebook. They are the original contributors of Hive which is a data manipulation add of targeted towards implementing warehousing on top of Hadoop. While MapReduce databases created a lot of buzz around NoSQL, it is interesting that Hive and Hbase are SQL. so, when folks say NoSQL, it is actually non-Relational databases. Another warehousing related add-on to Hadoop is Pig (Apache Pig) which has originally coming out of Yahoo.

Anyway, its interestingly rapid development happening in this space and the major drive is due to the huge user generated data being handled in the social networking giants like Facebook, Zynga, LinkedIn,.. but the original credit to this concept of Big Table goes to Google from where the Map Reduce database was introduced. The space is not getting its own eco-system developed!