Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

We have over 5TB compressed web server log in a HDFS and we often analyse using Hadoop.

It is painful to run map reduce on 5TB of data and most importantly, not many developer are familiar with it.

I am thinking if we should store the data in columnar database such as Greenplum or other MySQL column store which are aimed to store the analytical data in a efficient manner but able to support raipid query, which is quite important lately.

What are base you would recommend? Anything I should consider before the move? (I will do my own test anyway)

5TB is really small for Hadoop, why is it painful running map reduce on it? Are you familiar with Flume or Hive? What are your pain points?
–
Ali RazeghiJan 14 '13 at 23:28

how many nodes are you using in your Hadoop cluster? I mean, this sort of question needs more information before we can help you here. If you've got like one or two nodes then yeah, 5TB will take a long time.
–
jcolebrand♦Jan 14 '13 at 23:32

1 Answer
1

You can get the free community edition that allows up to 1TB of data. If you normalize your web logs when you load them in, chances are they'd compress down and may fit under 1TB, as Vertica has a pretty powerful data compression engine itself.

If not, I'd still recommend trying out the platform, but the license fee isn't the cheapest thing in the world.

What is the price range of Vertica? We might give it a try if it is not expensive..
–
YogaJan 17 '13 at 9:30

I suggest getting a quote from them... but it's definitely not what I consider "cheap" - probably not as expensive as a large Oracle license, but it's up there.
–
Eugene AbovskyJan 17 '13 at 19:07

Since you mentioned you're currently using Hadoop: Vertica also has a Hadoop/HDFS connector to help with things like ingesting data from Hadoop, which can then provide developers with a standard SQL interface to your data.
–
awgyFeb 13 '13 at 0:25

Either invest in a Vertica database, or for the same budget, buy as much a 10 IBM x3750 nodes + disks.
–
SCOJan 25 '14 at 21:50