A casino in Las Vegas or Macau would consider HD surveillance video files to be big data.

A hospital might consider data generated by patient sensors to be big data.

A company interested in determining customer sentiment via analysis might view petabytes of Twitter data to be big data.

Regardless of the industry, the big data in question has monetary value to the company. It either represents a valuable asset of the company or a way to prevent loss of other valuable assets. “Big data” doesn’t have to equate to “big files”. Big data could comprise small files generated so rapidly that it would saturate traditional data stores like relational databases. Hence a new way is needed to collect, analyze and act upon this big data.

What is Hadoop and where does Hadoop fit with “big data”?

Hadoop is an operating system that allows parallel processing of large amounts of data stored on clusters of commodity servers with locally attached disk. In this sense Hadoop is anathema to the big storage vendors who built billion dollar businesses selling the idea that storage has to be centralized and consolidated, has to be NAS or SAN attached so that it may be backed up and replicated. Hadoop runs counter to that argument by relying on direct attached storage available on commodity x86 servers.

As long as the data is in the gigabytes range and is structured data, RDBMS perform exceptionally well. However if your organization needs to conform to the Federal Rules of Civil Procedure (FRCP) you may want to archive email and email attachments going back several years. Now you enter the realm of unstructured or semi structured data for storing which you might find RDBMS unsuitable. This is where Hadoop comes in. Being open-source there is lower acquisition cost (I won’t say no acquisition cost as you still need to identify servers, direct-attached-storage, in-house Hadoop expertise) and you have the ability to scale-out as opposed to just scale-up which is what RDBMS do.

My IT forensics guy tells me that the existing Security Information & Event Management (SIEM) system is enough and we don’t need Hadoop!

The short answer is that you need both. If your goal is to comply with regulations like PCI, Sarbanes-Oxley Act (SOX) and your requirement is to collect, search (in a structured manner) and analyze logs from servers, firewalls, routers in your network then a SIEM is the right tool to use. Since it uses a back-end relational database you don’t want to use your SIEM to store petabytes of data.

However if your goal is to store a year or more worth of email and email attachments, badge access logs, surveillance videos then you can’t realistically use a SIEM for this. This is where Hadoop shines. Though Hadoop works well with all types of files (structured and unstructured) it is really designed to handle large unstructured files.

Yes Big Data is a relative term. But beyond it’s value I’d say it’s is not the regular data the organization is used to work with but the additional data they will have to cope with.
Also it maybe be time to find another term like Hadoop for Big Data or it will never be in the mainstream; Hadoop developments are way too complex, costly & lengthy for the vast majority of organizations.
As far as RDBMS & SIEMs are concerned since the latter is based on the former they suffer from the same inherent inabilities to deal with large volumes of unstructured & semi-structured data.
As far as turning Hadoop in a pseudo SIEM why not but for the same Fortune 500 crowd. But there are commercial ready to use flat file tools out there like Secnology. And because we are talking Big Data don’t forget to hire a Data Expert, as there is no “Magic Software” yet.