How to use Hadoop to mine business value from new types of data

Ari Zilka, CTO, Hortonworks |
Sept. 27, 2013

The emerging data architecture most commonly seen introduces Apache Hadoop to handle these new types of data in an efficient and cost-effective manner.

* Server Log Data. Server log data, which reports EKG-like information on the operations of enterprise networks, often holds the answers to most security breaches. Server logs are the first place the IT team looks when there's a problem with the network. However, the sheer volume of this data makes it difficult and expensive to store and even more difficult to analyze.

When security fails, Hadoop helps enterprises understand and repair the vulnerability quickly and facilitates root cause analysis to create lasting protection. Often, companies don't know of system vulnerabilities until they've already been exploited. So rapid detection, diagnosis and repair of the intrusion are critical.

Hadoop can make forensic analysis faster. If an IT administrator knows that server logs are always flowing into Hadoop, to join other types of data, he can establish standard, recurring processes to flag any abnormalities. He can also prepare and test data exploration queries and transformations, for when he suspects an intrusion.

* Sensor and Location Data. Hadoop solves two big challenges that currently limit the use of sensor data--its volume and its structure. Sensors measure and transmit small bits of data efficiently, but they are always on. As the number of sensors increases and time passes, the data from each sensor can add up to petabytes. Hadoop stores this data more efficiently and economically, turning big sensor data into an asset.

Using specific algorithms that identify previously invisible patterns, Hadoop can also be used for predictive analytics and proactive maintenance. The ability to predict equipment failure is valuable because it's far less expensive to do preventative maintenance than it is to pay for emergency repair or replacement equipment.

Doctors can now track more than 1 billion individual data measurements to diagnose and predict medical episodes with greater precision. Hadoop makes it much easier to refine and explore this data to find the meaningful patterns. Tools can be used to join various data sets together, combine that with data on health outcomes, and then refine it all into a master dataset that includes the important patterns and excludes the trivial ones.

Location data is a sub-variant of sensor data since the device senses its location and transmits data on its latitude and longitude at pre-defined intervals. This is truly a new form of data, since it did not exist (outside of highly specialized military and aerospace applications) until 10 years ago.

Today, smartphones can capture and transmit precise longitude and latitude at regular time intervals--the sensor is connected to the communication network in the same device. Consumer-driven businesses want to use this data to understand where potential customers congregate during certain times of the day. In addition, delivery vehicles use location data to optimize driver routes, improving delivery times, lowering fuel costs and reducing the risk of accidents.