More than 50% of the currently generated digital information is totally unstructured, this means it not found in databases, log files, Excel, XML or JSON files. Office and PDF documents, conversations, images, audio, video, email, zip files and many other types of information may contain relevant information that common structured or semi-structured analysis technologies cannot treat.

The Apache Hadoop Platform represents one of the biggest changes in the paradigm of mass data storage and processing. Solutions like Data Warehouses DWH or Network Attached Storages SAN, have fallen short to solve the problem of massive growth and diversification of information.