Data management in the Hadoop ecosystem is still in the early stages of development. The goal of cheaper and more effective ways of collecting, storing, processing and distributing structured and unstructured data (as well as internal and external data sources) has been impeded by complexity, lack of qualified professionals and difficulty in managing data.

Data movement and management in Hadoop is challenging. It includes data motion, process orchestration, lifecycle management and data discovery. The trick to simplifying data management in Hadoop is to process data in a decentralized fashion by pushing complexity into the platform - enabling data engineers to focus on the processing / business logic.

Falcon allows users to on-board data sets with a complete understanding of how, when and where their data is managed across its lifecycle. It uses Apache Oozie for coordinating workflows. Workflow templates are used for data management. Falcon provides open APIs that enable those workflows to be orchestrated more broadly to allow integration between data warehouse systems (e.g., orchestrate data lifecycle workflows within Hadoop as well as with a Teradata system).

In this video, Intel Fellow Eric Dishman shares his personal story of how his 25 year struggle with kidney cancer was finally resolved through Big Data and personalized medicine. Dishman's doctors were able to treat him successfully after sequencing his complete genome.

For personalized medicine, only 50,000 people on earth have had their entire genome sequenced.

In order for data driven marketing to transform customer experiences, to help marketers do predictive modeling, and to respond to events in real time, analytics must run wide and deep throughout an organization.