Big data is one of the most significant industry disruptors in IT today. Although still in its early stages of maturity, big data has shown significant ROI and has varied uses across every section of the industry. Open source data crunching platforms like Hadoop and NoSQL have enabled us to analyze larger volumes of disparate data. Although these technologies are hotter than an Internet IPO, you simply can’t ignore your current investments – those investments in SQL, ETL platforms, OLAP/OLTP systems, etc. – that drive everything from your data warehouse to your ERP, CRM, SCM, HCM and custom applications.

Tackle the big data integration challenge.

The challenges of data integration across various data platforms are varied and unique. Data integration at its core solves the issues of bulk data movement, replication, synchronization, profiling, transformation, data quality and data services. These capabilities serve as a key technology component for moving data between data warehousing, business analytics, master data management, enterprise applications and custom applications. Now, with big data and the need to combine these data sources with unstructured data sets, there is more that has to be moved. In fact, it’s not just the scale, but it’s also the complexity of new technologies. Data integration tools must advance to support these varied data sources and interoperate with technology offerings like Hadoop, MapReduce and NoSQL databases. Bottom line: today’s data integration tools need to have characteristics like unified tooling across the enterprise, integrating big data and having the capability to perform real-time analytics.

Integrate more with Informatica BDE.

Here comes Informatica with its latest version of a big data edition, commonly known as Informatica PowerCenter Big Data Edition (BDE). Informatica provides a safe and efficient way to integrate all types of data on Hadoop without having to learn Hadoop or MapReduce. Informatica BDE provides universal data access through its built-in data adapters and connectors. With it, you can access not only transactional and operational data, but also access log files, social data, machine data and sensor data. It provides an extensive library of prebuilt transformation for data integration capabilities that run natively on Hadoop. As a result, your staff with Informatica skill sets can rapidly develop data pipelines using a visual development environment that increases productivity over hand coding.

One of the key features of Informatica BDE is complex data parsing on Hadoop. With parser transformation, you can access and parse complex, multi-structured and unstructured data such as Web logs, JSON, XML or machine device data. There are also prebuilt parsers for market data and industry standards such as SWIFT, ACORD, HIPAA, etc.

See the proof in the big pharma PoC.

I recently had the opportunity to work on a big data proof of concept (PoC) in the pharmaceutical industry at the BIO IT World Conference in Boston. In the PoC, we built a data lake on an EMC Isilon and Hadoop platform with clinical study data from the National Institutes of Health (NIH) that included adverse event data from the Food and Drug Administration (FDA) and a wealth of news articles from various websites. We leveraged Informatica BDE to acquire and ingest the data into Hadoop in order to construct our data lake. We delivered actionable intelligence, such as displaying the primary drug suspected of causing an adverse reaction, along with new research recommendations.

Informatica BDE provided us a seamless integration experience across these various data sources. We ingested data from transactional systems, flat files, Twitter, Web logs, etc. We leveraged Informatica BDE capabilities to create data pipelines across these data sources, write business logic and simultaneously utilize Hadoop’s power by running these mappings in native Hadoop clusters as MapReduce programs, and store and analyze the data through HDFS and Hive.

Informatica has always been at the top of the data integration tools list. But now, with its BDE, I think it is finally a complete data integration tool.