ETL for Analytics

Extraction, Transformation and Loading (ETL) processes are critical components for feeding a data warehouse, a business intelligence system, or a big data platform.

While mostly invisible to users of a business intelligence platform, an ETL process retrieves data from operational systems and pre-processes it for further analysis by reporting and analytics tools.

The accuracy and timeliness of the entire business intelligence platform rely on ETL processes, specifically:

Extraction Extraction of the data from production applications and databases (ERP, CRM, RDBMS, files, etc.)

Transformation Transformation of this data to reconcile it across source systems, perform calculations or string parsing, enrich it with external lookup information, and also match the format required by the target system (third normal form, star schema, slowly changing dimensions, etc.)

Obstacles

Managing Diverse and Fast-Changing Data

There are numerous challenges to implementing efficient and reliable ETL processes.

Data volumes are growing exponentially. With the rise of big data, ETL processes have to process large amounts of structured and unstructured data, such as call detail records, banking transactions, weblog files, social media files, etc. Some business intelligence (BI) systems merely get incrementally updated, whereas others require a complete reload at each iteration.

Data velocity is moving faster from batch processing to real-time. Information needs to be distributed to all connected systems to enable real-time business insight and avoid multiple versions of the truth. As business intelligence analysis tends toward real-time, data warehouses and data marts need to be refreshed more often and the load time windows have shrunk.

Business intelligence structures and applications include big data platforms, data warehouses, data marts, and OLAP applications for analysis, reporting, dashboarding, scorecarding, etc. All these target structures have different data transformation requirements and different tolerances in terms of latency.

Transformations involved in ETL processes can be highly complex. Data needs to be aggregated, parsed, computed, statistically processed, etc. BI-specific transformations are also required, such as slowly changing dimensions. Primary keys are some of the most important attributes in relational databases as they tie everything together. Quite often data integration projects deal with multiple data sources and therefore need to address the issue of having multiple keys in order to make any meaningful sense of the combined data.

Solution

Talend ETL for Analytics

Talend's Big Data and Data Management solutions are optimized for enterprise-grade ETL, for big data and small. The following features are especially critical to the design, development, execution and maintenance of data integration and ETL processes:

A highly scalable and fast execution open source platform Leveraging a grid of commodity hardware, and the only solution to support the dual ETL and ELT architecture.

Broad data integration connectivity Support all systems so you can access all production data.

Talend Data Integration

Talend provides an extensible and highly-scalable set of data integration tools to access, transform and migrate data from any business system. With support for over 800 types of data sources, Talend simplifies your data ETL needs.