Talend Resources

Hadoop Streaming

Building Hadoop Streaming Into Your Data Integration Architecture

It's been observed that big data is distinguished by its volume, variety, and velocity. In Hadoop streaming, volume and velocity converge as big data moves continuously into a Hadoop cluster in support of real-time analytics. Talend, the leading provider of open source big data integration solutions, makes it easy to incorporate Hadoop streaming processes into your enterprise data management architecture.

Hadoop Streaming without Hadoop Coding

Talend Open Studio for Big Data is a powerful open source data integration solution that natively supports Apache Hadoop. The studio's Eclipse-based graphical development lets you quickly and efficiently design and implement Hadoop-based big data transfer and transformation processes, without having to learn or write Hadoop code. Hadoop application technologies like HDFS, HBase, Hive, Pig, and Sqoop are abstracted as graphical components that you can drag onto a central workspace and configure, while the Talend development environment automatically generates the underlying Hadoop code and commands.

In this intuitive development console you can build batch mode or Hadoop streaming flows from source data systems into Hadoop Distributed File System (HDFS), or extractions from HDFS to destination systems (again in batch or Hadoop streaming mode). Talend also enables you to perform Hadoop Pig-based transformations on your data while it's in HDFS, without having to learn or write Pig Latin code.

Along with a broad range of big data and Hadoop components like HDFS, Pig, and Sqoop, Talend Open Studio for Big Data supports integration and conversion of all kinds of data, ranging from file formats like XML, Excel, and EDIFACT, to major proprietary and open source relational database systems, to ERP systems like SAP, to SaaS applications like Salesforce. With Talend, you can manage your Hadoop streaming operations in the context of your enterprise-wide data flows, all from the same graphical console.

Hadoop Streaming for All Distributions

Talend Open Studio for Big Data lets you efficiently create Hadoop streaming operations for any major Hadoop distribution including Apache, Cloudera, Hortonworks, and MapR. With a MapR Hadoop distribution, Talend also supports MapR's Direct Access NFS, a powerful Hadoop tool that enables high throughput Hadoop streaming to HDFS.

While support for MapR NFS is available in the free, open source Talend Open Studio for Big Data, a particularly promising combination is to use NFS-based Hadoop streaming in combination with the change data capture functionality found in the subscription-based Talend Platform for Big Data. For organizations whose big data strategies evolve toward enterprise-wide big data management and quality control, you can migrate seamlessly from Talend Open Studio for Big Data to Talend Platform for Big Data.