What is Flume?

Flume is an open source software program developed by Cloudera that acts as a
service for aggregating and moving large amounts of data around a Hadoop cluster
as the data is produced or shortly thereafter. Its primary use case is the gathering of log files from all the machines in a cluster to persist them in a centralized store such as HDFS.
In Flume, we create data flows by building up chains of logical nodes and
connecting them to sources and sinks. For example, say we wish to move data from
an Apache access log into HDFS. You create a source by tailing access log and use a logical node to route this to an HDFS sink.

Comments or Responses

Flume is a open source project from Apache. Most frequently used to dump the data from log generated resources like social media, webservers. Flume extract the log-file as a event. Those events asynchronously dump from resources to hdfs via Flume. Flume has inbuilt functionality for single point of failure. Most frequently its recommended to unstructured and semi structured files. Where as structured data like RDBMS use SQOOP. Resources: http://www.bigdataanalyst.in/flume-interview-questions/