TailSource is no longer part of Flume. Using the TailSource you can tail any file on the system and for each line it can create flume events.

In case of channels and sinks, events are added and removed from the channel, will be a part of transaction. However, when you tail the file, there is no way, that it could be part of a transaction.

Suppose, because of any reason for instance channel fails, then there is no possibility to rollback this tailed transaction, to put back the data.

Let’s have an example, if you are tailing a file

/user/hadoopexam/access.log

And in the log4j you had done the configuration to rotate or rename the file if it reaches the 1 MB in size and renaming will be done as below.

/user/hadoopexam/access.log1

And assume Flume was reading a file access.log which is renamed to access.log1, however, it has file handler with it so it is still able to read it. But at the same time assume the new log file is also renamed as below

/user/hadoopexam/access.log2

Now, Flume is done with the access.log1, and it will start reading the file access.log and it is unaware that there is another file access.log2 was created and that log would be missed by the Apache Flume for reading.

So, you might have noticed that using the TailSource there are chances that data could be lost, that is the second reason why TailSource was discontinued after 0.9 flume release.

Testimonials

"I wish I had bought your exam prep during my first attempt. More than anything else your tests made me feel confident to crack the CCD-410 exam. It would not have been possible without you. My best wishes to your team" Read More