This PR adds a few tables to the User Guide that describe the metrics
published by sorurces, sinks and channels.
I used simple unix tools to gather the data then I wrote a small utility to
convert it to csv.
Then I used an online converter https://www.tablesgenerator.com/ to generate
the rst tables and then a little manual editing.
I discovered some rst formatting problems in the FlumeUserGuide.rst,
corrected them, too.
It was rather painful process to gather the data and find a decent
representation.
So far this PR only contains the end result. I would be happy to share the
utilities, just don't know what would be the best way.

The default hdfs.callTimeout used by the HDFS sink was too low only 10 seconds
that can cause problems on a busy system.
The new default is 30 sec.
I think this parameter should be deprecated and some new more error tolerant
solution should be used. To enable the future change I indicated this in the
code and in the Users Guide.
Tested only with the unit tests.

TaildirSource.process() implements the correct polling logic now. It returns
Status.READY / Status.BACKOFF which controls the common backoff sleeping
mechanism implemented in PollableSourceRunner.PollingRunner (instead of
always returning Status.READY and sleeping inside the method which was
an incorrect behaviour).

If there are multiple files in the path(s) that need to be tailed and there
is a file written by high frequency, then Taildir can read the batchSize size
events from that file every time. This can lead to an endless loop and Taildir
will only read data from the busy file, while other files will not be
processed.
Another problem is that in this case TaildirSource will be unresponsive to
stop requests too.

This commit handles this situation by introducing a new config property called
maxBatchCount. It controls the number of batches being read consecutively
from the same file. After reading maxBatchCount rounds from a file, Taildir
will switch to another file / will have a break in the processing.

Hadoop 1/2 profiles were obsolete and had not been used for a long time,
so they have been deleted.
HBase profile was always active, so its content has been moved to
top level.
Additional clean-ups: some version declarations moved to the parent pom,
redundant version declarations and exclusions deleted.

This has been tested with unit tests. The main difference that caused the most
problems is the consumer.poll(Duration) change. This does not block even when
it fetches meta data whereas the previous poll(long timeout) blocked
indefinitely for meta data fetching.
This has resulted in many test timing issues. I tried to do minimal changes at
the tests, just enough to make them pass.

Kafka 2.0 requires a higher version for slf4j, I had to update it to 1.7.25.

Option migrateZookeeperOffsets is deprecated in this PR.
This will allow us to get rid of Kafka server libraries in Flume.

Compatibility testing.
Modified the TestUtil to be able to use external servers. This way I could test
against a variety of Kafka Server versions using the normal unit tests.
Channel tests using 2.0.1 client:
Kafka_2.11_0.9.0.0 Not compatible
Kafka_2.11_0.10.0.0 Not compatible
Kafka_2.11_0.10.1.0 passed with TestPartition timeouts
(rerunning the single test passes so it is a tes isolation issue)
Kafka_2.11_0.10.2.0 passed with TestPartition timeouts
(rerunning the single test passes so it is a tes isolation issue)
Kafka_2.11-0.11.0.3 - timeouts in TestPartitions when creating topics
Kafka_2.11-1.0.2 - passed
Kafka_2.11-1.1.1 - passed
Kafka_2.11-2.0.1 - passed

This is based on the contributions for FLUME-2653 regarding a new feature
for the hdfs sink.
Added a new parameter hdfs.emptyInUseSuffix to allow the output file name
to remain unchanged. See the user guide changes for details.
This is desired feature from the community.

I added a new junit test case for testing.
Temporarily modified old test cases in my ide to use the new flag, and
they passed. I did this just as one of test, to be on the safe side.
It is not in this PR.

In the newer version of the Syslog message format (RFC-5424) the hostname
is not a mandatory header anymore so the Syslog client might not send it.
On the Flume side it would be a useful information that could be used
in interceptors or for event routing.
To keep this information, two new properties have been added to the Syslog
sources: clientIPHeader and clientHostnameHeader.
Flume users can define custom event header names through these parameters
for storing the IP address / hostname of the Syslog client in the Flume
event as headers.
The IP address / hostname are retrieved from the underlying network sockets,
not from the Syslog message.

This change is based on the patch submitted by Jinjiang Ling which has been
rebased onto the current trunk and the review comments have been implemented.

Kafka client does not handle -D keystore parameters directly so Flume has to
pass them explicitly in Kafka properties (like ssl.keystore.location, etc).
Also using the same method for the truststore (in order to handle
keystore/truststore in the same way).

This is based on @mcsanady 's original pull request #202
I took the test changes from him but reworked the new feature implementation
since it failed some unit tests.
Previously when a close failed we immediately did a recover lease.
This PR introduces a background retry mechanism. It uses the already
existing "hdfs.closeTries" parameter. Unfortunately it has infinite retries
by default, that seems a bit too long for me.

I also did a minimal code clean up. The most important is that
HDFSWriter writer in BucketWriter became final. This is essential for later use
in inner classes. Only some testing solutions made it not final.
I reworked those to use the constructor.

It makes possible to specify global/common SSL keystore parameters (path,
password and type) at Flume agent (process) level for all sources/sinks.
In this way, it is not necessary to define (=copy) the SSL config for each
component in the agent config.

The global SSL parameters can be specified through the standard -D JSSE
system properties or in environment variables.
Component level configuration is still possible.

This PR is based on Yan Jian's fix and his test improvements.
Also contains the deadlock reproduction contributed by @adenes.
I have made minimal changes to those contributions.
Denes's test was used for checking the fix.
Yan's fix contains an optimization as it first calls the callback function
that removes the BucketWriter from the cache.
This is useful, should help to avoid some errors.

The loadSources() method seemed like an appropriate place to check this.
Added 2 new interfaces for getting the transaction capacity and the batch size
fields. The check is only done for channels that implement the
TransactioCapacitySupported interface and sources and sinks that implement
the BatchSizeSupported interface.

By introducing error counters it will be easier to monitor problems.
Also errors are categorized, hopefully this will help setting up better
monitoring solutions.

Concept: an error is when an Exception is thrown or an ERROR level log is
written during event processing. In case of an error at least 1 error counter
is increased at least once. (Preferably 1 counter once).
Errors during event processing are counted.
Initialization errors are not handled here.
3 types of errors are differentiated.
- Channel read/write errors from the channel when the channel
throws a ChannelException.
- Event read/write errors. E.g: A source cannot read an event due to
- Generic errors - e.g.: TaildirSource cannot write position file.

HBase2Sink is the equivalent of HBaseSink for HBase version 2.
HBaseSink used some API calls which were deprecated in HBase 1.x
and they are not available in HBase 2.x any more.

HBase2Sink has been implemented by copying the existing
flume-ng-hbase-sink module to the new flume-ng-hbase2-sink module,
then adjusting the incompatible API calls to HBase 2.
The package and class names have also been modified to have
the hbase2/HBase2 tag. "Hbase" typos have been fixed too.

The functionality provided by HBase2Sink and the configuration parameters
are the same as in case of HBaseSink (except the hbase2 tag in the sink type
and the package/class names).

HBaseSink has not been modified, so it works with HBase 1.x as before.

FLUME-3222 Fix for NoSuchFileException thrown when files are being deleted
from the TAILDIR source

We fetch file names from a directory and later we fetch inodes.
If there is a delete between these operations this problem occurs.
Reproduced from unit test.
Added exception handling to handle this case.
It is enough to ignore the NoSuchFileException and continue.

Adding an interface with 3 implementations to provide functionality at the
configuration level to replace variables/keys from external sources. This
component is capable of hiding sensitive information or injecting generated
data into the configuration.

The implementation affects only the configuration layer so existing components
(sinks/sources/channels/etc) do not have to change and new components can
already have it through the configuration.

New custom implementations can be easily added even in plugin form.

Each implementation has unit tests in their module and an integration test in
the flume-ng-tests module.

Many unit tests use hardcoded port numbers which leads to flakiness and causes
problems when running builds in parallel.
This patch fixes this issue by searching for available ports instead of the
hardcoded ones.

flume-checkstyle breaks the assembly because its parent is not the flume-parent
Removing the moduleSets definition from the src assembly solved the issue.
Files are added based on fileSets, the resulting tarball's content equals
to the result of the dev-support/generate-source-release.sh in a clean
working directory.

This patch fixes the infinite loop between Kafka source and Kafka sink
by introducing the following configuration parameters in those components:
- topicHeader in Kafka source to specify the name of the header where it
stores the topic name where the event comes from.
- setTopicHeader in Kafka source to control whether the topic name is stored
in the given header.
- topicHeader in Kafka sink to configure the name of the header which
is used to specify in which topic to send the event.
- allowTopicOverride in Kafka sink to control whether the target topic's name
can be overridden by the specified header.

FLUME-3154. Add HBase client version check to AsyncHBaseSink and HBaseSink

The current implementation of HBaseSink and AsyncHbaseSink is not
compatible with the 2.0 version of HBase, which will be released soon.
This change adds a check and makes these sinks fail gracefully if
incompatible HBase jars can be found in the classpath.

Upgrading the version in dependency management and removing unused ones.
Both 1.x and 2.x jackson versions are needed, renamed the jackson.version
property to codehaus.jackson.version and added fasterxml.jackson.version
for the 2.x jackson.

Flume has a snappy-java dependency with version 1.1.0. Upon building Flume on
ppc64le architecture, errors such as "[FAILED_TO_LOAD_NATIVE_LIBRARY] no native
library is found for os.name=Linux and os.arch=ppc64le" are seen
Native libraries for ppc64le were added in snappy-java version 1.1.1.
Hence Flume needs to have a higher version of snappy-java.

Log4jAppender treats Collection messages as a special case making it possible to log
Collection of events in one Log4j log call. The appender sends these events to the
receiving Flume instance as one batch with the rpcClient.appendBatch() method.

Flume user guide does not specify whether a value in event header could be null or not.
Given an external system generating events which header values can be null and a user configures
Flume with Memory Channel then he will have no trouble.
Later on when the user changes Memory Channel to File Channel then Flume will fail with NPE.
It is because FC is serializing events with protocol buffer and header values are defined as
required in the proto file.
In this patch I have changed the value field to optional. However protocol buffer does not have
a notation for null and setting a field to null raises NPE again. Added a null check before
serialization to prevent this.
There is on caveat: When an optional field is not set, at deserialization it will be set to a
default value: in this case it will be empty string.

- Make sure this.stop() releases the resources and end up the component in
a LifecycleAware.STOPPED state
- Added junit test to cover the invalid host scenario
- Added junit test to cover the used port scenario

This patch adds the following new metrics to the FileChannel's counters:
- eventPutErrorCount: incremented if an IOException occurs during put operation.
- eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
during take operation.
- checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
- unhealthy: this flag represents whether the channel has started successfully
(i.e. the replay ran without any problem), so the channel is capable for normal operation
- closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.