As mentioned above, the default State Provider for cluster-wide state is the
ZooKeeperStateProvider. At the time of this writing, this is the only
State Provider that exists for handling cluster-wide state. What this means is that NiFi
has dependencies on ZooKeeper in order to behave as a cluster. However, there are many
environments in which NiFi is deployed where there is no existing ZooKeeper ensemble being
maintained. In order to avoid the burden of forcing administrators to also maintain a
separate ZooKeeper instance, NiFi provides the option of starting an embedded ZooKeeper
server.

Property

Description

nifi.state.management.embedded.zookeeper.start

Specifies whether or not this instance of NiFi should run an embedded
ZooKeeper server

nifi.state.management.embedded.zookeeper.properties

Properties file that provides the ZooKeeper properties to use if
nifi.state.management.embedded.zookeeper.start is set to
true

This can be accomplished by setting the
nifi.state.management.embedded.zookeeper.start property in
nifi.properties to true on those nodes that
should run the embedded ZooKeeper server. Generally, it is advisable to run ZooKeeper on
either 3 or 5 nodes. Running on fewer than 3 nodes provides less durability in the face of
failure. Running on more than 5 nodes generally produces more network traffic than is
necessary. Additionally, running ZooKeeper on 4 nodes provides no more benefit than
running on 3 nodes, ZooKeeper requires a majority of nodes be active in order to function.
However, it is up to the administrator to determine the number of nodes most appropriate
to the particular deployment of NiFi.

If the nifi.state.management.embedded.zookeeper.start property is
set to true, the
nifi.state.management.embedded.zookeeper.properties property in
nifi.properties also becomes relevant. This specifies the ZooKeeper
properties file to use. At a minimum, this properties file needs to be populated with the
list of ZooKeeper servers. The servers are specified as properties in the form of
server.1, server.2, to
server.n. Each of these servers is configured as
<hostname>:<quorum port>[:<leader election port>]. For example,
myhost:2888:3888. This list of nodes should be the same nodes in the
NiFi cluster that have the
nifi.state.management.embedded.zookeeper.start property set to
true. Also note that because ZooKeeper will be listening on these
ports, the firewall may need to be configured to open these ports for incoming traffic, at
least between nodes in the cluster. Additionally, the port to listen on for client
connections must be opened in the firewall. The default value for this is
2181 but can be configured via the clientPort
property in the zookeeper.properties file.

When using an embedded ZooKeeper, the ./conf/zookeeper.properties
file has a property named dataDir. By default, this value is set to
./state/zookeeper. If more than one NiFi node is running an embedded
ZooKeeper, it is important to tell the server which one it is. This is accomplished by
creating a file named myid and placing it in ZooKeeper's data
directory. The contents of this file should be the index of the server as specific by the
server.<number>. So for one of the ZooKeeper servers, we will
accomplish this by performing the following commands: