In practice, a single peer functions as both a source and a target node, often simultaneously.

Important: In a typical indexer cluster deployment, all the peer nodes are source nodes; that is, each node has its own set of external inputs. This is not a requirement, but it is generally the best practice. There is no reason to reserve some peers for use just as target nodes. The processing cost of storing replicated data is minimal, and, in any case, you cannot currently specify which nodes will receive replicated data. The master determines that on a bucket-by-bucket basis, and the behavior is not configurable. You must assume that all the peer nodes will serve as targets.

Note: In addition to replicating external data, each peer replicates its internal indexes to other peers in the same way. To keep things simple, this discussion focuses on external data only.

How the target peers are chosen

Whenever the source peer starts a hot bucket, the master node gives it a list of target peers to stream its replicated data to. The list is bucket-specific. If a source peer is writing to several hot buckets, it could be streaming the contents of each bucket to a different set of target peers.

The master chooses the list of target peers randomly. In the case of multisite clustering, it respects site boundaries, as dictated by the replication factor, but chooses the target peers randomly within those constraints.

When a peer node starts

These events occur when a peer node starts up:

1. The peer node registers with the master and receives the latest configuration bundle from the master.

2. The master rebalances the primary bucket copies across the cluster and starts a new generation.

3. The peer starts ingesting external data, in the same way as any indexer. It processes the data into events and then appends the data to a rawdata file. It also creates associated index files. It stores these files (both the rawdata and the index files) locally in a hot bucket. This is the primary copy of the bucket.

4. The master gives the peer a list of target peers for its replicated data. For example, if the replication factor is 3, the master gives the peer a list of two target peers.

5. If the search factor is greater than 1, the master also tells the peer which of its target peers should make its copy of the data searchable. For example, if the search factor is 2, the master picks one specific target peer that should make its copy searchable and communicates that information to the source peer.

6. The peer begins streaming the processed rawdata to the target peers specified by the master. It does not wait until its rawdata file is complete to start streaming its contents; rather, it streams the rawdata in blocks, as it processes the incoming data. It also tells any target peers if they need to make their copies searchable, as communicated to it by the master in step 5.

7. The target peers receive the rawdata from the source peer and store it in local copies of the bucket.

9. The peer continues to stream data to the targets until it rolls its hot bucket.

Note: The source and target peers rarely communicate with each other through their management ports. Usually, they just send and receive data to each other over their replication ports. The master node manages the overall process.

This is just the breakdown for data flowing from a single peer. In a cluster, multiple peers will be both originating and receiving data at any time.

When a peer node rolls a hot bucket

When a source peer rolls a hot bucket to warm (for example, because the bucket has reached its maximum size), the following sequence of events occurs:

1. The source peer tells the master and its target peers that it has rolled a bucket.

2. The target peers roll their copies of the bucket.

3. The source peer continues ingesting external data as this process is occurring. It indexes the data locally into a new hot bucket and streams the rawdata to a new set of target peers that it gets from the master.

4. The new set of target peers receive the rawdata for the new hot bucket from the source peer and store it in local copies of the bucket. The targets with designated searchable copies also start creating the necessary index files.

5. The source peer continues to stream data to the targets until it rolls its next hot bucket. And so on.

How a peer node interacts with a forwarder

When a peer node gets its data from a forwarder, it processes it in the same way as any indexer getting data from a forwarder. However, in a clustering environment, you should ordinarily enable indexer acknowledgment for each forwarder sending data to a peer. This protects against loss of data between forwarder and peer and is the only way to ensure end-to-end data fidelity. If the forwarder does not get an acknowledgment for a block of data it has sent to a peer, it resends the block.

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »