Effectively-once Java topologies

This document pertains to the older, Storm-based, Heron Topology API

Heron now offers two separate APIs for building topologies: the original, Storm-based Topology API, and the newer Streamlet API. Topologies created using the Topology API can still run on Heron and there are currently no plans to deprecate this API. We would, however, recommend that you use the Streamlet API for future work.

You can create Heron topologies that have effectively-once semantics by doing two things:

Initializes the state of the function or operator to that of a previous checkpoint.

Remember that stateful components automatically handle all state storage in the background using a State Manager (the currently available State Managers are ZooKeeper and the local filesystem. You don’t need to, for example, save state to an external database.

The State class

Heron topologies with effectively-once semantics need to be stateful topologies (you can also create stateful topologies with at-least-once or at-most-once semantics). All state in stateful topologies is handled through a State class which has the same semantics as a standard Java Map, and so it includes methods like get, set, put, putIfAbsent, keySet, compute, forEach, merge, and so on.

Each stateful spout or bolt must be associated with a single State object that handles the state, and that object must also be typed as State<K, V>, for example State<String, Integer>, State<long, MyPojo>, etc. An example usage of the state object can be found in the example topology below.

Example effectively-once topology

In the sections below, we’ll build a stateful topology with effectively-once semantics from scratch. The topology will work like this:

A RandomIntSpout will continuously emit random integers between 1 and 100

An AdditionBolt will receive those random numbers and add each number to a running sum. When the sum reaches 1,000,000, it will go back to zero. The bolt won’t emit any data but will simply log the current sum.

You can see the code for another stateful Heron topology with effectively-once semantics in this word count example.

Example stateful spout

The RandomIntSpout shown below continuously emits a never-ending series of random integers between 1 and 100 in the random-int field.

It’s important to note that all components in stateful topologies must be stateful (i.e. implement the IStatefulComponent interface) for the topology to provide effectively-once semantics. That includes spouts, even simple ones like the spout in this example.

importcom.twitter.heron.api.spout.BaseRichSpout;importcom.twitter.heron.api.spout.SpoutOutputCollector;importcom.twitter.heron.api.state.State;importcom.twitter.heron.api.topology.IStatefulComponent;importcom.twitter.heron.api.topology.TopologyContext;importcom.twitter.heron.api.tuple.Fields;importcom.twitter.heron.api.tuple.Values;importjava.util.Map;importjava.util.concurrent.ThreadLocalRandom;publicclassRandomIntSpoutextendsBaseRichSpoutimplementsIStatefulComponent<String,Integer>{privateSpoutOutputCollectorspoutOutputCollector;privateState<String,Integer>count;publicRandomIntSpout(){}// Generates a random integer between 1 and 100
privateintrandomInt(){returnThreadLocalRandom.current().nextInt(1,101);}// These two methods are required to implement the IStatefulComponent interface
@OverridepublicvoidpreSave(StringcheckpointId){System.out.println(String.format("Saving spout state at checkpoint %s",checkpointId));}@OverridepublicvoidinitState(State<String,Integer>state){count=state;}// These three methods are required to extend the BaseRichSpout abstract class
@Overridepublicvoidopen(Map<String,Object>map,TopologyContextctx,SpoutOutputCollectorcollector){spoutOutputCollector=collector;}@OverridepublicvoiddeclareOutputFields(OutputFieldsDeclarerdeclarer){declarer.declare(newFields("random-int"));}@OverridepublicvoidnextTuple(){intrandomInt=randomInt();collector.emit(newValues(randomInt));}}

A few things to note in this spout:

All state is handled by the count variable, which is of type State<String, Integer>. In that state object, the key is always count, while the value is the current sum.

This is a very simple topology, so the preSave method simply logs the current checkpoint ID. This method could be used in a variety of more complex ways.

The initState method simply accepts the current state as-is. This method can be used for a wide variety of purposes, for example deserializing the State object to a user-defined type.

Only one field will be declared: the random-int field.

Example stateful bolt

The AdditionBolt takes incoming tuples from the RandomIntSpout and adds each integer to produce a running sum. If the sum ever exceeds 1 million, then it resets to zero.

importcom.twitter.heron.api.bolt.BaseRichBolt;importcom.twitter.heron.api.bolt.OutputCollector;importcom.twitter.heron.api.state.State;importcom.twitter.heron.api.topology.IStatefulComponent;importcom.twitter.heron.api.topology.TopologyContext;importjava.util.Map;publicclassAdditionBoltextendsBaseRichBoltimplementsIStatefulComponent<String,Integer>{privateOutputCollectoroutputCollector;privateState<String,Integer>count;publicAdditionBolt(){}// These two methods are required to implement the IStatefulComponent interface
@OverridepublicvoidpreSave(StringcheckpointId){System.out.println(String.format("Saving spout state at checkpoint %s",checkpointId));}@OverridepublicvoidinitState(State<String,Integer>state){count=state;}// These three methods are required to extend the BaseRichSpout abstract class
@Overridepublicvoidprepare(Map<String,Object>,TopologyContextctx,OutputCollectorcollector){outputCollector=collector;}@OverridepublicvoiddeclareOutputFields(OutputFieldsDeclarerdeclarer){// This bolt has no output fields, so none will be declared
}@Overridepublicvoidexecute(Tupletuple){// Extract the incoming random integer from the arriving tuple
intincomingRandomInt=tuple.getInt(tuple.fieldIndex("random-int"));// Get the current sum from the count object, defaulting to zero in case
// this is the first processing operation.
intcurrentSum=count.getOrDefault("count",0);intnewSum=incomingValue+currentSum;// Reset the sum to zero if it exceeds 1,000,000
if(newSum>1000000){newSum=0;}// Update the count state
count.put("count",newSum);System.out.println(String.format("The current saved sum is: %d",newSum));}}

A few things to notice in this bolt:

As in the RandomIntSpout, all state is handled by the count variable, which is of type State<String, Integer>. In that state object, the key is always count, while the value is the current sum.

As in the RandomIntSpout, the preSave method simply logs the current checkpoint ID.

The bolt has no output (it simply logs the current stored sum), so no output fields need to be declared.

Putting the topology together

Now that we have a stateful spout and bolt in place, we can build and configure the topology:

importcom.twitter.heron.api.Config;importcom.twitter.heron.api.HeronSubmitter;importcom.twitter.heron.api.exception.AlreadyAliveException;importcom.twitter.heron.api.exception.InvalidTopologyException;importcom.twitter.heron.api.topology.TopologyBuilder;importcom.twitter.heron.api.tuple.Fields;publicclassEffectivelyOnceTopology{publicstaticvoidmain(String[]args)throwsAlreadyAliveException,InvalidTopologyException{ConfigtopologyConfig=newConfig();// Apply effectively-once semantics and set the checkpoint interval to 10 seconds
topologyConfig.setTopologyReliabilityMode(Config.TopologyReliabilityMode.EFFECTIVELY_ONCE);topologyConfig.setTopologyStatefulCheckpointIntervalSecs(10);// Build the topology out of the example spout and bolt
TopologyBuildertopologyBuilder=newTopologyBuilder();topologyBuilder.setSpout("random-int-spout",newRandomIntSpout());topologyBuilder.setBolt("addition-bolt",newAdditionBolt()).fieldsGrouping("random-int-spout",newFields("random-int"));HeronSubmitter.submitTopology(args[0],config,topologyBuilder.createTopology());}}

By default, Heron uses the local filesystem as a State Manager. If you’re running Heron locally using the instructions in the Quick Start Guide then you won’t need to change any settings to run this example stateful topology with effectively-once semantics.