Setup: Download and Start Flink

Flink runs on Linux, Mac OS X, and Windows. To be able to run Flink, the only requirement is to have a working Java 8.x installation. Windows users, please take a look at the Flink on Windows guide which describes how to run Flink on Windows for local setups.

You can check the correct installation of Java by issuing the following command:

Read the Code

You can find the complete source code for this SocketWindowWordCount example in scala and java on GitHub.

objectSocketWindowWordCount{defmain(args:Array[String]):Unit={// the port to connect to
valport:Int=try{ParameterTool.fromArgs(args).getInt("port")}catch{casee:Exception=>{System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'")return}}// get the execution environment
valenv:StreamExecutionEnvironment=StreamExecutionEnvironment.getExecutionEnvironment// get input data by connecting to the socket
valtext=env.socketTextStream("localhost",port,'\n')// parse the data, group it, window it, and aggregate the counts
valwindowCounts=text.flatMap{w=>w.split("\\s")}.map{w=>WordWithCount(w,1)}.keyBy("word").timeWindow(Time.seconds(5),Time.seconds(1)).sum("count")// print the results with a single thread, rather than in parallel
windowCounts.print().setParallelism(1)env.execute("Socket Window WordCount")}// Data type for words with count
caseclassWordWithCount(word:String,count:Long)}

publicclassSocketWindowWordCount{publicstaticvoidmain(String[]args)throwsException{// the port to connect tofinalintport;try{finalParameterToolparams=ParameterTool.fromArgs(args);port=params.getInt("port");}catch(Exceptione){System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'");return;}// get the execution environmentfinalStreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();// get input data by connecting to the socketDataStream<String>text=env.socketTextStream("localhost",port,"\n");// parse the data, group it, window it, and aggregate the countsDataStream<WordWithCount>windowCounts=text.flatMap(newFlatMapFunction<String,WordWithCount>(){@OverridepublicvoidflatMap(Stringvalue,Collector<WordWithCount>out){for(Stringword:value.split("\\s")){out.collect(newWordWithCount(word,1L));}}}).keyBy("word").timeWindow(Time.seconds(5),Time.seconds(1)).reduce(newReduceFunction<WordWithCount>(){@OverridepublicWordWithCountreduce(WordWithCounta,WordWithCountb){returnnewWordWithCount(a.word,a.count+b.count);}});// print the results with a single thread, rather than in parallelwindowCounts.print().setParallelism(1);env.execute("Socket Window WordCount");}// Data type for words with countpublicstaticclassWordWithCount{publicStringword;publiclongcount;publicWordWithCount(){}publicWordWithCount(Stringword,longcount){this.word=word;this.count=count;}@OverridepublicStringtoString(){returnword+" : "+count;}}}

Run the Example

Now, we are going to run this Flink application. It will read text from
a socket and once every 5 seconds print the number of occurrences of
each distinct word during the previous 5 seconds, i.e. a tumbling
window of processing time, as long as words are floating in.

First of all, we use netcat to start local server via

$ nc -l 9000

Submit the Flink program:

$ ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000
Starting execution of program

The program connects to the socket and waits for input. You can check the web interface to verify that the job is running as expected:

Words are counted in time windows of 5 seconds (processing time, tumbling
windows) and are printed to stdout. Monitor the TaskManager’s output file
and write some text in nc (input is sent to Flink line by line after
hitting ):

$ nc -l 9000
lorem ipsum
ipsum ipsum ipsum
bye

The .out file will print the counts at the end of each time window as long
as words are floating in, e.g.:

$ tail-f log/flink-*-taskexecutor-*.out
lorem : 1
bye : 1
ipsum : 4

To stop Flink when you’re done type:

$ ./bin/stop-cluster.sh

Next Steps

Check out some more examples to get a better feel for Flink’s programming APIs. When you are done with that, go ahead and read the streaming guide.