I will start with an example of the problem. We have 4 different data sources that produce 3 discrete signals input_1, input_2 and input_3, and one analog signal input_4.

I'm looking for the best technology that would provide a flexible and easy (for end user) way to transform that signals to the new ones. On the picture, new signals are out_1 and out_2. Pseudo code for them:

Number of signals might be quite big. Data sources and sink might be distributed. You might think about this as distributed IoT system, where we should program a logic, like "if TV and lamp is working, and temperature is bigger than 20, turn air condition on". Every signals might come every second/minute/millisecond.

The technologies I'm looking at right now:

Kafka as a broker

RabbitMQ as a broker

Spark

Spark Streaming

Flink

Storm

Kafka Streaming

Akka

Akka Streaming

Reactive Streams

maybe some more

I'm quite new in this field, and it's difficult to immediately understand all this zoo and choose the right tool.

I already have some kind of a self made solution for this problem, that behaves via actors models (like Akka but not Akka) via RabbitMQ. But it is not so scalable as I want. Probably, I can make a solution based on any of this technologies, but some of this solutions will be just duct tape.

There might be an additional requirements for the system, like: data recalculation for new signal/signal update, late data arrive, data lost for some periods for some signals, etc. But, I hope, this problems might be solved in application programming.

Questions:

Is timeline data fit to Kafka broker? If some producer will send own events not immediately, but after delay? Like there is missing an internet connection, and it buffer it internally and then send.

Is there any performance/syntax/reliability advantages between Spark Streaming, Flink or Akka Streaming?

Is it possible to do a recalculations for old data with streaming framework (Flink, Spark Streaming)? With actors models?

If I want to find how much peaks there were for some signal for certain time period, then is it easy to implement with Spark or Flink?

How my pseudo code for out_1 and out_2 might looks like in Spark/Flink/Storm? This frameworks provides examples where only single stream is used as input. But in my situation there might be an infinity number of inputs and probably only one output.