This forum is now a read-only archive. All commenting, posting, registration services have been turned off. Those needing community support and/or wanting to ask questions should refer to the Tag/Forum map, and to http://spring.io/questions for a curated list of stackoverflow tags that Pivotal engineers, and the community, monitor.

Complex use case - Interleaved splitter with countdown timer

The title of the post probably didn't make any sense at all so I'll explain more here. It was very difficult to give this use case a name. Anyway, description follows.

Simple scenario

Imagine the flow is triggered by an event of some sort. A single message arrives. That message then needs to be split into 60 separate smaller messages in an ordered fashion. Then every second one of those 60 messages must be released to the next endpoint which is the outbound channel adapter where the flow terminates.

Simple scenario solution

The above can be done using a gateway initially to publish event messages to the first channel. Then a splitter can be used to split the large message into a series of small messages that then get sent to a bounded queue channel. Then a poller element on the next endpoint retrieves one message every second. If I'm correct that should cover the simple scenario.

Complex scenario

Unfortunately the simple scenario is a little too simple and real life is not like that. We have several event messages arriving in real time continuously. And once each of these larger messages have been split into the smaller messages the ordering of smaller messages from all the larger messages must happen in an interleaved yet fifo fashion. Here is an example.

A large message M1 gets split into A3, A2, A1
A large message M2 gets split into B3, B2, B1
A large message M3 gets split into C3, C2, C1

The outgoing ordering of the smaller messages to the next endpoint every second must be:

1 second - A1, B1, C1
1 second - A2, B2, C2
1 second - A3, B3, C3

That is what I mean by interleaving smaller messages originating from multiple different larger messages.

Complex scenario solution

For the complex scenario extending the simple scenario solution to multiple messages simply doesn't work. The bounded queue channel may need to be arbitrarily large due to large numbers of inbound message into the flow. Having an unbounded queue channel may have the theoretical risk of OOM. Also a single queue wouldn't allow us to interleave as shown above.

The only solution I've been able to come up with is for the first endpoint (E1) to split the large message and populate a fifo queue and then to relay that queue to the next endpoint (E2) which maintains a set of queues. Then every second E2 pulls the head of all registered fifo queues in the set and sends them to the next endpoint (E3) which is the outbound channel adapter where the flow terminates. This last step could be E2 publishing to a gateway that E3 is listening on or E3 could potentially poll E2.

Now at this point one could ask the question that I asked which is that why is E2, being an endpoint, doing the job of a channel. I have thought about writing a custom channel to do the job of E2 in as generic a way as possible - essentially to maintain a series of queues and to take the head off each of the queues and send them to next endpoint. However before I did this I wanted to ask the experts here. I've also considered a traditional resequencer and splitter but I can't see how they would help in this case.

If I understand the scenario, you don't really need a set of queues but just a delay of one second for every message sent. So, if you have 10 producer/consumer pairs, then you will have 10 message-per-second throughput, correct?

If so, then I think what you really need is to just add a DelayQueue in the mix. There are a couple of ways to do this, and some other related forum/JIRA discussions that you may want to browse: http://jira.springframework.org/browse/INT-636

I'd be interested if that matches your use case.

For now, I would suggest just using a ChannelInterceptor that adds a DelayQueue on the sending side (essentially adding a one second delay before each message actually arrives on the channel).

As you can see, we are planning to implement this for 2.0, and so your feedback would be highly appreciated (and it should be relatively easy for you to plugin your own solution in the meantime... hopefully, based on your feedback, that would be aligned with our plans for 2.0).

FWIW, I agree with your statement that an endpoint should not be doing a channel's job, and to me, this feels like a responsibility of a channel.

Regards,
Mark

Comment

Thank you very much for your help. Yes soon after I'd posted here I began to find the jira and the associated blog posts via google. And yes your understanding of use-case in previous post is correct I think although the ordering requirement of those N messages is specific and at specific intervals.

I've done a very small example eclipse application test harness with a main method that runs using the header based delay interceptor. However it does not behave as expected as evident from the log output.

For a single large message when split I expect to see one small message received further on every second and when more than one large message arrives I would like to see the interleaving that I described earlier. So if LM1 (stands for large message) and LM2 arrive I'd like to see the following where LM1:SM1 stands for small message 1 originating from large message 1.

This sample splits sentences in words and groups by word. You can customize the poller on the groups channel if you need to separate groups in time. As the messages in are in the right order and you can treat them as groups or individually, I think all itches are scratched.

Let me know if I'm missing something.

Of course this implementation doesn't fix the fact that we could have better support for these advanced cases so you don't have to spell them out that much.

Comment

Iwein thanks for your feedback and example which I had a look at. I had considered an aggregator some time back when considering possible flow designs. However I didn't adopt it because there is no finite bound that the aggregate can check against. The aggregation is never complete. I'll try to explain better below.

In my previous example I used the example of three large messages arriving and resulting in three small messages each. However in reality N large messages may arrive. Although each of them horizontally only produces 90 small message - vertically if you take the head of what each of the large messages is splitting into then I have to time release N small messages per time unit in sequence to the next endpoint. This is a real-time flow that may receive any amount of data and proportionally has to time release that amount of data with soft guarantees about delivery timing (taking into account threads and gc and so on).

LM1 - A90, A89 ... A1
LM2 - B90, B89 ... B1
LMn - C90, C89 ... C1

So here I receive I receive three large mesages (LM1,2,n) and release in the first second three small messages (A1,B1,C1) except in reality it would be N instead of three.

So far I have achieved this in a custom fashion using a splitter, a delay channel and a channel interceptor (as per the jira). And that works nicely because the delay queue is able to sort A1, B1 and C1 to be earlier than the rest no matter how many messages are in the queue and due to this crucial benefit I don't have to maintain any state which I'm very glad about. However it is tricky to design this flow in a way that will scale to arbitrary numbers of incoming messages especially if you consider the thread to message allocation. I still have a few questions about my flow. Threads from the thread pool get used up way too quickly and I'm wondering why and so I've had to reduce the interval trigger frequency and also increase the number of threads. The getDelay() method is called numerous times per message which may well be legitimate DelayQueue behaviour but I'm still looking into it. The receive timeout isn't set for one of my pollers and it assumes the default of 1000. And lastly I'm also wondering if channelling all incoming load through one delay queue is going to cause contention but I really can't think of how else to do it because the delay queue needs to know about all the messages in it in order to sort descending by expiry. I'm beginning to wonder if I'll be needing the Azure machine with 800+ odd processors and jdk7 fork join and parallel arrays Ultimately our settings will only be guesstimates and we'll need feedback from production and thread pool monitoring to tell us really what's going on.

I intend to blog my final solution which is very similar to the zip file that I attached earlier. I'll try and do this by the weekend. It would be great to get your feedback on it. Thanks.

Comment

However I didn't adopt it because there is no finite bound that the aggregate can check against. The aggregation is never complete.

You lost me there. The completion strategy can make a decision based on any parameters. So you don't need a finite bound. There is however a bug in the aggregator that doesn't allow you to create multiple aggregates for the same correlation id. That could obviously cause you trouble.

The main reason I'm scratching my head here is that in order to be able to interleave the messages you will have to make a decision when to go from 1's to 2's. If you can't be sure you've completed all 1's the interleaving won't work properly. So if you put exactly that decision (that is apparently good enough) in a completion strategy you should be grand.... right?

Comment

I've done a sample flow based on your one and I still can't see what the body of isComplete() should be in my case. Large messages are always coming in and each of them will have 1's. The 1's must be prioritised continuously in real time for any new large messages that have arrived by any time. Still need to consider this more. Is there an isComplete() implementation that comes to your mind? The one in your example I can't see how to utilise as there is no constant figure I can use. Will continue to think about this.

Comment

I did a sample application with priority queue and although the priority queue certainly behaves as expected it is not suitable for my use-case as I need a timed release of messages from the queue.

The reason for needing a timed release is simple. I need a countdown timer. Another way of thinking about this is that I'm implementing a server driven heartbeat mechanism. This heartbeat originates from the server and is pushed out to a client via some sort of asychronous push technology and then the client can react to each of the heartbeats.

So essentially a spring integration flow event results in a 90 second countdown with a heartbeat sent every second. A delay queue works nicely as it is able to release by expiry. So having patched up the zip file I originally posted I have what I want. Will blog this soon.

Please let me know if you wish to know anything further or if you have any other possible implementations.

Also an interesting question to ponder is if spring integration should natively support a heartbeat mechanism or is it sufficient to just natively support a delay channel and let the user implement whatever heartbeat mechanism they need. I think it is very difficult to generalise on heartbeat use cases but would be interested to hear your opinion on this.

I might be missing something entirely, but I thought I would throw out one more idea:

event-source -> queue-channel <- poller -> output-channel

Of course, since you apparently have multiple event sources (A, B, C in the examples), this could involve multiple queue-channel/poller pairs and a router between the event-source and those queue-channels.

Does this make any sense at all for the use case?

-Mark

Comment

Also an interesting question to ponder is if spring integration should natively support a heartbeat mechanism or is it sufficient to just natively support a delay channel and let the user implement whatever heartbeat mechanism they need. I think it is very difficult to generalise on heartbeat use cases but would be interested to hear your opinion on this.

Support for a DelayQueue is a good first step. I think we'll have to wait a bit to see different things like yours come up in the wild before choosing an approach in the framework (if any). Of course it doesn't hurt to create an issue for it once you have your version done, it might turn out to be a very good option

I want to pass on the all A(1)( highest priority messages) to next queue after processing. But before taking C(2) type of messages from priority-channel
I want to have some delay because there could be a case that some client process can put A(1) type of messages. Same while taking D(3) kind of messages.