I am not able to reproduce with Junit test case having BrokerService started with in the test case. I guess I am not hitting the right stress conditions this way. But when I run the test case against an externally running ActiveMQ instance backed with oracle database persistence, it is reproducible most of the times. This is not a every time failure situation, it takes more time once than the other.

I was able to hit this situation of stuck messages on queue using following scenario most of the times:

1) Start 2 concurrent consumers for the queue using Spring's DefaultMessageListenerContainer using cacheLevelName as CACHE_CONSUMER
2) Send messages using JMETER 2.3.2 to the queue on ActiveMQ stand alone broker instance with 50 threads looping 20 times.
3) After a while, you will notice that Spring logs that no messages are being received but the messages are shown jconsole of ActiveMQ and the database backing it for persistence.

But in 5.2 RC3, the problem is that it dispatches duplicate messages and does not remove them from broker's database after acknowledge properly.

Attached test case might help to reproduce when run against externally running stand alone ActiveMQ broker. Another way to see the problem is that try to load test using JMETER by sending messages to a queue with a camel route that moves messages from this queue to another and you will notice that it stops moving after while or copied duplicates in case of 5.2 RC3.

Sorry about such a huge description but it is a real problem! A different team at our company are having this issue in production with 5.1. They are using it as an embedded broker with derby for persistence.

Gary Tully
added a comment - 04/Dec/08 11:44 on 5.2 giving duplicate messages. The recent fix for AMQ-2020 may be relevant. If you get a chance, can you try out last nights 5.3-SNAPSHOT build. http://people.apache.org/repo/m2-snapshot-repository/org/apache/activemq/apache-activemq/5.3-SNAPSHOT/

For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.

Maarten Dirkse
added a comment - 09/Dec/08 11:36 For what it's worth: the 5.3-SNAPSHOT does seem to fix the duplicate message issue (no more exceptions are reported), but it does not fix this bug. Our test runs hang our broker at more or less the same point (1.4 million messages) for 5.1, 5.2 and 5.3-SNAPSHOT.

We have been able to reproduce this issue in our test runs every time.

The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head.

The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1).

P1 --> [t1] --> C1 --> [q1]

Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic.

At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again.

In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1.

We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support.

We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into AMQ-2020 and so didn't test any further.

We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case.

Maurits Lucas
added a comment - 15/Dec/08 10:29 We have been able to reproduce this issue in our test runs every time.
The only problem, and the reason I haven't got a unit test for you, is that it takes a very long time to rear its ugly head.
The minimum setup which exposes the issue is have a producer (P1) send a lot (6 million in our case) messages to a topic (t1) and have one consumer (C1) consume the messages from the topic and publish them to a queue (q1).
P1 --> [t1] --> C1 --> [q1]
Both P1 and C1 produce and consume / produce messages as fast as they can. Because P1 produces faster than C1 can consume, we get a fast producer / slow consumer for that topic.
At the end of the run, there are 6 million messages on the queue, but if you add a consumer to the queue, it won't consume any messages. If you browse the queue using the webconsole, it appears empty. Only restarting ActiveMQ gets q1 working again.
In our previous tests, we discovered that this "freezing" of dispatching (it affects other queues than q1 as well) starts somewhere during the publishing of the 6 million messages to t1.
We are using a standalone broker, Kaha for persistence and Spring 2.5.4 for JMS support.
OS: Linux (2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux)
Java:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
We have tested with 5.1.0, 5.2.0 and 5.3-SNAPSHOT. Using 5.1.0 and the snapshot we ran into the issue, with 5.2.0 we ran into AMQ-2020 and so didn't test any further.
We saw some peculiar values for the inflight counter on t1, this leveled out at 32766 for hours on end, sometimes going down to 32765 for a couple of seconds. Not sure if that is significant, but I am including a screenshot of JConsole just in case.

Brecht Yperman
added a comment - 15/Dec/08 17:14 This is some testcode (very ugly, sorry) that keeps a consumer opened, and then tries to retrieve a message using a second consumer, but that consumer never receives the message

Brecht Yperman
added a comment - 15/Dec/08 17:18 If I introduce a step between 10 and 11: receive message using consumer2, the message sent in step 6 is received.
http://activemq.apache.org/point-to-point-with-multiple-consumers.html
This says the behaviour is undefined, so I shouldn't nag, I guess?

Maurits Lucas
added a comment - 30/Dec/08 10:45 Ho hum, turns out we had producer flow control disabled in our setup. Switching it on solved the issue, now our tests process all 5.9 million messages correctly and ActiveMQ no longer hangs.
And I think the max value for the inflight counter of 32766 observed in earlier tests is explained by the default prefetch size for topics: 32766.

We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled.

After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic.
When I check the broker using JMX console ~13000 messages in flight are reported.

Lukasz Zielinski
added a comment - 11/Mar/09 09:02 We ran into the same problem using topics (with 5.2.0) - we use only TCP transport and persistence is disabled.
After some time (will make this more specific when I get more data from support) our appication stops receiving and publishing messages on a topic.
When I check the broker using JMX console ~13000 messages in flight are reported.

Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why.

Both consumers listen on the same queue.
The first consumer only closes its session after the second consumer tried to receive the message that the second producer sent to the queue. So the first consumer is still active when the second consumer calls f_consumer.receive(5000);

So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer.
The first consumer acked the first message so it is available to receive more messages (even though it does not actively call f_consumer.receive()). So when the second message appears on the queue, the broker sends it right to the first consumer where it stays in the prefetch queue until it either gets received by the consumer calling f_consumer.receive() or the session gets closed. If in the testcase you call

consumerTest.receiveMessage(dinges);

rather than

consumerTestNew.receiveMessage(dinges);

the message is received fine.

So there are two ways to work around this:

1. have the first consumer close its session before the second consumer receives the message:

Torsten Mielke
added a comment - 18/Mar/09 18:07 - edited Don't want to say anything about the bug itself but I took consumertest.zip and turned it into a Maven JUnit test (see attached testcase.zip) in order to investigate into this problem. This new testcase also reproduces the same behavior however I do not believe this testcase shows a bug. Let me explain why.
Both consumers listen on the same queue.
The first consumer only closes its session after the second consumer tried to receive the message that the second producer sent to the queue. So the first consumer is still active when the second consumer calls f_consumer.receive(5000);
So what happens at runtime is that even though both consumers use a pull mode (by calling consumer.receive(5000); ) there is a default prefetch size (as this is not set explicitly) that is used for each consumer.
The first consumer acked the first message so it is available to receive more messages (even though it does not actively call f_consumer.receive()). So when the second message appears on the queue, the broker sends it right to the first consumer where it stays in the prefetch queue until it either gets received by the consumer calling f_consumer.receive() or the session gets closed. If in the testcase you call
consumerTest.receiveMessage(dinges);
rather than
consumerTestNew.receiveMessage(dinges);
the message is received fine.
So there are two ways to work around this:
1. have the first consumer close its session before the second consumer receives the message:
ConsumerProblemTest.java
...
consumerTest.close();
//create second consumer and read msg
DurableConsumer consumerTestNew = new DurableConsumer();
consumerTestNew.init();
consumerTestNew.receiveMessage(dinges);
consumerTestNew.close();
2. use a prefetch limit of 0 so messages to not get prefetched to consumers:
DurableConsumer.java
env.put(InitialContext.PROVIDER_URL, "tcp: //localhost:61616?jms.prefetchPolicy.queuePrefetch=0" );
With any of the two changes the testcase succeeds. Give it a go. Simply run mvn test, it should fail out of the box. Then make these changes, the testcase will succeed.

I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem AMQ-2020 but not this bug!

So that exhausts the two testcases we have so far. I have not been able to reproduce this issue.
Has anyone else who commented on this bug tried version 5.3 yet? What are the results?

Torsten Mielke
added a comment - 20/Mar/09 14:27 I also took the second test case DispatchMultipleConsumersTest.java and wrapped it in a maven JUnit project (see AMQ-2009 Testcase2.zip) to run it more quickly. I first tested against version 5.3 and did not reproduce any errors. The JUnit test succeeded many times (even under a higher load than initially implemented). When testing against version 5.1 I did reproduce the duplicate message problem AMQ-2020 but not this bug!
So that exhausts the two testcases we have so far. I have not been able to reproduce this issue.
Has anyone else who commented on this bug tried version 5.3 yet? What are the results?

I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory.

We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck.

I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on.

DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked.

Any suggestion is welcome. I am looking into this issue until i find a fix.

ying
added a comment - 10/Jun/09 22:45 I see this in Trunk 771718. Our network of 4 brokers are running for almost 30 days, then we see a broker is not dispatching msgs. It has about 400mb data in its data directory.
We restart the application to talk to another broker, restart the problematic broker, it starts fine, bridged ok, sees the consumer on the other broker, but all the msgs stuck on this broker is not dispatched. They are stuck.
I will try to gather more information when I get more idea. A simple unit test of the case is ok with less msgs. I suspect this happens when a lot of msgs are stuck on the queue, then it fails to dispatch even after restart. BTW, we have enough memory on the box for the broker to run and jconsole show it is using only a portion. There is no error in the log with debug turned on.
DemandForwardingBridgeSupport's serviceLocalCommand is supposed to be called but not. Any possible threading issue? Regarding the demandforwarding bridge, do you know the name of the thread I shall look in jconsole. due to the complexity of our system, there are about 170 live threads in the jconsole for this broker. Maybe a thread is blocked.
Any suggestion is welcome. I am looking into this issue until i find a fix.

1. setup a network of broker1, broker2, bridged by multicast discovery
2. make a producer send 5 msg to queueA to broker2
3. make a consumer to consume from broker1 queueA ( make it slow, so it only consumer 1 msg) but make sure all 5 msg from broker2 are forwared to broker1
4. stop the consumer to broke1, restart it to consume from broker2 queueA
5. the 4 msgs originally published to broker2 and forwarded to broker1 and has not yet been consumed will stuck on broker1 and will not forwarded to broker2 for the consumer to consume.

You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case.

This is a very critical issue because you might restart your application many times and run into this.

I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks

ying
added a comment - 11/Jun/09 23:26 I have a simple cause which can cause dispatch problem:
1. setup a network of broker1, broker2, bridged by multicast discovery
2. make a producer send 5 msg to queueA to broker2
3. make a consumer to consume from broker1 queueA ( make it slow, so it only consumer 1 msg) but make sure all 5 msg from broker2 are forwared to broker1
4. stop the consumer to broke1, restart it to consume from broker2 queueA
5. the 4 msgs originally published to broker2 and forwarded to broker1 and has not yet been consumed will stuck on broker1 and will not forwarded to broker2 for the consumer to consume.
You can only clear out the 4 left over stuck msg by making consumer consume from broker1 in this case.
This is a very critical issue because you might restart your application many times and run into this.
I will look into how the demandfowardbridge handles this case. if you know anything about this, please help. this is very urgent. thanks

SuoNayi
added a comment - 27/Jan/10 01:51 - edited I encountered the same question three times.
A standalone AMQ(5.2.0) server and three durable subscribers(because the bug with message selector) without using selector.
It works fine usually(sending or receiving messages).
A few days later, I use one of the subscribers to send a message successfully but the subscribers cannot receive any messages at all.
Restart the subscriber, it does not work yet.
Restart the AmqBroker, the subscribers receive the message.
But I donot know what's wrong?
I'm really crazy!
Any help ?

Is anyone using activemq in a very high volume situation and NOT running into this after a couple days? It seems like there should be a great amount of 'outrage' over this one, as it is quite disruptive. Wouldn't this be affecting just about everyone?

We are using activemq 5.2 in production. 3 sites have lower volume and we don't really have any problems. One site has high volume, and after 2 or 3 days at most, it will just fall asleep. No errors, no out of memory or anything, logs remain totally healthy. The messages continue to make it into the activemq DB without problems, it grows and grows, but yet messages just stop going from activemq to the subscriber.

because this is happening so regularly, the root problem isn't even understood, a fix is a long way away, and we can't "just restart" the broker in production without going through a bunch of hoops, we've deemed it not fit for production and now have to figure out a different queuing solution. =(

John Newman
added a comment - 31/Mar/10 17:03 Is anyone using activemq in a very high volume situation and NOT running into this after a couple days? It seems like there should be a great amount of 'outrage' over this one, as it is quite disruptive. Wouldn't this be affecting just about everyone?
We are using activemq 5.2 in production. 3 sites have lower volume and we don't really have any problems. One site has high volume, and after 2 or 3 days at most, it will just fall asleep. No errors, no out of memory or anything, logs remain totally healthy. The messages continue to make it into the activemq DB without problems, it grows and grows, but yet messages just stop going from activemq to the subscriber.
because this is happening so regularly, the root problem isn't even understood, a fix is a long way away, and we can't "just restart" the broker in production without going through a bunch of hoops, we've deemed it not fit for production and now have to figure out a different queuing solution. =(

have you tried to set producerFlowControl to false ( the default is true and the broker just keep silent usually happening for fast producers and slow consumers and message will be piling up and not doing dispatch at all)

ying
added a comment - 31/Mar/10 18:41 have you tried to set producerFlowControl to false ( the default is true and the broker just keep silent usually happening for fast producers and slow consumers and message will be piling up and not doing dispatch at all)
<policyEntry queue=">" producerFlowControl="false">
<dispatchPolicy>
<roundRobinDispatchPolicy/>
</dispatchPolicy>
</policyEntry>

Yes, I know about this and use the default value.But the producer is not fast as far as I know.
This case can reproduce when the borker's keepDurableSubsActive is set to be false.
It seems that the dispatching is blocked by FilePendingMessageCursor.

SuoNayi
added a comment - 01/Apr/10 01:47 Yes, I know about this and use the default value.But the producer is not fast as far as I know.
This case can reproduce when the borker's keepDurableSubsActive is set to be false.
It seems that the dispatching is blocked by FilePendingMessageCursor.

With selectors, if the a matching message does not fit in the maxPageSize for a queue, it will not be consumed till the messages ahead of it are consumed. With lots of sparse selectors this could lead to a hang.
Making the maxPageSize larger than the default 200 will help with that. To make it 1000 for all queue destinations use:

Gary Tully
added a comment - 01/Apr/10 09:32 - edited With selectors, if the a matching message does not fit in the maxPageSize for a queue, it will not be consumed till the messages ahead of it are consumed. With lots of sparse selectors this could lead to a hang.
Making the maxPageSize larger than the default 200 will help with that. To make it 1000 for all queue destinations use:
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue= ">" maxPageSize= "1000" />
</policyEntries>
</policyMap>
</destinationPolicy>
A more detailed explanation can be found http://trenaman.blogspot.com/2009/01/message-selectors-and-activemq.html

Can somebody please comment whether upgrade to 5.3.1 helped in their case? As we are facing the same issue in general and prior deep investigation we plan to do the upgrade from 5.2 to 5.3.1. So a feedback whether (or in how many cases) the upgrade helps will be more than welcome.

Pavel Moravec
added a comment - 12/May/10 11:13 Can somebody please comment whether upgrade to 5.3.1 helped in their case? As we are facing the same issue in general and prior deep investigation we plan to do the upgrade from 5.2 to 5.3.1. So a feedback whether (or in how many cases) the upgrade helps will be more than welcome.

We did indeed have horrible problems where hundreds of thousands of messages would be pumping along all week... and maybe once or twice a week.. a particular queue/topic would lock up. The rest of the queues and topics would be okay... but the one would lock up. We went through about three months trying to work it from our side knowing that 5.1.0 had many suspicious issues. We however are in production and upgrading was forbidden. When we finally did upgrade... the problem which got us for months disappear and hasn't shown back up in say 6 months now. So in our case... 5.1.0 -> 5.3 was a life saver.

Rob
added a comment - 14/May/10 19:37 We did indeed have horrible problems where hundreds of thousands of messages would be pumping along all week... and maybe once or twice a week.. a particular queue/topic would lock up. The rest of the queues and topics would be okay... but the one would lock up. We went through about three months trying to work it from our side knowing that 5.1.0 had many suspicious issues. We however are in production and upgrading was forbidden. When we finally did upgrade... the problem which got us for months disappear and hasn't shown back up in say 6 months now. So in our case... 5.1.0 -> 5.3 was a life saver.

We have similar Problems.
We connect with about 50 consumers to one queue. Few messages get stuck in the queue after a while. I can not see any issue in amq.log.
This messages stay in the queue until restart. Another workaround seems to be moving this messages to another queue and moving back - then this messages will be consumed immediatly.
But the workaround of moving out and in again to this queue lead to negativ "Number Of Pending Messages" - which seems to be another bug.
And we have this problems since AMQ 5.1.0 until now with AMQ 5.5.1

Raphael Olszewski
added a comment - 07/Nov/11 09:47 - edited We have similar Problems.
We connect with about 50 consumers to one queue. Few messages get stuck in the queue after a while. I can not see any issue in amq.log.
This messages stay in the queue until restart. Another workaround seems to be moving this messages to another queue and moving back - then this messages will be consumed immediatly.
But the workaround of moving out and in again to this queue lead to negativ "Number Of Pending Messages" - which seems to be another bug.
And we have this problems since AMQ 5.1.0 until now with AMQ 5.5.1

and also check the client logs for any warn level messages.
Then if you can identify the message id of the stuck message so we can correlate with the additional logging we may be able to determine what is going on.

Gary Tully
added a comment - 16/Nov/11 12:46 can you try and reproduce with additional logging, TRACE level logging for:
org.apache.activemq.broker.TransportConnection
org.apache.activemq.broker.region.cursors
org.apache.activemq.broker.region.Queue
org.apache.activemq.broker.region.PrefetchSubscription
and also check the client logs for any warn level messages.
Then if you can identify the message id of the stuck message so we can correlate with the additional logging we may be able to determine what is going on.

Brecht Yperman
added a comment - 16/Nov/11 16:25 I am out of the office and will have limited access to my email.
I will be back on Nov/21/11.
Please contact Vincent Van der Linden (vincent.vanderlinden@invenso.com) for urgent matters

I'll try, but it will be hard, because it is usually only occuring while working on many mesages. And TRACE-Level is generating really a huge amount of data.
But i'll try to catch one with this TRACE-Level-options you suggestet.

Raphael Olszewski
added a comment - 17/Nov/11 06:42 - edited I'll try, but it will be hard, because it is usually only occuring while working on many mesages. And TRACE-Level is generating really a huge amount of data.
But i'll try to catch one with this TRACE-Level-options you suggestet.

Me too with 5.9.0. I always see several messages(usually less than 5) stuck in the queue while there are active consumers. My consumers are implemented by MessageListener. This is not a problem only caused by multi-consumers/providers. I have seen this on a queue with only one provider and one consumer. For example, my latest case: the queue is enqueued with only 3 messages total and 2 are consumed but one seems left in the queue forever. Sometimes the message can be consumed by restarting the consumer but sometime it doesn't help.

Rural Hunter
added a comment - 02/Jul/14 07:43 Me too with 5.9.0. I always see several messages(usually less than 5) stuck in the queue while there are active consumers. My consumers are implemented by MessageListener. This is not a problem only caused by multi-consumers/providers. I have seen this on a queue with only one provider and one consumer. For example, my latest case: the queue is enqueued with only 3 messages total and 2 are consumed but one seems left in the queue forever. Sometimes the message can be consumed by restarting the consumer but sometime it doesn't help.

Rural Hunter
added a comment - 03/Jul/14 10:48 But if there are new messages enqueued, the old stuck message can be consumed. It seems there is a threshold that only it's reached activemq will dispatch the message.

I suspect that this problem is related to storage limit. If you set storeUsage limit to 5GB, you can easily reproduce this problem producing about 50000 items without consumer.
When the consumer start, it don't receive any message.

Eduardo Teodoro
added a comment - 04/Jul/14 00:31 I suspect that this problem is related to storage limit. If you set storeUsage limit to 5GB, you can easily reproduce this problem producing about 50000 items without consumer.
When the consumer start, it don't receive any message.
<storeUsage>
<storeUsage limit="5 gb"/>
</storeUsage>
<tempUsage>
<tempUsage limit="1 gb"/>
</tempUsage>