I could notice that there are a lot of synchronized code calling synchronized code, calling in turn... guess what ?... synchronized code. Since this happens while running concurrent threads, it can easily create deadlocks. After commenting out all synchronized modifiers I could find in the path, it seems that the thing is working.

But it's clear that some synchronization must be put back to secure the situation. My guess is that it should only be put at the lowest level, i.e. only when accessing ibuf in EV3MotorRegulatorKernelModule class, and not in any of the calling methods. I'll try this strategy tomorrow to see if it works as expected.

Hi,I don't think things are as bad as you think. Have you tried simply making the change I described (removing synchronized on the getTachoCount method)? Most of the synchronized tags are required (but this one is not) and have been present for a long time, I added a few new ones as part of the new motor code and this is what is causing the problems. The real issue here is the use of listeners and the fact that they hold locks. I have plans to resolve this in the longer term but for now I'd just like to get things working again.

Have you tried simply making the change I described (removing synchronized on the getTachoCount method)?

Yes I did it as the first attempt, and it didn't fix the problem. I then tried to "unroll" the wire, removing others one by one until it worked.

The real issue here is the use of listeners and the fact that they hold locks.

About the listeners, I noticed too that there are some synchronization "stacks" such as EV3MotorPort.checkComplete being synchronized and calling the listeners with EV3MotorPort.getTachoCount() (which is synchronized too on the same instance) as one of their args.

gloomyandy wrote:Please remove the synchronized from getTachoCount and post a stack trace of the deadlocked code. It's not a good idea to just remove the tags.

I agree with you. I'll do it tonight.

However I have noticed the the version of the sources I use is not the same as the one you pointed me in a previous message. I'm working with 0.6.0 tag, as suggested in the public documentation. As far as I remember (I'm not using my private machine at this time), the version of the sources I'm working with does not use the same type of synchronization (synchronized method versus synchronized statements) as the one you have pointed to. It could be wiser maybe to try modifying first the synchronization type before removing it. What's your opinion ?

gloomyandy wrote:I'm not aware of any significant difference between the master version and 0.6.0-alpha in this respect, both versions use a mixture of synchronization mechanisms.

Sorry, I've been misled by getTachoCount existing in two classes in the same source file.

I've applied the suggested fix and it seems to behave normally now. No more lock for the moment.

Since the robot is navigating now, I found something weird elsewhere, causing the computed heading to be off 90 degrees after a couple of moves. I have to discuss with Porgy and investigate a bit more to clarify what happens.

Some details about the context, so that messages are a bit more understandable : the robot is a coloured object chaser, which detect objects in its path, and depending on their colour bring them to distinct locations. The first part of the action is to travel until either it reaches the limit (300 mm) or finds something in its path.

My understanding of the situation is that there is some ping-pong game between the pilot and the motors when stopping the move, which leads to listeners being called a first time in response to the motors telling the pilot that they are stopped by invoking the its rotationStopped callback from checkComplete, which triggers in turn the listeners because calling movementStop, and after that the pilot calling again movementStop near the end of its stop method.

Thanks for the report. Not really my area of expertise to be honest. But another reason why I really don't like listeners very much. They are very easy to get wrong, and fixes tend to be very ugly because you have to supply all of the context via sort of global state. Anyway enough of my moaning. I'll try and get one of the other devs that has a better understanding of this stuff to take a look.

It happens that I've worked with OSGi for professional activities for several years, and I 100% agree about what the authors says about preferring other patterns such as the white board for instance. The involved project is an embedded open platform for ambient instrumentation of buildings, for research work at the beginning, but which is also used nowadays for operational systems (photo-voltaic fields monitoring for instance). We recently moved from OSGi to another approach for relaxing the "all Java" constraint (we need our platform be implementation language agnostic and allow multi-language implementation of its components, just like in a Linux working environment you can find parts written in C/C++, others in bash, others in Python,...), but we kept the white board principle, and replaced the OSGi integrated communication bus by D-Bus. I'm not a ROS specialist, but I have understood that it works in a similar way.

WRT to the Navigator module, there are BTW a couple of things which surprised me a bit at implementation level, and some parts of the code look a bit convoluted too (from my humble point of view). I rewrote them trying to clarify the logic while working on understanding how it works. I can send you my version if ever you are interested in. Let's be clear : I'm not telling that it was not good work, but just that I would have done things differently. I can be wrong of course.

I agree with Andy; I do not think the Event Generator - Listener pattern is a good one to use. I think we could eliminate all the listeners and perhaps it would be thread safe. I have looked at the existing code a bit, and I think the major change would be in the PoseProvider. Ir could have a method void UpdatePose(Pose aPose, Move aMove) As small modification of the Navigator would be necessary also. I have not tried anything out, and would to look at any alternative Navigator designs.But before spending much time, there should be some agreement about this change to the Navigation classes. This is a good time to do it, and the change would not be such a big departure from NXT as the SeensorFramework.Roger

As said above, I agree with both of you wrt listeners. They are ok for simple collaboration schemes but as soon as things get more complicated, they are a source of brain damages, especially when cascades or loops of listeners are created, most of the time by side effect.

Roger, which kind of architecture would you favor so that the notification connections are not hardwired ? I' mentioned stuff like the white board pattern since IMHO, although it uses a form of listener inside, it cures the listeners spaghetti plate syndrome by being more or less based on a hub communication topology.

Hi Eric,I recommend a very simple structure for navigation object collaboration: There should be only one object that controls the drive motors ( a Pilot) and only one object that controls the Pilot, a Navigator for example. The Navigator is responsible for updating its pose, and uses a PoseProvider to do the calculations. The PoseProvider can be greatly simplified; it needs only the method: void UpdatePose (Pose p, Move m). Pilot already has a method Move getMovement(); Whenever the Navigator needs an updated pose, it calls poseProvider. update (myPose, myPilot.getMove). But it must update its pose whenever the Pilot completes a movement so it contains an inner class that tests pilot.isMoving() in its main loop and updates the pose every time the a movement is completed. With this restricted object collaboration, there is no need for listeners. Of course, it is not the most general structure imaginable, but it seems to me that requiring a single object to control the pilot and that is the single object to control the motors - constitutes about the simplest scheme to do the job.Roger