M4-M0 core synchronisation / lock mechanism using mutex or semaphore

Content originally posted in LPCWare by wlamers on Fri Jul 18 01:06:35 MST 2014I have an application in which both the M4 and M0 core (of an LPC4357) are accessing a (kind of) ring buffer. The buffer is used to store and retrieve data using a pointer mechanism. This requires a sort of lock mechanism preventing simultaneous access of both cores to the pointers/variables. As a side note: there are also some interrupts (mainly on the M4 core) that also access the same buffer pointers/variables.

The latter situation (interrupts) can de dealt with by disabling the interrupts while entering a critical section (section that access the buffer pointers/variables). But the M4-M0 synchronisation is more difficult. I do not want to use the SEV instruction to signal an interrupt to the other core to handle the lock mechanism, mainly due to performance reasons.

At the moment if have implement a simple lock mechanism that reads a lock variable in shared memory, accessible by both cores. The problem is that the 'check and set' of the lock variable isn't an atomic operation (it requires an 'if' statement and an assignment). This could possible break the lock mechanism causing unpredictable behaviour.

Therefore I need to implement a classic mutex lock mechanism where the test and set of the lock variable (mutex) is atomic. For the M4 this should be possible using the LDREX and STREX (load-exclusive, store-exclusive) instructions. Also ARM recommends using DMB to set a memory barrier. But oddly enough the M0 does NOT have these instructions. Rendering it impossible to use a mutex or other lock mechanism between the cores. Obviously NXP did know this during the design op the 43xx family and I assume they have come up with a solution. Although I cannot seem to find one. The IPC section of the manual and the application note is of no help, nor do I find much information on the internet.

So to generalize my question: who knows a good way to design an 'atomic' lock mechanism between the two cores (which can also be used in the interrupt service routines)?

Content originally posted in LPCWare by wlamers on Fri Jul 18 02:35:29 MST 2014Yes I have, but the suggested method relies on interrupts and or 'messaging' the other core. Both I want to prevent due to performance reasons. Or are you implicating something else (I maybe have overlooked) here?

Quote: wlamersThe IPC section of the manual and the application note is of no help, nor do I find much information on the internet.

Content originally posted in LPCWare by wlamers on Fri Jul 18 06:49:51 MST 2014Quote: JohnRCould you explain why you felt this was so.

I am using SEV and interrupts on a M4/M0/M0 system with the LPC4370. The data to be transferred between cores are placed in shared memory. So far the system seems to work without problems and seems a lot easier than the IPC queues suggested in the UM10503 manual.

Well this is exactly what I am doing to send commands and messages between the cores. I defined two section in each project (one @0x20008000 0x200 long, and the second @ 0x20008200 also 0x200 long). This works indeed really well for this purpose.

But in my case I also let the M4 and M0 simultaneously 'work' on a ring buffer. This requires that both the M0 and M4 check and set a lock variable such that they cannot mess things up. I could use the same memory regions of above and send and interrupt to handle the locking, but this takes al lot of overhead (set the message, signal the other core, enter the IRQ, read/set lock, exit irq, continue). Since this is a very time critical application I would be better of by just let one core wait until the resource (buffer) lock is released after which the core can start immediately processing. This saves at least 20-50 instructions.

Quote: Since the ARM Cortex-M4 and ARM Cortex-M0 cannot at the same time write to the same location, there is no need for a synchronization object (e.g. a semaphore) in this IPC.

I am aware of this but unfortunately does not solve the classis 'atomic mutex' problem. For example:

- M4 tests the lock (if (locked)) and it appears to be unlocked so it can continue- M0 also tests the lock at the same instance in time or one or two clocks later and and it appears to be unlocked also (the M4 has not set the lock yet)- M4 sets the lock (locked == true) and continues- M0 also sets the lock (which was already set by the M4) and continues- CONFLICT, since both change the buffer pointers etc!

You see, I need an atomic 'test and set' instruction for both the M0 and M4.

This is nothing new, since all multi-core/multi-thread applications and OS's have this problem and require an atomic test and set instruction.

Well the M4 has this, but the M0 not. But maybe there is a clever way around this?

Hmmm, well writing this up I was thinking the following. If I just use two additional lock variables in shared memory. And one is assigned to the M0 and one to the M4, meaning that only the M0 has write access to the first an only the M4 had write access to the second. I that way I can create a mechanism preventing testing and setting the value while the other core is busy doing that also. In code it would look like this:

// M0while (M0_testLock == true) {} // wait until M4 releases test and set lock which cannot take more than a few clocksM4_testLock = true; // set the set and test lock for the M4 since we are going to do a test and setif (locked == false) // Test for locklocked = true; // Set lockelse dummy;M4_testLock = false; // Release the test and set lock

// M4 (vice versa)while (M4_testLock == true) {} // wait until M0 releases test and set lock which cannot take more than a few clocksM0_testLock = true; // set the set and test lock for the M0 since we are going to do a test and setif (locked == false) // Test for locklocked = true; // Set lockelse dummy;M0_testLock = false; // Release the test and set lock

Could this work or am I overlooking something? Or who knows a better way?

Content originally posted in LPCWare by wmues on Sun Jul 20 13:01:41 MST 2014A ringbuffer with a read index and a write index can be used by 2 processes without any other synchronisation, if the accesses to the index registers are non-interruptible.

Content originally posted in LPCWare by rocketdawg on Tue Jul 22 10:46:31 MST 2014Quote: wmuesA ringbuffer with a read index and a write index can be used by 2 processes without any other synchronisation, if the accesses to the index registers are non-interruptible.

regardsWolfgang

I was thinking the same thinggoogle lockless algorithmsI would think that one core is the Consumer and the other the Provider.But lockless algorithms can be complex or simple, but often, hard as ever to debug.

the Peterson algorithm does have some execution overhead. I may very well shorter than an IRS context save/restore.I always worry about the M0 core since it is a Von Newman core, thus reads and write to RAM are slower than the Harvard Arch M4.and the CPU burst size is 1. Bus masters share memory in a round robin scheme. So what does that mean?and I certainly do not like the "spin in place" if one were to use this method from within a ISR. Could get endless loop.

Content originally posted in LPCWare by wlamers on Wed Jul 23 03:08:31 MST 2014Thanks for that info.

Unfortunately my buffer is a bit more complex than a simple ring buffer. In fact it is a buffer that is able to keep blocks in contiguous memory. Therefore it needs some helper functions to determine where to put data and where to get data using buffer pointers (maintained in a struct). Therefore the other core may never access these pointers (e.g. updating/writing them) at the same time the first core does. Here is where the locking is necessary. I implemented the Peterson's algorithm (including some memory barriers) and it seems to work fast enough. I did not run into trouble yet. But to make sure this will never happen in practise I need a way to check this.

You mention to use unit testing, but I do not have experience with multiple thread unit testing. How could I write test code that does this? Do I need some sort of random generator that 'fires' tests in an unpredictable timely manner? Could you give me an example?

Quote: I do not want to use the SEV instruction to signal an interrupt to the other core to handle the lock mechanism, mainly due to performance reasons.

Could you explain why you felt this was so.

I am using SEV and interrupts on a M4/M0/M0 system with the LPC4370. The data to be transferred between cores are placed in shared memory. So far the system seems to work without problems and seems a lot easier than the IPC queues suggested in the UM10503 manual.

From the manualQuote: A CPU cores raises an interrupt to the other CPU core or cores using the TXEVinstruction

Quote: Since the ARM Cortex-M4 and ARM Cortex-M0 cannot at the same time write to the samelocation, there is no need for a synchronization object (e.g. a semaphore) in this IPC.Quote:

One awkwardness is that only one TXEV is issued by M4 and wakes up both M0Sub and M0App. I use a global flag to differentiate between the two cases but still both interrupts respond and then have to either execute some code or simply return if not flagged.

It would have been better, I think, if instead of M0Sub and M0App having the same interrupt numbers (INT #1), separate numbers had been assigned.