The idea is to divide a big problem (of all the functions a robot might have) up in autonomous small problem. So suppose we have a walking robot where each leg has its own MCU managing just that one leg. All these slave controllers are connected onto a bus to receive high-level commands and/or send back events (leg stuck, too much force etc.). The central brain (MCU) - also connected to the bus- only has to deal with these high level commands and events. It orchestrates /coordinates the operations of all these intelligent slave MCU's.

The idea is to use very simple/small MCU for each slave controller and use a multi-tasking MCU as the brain. I haven't decided yet on which exact brands/types/models to use. I assume I will be programming in C++ (or perhaps C#/.NET micro). So some overhead will require a minor increase in available (programming) memory.

My question is, what algorithms or (code) design patterns would be appropriate for this type of design.

This topology has been used a lot in robotics. ROS has significant support for this topology, for example.

In general, where you put your smarts has a lot to do with economics. Building reliable busses to communicate status and commands with low latency adds cost, and support functions for a MCU (power regulation, filtering, etc) also adds cost.

Most systems that go for this approach end up putting one controller per separately-built or separately-sold item. For example, if you build a vision sensor, you end up building image processing into that sensor and selling it all as a single device.

The example of one MCU per leg is a little stretched, because the processing power needed for IK and servo control is trivial even for the cheapest microcontrollers today. Running four legs from a single MCU ends up being cheaper, and often more robust, for current systems. If you want a system that has some degree of autonomy even when the executive functions are disabled ("reflexes") then you'd want an MCU per extremity, though. So, it depends on what requirements you have and what you want to pay for.

I understand the economics and efficiency part of this design. The 'one MCU per leg' was only and example to clarify what I meant (clearly you got it ). At this moment I am only researching the possibilities and options. The decision on how many MCU per 'part' will be made later.

One of the reasons why I am researching this design is to make things modular. Now I know that there are other ways to do that (and it is not the only reason) but those other ways always seem to involve a limited amount of 'connectors' on the main board - only allowing you to connect so many legs/wheels/sensors etc.

The way around that is to use a bus that can address each attached 'device'. Each 'device' board would have the connectors to connect to the next 'device' - so at least 2 connectors.

Another reason is just for the fun of it. I want to explore the implications of using a distributed design (rule 1 of distribution: don't do it ;-) and in this particular post I am asking for the software implications.

What are the commonly used practices for communicating with other MCU's and how do you let multiple MCU's accomplish one task? What can you do when one or more 'node' are offline? That sort of stuff...

This topology has been used a lot in robotics. ROS has significant support for this topology, for example. [..]

I just looked up ROS (Robotic Operating System) and it runs on Linux! That is not quite the level I was aiming for. I was aiming on a more small scale, lower level solution. So not quite as professional probably :-P

You could implement your own non pre-emptive scheduler straight onto the tin. It's just a loop with a clock variable and a set of priorities. When clock % priority = 0, then run the associated task. Keep the tasks small and quick, and it's a very efficient paradigm.

Yep cooperative multi tasking was what I had in mind for the main MCU. You can keep it (overhead for tasking) very lean. Unfortunately I haven't bumped into a good C++ library yet... so I guess I will have to roll my own.

Another option I have considered is using a multi-core processor. I read some on the Propeller Parallax and that looks very promising. That way you can have really separate task that run truly parallel.

Have true multi-threading also introduces some synchronization and locking issues but I was hoping to adopt a message based paradigm and avoid shared state (that should take the pain out of most issues).

I usually let my 'imagination' run wild in the design phase to end up with the 'perfect picture'.But when building time comes I simplify it and only build what I need - but following the design of my perfect picture as much as possible/sensible.

I know that once you start building you will get new ideas - or discard old ideas - things will change. That is evolution and that is part of the fun.

My questions here are more geared toward 'what experience of others who've been there can I use'.

In general, for safety, you want a "dead man's" system. Two nodes communicating will exchange data at a certain rate, and when one end has missed its time slot by more than X time, assume that that node is dead until you find out otherwise.Thus, for example, a node driving a motor controller should stop the motors if it goes for 100 milliseconds without getting any new commands from the higher-order function that decides on desired motor speeds.

If you use a network like CAN or I2C, they already have built-in functions for addressing particular nodes. You just assign node IDs ("front left leg == 5") and go.

Another option I have considered is using a multi-core processor. I read some on the Propeller Parallax and that looks very promising. That way you can have really separate task that run truly parallel.

I was going to suggest the Propeller. The multiple processors are a great fit for lots of robotic projects.

For example, I used the Propeller in my Mecanum wheeled Rover 5. One processor (cog) is dedicated to reading the four quadrature encoders (without causing insane numbers of interrupts(or any for that matter)). Another cog produces the four PWM signals (a single cog can produce many more than just the four). Another cog used used as a UART (a single cog can easily be used to provide four UARTs).

Breaking the tasks down into independent cogs keeps the main loop nice and clean. You don't have to worry about interrupts messing up the timing of other parts of the code.

Post #85 of this thread has a fun (IMO) video of the robot moving around.

I started reading more on the propeller and I (still) have some questions about it. Like how big a program can you load into each cog. From what I understand it runs from (cog) RAM!? I was also wondering if you could leave out the Spin interpreter from ROM and stuff some of your own code down there...? What is a good way to extend the memory? I saw it has I2C pins and saw some 'talk' of an external (E)EPROM.

I started reading more on the propeller and I (still) have some questions about it. Like how big a program can you load into each cog. From what I understand it runs from (cog) RAM!? I was also wondering if you could leave out the Spin interpreter from ROM and stuff some of your own code down there...? What is a good way to extend the memory? I saw it has I2C pins and saw some 'talk' of an external (E)EPROM.

Now I will check out your links. Thanx.

Cog RAM can be limiting issue. Normally only drivers that need to work at high speed are need to completely fit in a cog. For example the SC card driver uses almost all the 2K of cog RAM. These high speed drivers are written in assembly. I haven't had this issue myself but I haven't written any very intense drivers.

I have had problems of running out the 32K of hub RAM but these have always been when using some sort of video driver since video uses RAM as a screen buffer. In these cases I've either broken the program into smaller pieces that can be loaded from a SD card quickly or used a second Propeller as a video slave.

Parts to the program written in Spin can be much longer. Apparently the Spin byte code is pretty compact and you're not limited to 2K of Spin code per code since a cog using Spin uses cog RAM to hold the Spin interpreter, the interpreter then executes the Spin byte code from the larger hub RAM.

There are lots of ways to add memory such as EEPROM, SD cards, flash RAM etc. One of my experiments was to use eight 23K256 SRAM chips as a group and used an 8-bit data bus to communicate with all the chips at once (each chip would store one bit of a byte).

Any external memory added wont be treated the same as hub RAM though, the external memory will need to be accessed through a driver for the particular memory being used. The various cogs wont be able to directly read and write to it like they can hub RAM.

Parallax is working on a C compiler for the Propeller which makes it easier to use large programs. It's in Beta testing now.

I have decided that I will make a cooperative multitasking prototype first to get some real-life experience with the problem-domain. I just ordered a simple robot car platform (DFRobot 4WD+E) - including a motor driver board- I will use for my prototype that will start out as an object avoidance robot.

I will simulate each separate MCU as a software Task that runs in conjunction with other Tasks in the system: motor-control, sensor feedback and brains will probably be the 3 Tasks that will be running.

Just remember that you have lots of program space and very little variable space to play with, so don't waste too many bytes on the scheduler. If you can use a flag in a register byte instead of a whole variable just to store true or false, do so and take the coding hit.

Agreed. My Arduino library is made with that in mind. I try to keep things 'static' and constant as much as possible - so it is stored in prog mem - without sacrificing the OO concept too much. After seeing a lot of dynamic code for things that were not dynamic (like buttons!) in other libs, I decided to write my own.

The Task(.h) macros are recently added and need to be tested and tweaked. It is a clever (to my eyes) trick with a switch statement and one int variable per task. Seems a reasonable price to pay. The 'scheduler' will be the way (flow) you call the tasks in the main loop. I have not tried it yet but it sounds very cool and lightweight.

One thing I may want to add is some sense of time/timing. I want to avoid that all tasks call methods to retrieve the time separately. Not sure yet how to fix that.

I might be wrong, but I don't think static and const store things in progmem. If they do, once you read them they're copied into RAM even if they did. You need to check out the PROGMEM directive and use the associated function to read the values/strings.

A static const int that you do not pass by reference or pointer may be optimized out, if I recall correctly. An easier way of making sure that happens is to use an enum.Note that a static const int may be passed by reference, and you can get a pointer to it; if there's any chance of that happening, then the compiler/linker will have to reserve memory for it.Also, not all types are eliminatable -- floats are not, strings are not, etc. On some architectures ints above a certain size also are not.

Template parameters and enum values do not generate memory storage. However, template parameters may generate a large amount of code, so be careful about that. If you only ever instantiate one of something (say, "the servo driver" or "the display driver" or whatever) then template argument values are totally fine, as you'll only need one copy of the code anyway.

Templates and vtables are totally separate. Any class instance that has "virtual" something somewhere will have a vtable, template or not. Meanwhile, a template is mainly a way of writing code with better type checking than a macro. Templates will have vtables if and only if any component class of the final instantiated class has a virtual anything (virtual base type, or virtual member function.)

The warning I made was not against vtable space, but instead against template argument code bloat.

The compiler now has to emit two copies of the member functions, specialized for Size == 10 and for Size == 20. Once you use a lot of templates, this can generate a lot of code bloat in a hurry, and because the code is actually different between the function versions, Common-data folding of function sections would not help. (Otherwise a popular optimization among linkers to get a handle on bloat of trivial code.)

will all code of a template be duplicated for each unique template parameter? Or just the methods that use the template parameter?

All of them will generally be duplicated, because the mangled name will be different for the two instances.If you have comdat folding capability, then the linker can see that the functions that don't use the template argument are the same, and merge them at linking.To make this happen in GCC, you have to compile with "-ffunction-sections" (which really ought to always be the default!) and then link with --icf using gold, not gld. And gold doesn't support non-ELF outputs, so I don't know if that works for AVR code ...