What is a 'multi-tasking' app? We could answer that by saying 'an application that does many things at once'. This is a common requirement for embedded systems.For a small embedded application like we'd write for the AVR, we would probably like to do all or some of the following tasks:

1. display via leds or a lcd
2. get inputs from sensors
3. perform some logic or control
4. output to relays or some control device
5. Communicate with a PC, maybe via a multi-drop network.
there's a few more we could think of.

So how can we get the processor to do all these things at once? The answer is: we can't. The processor only executes one instruction at a time but the saving grace is that is does it quite quickly. So fast in fact that we can appear to be doing many things at once, but in reality we're executing a number of tasks, but splitting them up into little chunks and running them one at a time.Luckily for us, most things in the real world are quite slow compared to the speed of the processor. Some examples are:

1. Relays - being a mechanical device these can take many 10's of milliseconds to release or operate.
2. Character LCDs - depending on temperature these can take around 100milliseconds to change. Also the human watching it is none too fast either, so updating 10 times a second is usually fast for a human to actually read and follow.
3. Mechanical switches - like relays these can be slow and have a side effect called 'contact bounce' - this causes the switch to open/close rapidly over a number of milliseconds before settling to a known state. Also, if operated by a human, the speed is measured in 10's of milliseconds.
4. A machine operating at 12000rpm - each revolution takes 5 milliseconds.

For an AVR operating at a conservative speed of 8MHz, each instruction takes around 125ns which is about 8million per second. With the above examples where we measure the speed in milliseconds, you can see the AVR can execute around 8000 instructions every millisecond - this allows us to do a bit of work.

How do we go about creating a multitasking application?
First up we need to think about how we can split all the operation we want our application to do into 'tasks'. Most likely these tasks we need to talk to each other so we need to think about how we can keep the lines of communication between these tasks as simple as possible. This has the computer science term of 'coupling'. Tightly coupled means each task relies heavily on each other to operate. Conversely we have 'loosely coupled' which means each task has little reliance on other tasks. To make life easier for ourselves we want to lean towards the loose end of the scale. For an example, I'll detail an industrial controller - the sort of stuff I do everyday. The tasks might be something like this:

1. analog input task. reads the analog inputs and performs so math to scale the inputs to usuable units. Maybe a temperature sensor to read in Celcius.
2. logic task. Maybe we want some alarms if the temperature gets too high so we'll do some comparisons with the temperature value against some alarm settings and flash a light and honk a horn if the temperature gets too high.
3. bus communications. A PC might request information from us as well as set alarm levels etc. We might be one of a number of devices.
4. User interface. We might want to display the temperature and alarm status on a LCD. We might also have pushbuttons that allow us to interact with our device.
5. Setup menu. This is where the user can set various parameters for our application.

This gives us five tasks, not too many to deal with. Now we have to think about how we want these tasks to run and how often. We can also think about the priorities of these tasks - what task is more important or time critical? Some tasks might only run on demand - maybe in response an external event, some might run regularly - for instance the user interface and some tasks might run at the request of another task. Since I want to cover a straightforward system, I won't delve into tasks with variable priority etc and complicate this tutorial. I'll leave this for the reader to investigate further.

Lets assign some numbers to our tasks:

1.analog input task. Since we're measuring temperature which moves quite slowly, we'll sample this regularly at 10 times per second.
2. logic task. Since we'll be flashing lights and operating relays, 10 times a second would be a nice number.
3.bus communications. Since we'll be a slave device to a PC master, we're at the beck and call to him. We only need to speak when spoken to and to keep this fast, we want to respond quickly.
4. User interface. 10 times a second would be a good number here also. Because the human operator won't always be looking, we can have a fairly low priority since the user won't notive a slippage in the display update here or there.
5. Setup menu. Since we have only one display, either the user interface is running or the setup menu is running. So we activate on demand and run 10 times a second.

So we end up with three tasks that run regularly and one that runs on demand. If we were to choose a nice unit of time to dole out to the tasks that was neither too fast or too slow, we'd probably come up with a time chunk of 10ms - this gives us 100 time chunks per second, thus 100 tasks per second. To do a quick check of performance, we have four tasks to run every 100ms, thus we have 6 time chunks free every 100ms. Room to add more tasks or to cope with faster communications. The absolute worst case would be having communications every 10ms - there would be no time for the other tasks to run, so this is something we need to avoid. Next comes priority:

Pre-emptive is exactly what is says: at any point in your task, you can be suspended and another task run. This involves saving and restoring the 'processor state' for each of the tasks as well as the task that manages this. The down side of this technique is that it consumes a lot of memory - each task has its own stack. For a memory constrained processor like the AVR (especially the smaller ones) you want to be as frugal as possible with memory. The other downside is that you need to provide mechanisms to share variables between tasks and that adds complexity and code size. However, this technique is very common in many operating systems like Windows and Linux.

Co-operative, again, the name says it all. Tasks must 'co-operate' with each other in order for the system to run. Each task is alloted an amount of time to do its work and return. Upside is this technique is great for small systems and avoids many potential issues with sharing of variables between tasks. Downside is that your code must be written in a way that you do the job and finish within a given time. Also, co-operative doesn't scale too well, but for our app of just five tasks, it fits well. I'll be discussing a framework for a simple co-operative task switcher that I've used on many projects and that can be used to solve a number of common embedded problems.

Co-operative Task Switcher.

First up we need to keep track of time - in our case 10ms chunks. To do this we will use the AVR timer in CTC mode to interrupt every 10ms and set a timer tick flag to say that 10ms has elapsed. Virtually all our timing will be based on this.

Next we need to figure out a way to dole this time out to the tasks and to decide what task to run next. For this we have a flag for each task, if this flag is a '1', the task wants to run. To sort out who should run first if there are more than one flag set, then we have priority. We'll set a fixed priority to makes things simple. Therefore we search the flags in order f highest priority to lowest to find the first flag set. If no flags are set, we just wait for another time tick. We'll also arrange the task flags in one byte - this gives us 8 tasks.

That's our basic framework. Simple eh? But it doesn't do too much for us. We need to add some timers so we can run tasks at regular intervals. If we give each task a timer that when it expires, it will set its task flag so that the task is flagged for execution. This gives us two features: to run tasks at regular intervals and to delay the running of a task. So task1 might want task2 to run in 1 seconds time. To do this we add some extra code to take care of the timers right between items 1 & 2 of the pseudo code above - just right after we've got a timer_tick.

1. for timers=0 to 7 do
2. if timer[timers] >0, decrement timer[timers]. if timer[timers] = 0 then set_task timers.
3. next timers

We implemented downcounters as its easier to test for zero or non zero vs testing for a given number. Thus if we want a task to run in 1 second, we load its timer with the value 100 (times 10ms = 1 second).
In summary, each task has a task bit and a timer associated with it. To set a task bit, we have a function called set_task(task), to clear a task bit clear_task(task) and to load a timer we access the timer array directly, task_timers[task] = time.

Cliff - fully aware of this, MISRA tells me I shouldn't use function pointers! In this instance it would be in flash so the potential danger wouldn't be there. The code started out in assembler, so the C is a fairly literal adaption of it.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here.

No guarantees, but if we don't report problems they won't get much of a chance to be fixed! Details/discussions at link given just above.

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

Your simple system of multitasking would be ideal for controlling such a robot, to read information form IR distance sensors, to execute controlling commands, etc..
So far there is a Attiny2313 microcontroller ( control through radio module over usart ), but i'm going to use atmega128, because i need two usart's ( one for controlling the robot and the other to receive data from a special sensor and send this data through the other usart module to the PC ). It doesn't matter what the sensor is, but it gives 15 bytes with frequency of 20Hz.

So the idea is to receive this 15 bytes every 50ms (1/20Hz) with one usart and then send them through the other to the PC (this other usart also is used to control the robot ).

And here i have got couple of questions. Both usart's are 9600 baud/s, 8 data bits, no parity and one stop bit.

1) If there is 15 bytes, i have got 120 bits of data. Adding all start and stop bits it gives me 120 + 15*2 = 150 bits. So the time to send this over usart would be: 150/9600=0.015625s=16ms. So far everything is correct ?

2)When i start receiving stuff with interrupts and sending i can't do anything else. Would it be a good idea to use interrupts ? In one usart RX interrupt send the one byte immediately to the other usart ? Or to put it in some kind of a buffer and send the data through the other usart in some other part of embedded system ?

3) Is it possible to do that without interrupts ? When i receive data in usart i have to check the exact time the UDR, what if i don't do that. The data in UDR is lost when usart receives next byte ?

4) your proposition of doing that :P
Besides transfering this data robot would have to do:
-read adc signal from sensor (still don't know how often)
-read adc signal from powering system ( every couple of seconds - not time critical )
-read controlling commands and execute specifed action

Paul, I would probably buffer the data from both the usarts in send and receive. This is a more general approach. Sometimes, if you're pushed for ram, you need to do things like sending data out one usart when receiving from another as you suggest. As for not being able to do other things whilst receiving data, this should not be the case. For the receive interrupt, when fired it gets the data from the UDR and pops it into a circular buffer and exits or if you have a particular protocol, you might have a little state machine that checks for the start token, end token etc. All this takes very little time. So the impact on your other processing is probably only 1-5% at most. If using a circular buffer, you can have a task that regularly checks for incoming data and process it or if you decode the protocol, the isr sets a flag when it has a receive packet and a task can check for this.

Hi Kartman, fantastic tutorial;
few questions: I am doing some RX related stuff (on ATmega128, XTAL freq: 14.7456 MHz) which receives frames (either only commands or data along with commands) at 115200 bps. For some reason I don't have my hardware ready yet. The UDR is read in RX ISR, the received data is then parsed in a function which runs from a non-preemptive multitasking scheduler which is running at every 1ms tick with 8 tasks.
In receive ISR I have used a ring buffer into which the data is pushed at every interrupt, this data is then popped into another byte array where it looks for start/stop, cmd, message size bytes etc which is part of scheduled function. on getting particular cmd some functionality needs to be executed. however from intuition it seems that all this activity may not be completed before timer tick (1ms).
Can you suggest me optimized method of doing this?
Shall I use state machine for frame parsing directly in ISR, OR its not recommended.
How to efficiently use ring buffers along with RX interrupts when message parsing needs to be carried out?

If you are receiving packetised data, then it makes sense to decode the packet in the isr (assuming there is not a time consuming function such as computing a CRC). For example: NMEA data as output from GPS receivers. The isr can have a state machine to search for the start token, accumulate a packet and calc the checksum then when it gets the end token, check the checksum. You would then copy the packet to another buffer and set a flag that would be tested and a task activated if it was set. Depending on the priority of that task it may or may not get executed next.

If using a circular buffer and NMEA packets, again have a state machine to decode the packets but put a limit on the amount of chars you extract in one go and if the buffer is empty. You would then defer the task for a time period and attempt to get the rest of the packet.

In part two, I use the example of Modbus and the packet is assembled by the receive isr and sets a flag like described above. Modbus RTU uses a CRC for packet checking, so this is done by the processing function, not in the rx isr where it might consume too much time. These decisions need to be made in view of the whole system - interrupt latency being a factor.

Quote:

do explain some loop timeout mechanisms in blocking functions (in part 2)

You don't have blocking functions! If a function blocks, then the co-operative scheme fails. The whole basis is that you design your system not to block - finite state machines is one technique to achieve this.

However blocking function I mean to say, e.g consider a scheduled task such as scanning an ADC input after every 10 ms. there we have to wait for conversion to get complete

while(ADCSRA &(1<<ADSC));

how to tackle such conditions in scheduler as these statements might hang the system for indefinate time and other task may not get execute.
In such cases can we give fixed delay for conversion to happen e.g say 2 us delay before reading ADCR OR some kind of timer timeout ANDed with the condition?

This is because inspite of having interrupts for every activity all of them cannot be used as unnecessary nesting of interrupt is not recommended (read somewhere)

If for some reason the ADC failed and caused an infinite loop then that is a failure. In the dispatch loop we would kick the watchdog so normal operation would keep the 'dog happy. In the instance you describe, the task would stall and the watchdog would kick in and reset the microcontroller.

In normal operation, the ADC has a known conversion time and therefore if this is less than the alotted task time then you are in the clear. If you need better performance where you might need to sample the ADC faster, then you would need to use interrupts. It is not necessary to nest interrupts unless you have tight real time constraints. Having multiple interrupt sources it not necessarily bad, but you need to ensure that the worst case latency doesn't exceed your requirements and that you don't have too much of the processor time tied up in servicing interrupts.

In my next installment, I will show an example where the ADC is run in a task. This is a low performance application where the ADC inputs are read, made available via Modbus and can be displayed on a LCD.

Every application is different, so you need to evaluate requirements on a case by case basis. One size does not fit all!

I didn't do any spell-checking on it... I just used copy-and-paste to put your text into Open Office Writer. :)

I'm more of a hardware guy than a software/firmware type, so I have a LOT of figuring-out to do. I doubt that I'll ever want to write an RTOS on my own, but it's a good introduction to how such a system works.

When using <= it will do one final go on the loop when your index will become 8 bcz less than or equal too. And will result in index being out of bounds. So you will corrupt the memory outside the array. If nobody is using it, no problems, but...it should be avoided because it can cause bugs.

Since I am a PC programmer mainly, this would lead to application crash at that instant or at some time which would be hard to track when & reason why.

Now....

I woke up this morning to search this forum on how to do similar things and I am so glad to have found your tutorial!

I need to control 16 DI, 16 DO lines, CAN send/receive and Ethernet send/receive on one AVR device (thinking either mega128 or go with xmega). Something like an controller, CAN for motor drives and Ethernet/TCP/IP for communicating with a PC...where PC side will send status update request every 300ms and/or some command to do something which would cause change in DIO or CAN message being sent out....also for status update, I would regularly have to get data from CAN and put it in some kind of array for sending back to PC (health of CAN devices, data from them, etc).

I was thinking of using A90CAN128 device with Wiznet 5100 chip which implements TCP/IP stack and all other goodies or go with mega128 / xmega128 with MCP2515 CAN controller + same which for Ethernet. In all cases, my memory reqs are not that high, and mega can do it.

I didn't want to go for RTOS for just those 4 simple tasks.

Only thing I have to think more is to have variable DIO task sleep time, which I can implement as a "state" machine so to set a bit at time N depending on state I am in; Bcz to do one fully with DIO I would need DO 1 and DO high until DI 2 is high or 2000msec expires in which case flag an error.

I cannot begin to tell you how good I thought this article was. It's clear and incisive and explains this subject far better than anything I've ever read before. Just cannot wait for the next articles you write here.

In total awe.

SC

EDIT : Its great to see how people with rich experience like you share it, so that the beginners do not have to reinvent the wheel.

Why memorize anything which you can easily pen down on a piece of paper or get from a book in less than two minutes? - Albert Einstein on being asked why cant he remember his phone number!!!

I'm a seat of the pants hack, but I did know a bit about Windows and it's non-multitasking environment back 20 years ago.

I just wrote a simple program to control a machine, but I need to dump some of the while and delay stuff. Should I be looking at your code, or just use timers to turn 1 things off, while another is in a while loop?

Should I be looking at your code, or just use timers to turn 1 things off, while another is in a while loop?

Often an MCU app needs nothing more than a central non-interrupt loop doing the "slow" stuff and a handful of ISRs (possibly just timer ISRs) doing the small amount of things that need to happen more regularly/deterministically. It kind of depends how many "concurrent" tasks you have going on as to whether you finally reach a point where more formal "multi-tasking" becomes warranted.

To get rid of the while and delay stuff, it is common to build sequential systems with a 'finite state machine'. This has been discussed in some posts recently.http://www.avrfreaks.net/index.p...

If you're controlling things like relays, a time tick of 100ms is reasonable. In my framework, you would have a task executing every 100ms and running the state machine. You could have multiple tasks and multiple state machines running concurrently seemingly all running at the same time, but being executed in sequence.

So Does it mean we can't get interrupts in
a Co-operative Task Switcher? since you've mention that each task must run and finish at a definite amount of time?

Just think of a co-op tasker as one big while(1) main() loop and in that sense it's no different to any other kind of program and there are no limits about also running ISRs alongside.

The usual rules about trying to keep ISRs as short as possible apply of course. If an ISR were to take 100ms for example then that would stop all the co-operating foreground tasks for all that time. So, as always, make it just a few time sensitive instructions and put any "long work" in another foreground task that usually does nothing but is triggered into action by some flag set in the ISR. The usual rules for foreground task timing apply to it too - so if it will take more than the usual task time limit then break its work up into a multi-stage state machine too.

Since task_timers is unsigned it can never be < 0, so the test is good as Kartman gave it.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here.

No guarantees, but if we don't report problems they won't get much of a chance to be fixed! Details/discussions at link given just above.

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

for unsigned does just the same! Remember that as long as something is != 0 it is true.

So, technically it will do just as fine as testing for > 0.

Whether it would be clearer for the reader of the code to use > 0 is probably a matter of opinion.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here.

No guarantees, but if we don't report problems they won't get much of a chance to be fixed! Details/discussions at link given just above.

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

has 8 elements starting from 0 which makes the last element be the 7th?

thus from while (task <= NUM_TASKS ) this allows the 8th element aswell no? which might probably lead to access of the memory outside the array boundary when compiler gets to this if statement, as the last element is the 7th

Kartman said that for example task 1 might need task 2 to take (run) 1sec!
So i understood that creating task_timers[NUM_TASKS]={0,0,0,0,0,0,0,0}; gives us extra opportunities to specify the time we want that task to run plus to keep in track if its flag is ON.
I was concerned about initializing the task to a certain amount of time! like some task run in 250 ms? won't this affect/delay other tasks as we had 10 ms for each task to run?
If i understood right we might want some tasks to take longer than others as long as the finish and other can also follow?
(which is the matter of the facts Kartman said that the worst case scenario would be having the task which will need to be done on every single 10 ms basis)
But then if the task could take longer than others as long as it finishes and allow other also to run would be fine?

(just like Cliff suggested like in the while (1) loop, the most important thing is that every single task will be done although some may take longer than others) right?

Does anyone knows What adding a RTOS in this Cooperative Multitasking would benefit?

Your question makes little sense - are you saying you would run a secondary co-operative tasking system within a single thread of a pre-emptive, time-slicing tasking system? To what gain? If you have a pre-emptive scheduler then why not make all tasks separate - that's kind of the point of pre-emption isn't it?

Thank you for your comment Cliff, apologize me if you think i am asking a silly question, but is any "time slicing operation also a RTOS" ?
I think yes! since I can myself define a RTOS as an operating system which has a consistent timing and high degree of task priority.

so it seems like my original question was whether a RTOS is a software or hardware or if we can we it in have both parties,

It's an "OS" but whether it is "T" is another matter. For example Linux is a time slicing OS but even when you apply the real-time patches to the kernel it's still very difficult to guarantee true real time operation. The latency on interrupt handling can be quite a problem!

Quote:

so it seems like my original question was whether a RTOS is a software or hardware or if we can we it in have both parties,

There is no question it's simply a piece of software however it would be difficult to envisage a time slicing OS that does not use some form of timer. In Linux and Windos this means a 10ms or 1ms timer that switches between anything up to about 10,000 threads of execution at any one time.

This is supposed to be a question, but I am not really sure how to ask it, but here goes:

In the code example, task6 and task7 will end up running in roud-robin style after the first execution of each task, because of the higher priority of task6, right? But sometimes you would want two tasks running syncronously, not one time slot apart.
So what would the solution be then? I guess there is only one, do a task with two sub-tasks in one time slot?

Say you wanted task 6 & 7 to run every second but half a second apart. Task6 would load the task timer for task7 with half a second. Similarly, task7 would load the timer for task6 with half a second. These two tasks would happily ping pong.

Hehe :) I actually meant the other way, two tasks beeing executed in the SAME time slot.

What it all comes down to is probably that I don't see how the prioritization actually is thought to work.

I see the tasks running in round-robin style after all tasks have been executed once and everything has settled. With this the highest priority task will simply be the first one in the "round-robin schedule".

Ok, last thought/question, why would one want to use prioritization? I obviously fail, big time, to see the reason :-(

Windows (these days) does it for example. Things like the mouse and keyboard service have a higher priority than when, for example, Excel is recalculating a spreadsheet. In days gone by you spent a lot of time looking at an "hour glass" mouse cursor (often that would not move) while Excel calculated.

Even when you aren't using an OS you do the same in the design of your main/interrupt mixed programs where you do things like updating LCDs in the "spare time" while things like reading the spindle rotation or operating the control valves is done as a matter of urgency.

At first I wanna thank the author for writing this great tutorial!
I have been searching a long time for a easy to use and understand "RTOS" or something that could handle tasks. After I never got DuinOS running and FreeRTOS and others were just too complicated to get it running with the arduino IDE I thought I would never find something, untill I found this tutorial.
So I just wanted to say that it works perfect with the arduino IDE, just copy-> paste and rename the "main" into "loop" and you can have fun with the arduino libraries and create tasks.

He is not the only one Mate.
I myself really have found this tutorial extremely useful I dont know if it was because i am actually new into RTOS & Multitasking, But I very much appreciated all the effort you put into this tutorial man. except that i forgot to say Thank you :wink:
I really wanted to thank you Kartman, Cliff, Johan, and many other moderators here
at avrfreaks who really put a lot of effort into helping us.

Obviously, this post is old so my thanks on your wonderful tutorial are a bit behind the times, but I figured I'd put them here anyways. I've not yet had time to read (and digest) Dean's timer tutorial fully, so the values you use to setup the timer interrupt are kind of a mystery. Perhaps you might consider commenting those, or lines like it, in future tutorials for the slowpokes in the pack like me :).

Are the comments not enough then? That's almost as simple a timer setup as you can get and the key comment is "// CTC mode". A quick look at any AVR datasheet will show you that is the mode where the timer counts up to the value in a compare register (OCR0A in this case) and then resets to 0 (possibly generating an interrupt as it does so). So this timer is going to count 0x00, 0x01, 0x02... 0x93, 0x94, 0x95 (interrupt), 0x00, 0x01, 0x02..

As the count of 0x00 is included in that it's really counting 150 (0x95+1) timer "ticks" between overflows.

Also the "// start timer" comment shows 0b101 being loaded in to the B register. That is going to be /1024 so the timer will be running at 1/1024th of the CPU speed. It will therefore count 150*1024 = 153,600 CPU cycles between compare match interrupts. The comment at the top of the code say 16MHz so it executes 16,000,000 in 1 second therefore the interrupts will be 153600/16000000 apart that is 0.96ms - so close to 1ms.

After finally getting through the Timers document in this forum they all make perfect sense. I think the names of the registers and such were more confusing than anything, but I see now that the comments are appropriate if you reference any AVR datasheet you have handy.

2) When you're writing "set_task(6)", it's only writing 64 to task_bits, right? The "task6()" function will actually start (I mean, the timer for task 6 and its LED blinking) in the "task_dispatch()" function, right? The reason I'm asking this is this portion of code:

set_task(7); //task7 runs
set_task(6); //task6 runs

What I've understood is happening here is "task_bits" is assigned 128 first, then without actually starting "task7()" (I mean, the timer for task 7 and its LED blinking), "task_bits" is assigned 64. And then the main loop starts and "task_dispatch()" is called which starts "task6()". And I didn't find "task_bits" being assigned 128 afterwards which would start "task7()" in ""task_dispatch()".

It's probably my mistake in understanding. But I'll appreciate your or anyone else's help very much in helping me with this.

If you read/search through the whole thread you will find that there has been two such remarks prior to yours. (And unless I am missing something, you are correct.)

Quote:

What I've understood is happening here is "task_bits" is assigned 128 first, then without actually starting "task7()" (I mean, the timer for task 7 and its LED blinking), "task_bits" is assigned 64.

No. The set_task function uses the |= operator, not the = operator. Thus, first call ( set_task(7) ) sets bit 7 (the task_bits gets the value 128), Next call ( set_task(6) ) sets bit 6 without setting or clearing any other bits, so now the task_bits has the value 192.

If you are unfamiliar with bit manipulating operators then there is an excellent tutorial here at AVRfreaks. Mandatory read.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here.

No guarantees, but if we don't report problems they won't get much of a chance to be fixed! Details/discussions at link given just above.

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

That portion of code is not ideal, but the later switch() relies on the value 8 being 'no task'. Therefore even though i go over the end of the bit_mask table, nothing comes of it. In retrospect i should've coded things clearer. Pascal would've given a runtime error! The challenge is to rewrite that section of code so it doesn't overflow the table and doesn't rely on a side-effect. The change is not simply using < as opposed to <=

Cheers Larry. That code had remained unchanged for 15 years or so. My new attempt just felt 'clunky' as there was no looping but it does seem to be much simpler and more obvious. I was modifying some code at the office today and thought I'd try it. Imagecraft reported around 20 bytes less code for the new method and one less byte of ram. So win-win. I wasn't motivated enough to measure the execution time, but I'd expect it is probably a few cycles less worst case especially since the compiler should be able to cache the task_bits variable in a register. I have had part two written for the past two years but never got around to proof reading it to make it suitable for publication. Not enough time since children and home construction projects eat up my time.

If I understand your code correctly, when two task timers run to zero at the same time, only the task with the lowest number will be executed in the current time slice. The other task will be executed 10 ms later. This would also imply that the task timer of this second task will be re-initialized 10 ms later. It will never make up for the lost time will it? So basically only task0 would be really deterministic...

RedDuck, you understand correctly. In many cases hard real time is not necessary. If you're just controlling relays and a lcd, such anomalies are not apparent or cause a problem. Obviously, if you really needed hard timing, then you would have that done via an interrupt or hardware (timer etc).

There's little doubt that (1<<task) is probably more obvious. As for efficiency, I agree with Cliff. Nevertheless, the net result is not going to be much of an efficiency gain methinks.
The source of that code goes back to the early days of an IAR 8051 compiler where I would look carefully at the generated code to coax the compiler into doing its best. If I was doing a code review and came across this, the questions I would ask:
1. does it work? No potential side effects?
2. is the intention reasonably obvious?
3. what would be gained by changing it?
4. Is the use consistant?
5. Is it MISRA compliant?

Granted it's a lot more verbose, but again, the pgm_read_byte() is a avr-gcc - ism. given another compiler, it would probably read a lot simpler. Does less source == less object?? (that was a rhetorical question). Less chance for typing errors though!

I have a problem in task_timers array in which it does not get set in tasks functions. On line 129 of the log file, the tasktimers array stops getting value in the tasks function. What could be the problem?
Here is the code.

You've added more tasks but task_bits is only a char which has 8 bits - make task_bits an unsigned int. 10 does not go into 8. I think you have too many tasks. I would suggest you combine some of them into one task.for example, most of my stuff is industrial control. I have one task for modbus communications, one task for control logic, one task for a lcd. The comms task runs on demand, the control logic every 100ms ( because relays usually don't work much faster) and the display task every second. Having a lot of tasks doing a little amount of work is probably not the most efficient approach.

Setting the task_timer[x] to 1 for each task means the tasks will forever be competing to execute - this may just be part of your test.

Thanks so much for this tutorial. I definitely learned from it. I just posted my first project, that uses the framework on an ATtiny2313. It was probably overkill for what I was doing, but it was worthy learning experience to use it.

The LED of task1 flashes properly but the task0 doesn't :(
here I wasn't trying to add some delays because it would be causing some conflicts but I don't see why why the LED in task0 doesn't flash after I decrement I until it reaches 0?
here is task1 which works fine,

Few years ago Someone would come in my Bedroom and ask why it isn't tidy, i would reply "it's because me & my bedroom obey the 2nd Law of Thermodynamics" ;) -> Entropy :lol:

Anyways, going back to your question,
i would recommend to declare variable "i" as static first of all. this will prevent it to get destroyed everytime void task0(void) finish running :)
consider this code.

Can you see the Advantage of the Keyword "static especially when you have an increment/ decrement from a f(x) outside main() ?
Apart from this Another good programming habit is to re-initialize your variable after if expires. since your are intending to decrement from 10 to zero then ensure that "i" will be 10 again so it can be decremented and so forth..

A question, it looks like only one task will run each tick, the highest priority one, if lower priority tasks are ready they have to wait for the next tick and hope a higher priority task is not ready, is that the intent?
what changes would be needed to let all ready tasks run, in priority order, with each tick?

The intent is that you would have regular running tasks as lower priority and infrequent but important tasks as a higher priority. Generally i would have a serial receive comms task as high priority. When a serial packet is assembled by the rx isr, it enables a task to process it. Lower priority tasks might be updating a lcd. The human wont pick if it is delayed by a few 10's of milliseconds.

Fixed priority is simple. It can have drawbacks in that lower priority tasks can get starved. Another method is round - robin. Effectively all tasks are the same priority. Each tick you would look for the next task ready to run. You could have a counter that has the next task number to run. On the tick you would increment the counter, roll back to 0 if > 7 then test to see if that task bit is set - if it is, call the task else inc the counter. Only try 8 times per tick otherwise you'll keep spinning until another task becomes active.

There's other techniques based on priority, round robin, least recently run and combinations of these. There has been extensive research into this topic - so Google if you've brave or bored. Fixed priority can starve other tasks if you're not careful whereas round robin gives everyone a fair go. What you choose depends on the nature of the problem you want to solve.

Part 2 is still vapourware. The draft was done some time ago but i havent been suitably motivated to check the code and publish. Too many other distractions.

This is an interesting and informative thread, thanks to all who contributed to the discussion,
especially Kartmannn and clawson. Looking forward for tutorial part 2.
I've boiled down what I learned here and use the following (example) framework in my projects:

1. does it work? No potential side effects?
It works, no side effects discovered so far. Comments welcome.
2. is the intention reasonably obvious?
I wanted to test if I had understood the different suggestions ;)
3. what would be gained by changing it?
Number of tasks not limited to eight, compact code...
4. Is the use consistent?
I think so.
5. Is it MISRA compliant?
Probably not. But this is an AVR-forum, isn't? ;)

Misra used to frown on the use of function pointers. The latest standard doesn't seem to be worried about them anymore. Since your function pointers are in flash, it is highly unlikely they would get inadvertently modified which was the original issue that Misra was addressing. Misra has gained a large following in embedded systems other than automotive. Compliance makes good sense - but i would stop short of strict compliance for amateur projects. As for function pointers vs switch vs if - i think the intention is pretty clear regardless of the implementation. Execution efficiency and code size? I don't think there are great gains to be made - maybe if you were targetting a tiny. Considering we sit in a loop waiting for a timer tick - then we're really not too concerned for extracting every last cycle.

Your example is rather synthetic - i gather you want each task to not 'bump' into each other and effectively have equal priority. In my instance, i wanted and needed priority. I had a comms task that was the highest priority - comms response was important. If the lcd update task got 'bumped' - no one would notice. Similarly for tasks like reading the adc, doing some control logic and activating relays or lamps - the non-determinism isn't really an issue. Of course, its not a solution for every problem. If you wanted equal sharing for all tasks, a round- robin strategy would be the choice. In most of the stuff I've done in the last 20 or so years, the strict priority has worked well.

More tasks? Really, you want less tasks! I think if you need more than 8 tasks, then either the architecture of the code is suspect or you really want to go to a pre-emptive rtos.

Part 2 was intended to introduce a real world application with modbus comms, some control logic and data acquisition and a simple 2x16 lcd interface with three buttons and a setup menu. The sort of thing an AVR class processor would be used for as a small controller. The code is 99% along with the words. I just haven't garnered the motivation to go through the checking to get it ready for an audience. Nothing worse than a tutorial that is full of bugs!
It's only been 7years!
If your intention was to have a task, for say, each flashing an led, then i'd say that is an inefficient implementation. If you can give me an idea of what you're wanting to achieve, then i can offer suggestions.

As everyone else has said - thanks for the great tutorial! I've learned quite a bit following along with your code but have one question regarding the task_timers.

You have a "system tick" setup every 10ms by using the AVR's timer which you build off of. Every 10ms, the tick_flag is set, which triggers the task_dispatcher.

Inside the task dispatcher, the first step is to loop through all the task timers and decrement them to zero. As the task_timer is decrementing to zero, how do you ensure one "decrement cycle" takes exactly one "system tick"?

For example, in your tutorial, task7() sets task_timer[7] = 50, which translates into task7() running every 500ms. So every 10ms, the tick_flag is set, then we enter into the task dispatcher. As the task dispatcher is decrementing task_timer[7] how do you ensure that the task_timer is only decremented every 10ms to give a delay of 500ms?

One follow up question - Our system is now humming along happily, ticking away in 10ms intervals and waiting for a task_timer to decrement to zero. Once the task_timer hits zero, a task_bit is set and that task is ready to execute. How do we ensure the task to be executed completes within the remaining time before the next system tick?

In your example, this is straightforward since there are only a handful of lines of code per task to turn the LED on/off, but what about more complex systems where in a given task a IOs can be polled, ADC conversions need to take place, and LCD needs to be driven etc? Or the case where one system tick triggers multiple tasks to be ran.

In other words, how can you determine what portion of your 10ms system tick is used by one task? I'm thinking this is related to the number of assembly instructions per statement of code?

The simple answer is : you measure it. You can do this in the simulator or you can physically measure the time by using an oscilloscope or logic analyser on a port pin you set on the beginning of the task and cleared when you exit. You can also use a timer to measure the time from when dispatch calls it and when it returns. 10ms equates to 160000 cycles at 16MHz - that allows you to get a bit of work done. Deviceslike lcds are notoriously slow, so they can chew up a few cycles. With some experience you get a feel for how intensive a bit of code might be. The number of instruction per statement of code is extremely variable it could range from 1 to infinity, so you can't use number of source lines of code as a time metric. In assembler you can count the cycles for each instruction - but that gets labourious. In many cases the execution time is variable as the execution path varies, so measurement is really the only way.

If the task exceeds its time budget, then other tasks just get pushed out - thus the term 'co-operative'.

The sample code flashes two leds as a example. Give yourself a challenge to solve then work towards it. You'll soon learn what works and what doesn't. Also, running the code in the simulator can give you an insight as to what happens at a step by step level.

I'm happy that you've gained something from it. Using the bit_mask table is a bit of a throwback to when i first wrote the code on a 8051 processor. With the AVR you could flip a coin vs using shift operators. With an ARM processor, using a shift operator would be more efficient.

I've used the framework on a number of commercial projects spanning nearly 20 years. There's many little boxes ticking away happily around the world.

What makes the threads context switch is inserting a call to "yield", somewhere in the code of the spawned methods, or why not from an interrupt handler (ISR) written in tinythreads.

If you want to know the implementation of the yield method - which triggers the context switching - then you'll have to do some coding yourself (HINT: It's four lines of code, including turning on and off interrupts).

Yield shall enqueue the current thread and dispatch the one in the waiting queue. (PM me if you're not up for my bullshit, I'll show you)

Oh, yeah sure. Right now I've hooked my Arduino up to an LED (I'm strapped for parts, can't make anything more advanced at the moment), that one blinks on/off every 300ms, and the on-board LED blinks on/off every 200ms, both using very simple endless loops with _delay(). The interrupt comes from the WDT set to timeout every 32ms, works flawlessly to the naked eye.

As a side note (nothing to do with multitasking, but interrupt handlers), depending on your system and what the requirements are, you can get away with just runnig your code in the interrupt handlers (: That way you can sleep in the main-loop, saving energy!

This tutorial was not meant to be the last word in multitasking - as i said up front, it is an example of a simple, cooperative tasker. There are a number of ways to achieve the same or similar results.

It's not an impossibility, but I'm not the original author of this program. Not that it's copyrighted or anything but since I didn't write it myself (and I'm really no C programmer) I might not be able to answer those really tricky questions. We'll see, until then all information is in those slides and in the submitted code, only thing missing is the implementation of yield which I can help you with. The mutexes I haven't looked into since the threads I've run haven't shared common resources, but I have the code for those aswell.

I'd suggest that if you find yourself needing to do it then you have partitioned your tasks wrongly.

My thoughts to. In multi-tasking it is often the case that you have one task that "does a job for" another task but rather than CALL it you might typically have the "worker" task block on a mutex or semaphore then, when the parent task wants the job done it release the shared object that triggers the other into life. Anotther way is for the parent to post a request into a message queue of the child task and when it receives and "OK, go for it" message it springs into life. This is quite different to synchronously calling functions in one task from another. You have to start thinking about the problem in a "different way" when you try to share work across threads and tasks in a multi-tasking system.

At the end of the day think about it just like you get to write three or five (or however many) different main()s in a single program. Each "does it's own thing". But a bit like when in Windows your word processor sends a document to the print queue along with another document already waiting to be printed from a spredasheet, sometimes the main()'s "talk" to each other. That's at the highest level of course but really the issue is pretty similar.

So in what context do you see the need for one task to "call" another?

Of course there are things that both tasks might call - I bet they might both be tempted to call printf() but that does not "belong" to either task really - it's a shared resource hey can both use. Of course that raises another issue. If one does printf("hello") and one does printf("goodbye") at the same time how do you prevent the user seeing "hegloodlyeo" ? (or maybe that's what you want?). This is really where a mutex (or semaphore if more than 2 tasks) might come into use. A mutex (mutual exclusion) ensures that only one task can "own" a resource at a time. So both programs call get_mutex(printf); printf("hello"/"goodbye"); release_mutex(printf). The one that "gets there first" gets control and can complete his printf() then he release the lock. Meanwhile the other task is "stuck" inside get_mutex() waiting for it to become available. When the first task calls release_mutex() that allows the get_mutex() to then complete and he can then do his printing. So the user sees "hellogoodbye" or "goodbyehello" depending on which got there first.

I re-coded it so that there is no need to call one task from other. But I am getting a strange problem after running it a while. It is getting stuck somewhere and not coming out. How to figure it out where it is getting stuck?

It kind of depends what "stuck" really means - you meant the task switching preemption actually stops or simply that one of the tasks goes into an infinite loop?

Anyway an OCD debugger (JTAG/debugWire/PDI/whatever) is usually the best for this kind of thing or failing that (if not too much external stimulation is involved) the simulator in Studio. Run the code until it is "stuck" then simply break execution and find out where it is. Maybe follow it into the task preemption (assuming that's till going) and watch it as it returns into each task. Is one (or all even?) "stuck" in a loop waiting for something to happen that never will?

It seems one of the task is going into an infinte loop. This problem is happening randomly like sometimes after 10min, sometimes after 15 min and like that. What do you mean by "Maybe follow it into the task preemption (assuming that's till going) and watch it as it returns into each task." How to do that?

As I say just run the code in a debugger or simulator to the point where it "locks" then break execution to find out where it is stuck (which should then tell you why). If some tasks are still going OK then just keep stepping execution until it gets into the task that is not running.

(BTW this is one of the downsides of multi-tasking - it can often be trickier to find out what's wrong when some part of the code stops working).

Cliff, there is no preemption happening in the bit of code i wrote. Everything runs to completion.
Sktnandy - see above. Your task can't sit in a loop - when it is called it must do what is necessary and exit within its time slot. If it has to wait for something, you use a state machine, set the state and come back later.

Further to what Brian has suggested, output the task number via the uart. For many years i didn't use (or have) in circuit debug so techniques like flashing leds and uart output are powerful means of identifying what is happening. Even recently i output recovered clock and data signals on i/o bits then used a salae logic analyser to see what was happening.

It could be a number of problems. Do you have the correct pullup resistors? What have you done to investigate the problem? What circumstances lead up to the problem? Use the sound card as an oscilloscope and capture the i2c bus. You'll be able to see where it hangs and hopefully solve the problem yourself.

thnks for the reply. I searched net but it could not find anything to resolve the problem. Pull up resistors are 2k2. As per the code, i2c routine get called after every 250ms. And after a while it gets stop at that line, though the code is going to interrupt after every 10ms. I checked this with a led.

You have no default case in the below so if i2c_transmit(type) is called with type != I2C_START | I2C_DATA | I2C_STOP you won't make any change to TWCR but you will enter the while loop. Is that what you intended?

If not then set a flag (LoopFlag) in cases below, clear LoopFlag in a new default case and add LoopFlag test as condition for While.

If I understand correctly, there is a heavy dependency on task_timers. In case few task_timers are of 10 msec and their priority is also high, other tasks may not get chance to run. So we have to be careful in selecting task_timers right?

There's a fixed priority with the tasks. Thus a task can 'steal' all the cycles. The general idea is that you don't allow that to happen. Each task does what it needs to do in its 10ms slot and must yield. You can normally do quite a bit of work in that time. Thus the term 'co -operative'.

Ah forgot about the buffer. Would it be sketchy to wait and unload the received words once the buffer and shift register are full? On an interrupt i guess it would not be.

Cheers for making me aware of that. Just went through the datasheet and found what you mentioned, YAY learning!

So that is 2112 cycles between each unloading of the 3 bytes? I was trying to work this out to see when having such a high baud was problematic, however every design is situation dependent. And was asking to see if my math was right, which it seems to be.

I'm not doing anything fancy yet, as I did had to google DMX/RDM. It was more to learn about this multitasking idea and find the limitations that I need to be aware of.

And now I just read the flowchart pic at footer of your post. I feel it is fairly apt considering Im fairly new to MCU's and often dig myself a hole by trying to optimize and write the initial code at the same time.

3 bytes is the outside limit. The moment the next start bit arrives, a data overrun condition is flagged with the DORn bit, and the latest incoming word is dropped.

There is no way to determine programmatically how many words are waiting in the RX buffer, or how many bits are in the shift register. The only determination you can make is that the receive buffer contains either:

0 words (UDRE = 1)

1 or more words (UDRE = 0)

Some models of AVR have a start frame detection bit, so you could get a warning that there is an incoming word in progress, but that is typically used to wake the device from sleep mode.

In practice it is sufficient to service the receive interrupt with every word, so usually the two-deep-buffer-plus-shift-register only ever gets partly full. The extra depth is useful when your app gets excessively busy on occasion, allowing it to catch up without losing any incoming data, but this behaviour should be profiled to ensure overrun isn't possible, or that it is properly handled if it does.

While you can design your app 'to the edge', this must be done with care.