First, I'd like to say I don't need multitasking for my current project. But, after looking at a couple threads on the topic in the tutorials section, I appreciate the elegance and am feeling compelled to use that method in my implementation.

Second, I've been an amateur programmer off and on for 33 years. I have little formal training, but I do try to do things "right" within reason. Generally, that means trying to mimic the pros and avoid bad programming habits as I become aware of them. I fail miserably in the end, but the emphasis in on "try". This is my motivation for this question, to write the code for my current project right... whatever that means.

So, by "multitasking" I'm referring to the method where a timer is set to CTC mode and a tick_flag is set at each overflow. I'm sure there are different approaches, but I assume the more commonly used all basically operate via that method? I'm torn between using that approach and what I will call a clock counter method that was my originally intended approach. I have a sense of "I don't like it." It's inelegant. But, I cannot pinpoint exactly why I don't like it. Are there poor practices in using the clock counter approach? If not, I'm inclined to use it but I'd like your input. I'll explain the project briefly, then explain the clock counter code as I haven't written it yet.

The Project:

Device: ATtiny 85

Output: 3 leds

Inputs: 1 tactile button and 1 PIR sensor

The circuit will be a battery powered remote recording device for pedestrian traffic. By remote, I mean portable. It can be moved and placed anywhere. The circuit may be left in the field for up to a week. Assume sufficient power to run the device for a week. When the circuit is first energized, it starts the clock... like millis() in Arduino (only it's a variable) and runs continuously, incremented via the CTC timer. This clock is the backbone of the code. When the sensor is triggered, the AVR will record the last 20 "hits" in a long[20] array after being converted to minutes. The button will act as a multifunction button by being able to distinguish between a long and a short key press. It's purpose will be to activate the output leds to communicate the time in milliseconds from bootup of each hit. I plan to use multiple "timers" (or tasks if multitasking method is used) for various things that should be obvious but a few are listed below. 3 leds = 8 combinations. I was planning to simply display sequentially, each digit of each value recorded in the long[20] array with a couple flashes between the values for delimiting. Since 3 leds won't handle 10 combinations, I was planning to do a flashing binary 6 and flashing binary 7 to represent 8 and 9 decimal (or similar method).

"timers" :

debouncing button

timing button press

timing the flashing of the leds (on and off)

filtering artifact PIR sensor output (must be active for 100ms to count as a hit)

etc.

Each of the timers would require their own long variable. It would be set to the current clock time at some point. For example, the button was just pressed (timer_button = clock_ms +100). In the main loop, compare the timer_button value to clock_ms ( timer_button < clock_ms , then do something ). Remember clock_ms is the number of milliseconds the circuit has been powered up. The main loop would have an if() for each timer to direct program flow. That's pretty much it. I hope it made sense. It seems convoluted to me, but so did the multitasking method the first couple times I saw it.

What I'd like to know is, is there anything inherently wrong with the clock counter method? Does it scream amateur? I hope to learn more about the right way to program by learning why this method would be a poor choice. Is it significantly worse than the multitasking method? Are there better non-multitasking methods that I should consider?

Before suggesting I step up to an atmega, this project is an exercise in designing small. It isn't intended for production. I have some 328's on hand. I also have some shift registers that would make the output better but I am intrigued by minimalist practices and thus I ask you to respond with respect to the parameters of the hardware I've laid out.

I gather you've read my tutorial on multitasking. The method you describe is a common technique. The downside on an AVR is that you're doing 32 bit operations that are 'heavy' and chew up 4 bytes. If you were using an ARM cortex, then 32bit operations are native and very efficient., but you'd still be chewing up four bytes.
Ultimately, for co-operative multitasking, all you are doing is divvying up the processor time. The classic technique was the 'super loop' where you'd have flags that would determine what functions you'd call and a timer or functions would set flags. The technique i proposed in my tutorial is basically the same thing but arranged to make it a bit more general. I got the technique from a GE analog cell phone. Of course, there are many ways to skin a cat, so for a learning experience, try different ones and see which one is smaller, uses less ram and works!

I'd put the debounce code in the isr, but flashing leds i'd do as a timed task.

As Kartman wrote, working with absolute timing (run_at(time)) is computationally heavier than working with relative timing (run_in(ms) or run_every(ms)). Each timing operations has to deal with 4 bytes (32bits long), which is not a good idea on an 8bit core.

Besides, most tasks in an embedded system are defined in relative terms: "check if button is pressed every 10ms", "turn off the LED in 1s"… Why make it "check if button is pressed at NOW+10ms", "turn off the LED at NOW+1s"?

Overall it make more sense to use relative timing as the basis for a scheduler design. You can always maintain an absolute counter if a task needs absolute timing or very long intervals.

Like many before and after me, I have designed my own cooperative time-triggered scheduler. It has a lot in common with the one Kartman presents in his tutorial. I would encourage you to do the same or reuse similar existing code.

Let me tell you a bit about mine, and specifically how it differs from Kartman's example: it might give you a few ideas.

It is small and optimised in order to run well (and consume less) on small devices: compiled for ATtiny85 with GCC, Blink is 252 bytes and uses 6 bytes of RAM.

Each task eats 2 bytes of RAM: a byte counter and a byte period (after it's run a task will have its counter reset to period). This way you can run a task in counter ticks and/or every period ticks.

Like Kartman's example, the main clock is usually set to tick every 10ms. Thus the longest delay/interval in 2.55s. If a task needs a longer delay, which is rare, it can maintain its own long-term counter.

One difference however is that the MCU is put to sleep between ticks, and wakes up at the next clock tick interrupt. It consumes less power than an active wait for a tick_flag and makes for very little jitter in task timing. In fact, in this application it is idle 99.98% of the time:

When it's actually doing some work (toggling the LED) it's running for 3µs, compared to 1.3µs when there's no task to run:

Another difference is that all tasks programmed to run on this tick will (not just one task). There's a lot you can do in 10ms, specially when tasks are like: check the state of a button, toggle a LED, read some analog pin, etc. It is your responsibility to make sure all tasks set to run will fit within a clock cycle.

There is no explicit task priority: tasks will simply run in the order they have been defined. For example:

Hi Kartman, yes, your tutorial was the one I that got me wondering if I should swing from the clock counter method to the time sharing method.

You said:

The downside on an AVR is that you're doing 32 bit operations that are 'heavy' and chew up 4 bytes.

You're referring to my original intent to compare all the long variable types with the clock? That was part of my concern about using the method. I understand the 32 bit timer variables chewing up memory for the variable and there's plenty available so I'm not concerned about that point. But how does it affect performance? Is it a drastic reduction in speed or something else I should be aware of? This particular project halting execution while running a task subroutine isn't an issue.

I'm not concerned about optimizing the code to "perfection". Like you say, there's more than one way to skin a cat. But, if the general consensus of you experienced programmers is "wtf" then I must be on the wrong track and would rather correct my thinking now, earlier in my training, than later. Would my proposed method be generally considered inappropriate?

If I go with your time sharing method, would it be inappropriate to have the tick_flag variable AND a clock_counter variable that increments inside the ISR routine? Of course the clock counter is still necessary for recording the time the PIR events are recorded. I considered putting the clock counter outside the ISR in the main loop, but there's a risk that the tick_flags could get reset before the clock counter is incremented. Although I suppose this could work?

ATOMIC_BLOCK(ATOMIC_FORCEON)
{
tick_flag = 0;
clock_cnt++;
}

Assuming the previous code snippet is acceptable, which would be preferred the atomic block or just tucking the clock_cnt++ in the ISR function?

Either method will work, what I like about Kartman's task scheduler is it lets you "think" of each task as a separate problem to be solved and is independent of the other tasks that need to be done. I have done it both ways, using which ever method seemed like the right way at the time, with no real preference either way. YMMV

In your example the atomic block is not required as your clock_cnt is not shared with the isr. However, if you increment clock_cnt in the isr, you'll need to use atomic block each time you read it. Also worth noting is that Arduino code normally uses your clock method.
If what you're doing is in the realm of 10's of milliseconds, then the performance hit of 32 bit operations is irrelevant. Many of us work on things where microseconds count, so we tend to be frugal. I'm working on web stuff at the moment and it is the opposite - how much can we keep in ram and how many servers will we use.

As Kartman wrote, working with absolute timing (run_at(time)) is computationally heavier than working with relative timing (run_in(ms) or run_every(ms)). Each timing operations has to deal with 4 bytes (32bits long), which is not a good idea on an 8bit core.

Firstly, your response posted while I was writing (and posting) my previous response to Kartman. I wouldn't have worded it the same as YOUR post addresses some of my questions.

Secondly, with respect to the above quote, "deal with 4 bytes..." I get that the storage for a 32 bit variable takes up 32 bits of space. It's a one to one trade off. THAT I can accept. I recognize that comparing two 32-bit values comes at "some cost". Can you quantify it? Surely the operation would occur in a single clock cycle? Do I run the risk of running out of stack or heap or something? I'm not arguing your point. In fact, it's well taken. I'd like to be able to recognize and assess this cost for future applications..

Quote:

It is your responsibility to make sure all tasks set to run will fit within a clock cycle.

Understood. But how is this done? I am guessing it has something to do with counting the assembler operations. Frankly, I find the assembly stuff intimidating. I'm only just beginning to get a little comfortable with C.

Quote:

There are many other things, but it would go further than what you've asked. Anyway, hope it inspires you!

Wow. <slow clap> That was a really awesome write up. Cutting out the priority really simplified things and for this particular application it appears to be exactly what I need. I'm going to pursue this method. So, I guess you can say you have most certainly inspired me!

I didn't see this in the tutorials section. Maybe you could build on your post here and post it in the tutorial forum. Really good stuff. Cheers

This is an 8bit micro so that on the whole it does things 8bits at a time. As you can see a 32bit compare is therefore four stacked CP operations. True each is just 1 cycle but that does make 4 cycles to do the compare and that does not take into account the accessing of the 32 bits in the first place. As you can see here in my somewhat contrived example tht involves 8 LDS operations each being 2 cycles so 16 cycles in all. Though that is to compare var1 to var2. If it was comparing to a constant:

then 4 LDS operations (8 cycles) are replaced with 4 LDI (total 4 cycles). Oh and then there is the BRNE at the end of each which makes the conditional branch based on the result. That costs 2 cycles when it jumps or just 1 if it doesn't.

You won't run out of stack/heap doing this but count those opcode bytes - that's a fair few bytes of your flash to do work with 32 bits vars.

Basic rule is always use as few bits as you can get away with when programming small micros. The more you employ the more cycles it will take and the more flash it will occupy.

A 32bit compare would need four reads from memory and four compares, so thats 8 instructions and 16 bytes of flash and I'm guessing at around 12 clocks. How does one know how long it takes for a code segment to execute? Atmel Studio - the simulator has a cycle counter so you can breakpoint on a piece of code and take note of the cycle count at the start and at the end of a given section of code. Calc the difference and that's how many clocks it takes.

In your example the atomic block is not required as your clock_cnt is not shared with the isr. However, if you increment clock_cnt in the isr, you'll need to use atomic block each time you read it. Also worth noting is that Arduino code normally uses your clock method. If what you're doing is in the realm of 10's of milliseconds, then the performance hit of 32 bit operations is irrelevant. Many of us work on things where microseconds count, so we tend to be frugal. I'm working on web stuff at the moment and it is the opposite - how much can we keep in ram and how many servers will we use.

The reason I put in the Atomic block was to ensure that when the tick_flag was reset, the clock increment would be certain. What happens if the timer interrupts immediately after the tick_flag is reset? You throw off the clock, right?

For this project frugality isn't needed, nor is multithreading/timeshare etc. BUT, you're pointing out the kind of stuff I was looking for. The clock_cnt method I had in mind just wasn't sitting well with me. It just seemed cumbersome... but I couldn't put my finger on it. It's the effect of the 32-bit operations I was overlooking because I don't know how to quantify that. While I do know how to quantify the cost of the memory space of the variable itself, the operations on them is a different beast.

I think my error in even entertaining the clock counter method was that manipulating tick_flag and clock_cnt simultaneously seemed redundant as one could be easily determined in a single operation per tick from the change in the other. Seemed like 6 of one, half-dozen the other... but it isn't. At every tick, there are multiple comparisons (an operation) going on. My method results in those operations all being 32-bit whereas the tick_flag method results in only one... clock_cnt++. Wow. I've spent days trying to compare the two methods. Feeling pretty embarrassed at the moment.

I recognize that comparing two 32-bit values comes at "some cost". Can you quantify it? Surely the operation would occur in a single clock cycle?

You really want to start reading some LSS!

LOL. I've tried! It makes my head hurt.

Quote:

This is an 8bit micro so that on the whole it does things 8bits at a time. As you can see a 32bit compare is therefore four stacked CP operations. True each is just 1 cycle but that does make 4 cycles to do the compare and that does not take into account the accessing of the 32 bits in the first place. As you can see here in my somewhat contrived example tht involves 8 LDS operations each being 2 cycles so 16 cycles in all. Though that is to compare var1 to var2. If it was comparing to a constant:

But as you've just seen it's really fairly simple. The AVR (despite what Atmel would tell you about 130..140 instructions) really just has a little over 75 distinct operations it can perform and they are all documented (with some synonyms) here:

It's actually a quick way to learn to read (not necessarily write) the assembler of the AVR to just express what you want to achieve in a line or two of C that you understand then build the code and study the generated Asm, looking up any opcodes you haven't already come across in that manual. Pretty soon you will know them all.

Let's face it 8 bit micros (well even bigger ones too) are never really doing anything particular complex in a single instruction (especially true of RISC not CISC processors). They take some 0's and 1's in at one, switch a few transistors then output some new 0's and 1's at the other end. Well worth following the current thread/blog by Neil Barnes where he is effectively building a very simple CPU from the ground up using 74 series TTL logic gates. It's only a small step from there to AVR8.

I was planning to do a flashing binary 6 and flashing binary 7 to represent 8 and 9 decimal (or similar method).

Or just display in octal.

Since you're intrigued with minimalist approaches, you might also consider Charlieplexing. It is a technique which allows N pins to control N2-N LEDs. Three pins could control 6 LEDs. Enough for 64 different patterns if used for binary display. Or use 4 for a 4-bit display (good for decimal or hexadecimal), and the remaining two for other purposes.

Charlieplex matrices can also be used for reading buttons. Whereas a 3x2 matrix of buttons would require 2+3=5 I/O lines to read using a conventional matrix approach (or 6 I/O lines for direct connection of each button), a Charlieplex matrix can read 6 buttons using only 3 I/O lines. There are some limitations. Multiple button presses cannot be unambiguously distinguished. Also, each switch must be in series with a diode, so this adds to the component count a bit.

You can even use the same matrix to both read buttons and light LEDs. For example, 2 buttons and 4 LEDs.

There are numerous threads concerning Charlieplexing that have cropped up over the years, at least one of them fairly recently.

It's worth noting, since you mentioned you like to avoid bad habits, that some would say the first bad habit to break yourself from would be that of selecting the project hardware first, then specifying the project requirements. That is, if the project requirements suggest you need a device with 8 I/O lines, select one. Don't select a device with only 5 I/O lines. Otherwise the project then gets bogged down in a quagmire of solving problems unrelated to the requirements, but rather to those imposed by the premature selection of hardware.

That said, I too appreciate the challenge of turning a 'sow's ear into a silk purse' ... and I'm not generally one to say 'don't do it that way'. Plus, you've asked us to respect the parameters of your problem. Fair enough!

As a bonus, Charlieplexing, like most multiplexing techniques, is a time-sliced technique, which could tie in nicely with your other project goals. Each 'on' LED is actually on only part of the time, usually with a duty cycle of 1/N. It works because the human eye perceives an LED as solidly 'on' even if it is flashing, as long as it is flashing fast enough. For LEDs, that's usually about 70 Hz or faster. So, the Charlieplexing 'engine' would run at a basic frequency of N * REFRESH_RATE. For N = 3 (number of I/O lines) and a 70 Hz refresh, that's 210 Hz or roughly every 5 ms. Depending on your other timing requirements, you could select a time base which fits nicely. A 5 ms time base would result in a 66.7 Hz. A 4 ms time base would get you 83.3 Hz.

But I digress ;-)

"Experience is what enables you to recognise a mistake the second time you make it."

Atmel Studio - the simulator has a cycle counter so you can breakpoint on a piece of code and take note of the cycle count at the start and at the end of a given section of code. Calc the difference and that's how many clocks it takes.

Had some trouble finding it at first. Great tip! I'm continually more impressed with Studio.

I'd suggest when you're faced with a decision between various techniques, choose the simplest one and get that working. Then you can try another technique and compare. Otherwise, if you keep on pondering, you don't have a working solution. In my job i have these conundrums almost daily - is it faster to have one complex database table with multiple indexes or a number of simpler tables with one primary index and use a join? There is no simple answer as it depends on many things so what do you choose? I'm not a database guru and i have little experience to fall back on, so how do I come to a solution? We chose the simplest - single table with multiple indexes. We've not tried the other yet. Red black trees vs b-trees? And so on. Sometimes there's a clear cut winner, other times not so. Again, at some point you have to produce something that works - it may not be perfect, but it does what is intended. Otherwise, it is all academic. Frequently when writing code, i'll figure out a better way, so I'll refactor. Or you could be debugging and figure out that your code is crap - at then you know Why it is crap. So you make changes and try again.

I'd suggest when you're faced with a decision between various techniques, choose the simplest one and get that working. Then you can try another technique and compare. Otherwise, if you keep on pondering, you don't have a working solution.

You've identified one of my biggest problems. The older I get the bigger the problem becomes. I tend to see the flaws (lets face it, we might as well give up if we've already achieved perfection) and spend more time pondering a better way than I do actually accomplishing it. I have a very cluttered mind.

Quote:

In my job i have these conundrums almost daily - is it faster to have one complex database table with multiple indexes or a number of simpler tables with one primary index and use a join? There is no simple answer as it depends on many things so what do you choose? I'm not a database guru and i have little experience to fall back on, so how do I come to a solution? We chose the simplest - single table with multiple indexes. We've not tried the other yet. Red black trees vs b-trees? And so on. Sometimes there's a clear cut winner, other times not so. Again, at some point you have to produce something that works - it may not be perfect, but it does what is intended. Otherwise, it is all academic. Frequently when writing code, i'll figure out a better way, so I'll refactor. Or you could be debugging and figure out that your code is crap - at then you know Why it is crap. So you make changes and try again.

It's all good advice. When I was designing the hardware for this device, I found myself concerned with how to design the hardware so I could add functions and features to it later. In a way, I was basically doing what you suggest... just choose something simple and get it working. But then at some point it occurs to me that if I put in 3 leds instead of 2 I could do this or that. So, I add the 3rd led it's simple enough. But it's a slippery slope. It's silly when my hardware design fits on a 3x4 board and the attiny doesn't facilitate much expansion. Alas, my burden is my imperfection! :)

Your questions about operating on 32bits have been answered. From what I gather of your application, your technique would most likely work. But a scheduler is typically a piece of code that you'll reuse over and over again. It's also the piece of code that will run the most often (every tick) and upon which all other tasks depend. So I believe it's worth addressing the problem in a generic manner (the time spent developing it can be amortized over several projects). Specially if one of your main goals is to learn, as opposed to having someone rushing you because he's paying for your time.

chipz wrote:

netizen wrote:

It is your responsibility to make sure all tasks set to run will fit within a clock cycle.

Understood. But how is this done? I am guessing it has something to do with counting the assembler operations.

I believe this has been addressed as well in the above answers.

One thing though: most of your tasks are going to be very short. Toggling the LED is about 1.5µs, reading the switch pin and debouncing it would be a bit more. You can fit thousands of these in a 10ms cycle: there is no need to count cycles precisely. The big tasks are kind of obvious: either they do a lot of processing (loops, complex math…), of they take long because they're communicating over a slow protocol.

If you cannot fit two of them in the same time slot, make sure they're never running together and you're good. For example, by intertwining them (long1 runs on even clock cycles, long2 runs on odd clock cycles):

They each run every 2 ticks, but long2 starts 1 tick later (counter=1).

If one task is too big in itself, slice it into smaller parts: do half the job, reschedule, do second half. You can even monitor the main clock from your task and stop your processing when you're getting close to the next tick. Or you could pick a longer clock cycle (20ms?). It really depends on the specifics of your project.

On that topic, one of my tasks (async) itself is a scheduler for a second class of tasks: asynchronous tasks. They are typically long jobs that have been offloaded by standard (synchronous) tasks, so let's call them jobs. async is the last task to run and it tries to fill the idle time with as many jobs as possible.

Maximum execution time of each job is monitored and recorded. The queue of jobs is consumed one at a time, jobs being run until there's not enough time to run the next one.

It's not a perfect system, but it still does a pretty good job at filling idle time. Here is a tiny85 fully refreshing a 128x64 OLED screen over I²C:

It takes about 3.5 cycles (35ms), CPU usage jumps from 0.4% to over 90% during this time. If you look closer, you can actually see the successive jobs:

Perhaps that's going a bit too far, in which case I apologize. The main idea here is that synchronous tasks should be kept small; if they need to do a very long job, they should just offload it to async and it will be done asap.

chipz wrote:

Wow. That was a really awesome write up. Cutting out the priority really simplified things and for this particular application it appears to be exactly what I need. I'm going to pursue this method. So, I guess you can say you have most certainly inspired me!

But a scheduler is typically a piece of code that you'll reuse over and over again. It's also the piece of code that will run the most often (every tick) and upon which all other tasks depend. So I believe it's worth addressing the problem in a generic manner (the time spent developing it can be amortized over several projects).

Would this be best implemented in its own module or create a library to include in future projects?

I'm sorry, I don't see how this can be used in the code. Not your point about alternating ticks to improve efficiency, I get that, but the define directive itself. Would you please expound on this?

Quote:

Perhaps that's going a bit too far, in which case I apologize. The main idea here is that synchronous tasks should be kept small; if they need to do a very long job, they should just offload it to async and it will be done asap.

Perhaps you went a "tiny" bit too far. But if you hadn't, it would have been my loss. I've gained much more from this thread than I had hoped. That's always good.

I'm sorry, I don't see how this can be used in the code. Not your point about alternating ticks to improve efficiency, I get that, but the define directive itself.

Alright. That's actually quite a tricky bit of preprocessor voodoo that relies on the Boost Preprocessor Library. Each task definition [format: (,,,)] is called a tuple, TASKS is a sequence of them [format: ()()()].

I wanted a simple and elegant way of defining tasks, I wanted to be able to change tasks order simply by moving a line up or down, etc… I ended up using this format, which I kind of like, but there's a layer of not-so-obvious preprocessor macros to transform this definition into a C task list, etc. I even had to mess with the Boost source code to get what I wanted. Note that this complexity is independent from the scheduler C code itself however. You don't have to do it that way.

@netizen, thanks for the Boost Preprocessor intro. Yes, it's very cool. I've avoided preprocessor directives as much as possible until recently. Recently I started using them. I didn't recognize the code in your initial post and thought it was just pseudo code at first. The second time you posted it it registered as something more. I searched online but couldn't find anything that looked like it. I agree, it looks very elegant on the surface. I'll wait until I master the standard directives before I look into Boost further. Thanks for sharing. You've been most helpful!

I think you're right: I found this Boost library rather confusing, even though I like to think I have a fairly good grasp of the preprocessor. It took me surprisingly long to feel at ease with it. But perhaps it's just me…

Would someone please comment on this, a quote from my post #10? It's secondary to the OP but I'm curious if my assumption is wrong.

The reason I put in the Atomic block was to ensure that when the tick_flag was reset, the clock increment would be certain. What happens if the timer interrupts immediately after the tick_flag is reset? You throw off the clock, right?

I realize it isn't how Atomic is typically seen in code, but if you read it in someones code would you find it palatable or not? Or would it just not work at all as intended?

This thread is almost impossible to follow on a mobile phone. Can someone remind me again what dire threat the use of atomic protection on the flag clear and increment is supposed to be protecting us from?

Actually, come to that, why does tick_flag and the ISR even exist anyway? Why not simply poll then clear the IF flag for COMPA?

Cliff, in the context of the example i gave in my tutorial, i used the isr as I would normally put some extra stuff in the isr like some debounce for inputs. As you say, yes, you could just use the compa flag. The OP has mentioned this.

I've just had a cursory scan of it and it looks good. There's a couple of things I'd add to it like the danger of using pushbuttons on interrupts and validating input. Apart from those, it seems to cover in reasonable detail the things important to designing embedded systems.