I hate RTOSes

Monday, April 12th, 2010 by Miro Samek

I have to confess that I’ve been experiencing a severe writer’s block lately. It’s not that I’m short of subjects to talk about, but I’m getting tired of circling around the most important issues that matter to me most and should matter the most to any embedded software developer. I mean the basic software structure.

Unfortunately, I find it impossible to talk about truly important issues without stepping on somebody’s toes, which means picking a fight. So, in this installment I decided to come out of the closet and say it openly: I hate RTOSes, because they are a ticking bomb.

The main reason I say so is because a conventional RTOS implies a certain programming paradigm, which leads to particularly brittle designs. I’m talking about blocking. Blocking occurs any time you wait explicitly in-line for something to happen. All RTOSes provide an assortment of blocking mechanisms, such as various semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles. Typically, however, tasks block in many places scattered throughout various functions called from the task routine (the endless loop). For example, a task can block and wait for a semaphore that indicates end of an ADC conversion. In other part of the code, the same task might wait for a timeout event flag, and so on.

Blocking is insidious, because it appears to work initially, but quickly degenerates into a unmanageable mess. The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such task cannot be easily extended to handle other events, not just because the system is unresponsive, but also due to the fact the the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.

You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is developed over time by gradually adding new events and behaviors. The inflexibility prevents an application to grow that way, so the design degenerates in the process known as architectural decay. This in turn makes it often impossible to even finish the original application, let alone maintain it.

The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data as other feature in another tasks (we call such features cohesive). But placing the new feature in a different task requires very careful sharing of the common data. So mutexes and other such mechanisms must be applied. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, hairy, unintended side-effects.

For decades embedded engineers were taught to believe that the only two alternatives for structuring embedded software are a “superloop” (main+ISRs) or an RTOS. But this is of course not true. Other alternatives exist, specifically event-driven programming with modern state machines is a much better way. It is not a silver bullet, of course, but after having used this method extensively for over a decade I will never go back to a raw RTOS. I plan to write more about this better way, why it is better and where it is still weak. Stay tuned.

20 Responses to “I hate RTOSes”

I love RTOSes! I have to agree, though, that in the hands of the inexperienced they can be horribly abused. But then, what can’t? A C programmer’s first venture into C++, for example, can make you weep.

The points I want to tackle are:
(1) Waiting at multiple paces in the task loop.
(2) Inability to wait for more than one thing at a time.
(3) Processing only one of these events per loop iteration.

(1) is very bad practice but it is encouraged by the fact that (2) is a characteristic of too many RTOSes. My own RTOS, SKC++, which will be great if I ever get round to finishing it, overcomes (2). This is not a sales pitch, as I have no product yet. However, I have blogged extensively about the project, with my teaching hat on. Have a look at the latter part of this to see how I deal with (2).http://software-integrity.com/blog/2009/10/14/skc-event-handling/

As the article explains, I didn’t invent this idea but got my inspiration from pSOS, which also allows waiting for several events concurrently. However, if there are several events pending when the task waits, pSOS insists that they all be processed in one task iteration – i.e. before waiting again. This leads to messy code and increases the probability of error, so I added the notion of priority to events, so that you only get to service the pending event of highest priority each time round. This is item (3) which I believe makes sense.

I neither love nor hate RTOSes. They are one (important) tool among many that are needed to get the job done.

I have used many RTOSes over the years and agree that many of them make it difficult for tasks to wait for multiple events simultaneously. I have not yet had the opportunity to make use of a formal event driven state machine framework, such as QP, but I would love to try it on a project soon.

One technique I have used successfully to solve the problem of cleanly waiting on many different events within a task is POSIX-like condition variables (I say POSIX-like because few RTOSes, even with POSIX wrappers, are purely POSIX compliant). The key behavior that is needed: waiting on a condition variable atomically releases the associated mutex. With this construct it is possible to implement clean event passing mechanisms between tasks where the events are completely arbitrary and application defined. It is also easy to add additional events, without breaking the design. The basic model I have used is as follows.

I have used this model successfully to implement Task to Task event processing. Of course, this method will not work for ISR to task event processing because an ISR must never acquire a mutex. I would argue, however, that tasks servicing ISR events should never wait on more that one event — the signal from the ISR. And in this case every RTOS I have ever seen has a mechanism for this based on either semaphores or message queues that can be safely posted from an ISR.

One might argue that the model I have described above involves a LOT of RTOS overhead to accomplish a simple event passing framework, and I don’t disagree. And, yes, it is possible for the event recipient to be preempted while holding the mutex, thus creating a priority inversion problem that may block the sender, but traditional message-queue based designs suffer from the same problems. I didn’t say it was perfect. I look forward to seeing SKC++ when it is ready.

And I am looking forward, Miro, to your next installment on this topic.

Thanks a lot for good comments. I’m so glad that we finally have a real discussion here!

After reading your links, it seems to me that you’ve been already burnt by an RTOS, so you too hate RTOS, but you simply don’t fully realize this yet. As an advanced RTOS user, you are obviously jumping ahead and touching on issues that I plan to explain in my upcoming blogs.

For now, let me just say that a traditional RTOS is not inherently incompatible with the event-driven programming paradigm. An RTOS is quite a low-level concept and can be used in many ways. So in fact, you can build an event-driven infrastructure (an event-driven framework) with a traditional RTOS. But then, you will be using only a small fraction of the RTOS features, and even the few features that will be used for event-driven programming will not be making the use of the RTOS efficiently. In other words, an RTOS is a poor match for event-driven programming paradigm.

I will be talking about all the issues, including everything that you mentioned in your blogs, such as event queues bound to the tasks, zero-copy event passing, and many more. Please stay tuned!

1. Naive question – how do you think is it feasible to creat event-driven system which can substitute general purpose OS such as Windows or Linux?
2. People prefer RTOSs, if one look at different forums we see many discussions – big movment is there, new RTOSs appear nearly monthly(ok – it’s exaggeration). Why?
My assumptions:
– there is a lot ready to use periphiral libraries that could be used whith RTOSs;
– there is many communites around RTOSs;
– there is a lot of pros in the communities giving support;
– people don’t know and don’t want to know about alternatives – you are the only prophet of event-driven framework(I don’t konw about the others);
– people prefer to follow known ways and be close to mainstream;
– people buy what is advartised and fall into eyes;
– although your book is called Practical UML Statecharts, but imho it’s more academical then practical, lacking real world examples(it is not critic of the book itself – I like it very much and appreciate it highly);
– event-driven framework is conseptulaly more difficult or weird;
– event-driven framework has more steep learning curve then RTOS;
– it takes more time to think before coding then with RTOS – one need to draw sequence diagram, statechart diagram;
– ..
(It remains me reasons why people prefer C over C++)

“I plan to write more about this better way, why it is better and where it is still weak.”

Great questions, Alcosar. I’ve been struggling with them for many years, and I plan to address many of these important questions in this blog. I don’t promise definitive answers, but I will go beyond obvious reasons, such as intellectual inertia of the embedded software community.

I will also try to help you and other readers through the paradigm shift from sequential to event-driven programming. So that you will no longer think that event-driven programming is “weird” or “backwards”, but rather it will feel as the most natural way of approaching event-driven problems.

As to your assessment of my last book, I was trying very hard to exactly make it live up to its title, that is, to be *practical* rather than academic. Perhaps I failed, but I would really encourage you to spend a little more time with the “Fly ‘n’ Shoot” game example from Chapter 1. Please look through the provided code, compare it to the state diagrams in the book, think through the sequence diagram, and above all, *run* the code and play the game (you need to score at least 3000 points). Then, please think about how you would implement the same game with an RTOS.

Being an 8 & 16-bit developer for the last 5 years, I’ve had little experience with RTOSes until recently (I’ve started playing around with MQX on the Coldfire platform). In the low-end world you always find code filled with loops expecting events to happen, worst of all, there’s no RTOS controlling the software flow, so the MCU does stay there. On the other hand, these devices don’t run as much functions and/or code as a 32-bitter running an RTOS, so programmers usually find it’s ok to keep those loops as long as the final application ends up working (barely), even if they are unable to add more functions later on. This is a similar paradigm as what you’ve explained happens in RTOSes. I’ve worked a lot with state-machine implementations and have a love-hate relation with them, they are great, they get the job done, and a good implementation will allow you to add more functions later on (as long as it isn’t a “super loop” approach). The downside is that they’re quite difficult to correctly implement, there are usually too many flags and/or state variables to keep track of and each implementation is unique, unless you are working at a really formal place where everybody is actually following some sort of architecture standard. Anyway, I’m looking forward to your coming posts, I’d really like it if you could talk about implementations of event-driven approaches in low-end MCU.

Great comment. Thank you for pointing out that RTOS is too heavy for many low-end systems, mostly because an RTOS requires a separate *stack* (RAM) for each task. In contrast, event-driven approach can have really tiny footprint, especially in RAM, which makes it ideal for high-volume, cost-sensitive applications such as motor control, lighting control, capacitive touch sensing, remote access control, RFID, thermostats, small appliances, toys, power supplies, battery chargers, or just about any custom system on chip (SOC or ASIC) that contains a small CPU inside. Also, because the event-driven paradigm naturally uses the CPU only when handling events and otherwise can very easily switch the CPU and peripherals into a low-power sleep mode, the paradigm is particularly suitable for ultra-low power applications, such as wireless sensor networks and implantable medical devices.

So, yes tiny event-driven frameworks are a viable alternative to the venerable “superloop” in low-end systems. For example, the ultra-lightweight QP-nano framework with hierarchical state machines, event queues, timers, and cooperative or preemptive scheduler can run in as little as 128 bytes of RAM and a few KB of ROM (see the eZ430 example included in the QP-nano distribution available from Sourceforge.net).

I plan to dedicate at least one post to low-end systems and scalability of the modern event-driven programming techniques based on state machines.

Miro,
You would perhaps agree that one valid reason to use RTOSs or for that matter general purpose embedded OSs, with or without QP, would be the protocol stacks (TCP/IP, USB etc) they provide.
One suggestion for overcoming your writer’s block: the article about event drive programming and deadlocks that you promised in replying to my comment on the Barr Code blog entry on race conditions.

I agree that the most important reason why one should still consider using a traditional blocking RTOS is compatibility with the existing software, such as TCP/IP, USB, or CAN communication stacks that are designed for a traditional blocking kernels.

On the other hand, there is absolutely no technical reason why communication stacks would require blocking. In fact, this kind of software is very event-driven by nature. For example, the popular, open source, lightweight TCP/IP stack lwIP uses internally a strictly event-driven API, exactly to reduce its memory footprint and increase performance. lwIP can run without a traditional blocking kernel, and a port of lwIP to the QP event-driven framework works beautifully (see state-machine.com/lwip). Porting other stacks (USB, CAN, etc.) is just a matter of time.

Thank you everyone for great comments. I’d like to address most of the issues mentioned in your comments, but obviously I cannot do it all in one post. Therefore, I’d like to lay out my plan of attack, so that you know in which order I am going to build my argument:

1. In my next post I’ll talk about avoiding blocking in RTOS-based applications. I’ll introduce run-to-completion event-processing, the concept of an event-driven framework, and inversion of control.

2. I’ll talk about challenges of programming without blocking. I’ll explain what you need to sacrifice when you write non-blocking code and why this often leads to “spaghetti” code.

3. I’ll explain how state machines can work as powerful “spaghetti reducers” and why state machines didn’t catch on in the traditional event-driven GUI programming.

4. In the subsequent post I’ll explain the important things that an event-driven framework can easily do, but a traditional RTOS can’t.

5. I’ll describe the pros and cons of the most popular state machine implementation techniques in C.

6. I’ll describe the main shortcomings of the traditional state machines and how modern hierarchical state machines avoid them.

7. I’ll explain how modern modeling tools work and what kind of code they can generate.

8. I’ll plan to explain a very simple and absolutely portable cooperative kernel for executing event-driven state machines.

9. I’ll explain an ultra-light preemptive kernel that is ideal for executing state machines in run-to-completion fashion.

I have been using both RTOSes and state machines since the late 80s. And I’m never tired of learning new things, so I read this topic with pleasure. In fact, I read your Statecharts book some years ago, and applied successfully your ideas in some products with no RTOS.

What I missed in this discussion, and perhaps belongs to a new thread, is a comparison with another contender: time-triggered architectures. I read with curiosity “Patterns for time-triggered embedded systems” (available here: http://www.tte-systems.com/books/pttes ) and I found it a mix of a very simple event-driven mechanism (where the event is always a time event) and a curious scheduler. I have never tried this approach, Do you consider it useful?

I have read the TTE(old version,Chinese translation), It’s simple and stable, and easy testing . TTE views the program in the time sequence, periodically checks the event flags and handles them, and can handles some Simple random events. I think TTE is suitable for definitely coming events within a time range, but It’s a problem for low-power sleep mode, cann’t be in sleep mode with TTE. The QP can run in IOC mode and can buffer the the events, no limited for events coming time, and QP can enter low-power sleep mode if no events coming.

There are some theoretical results showing that threads (aka RTOS) are technically equivalent to event systems (you call state machines). This paper describes that threads are easier for humans to grasp and use.

At best, every formerly-blocking point in your function splits your code into two functions, and most automatic variables in the function must be moved to a structure with lifetime longer than the function itself. There can be syntactic sugar using preprocessor that papers over it, but it’s not very rosy. At least in C++ you can do it while maintaining type safety — if you goof, the compiler will barf. It’s a tad harder in C, but still possible — again, at a cost of needing more syntactic sugar macros to paper over language shortcomings.

I’ve been developing deeply embedded systems since 1995 with and without RTOS.
I just picked up and focused into one core subject problem from this arcticle for which I think there is a simple solution. That is related to waiting for multiple events. Of course without truly knowing the powers of your programming language and RTOS concepts, such as events, message queues and mutexes you can use them to create very complex software that is impossible to maintain. It is the software architecture design and design patterns that help tackling that part of complexity.

For example you could specify in your software “X” architecture that all tasks use the one and same messaging interface with message queue, and all events that these tasks must wait MUST be posted into that tasks message queue. So, whenever some worker tasks waits for something to do, it waits for a message queue message to arrive. That message then tells what that event was about. The message is typically processed by the task in a message-specific message handler (or if you like to call then event specific event handlers, be my guest) and after processing the tasks returns to wait state. By using abstractions and standardization in your architecture you can really create high quality SW with RTOSses as well.

Of course the benefit from RTOS and tasks are that especially long-time consuming operations are easier to implement than being forced to divide some natural forward-only sequences into separate atomically executable subparts. Both systems have pros and cons.

But this is precisely my whole point: I argue that event-driven programming is far superior to sequential programming based on blocking. Yet all mechanisms provided by a conventional RTOS (such as semaphores, event-flags, time delays, message mailboxes, etc.) *are* exactly based on *blocking*.

Please note that I’m *not* saying that event-driven programming is not possible with an RTOS–it is and I discuss it in my “RTOS without blocking?” post. But how on earth is a programmer supposed to know *not* to use any of the RTOS mechanisms (except of the blocking on the empty queue at the top of the task) in their code? I mean, when you buy an RTOS, you pay exactly for all these semaphores, mutexes, time delays, monitors, etc.

So, in the end, what I see “in the trenches”, people fall for the blocking mechanisms, and use them, even if they know about the “event-driven” architecture you describe. For example, people use the (blocking) time delay quite a bit. But this kills the responsiveness of their tasks…

I hate RTOSes for one reason only… I didn’t design the freaking thing and when s***t goes wrong, you end up debugging someone else’s problems. I’m a big fan of doing s***t MYSELF, I don’t need a freaking ~RTOS, I’m smart enough to do better than that … and after all an RTOS has no place in small to medium embedded project … I mean c’mon, what’s the problem here ? People don’t know how to pick up the data sheet of a processor and start writing embedded C or assembly to control the peripherals of an MCU ?? There’s so many ‘noob’ programmers out there it aint even funny

What’s wrong with a superloop? I’ve been using that to implement cooperative multitasking on a small system. The “tasks” are just entry functions called from the loop, and they must not be blocking, which is easy to implement, even for waiting delays. Either with state machines or with computed gotos in combination with some task static timeout variable.

They cross-communicate using a rudimentary mailbox system. Like in the office, each “task” as an inbox and an outbox. The superloops empties the outboxes and puts the message into the appropriate inbox, acting like the postman.

Since everything is strictly sequential (except interrupts), there are no race conditions, no need for synchronisation and the like on the application level. As blocking is not allowed as per the system design, it will not be used. As long as there is only one processor core, that works well enough. KISS, keep it simple and stupid.

Watchdog triggering? At the end of the superloop, dang simple and reliable. Will hit if any of the tasks stalls/blocks.

The thing where this doesn’t work that well is when real time constraints come into play. In simple cases, like switching off a motor in case of overcurrent, that can be done in an interrupt, leaving the application processing part for the relevant “task”, but that will go only so far.

The phenomenon seems to be, however, that many people go for a REAL TIME OS although they don’t need the RT parts of it?! No wonder why this ends in a mess.