After looking through the processors listed in the Embedded Processing Directory, I think it is safe to say that processors that include MMUs are predominantly limited to the 32-bit processor space. While there are some 8-, and 16-bit processors that lay claim to an MMU, the majority of these sized processors, listed across approximately 60 pages in the directory, do not include an MMU. These smaller processors support smaller memory sizes and are not likely targets for consolidating many functions within a single processing core.

Even in the 32-bit processor space, there is a lot of activity at the small end of the processing spectrum. Consider the ARM Cortex-M0 that only hit the market within the last year and does not include an MMU. The Cortex-M0 is the smallest 32-bit core ARM offers and it experienced the fastest adoption rate of any ARM core ever. The Cortex-M3 does not support an MMU, but it does support an optional MPU (memory protection unit). In fact, MMU support only exists in the Cortex-Ax class processors with the Cortex-Rx processors only supporting an optional MPU.

I do not believe there is a universal answer to using an MMU; rather, it seems that when to use an MMU depends greatly on the choice of processor and the type of software the end-device requires. Is using an operating system an essential ingredient to using an MMU? Is there a size threshold for when using an MMU makes sense? Does the use of dynamic or static memory allocation affect when it makes sense to insist on an MMU? Does an MMU make sense in systems that have deterministic hard real-time requirements?

In other words, where is the line between using an MMU and not using an MMU? The embedded space is too large to generalize an answer to this question, so I ask that you share what type of systems you work with and any specific engineering reasoning you use when deciding whether or not to use an MMU in your design.

If you have a question you would like to see in a future week, please contact me.

This entry was posted
on Wednesday, July 21st, 2010 at 10:08 am and is filed under Question of the Week.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.

9 Responses to “To MMU or not to MMU?”

Memory protection is needed for security/stability if you have multiple processes running in your system. An MMU is useless for deeply embedded systems running only a single multi threaded process. That’s the whole religion behind it

MMU slows down access to the physical memory and local bus, at least on the “large” processors like PowerPC or Intel.
If you run uC/OS with multiple threads (not processes) there is no advantage of setting up MMU except that it slows down local bus access 2 to 10 times depends how you do it.
The bottom line if your system is complex and built upon multiple processes written by multiple teams you better off with MMU. If you are building small (a few million lines of code) high performance system MMU will hinder your performance.

An MMU provides both protection, and management in the form of a logical to physical mapping layer, both of which I’d suggest have their uses even without multiple processes.

Protection is useful in the development/debug phase of all sizes of products. If nothing else for NULL dereference detection and quicker detection of errant pointers while you’re “close” to the origin of the error, rather than many cycles down the road. It’s less useful if the hardware provides exception faults for accessing memory that doesn’t exist on the bus and you can somehow get an exception for access in the first 1K.

Management can be useful in embedded projects even without linux or the like. At a previous incarnation we enhanced a standard RTOS with MMU support to support demand paging of a much larger image from flash storage into a physical 256KB of memory without resorting to manual overlays. It was also useful in getting different code sets into ‘fast’ memory for different modes of operation without resorting to overlay trickery.

Please note that an MMU is a Memory Management Unit (see ARM 9, for example). ONE of its features is that it can prevent incorrect memory accesses, i.e. memory protection. By contrast, an MPU is specifically a Memory Protection Unit, and is a much less complex device than an MMU (see Atmel AVR32).

There are certain CPU’s that have a built-in MPU (Analog Devices’ Blackfin, ARM Cortex-M3, etc.), but they do not have an MMU. The MPU gives your application memory protection, while an MMU gives you full virtual memory (with MPU implied). An MMU will slow down a system (due to frequent accesses to the page table), but an MPU is very time efficient– usually just loading some special registers when there is a thread change.

If you have a system that is mission critical, the use of an MPU is wise, and having an RTOS that provides API’s to control it makes life easier. This way, each thread can have some guaranteed sections of memory. If the RTOS also guarantees a certain amount of CPU time, then you have a much safer system.

An MMU is not just a performance hit, it can be an especially non-deterministic one, for example, when page-based address translation is involved. That common type of translation minimizes its performance penalty by caching translations between virtual and physical addresses. However, when the translation misses in its cache, or context switching causes many things to be reloaded, each “page table walk” adds various additional memory accesses.

All that said, MMUs are a tremendous boon to reliability by making it easier to catch programming errors that would otherwise go undetected before causing much more costly problems. MMUs also make it easier to design the software architecture in a way that increases reliability, lowers costs, and yes, sometimes even increases performance: Not every performance problem can be placed at the feet of hardware speed. I’d venture to say that unless you’re running up against hard physical limits, plenty of us can make having an MMU a non-issue from a performance perspective.

There’s probably enough knowledge in this group even to take a given problem and figure out how to make MMUs a non-issue from a real-time (determinism) perspective, given sufficient flexibility in the system architecture. It is done all the time; simply look around yourself right now and you’ll see examples.

Though I think his cutoff of a million LOC is way too high, Alex has a good point about where an MMU doesn’t make sense: Where the software is “simple enough”.

An 8- or 16-bit processor with scarcely a 64K address space is much more often going to be “simple enough” compared with a processor having code that resides in megabytes or more of storage.

Where the amount of code is too large for any meaningful consensus that it is “simple”, but an MMU is an intractable show-stopper regardless, those would be the interesting examples to hear about from the “no-MMU” crowd.

i think a simple form of MPU, that simply detects invalid memory accesses is useful very, very often, for debugging and catching run-time errors

anything more depends upon the situation.
MPU that protects memory per process is only useful if you have indeed multiple processes, or an os/application boundary

MMU with address translation: flash image that does not fit in ram and not directly accessigle, HD as ram extension
more in general: if you want to treat storage not directly accessible to the CPU like it is, and/or cache things

so for most single-process apps: a very simple MPU is sufficient

I doubt performance when using a MMU is an issue for high-end controllers, but the added complexity is (extra address translation step), so better avoid if it offers nothing.

For 16/8 bit systems with rather big memories (the normal case nowadays), even an basic MPU is less useful because almost any address is a valid one.

We used a basic MPU on a 16 bit system, no OS, one thread, with great benefits (reduced debug time),

and we even used memory translation windows (=MMU) on small sytems to our benefit
- simpler address computations, big gain on many 8-bit systems
- more stable memory map, if HW descriptor size changes

how about use MMU+Dcache?
as you know, when the MMU is disabled, the Dcache also couldn’t be used. So if we use the MMU, and enable the Dcache, will the performance be better than no MMU and no Dcache?

All MMU entries can be locked. Also, most MMU tables do not have to be several levels deep. It is entirely possible to fit an entire system description within the TLB (translation look-aside buffer). It is also possible to write embedded software in C++. It is just more complex. I think that the arguments are the same; with perhaps more benefits to use an MMU. Perhaps when people say MMU, they believe that it must have ‘page faults’, etc. The only time the ‘page fault’ should occur in a statically configured embedded system is when some code is not addressing memory that it owns. It is also quite easy to lock the MMU entries in the interrupt path. Too bad people seem so negative about this post. Thanks for trying Robert.