So you have taken the test and you think you are ready to get started with OS development? At this point, many OS-deving hobbyists are tempted to go looking for a simple step-by-step tutorial which would guide them into making a binary boot, do some text I/O, and other "simple" stuff. The implicit plan is more or less as follow: any time they'll think about something which in their opinion would be cool to implement, they'll implement it. Gradually, feature after feature, their OS would supposedly build up, slowly getting superior to anything out there. This is, in my opinion, not the best way to get somewhere (if getting somewhere is your goal). In this article, I'll try to explain why, and what you should be doing at this stage instead in my opinion.

Neolander, I appreciate your view, but I cannot let you get away with that type of reasoning.

All of today's (major) kernels predate the advent of efficient VMs. With some original out of the box thinking, plus the benefit of the technological progress in the field in the past 15 years, a type safe efficient kernel is not far-fetched at all.

Per usual, the main impediments are political and financial rather than technological.

Okay, let's explain my view in more details.

First, let's talk about performance. I've been seeing claims that interpreted, VM-based languages, can replace C/C++ everywhere for some times. That they now are good enough. I've seen papers, stats, and theoretical arguments for this to be true. Yet when I run a Java app, that's not what I see. As of today, I've seen exactly one complex java application which had almost no performance problems on a modern computer : the Revenge of the Titans game. Flash is another good example of popular interpreted language which eats CPU (and now GPU) time for no good reason. It's also fairly easy to reach the limits of Python's performance, in that case I've done it myself with some very simple programs. In short, these languages are good for light tasks, but still not for heavy work, in my experience.

So considering all of that, what I believe now is that either the implementation of current interpreters sucks terribly, or that they only offer the performance they claim to offer when developers use some specific programming practices that increase the interpreter's performance.

If it's the interpreter implementation, then we have a problem. Java has been here for more than 20 years, yet it would still not have reached maturity ? Maybe what this means is that although theoretically feasible, "good" VMs are too complex to actually be implemented in practice.

If it's about devs having to adopt specific coding practices in order to make code which ran perfectly well in C/C++ run reasonably well in Java/Flash/Python... Then I find it quite ironical, for something which is supposed to make developer's life easier. Let's see if the "safe" language clan will one day manage to make everyone adopt these coding practice, I'll believe it when I see it.

Apart from the performance side of things, in our specific case (coding a kernel in a "safe" language that we'll now call X), there's another aspect of things to look at. I'm highly skeptical about the fact that those languages could work well at the OS level AND bring their usual benefits at the same time.

If we only code a minimal VM implementation, ditching all the complex features, what we end up having is a subset of X that is effectively perfectly equivalent to C, albeit maybe with slightly worse performance. Code only a GC implementation, and your interpreter now has to do memory management. Code threads, and it has to manage multitasking and schedule things. Code pointer checks, and all X code which needs lots of pointers see its performance sink. In short, if you get something close the the desktop language X experience, and get all of the usual X benefit in terms of safety, your interpreter ends up becoming a (bloated) C/C++ monolithic kernel in its own right.

Then there are some hybrid solutions, of course. If you want some challenge and want to reduce the amount of C/C++ code you have to a minimal level, you can code memory management with a subset of X that does not have GC yet. You can code pointer-heavy code with a subset of X where pointer checks are disabled. And so on. But except for proving a point, I don't see a major benefit in doing this instead of assuming that said code is dirty by its very nature and just coding it in C/C++ right away.

That's exactly what I meant when I called it a legacy feature. However, conceivably the feature might not have been dropped if we had popular microkernels around using it.

Yes, but you did not answer my question. Why would they have used segmentation instead of flat seg + paging ? What could have segmentation permitted that paging cannot ?

You need to either trust your binaries are not malicious, or validate them for compliance somehow.
If we're running malicious kernel modules which are never the less "in spec", then there's not much any kernel can do. In any case, this is not a reason to dismiss a microkernel.

Again, I do not dismiss microkernels. But I do think that forcing a specific, "safe" compiler in the hand of kernel module devs is a bad idea.

"First, let's talk about performance. I've been seeing claims that interpreted, VM-based languages, can replace C/C++ everywhere for some times...."

Firstly, I agree about not using interpreted languages in the kernel, so lets get that out of the picture right away.

Secondly, to my knowledge, the performance problems with Java stem from poor libraries rather than poor code generation. For instance, Java graphics were designed to be easily portable rather than highly performing, therefor it's very poorly integrated with the lower level drivers. Would you agree this is probably where it gets it's reputation for bad performance?

Thirdly, many people run generic binaries which aren't tuned for the system they're using. Using JIT technology (actually, the machine code could be cached too to save compilation time), the generated code would always be for the current processor. Some JVMs go as far as to optimize code paths on the fly as the system gets used.

I do have some issues with the Java language, but I don't suppose those are relevant here.

"I'm highly skeptical about the fact that those languages could work well at the OS level AND bring their usual benefits at the same time."

Can you illustrate why a safe language would necessarily be unsuitable for use in the kernel?

"If we only code a minimal VM implementation, ditching all the complex features, what we end up having is a subset of X that is effectively perfectly equivalent to C, albeit maybe with slightly worse performance."

'C' is only a language, there is absolutely nothing about it that is inherently faster than Ada or Lisp (for instance). It's like saying Assembly is faster than C, that's not true either. We need to compare the compilers rather than the languages.

GNU C generates sub-par code compared with some other C compilers, and yet we still use it for Linux.

"Code only a GC implementation, and your interpreter now has to do memory management. Code threads, and it has to manage multitasking and schedule things."

I don't understand this criticism, doesn't the kernel need to do these things regardless? It's not like you are implementing memory management or multitasking just to support the kernel VM.

"Code pointer checks, and all X code which needs lots of pointers see its performance sink."

This is treading very closely to a full blown optimization discussion, but the only variables which must be range checked are those who's values are truly unknown within the code path. The compiler can optimize away all range checks on variables who's values are implied by the code path. In principal, even an unsafe language would require variables to be range checked explicitly by the programmer (otherwise they've left themselves vulnerable to things like stack overflow), which should be considered bugs and thus an unfair "advantage".

"Why would they have used segmentation instead of flat seg + paging ? What could have segmentation permitted that paging cannot ?"

In principal, paging can accomplish everything selectors did. In practice though switching selectors is much faster than adjusting page tables. A compiler could trivially ensure that the kernel module didn't overwrite data from other modules by simply enforcing the selectors except in well defined IPC calls - thus simultaneously achieving good isolation and IPC performance. Using page tables for isolation would imply that well defined IPC calls could not communicate directly with other modules without an intermediary helper or mucking with page tables on each call.

Firstly, I agree about not using interpreted languages in the kernel, so lets get that out of the picture right away.

In fact, that was what most of my post was about

Secondly, to my knowledge, the performance problems with Java stem from poor libraries rather than poor code generation. For instance, Java graphics were designed to be easily portable rather than highly performing, therefor it's very poorly integrated with the lower level drivers. Would you agree this is probably where it gets it's reputation for bad performance?

Probably. Java is often praised for its extensive standard library, so if said library is badly implemented, the impact will probably be at least as terrible as if the interpreter is faulty since all Java software is using it.

Thirdly, many people run generic binaries which aren't tuned for the system they're using. Using JIT technology (actually, the machine code could be cached too to save compilation time), the generated code would always be for the current processor. Some JVMs go as far as to optimize code paths on the fly as the system gets used.

Does it have that much of an impact ? I'm genuinely curious. Didn't play much with mtune-like optimizations, but I'd spontaneously think that the difference is the same as between GCC's O2 and O3.

Can you illustrate why a safe language would necessarily be unsuitable for use in the kernel?

That was what the rest of the post was about.

'C' is only a language, there is absolutely nothing about it that is inherently faster than Ada or Lisp (for instance). It's like saying Assembly is faster than C, that's not true either. We need to compare the compilers rather than the languages.

Don't know... I'd say that languages have a performance at a given time, defined by the mean performance of the code generated by the popular compilers/interpreters of that time.

Anyway, what I was referring to is that interpreted languages are intrinsically slower than compiled languages, in the same way that an OS running in a VM is intrinsically slower than the same OS running on the bare metal : there's an additional bytecode re-compilation overhead. They don't have to be much slower though : given an infinite amount of time and RAM, compiled and interpreted code end up having equal speed in their stationary state, and interpreted can even be slightly faster due to machine-specific tuning. The problem is the transient, and situations where only few RAM is available.

"Code only a GC implementation, and your interpreter now has to do memory management. Code threads, and it has to manage multitasking and schedule things."

I don't understand this criticism, doesn't the kernel need to do these things regardless? It's not like you are implementing memory management or multitasking just to support the kernel VM.

Sure, but if the kernel's VM ends up doing most of the job of a kernel, what's the point of coding a kernel in X at all ? The VM, which is generally coded in a compiled language, ends up being close to a full-featured kernel, so I don't see the benefit : in the end, most of the kernel is actually coded in whatever language the VM is written in.

"Code pointer checks, and all X code which needs lots of pointers see its performance sink."

This is treading very closely to a full blown optimization discussion, but the only variables which must be range checked are those who's values are truly unknown within the code path. The compiler can optimize away all range checks on variables who's values are implied by the code path. In principal, even an unsafe language would require variables to be range checked explicitly by the programmer (otherwise they've left themselves vulnerable to things like stack overflow), which should be considered bugs and thus an unfair "advantage".

Take a linked list. When parsing it, a process ends up looking at lots of pointers without necessarily knowing where they come from. This is the kind of code which I had in mind.

In principal, paging can accomplish everything selectors did. In practice though switching selectors is much faster than adjusting page tables. A compiler could trivially ensure that the kernel module didn't overwrite data from other modules by simply enforcing the selectors except in well defined IPC calls - thus simultaneously achieving good isolation and IPC performance. Using page tables for isolation would imply that well defined IPC calls could not communicate directly with other modules without an intermediary helper or mucking with page tables on each call.

It's possible to have overhead only at process load and first call time with paging and tweaked binaries, but I need a better keyboard than my cellphone's one to explain it.

the generated code would always be for the current processor. Some JVMs go as far as to optimize code paths on the fly as the system gets used.

Yes, this is often stated as a PRO for JIT compiled code, but since it is JIT (just-in-time) the actual optimizations that can be performed during an acceptable timeframe are VERY POOR.

'C' is only a language, there is absolutely nothing about it that is inherently faster than Ada or Lisp (for instance). It's like saying Assembly is faster than C, that's not true either. We need to compare the compilers rather than the languages.

Well, assembly allows more control than C, so given two expert programmers, the Assembly programmer will be able to produce atleast as good and often better code than the C programmer, since some of the control is 'lost in translation' when programming in C as opposed to Assembly. Obviously this gets worse as we proceed to even higher-level languages where the flexibility offered by low level code is traded in for generalistic solutions that works across a large set of problems but are much less optimized for each of them.

GNU C generates sub-par code compared with some other C compilers, and yet we still use it for Linux.

Which compilers would that be? Intel Compiler?

I don't understand this criticism, doesn't the kernel need to do these things regardless?

While both a GC and manual memory management has the same cost in the actual allocating and freeing of memory (well almost, a GC running in a WM will have to ask the host OS for more memory should it run out of heap memory which is VERY costly, also in order to reduce memory fragmentation it often compacts the heap which means MOVING memory around, again VERY costly though hopefully less costly than asking the host OS for a heap resize), a GC adds the overhead of trying to decide IF/WHEN memory can be reclaimed which is a costly process.

I've been looking forward to seeing a managed code OS happening because I am very interested in seeing how it would perform. My experience tells me that it will be very slow, and that programs running on the OS will be even slower, last I perused the source code of a managed code OS it was filled with unsafe code, part of which was there because of accessing hardware registers but alot of it also for speed, which is a luxury a program RUNNING on said OS will NOT have.

Hopefully we will have a managed code OS someday capable of more than printing 'hello world' to a terminal which might give us a good performance comparison but personally I'm sceptic. I think Microsoft sent Singularity to academia for a reason.

In principal, paging can accomplish everything selectors did. In practice though switching selectors is much faster than adjusting page tables. A compiler could trivially ensure that the kernel module didn't overwrite data from other modules by simply enforcing the selectors except in well defined IPC calls - thus simultaneously achieving good isolation and IPC performance. Using page tables for isolation would imply that well defined IPC calls could not communicate directly with other modules without an intermediary helper or mucking with page tables on each call.

As I said, it's possible to have some IPC cost and page table manipulation only when a process is started, and maybe at first function call. Here's how :
-Functions of a process which may be called through IPC must be specified in advance, so there's no harm tweaking the binary so that all these functions and their data end up being in separate pages of RAM, in order to ease sharing. I think it should also be done with segmentation anyway.
-The first time a process makes an IPC call, part of or all of these pages are mapped in his address space, similar to the way shared libraries work.
-Further IPC calls are directly made in the mapped pages, without needing a kernel call or a context switch.

If we are concerned about the first call overhead (even though I'd think it should be a matter of milliseconds), we can also make sure that the mapping work is done when the second process is started. To do that, we ensure that our "shared library" mapping process happens when the process is started, and not on demand.