This is a first article, intended to be an introduction to toC!, more articles presenting syntax and inner parts of thecompiler will follow.

C! is one of our projects here at LSE (System Lab of EPITA.) It is a
programming language oriented toward lower-level system programming (kernel or
driver for example.)

We were looking for a modern programming language for kernel programming and
after trying some (D, C++, OCaml … ) it appears that most of them were too
oriented toward userland to be used in kernel programming context.

So we decide to modify and extend C to fit our need and quickly aim toward
a new programming language: C!

Thus, to write some kernel code (or a complete kernel) we need a native
language with direct access to this kind of low-level operations. This implies
the ability to include ASM code somehow, to manage function calls from ASM and
to build standard functions (so you can have function pointers for various
interruption mechanisms.)

And, since you're not in user-land, you can't use user-land facilities
(standard system libs for example). For most languages this means that you must
rewrite memory allocators and tools that come shipped with them (especially for
managed memory languages using garbage collection).

Another issue is the binary format: when writing user-land programs, your
compiler builds a file suited for the kernel binary loader. On most current Unix
systems, your file will respect the ELF format. Of course, you can write an ELF
loader in your bootloader (or any part of your booting process for that
matters) but since you are managing memory and memory mapping, you can't rely
on the way a program is loaded on your system and thus the organization of your
ELF must reflect these constraints.

Of course, this issue is not language dependent, even with a pure ASM or C
kernel, you will have to control the way your linker builds the final
binary. But, in C (and obviously in ASM) there are no major issues there, the
structure of your program will be sufficiently simple so that the only
important question is: where will I be in memory?

So, what's wrong with modern languages?

For the most evolved ones such as languages with transparent memory management
and garbage collection, one of the most important problem is to provide a
replacement for all aspects of the standard libraries of the system: memory
allocator, threads and locks management, etc. And in that case some aspects
just can't be rewritten the same way it is in user-land.

The C++ situation is somehow better and worse: in theory there is less runtime
needs than most modern languages. The good part is that you can bypass the
most problematic elements of C++ (such as RTTI or exceptions) so you don't have
to fight against them. Once you've deactivated problematic features and found
what can't be used without them, you have to provide runtime elements needed
by your code: start-up code, pure virtual fallback, dynamic stack allocation
code, dynamic memory allocation for new and delete operators (for objects and
array) …

Roaming here and there, you'll find documentation on how you can write your
C++ kernel, but let's face it: is the required work really worth the pain?

What's wrong with C

So, if you're still reading me, this means that you're partially convinced that
using C++ (or D, or OCaml, or … ) is not a good idea for your kernel. But, why
not go on with the good old C programming language?

Since it was designed for that job, it is probably the best (or one of the
best) fit for it. But, we want more.

Here is a quick list of what we may find wrong or missing in C:

The C syntax contains a lot of ambiguous traps

While the type system of C is basically size based, a lot of types have an
ambiguous size (int for example)

Controlling size and signedness of integers is often painful

There is no clean way to provide some form of genericity or polymorphism

There's no typed macros

The type system and most static verification mechanisms are too basic
compared to what could be done now

C miss a namespace (or module) mechanism

While you can do object oriented programming, it is tedious and error
prone

In fact, the above list can divided in two categories:

syntax and base language issues

missing modern features

Genese of C!

Once we stated what was wrong with C, I came up with the idea that we could
write a simple syntactic front-end to C or a kind of preprocessor, where we
would fix most syntax issues. Since we were playing with syntax, we could add
some syntactic sugar as well.

We then decided to take a look at object oriented C: smart usage of function
pointers and structures let you build some basic objects. You can even have
fully object oriented code. But while it is dead simple to use code with
object oriented design, the code itself is complex, tedious, error prone and
most of the time unreadable. So, all the gain of the OOP on the outer side,
is lost on the inner side.

But, OOP means typing (Ok, I wanted static typing for OOP.) And thus, we
need to write our own type system and type checker.

Finally, from a simple syntax preprocessor, we ended up with a language of its
own.

Compiler-to-compiler

Designing and implementing a programing language implies a lot of work:
parsing, static analysis (mostly type checking), managing syntactic sugar, and
a lot of code transformations in order to produce machine code.

While syntactic parts and typing are unavoidable, code productions can be
shortened somehow: you write a frontend for an existing compiler or use a
generic backend such as LLVM. But you still need to produce some kind of
abstract ASM, a kind of generic machine code that will be transformed into
target specific machine code.

The fact is that a normal compiler will already have done a lot of
optimization and smart code transformation before the backend stage. In
our case, this means that we should do an important part of the job of a
complete C compiler while we are working with code that is mainly C (with a
different concrete syntax.)

The last solution (the one we chose) is to produce code for another compiler:
in that case all the magic is in the target compiler and we can concentrate
our effort on syntax, typing and extensions that can be expressed in the
target language.

Based on our previous discussion, you can deduce that we chose to produce C
code. Presenting all aspects of using C as a target language will be discussed
further in a future article.

Syntactic Sugar

An interesting aspect of building a high-level language is that we can add new
shiny syntax extensions quite simply. We decided to focus on syntax extensions
that offer comfort without introducing hidden complexity.

Integers as bit-arrays

In lower-level code, you often manipulate integers a bit at a time, so we
decided to add a syntax to do that without manipulating masks and bitwise
logical operands.

Thus, any integer value (even signed, but this may change, or trigger a
warning) can be used as array in left and right position (you can test and
assign bit per bit your integer!).

A small example (wait for the next article for full syntax description):

Assembly blocks

When writing kernel code, you need assembly code blocks. The syntax provided
by gcc is annoying, you have to correctly manage the string yourself (adding
newline and so on.)

On the other hand, I don't want to add a full assembly parser (as in D
compiler for example.) Despite the fact that it is boring and tedious, it
implies that the language is stuck to some architectures and we have
to rewrite the parser for each new architecture we need …

In the end, I found a way to integrate asm blocks without the noise of gcc but
keeping it close enough to be able to translate it directly. Of course, this
means that you still have to write clobber lists and stuff.

Macro stuff

Actually, C! has no dedicated preprocessing tools but we included some
syntax to provide code that will be macro rather than functions or variables.

First, you can transform any variable or function declaration into a kind of
typed macro by simply adding a sharp in front of the name (see previous
example). The generated code will be a traditional C macro with all the "dirty"
code needed to manage return, call by value and so on.

The other nice syntax extension is macro classes: a macro class provides
methods (in fact macro functions) on non object types. The idea is to define
simple and recurring operations on a value without boxing it (next article will
provide examples).

Modules

Another missing feature of C is a proper module mechanism. We provide a simple
module infrastructure sharing a lot (but far simpler) with C++ namespaces.
Basically, every C! file is a module and referring to symbols from that
module requires the module name, like namespaces. Of course you can also
open the module, that is making directly available (without namespace) every
symbol of the module.

Namespaces provide a simple way to avoid name specialization: inside the
module you can refer to it directly and outside you use the module name and
thus no inter-module conflict could happen.

What's next

In the next article of this series I will present you with C! syntax, the
very basis of the object system and macro stuff.

The compiler is still in a prototype state: all features described here are
working, but some details are still a bit fuzzy and may need you to do some
adjustments in the generated code.

As of now, you can clone C! on its LSE git
repository, take a look at the
examples in the tests directory and begin writing your own code. Unfortunately,
we don't have an automated build procedure yet, so you will have to do it step
by step.