C# (or maybe C++) compiler by Anton Golov (3809277)
Frivolous changes to existing code:
* Removed unused keywords/types, including try/catch, floating point types, and
array support in the grammar.
Core features implemented (1-6):
* Boolean and character constants are handled and have the correct type.
This makes the latter effectively useless, as they can only be stored and
printed; arithmetic on chars is not implemented.
* The operators have the correct priorities. All operators except the
assignment and comparison operators are left-associative. Assignment is
right-associative, and comparison is non-associative.
* Static method calls work fine. At the moment, only static methods of the
current class can be called. The caller is responsible for cleanup.
* The print syntax works. For efficiency reasons, the arguments are
evaluated right-to-left.
* Methods can have a result. It is returned via the result register.
* Local variables can be used.
Bonus features implemented (7, 9, 11, 12, 13, 14):
* Lexer comments of both types are discarded.
* The for statement is implemented in the parser, and internally treated as
a while statement.
* The code generator evaluates both logical operators lazily.
* Static member functions and variables may be defined and using the C++ syntax
of ClassName::FunctionName(args) or ClassName::VariableName. An unqualfied
function call is assumed to be a static function call to a function of the
current class. All other kinds of member access must be qualified.
Static members are placed near the vtable, and lead to "code modified"
warnings. Please ignore.
Derived classes have their own instances of statics.
The current implementation of statics is suboptimal as all but the first
static member requires addition to access. This could be optimised out by
generating a label for every static member or allowing the assembler to
handle statements like LDC label+3.
* The following error messages are produced:
* Undeclared local.
* Undeclared static function.
* Undeclared static member.
* Undeclared non-static function.
* Undeclared non-static member
* Nonexistent type.
* Usage of rvalues as lvalues.
* Type mismatch.
The error-reporting itself is lacking; lack of position information and
crude checks means the messages aren't particularly useful. Oh, and they
only come one at a time, and stop all further compilation.
* Objects can be created using the new syntax. New requires parentheses but
does not allow for passing arguments (constructors are not implemented).
Objects are stored on a heap, the size of which can be adjusted with the
--heapsize flag. Every allocated object has an overhead of 1 byte. Objects
can be destroyed using delete, as in C++. Any kind of silly usage of the
heap is undefined behaviour, no diagnostic required. :)
Bonus bonus features implemented:
* Local variables may be initialised at the point of declaration.
* The type of such variables may be left for the compiler to deduce using
"var".
* A number of optimisations are applied.
* Classes may inherit from other classes using the standard C# syntax. The
derived class receives all members of the base class, and a reference to a
derived class object can be converted to a reference to a base class object.
Deletion via base class objects works correctly.
Virtual function access suffers the same problem as static member access.
* All non-static functions are by default polymorphic, and can be overriden.
* C++-style lambda functions are supported. The syntax is
[captures](parameters) -> return_type { body }
Where captures is a comma-separated list of locals to capture. The return
type may be omitted, together with the arrow, in which case void is
assumed.
All captures happen by-value. Captures are treated as locals within the
lambda body; they may be assigned to, but this will not affect the value
in subsequent calls.
A lambda expression allocates memory, which can be reclaimed by deleting
the resulting function reference.
* Lambdas have the type @rt(args), where rt is the return type, and the
arguments are not given names. Lambdas may be passed around as any
object.
* Conversions of function types take into account conversions of the
parameter and return types; if D is derived from B, @D(B) will convert
into a @B(D) just fine (but not the other way around!).
* Converting member functions to a lambda type will fail because the name
won't be found. :) (Ran out of time before I could implement this.)
* A lambda x can be called using the syntax %x(args).
Some comments on the architecture:
* Due to the overwhelming complexity, the algebra has been split into three
pieces:
* The first phase (CSharpSTBuild) builds the symbol table and does some
basic sanity checking at class-level.
* The second phase (CSharpTypeCheck) does type checking and extracts all
information for generating expressions from the global environment.
* The first phase generates the actual code.
* The second algebra transforms all expressions and some statements into a
slightly different form. All references to types are removed, and instead
any global level information is filled directly in. This allows for a much
simpler implementation of the code generator.
* There are a number of optimisers. These are implemented as parsers that
run over the SSM code, matching anything they can optimise and replacing
it with the updated version, while keeping anything not matched the same.
* The heap is implemented as a circular single linked list. Only free
blocks are linked; they contain their size and a pointer to the next
block. Allocation involves splitting off the front of a free block and
removing it from the linked list. Both allocation and deallocation are
linear in the size of the list in the worst case.