Anders Hejlsberg, a distinguished engineer at Microsoft, led the team that
designed the C# (pronounced C Sharp) programming language. Hejlsberg first vaulted onto the
software world stage in the early eighties by creating a Pascal compiler for MS-DOS and CP/M.
A very young company called Borland soon hired Hejlsberg and bought his compiler, which was
thereafter marketed as Turbo Pascal.
At Borland, Hejlsberg continued to develop Turbo Pascal and eventually led the team
that designed Turbo Pascal's replacement: Delphi. In 1996, after 13 years with Borland, Hejlsberg
joined Microsoft, where he initially worked as an architect of Visual J++ and the Windows Foundation
Classes (WFC). Subsequently, Hejlsberg was chief designer of C# and a key participant in the
creation of the .NET framework. Currently, Anders Hejlsberg leads the continued development
of the C# programming language.

On July 30, 2003, Bruce Eckel, author of Thinking in C++ and Thinking in Java,
and Bill Venners, editor-in-chief of Artima.com, met with Anders Hejlsberg in his office at Microsoft
in Redmond, Washington. In this interview, which is being published in multiple installments on Artima.com
and on an audio CD-ROM to be released by Bruce Eckel,
Anders Hejlsberg discusses many design choices of the C# language and the .NET framework.

In Part I: The C# Design Process, Hejlsberg discusses
the process used by the team that designed C#, and the relative merits of usability
studies and good taste in language design.

In Part VI: Inappropriate Abstractions, Hejlsberg and other members of the C# team discuss
the trouble with distributed systems infrastructures that attempt to make the
network transparent, and object-relational mappings that attempt to make the
database invisible.

In Part VII: Generics in C#, Java, and C++, Hejlsberg compares C#'s generics implementation to
Java generics and C++ templates, describes constraints in C# generics, and describes
typing as a dial.

Interpreting and Adaptive Optimizations

Bill Venners: One difference between Java bytecodes and IL [Intermediate Language] is that Java bytecodes
have type information embedded in the instructions, and IL does not. For example, Java has
several add instructions: iadd adds two ints, ladd
adds two longs, fadd adds two floats, and and
dadd adds two doubles. IL has add to add two
numbers, add.ovf to add two numbers and trap signed overflow, and
add.ovf.un to add two numbers and trap unsigned overflow. All of these
instructions pop two values off the top of the stack, add them, and push the result back. But
in the case of Java, the instruction indicates the type of the operands. A fadd
means two floats are sitting on the top of the stack. A ladd
means there two longs are sitting on the top of the stack. By contrast, the
CLR's [Common Language Runtime] add instructions are polymorphic, they add the two values on the top of
the stack, whatever their type, although the trap overflow versions differentiate between
signed and unsigned. Basically, the engine running IL code must keep track of the types of
the values on the stack, so when it encounters an add, it knows which kind of
addition to perform.

I read that Microsoft decided that IL will always be compiled, never interpreted.
How does encoding type information in instructions help interpreters run more
efficiently?

Anders Hejlsberg: If an interpreter can just blindly do what the instructions say
without needing to track what's at the top of the stack, it can go faster. When it sees an
iadd, for example, the interpreter doesn't first have to figure out which kind of
add it is, it knows it's an integer add. Assuming someone has already verified that the stack
looks correct, it's safe to cut some time there, and you care about that for an interpreter. In
our case, though, we never intended to target an interpreted scenario with the CLR. We
intended to always JIT [Just-in-time compile], and for the purposes of the JIT, we needed to track the type
information anyway. Since we already have the type information, it doesn't actually buy us
anything to put it in the instructions.

Bill Venners: Many modern JVMs [Java virtual machines] do adaptive optimization, where they start by
interpreting bytecodes. They profile the app as it runs to find the 10% to 20% of the code
that is executed 80% to 90% of the time, then they compile that to native. They don't necessarily
just-in-time compile those bytecodes, though. A method's bytecodes can still be executed by
the interpreter as they are being compiled to native and optimized in the background. When
native code is ready, it can replace the bytecodes. By not targeting an
interpreted scenario, have you completely ruled out that approach to execution in a CLR?

Anders Hejlsberg: No, we haven't completely ruled that out. We can still interpret.
We're just not optimized for interpreting. We're not optimized for writing that highest
performance interpreter that will only ever interpret. I don't think anyone does that
anymore. For a set top box 10 years ago, that might have been interesting. But it's no longer
interesting. JIT technologies have gotten to the point where you can have multiple
possible JIT strategies. You can even imagine using a fast JIT that just rips quickly, and
then when we discover that we're executing a particular method all the time, using another
JIT that spends a little more time and does a better job of optimizing.
There's so much more you can do JIT-wise.

Bill Venners: When I asked you earlier (In Part IV) about
why non-virtual methods are the default
in C#, one of your reasons was performance. You said:

We can observe that as people write code in Java, they forget to mark their methods final.
Therefore, those methods are virtual. Because they're virtual, they don't perform as well.
There's just performance overhead associated with being a virtual method.

Another thing that happens in the adaptive optimizing JVMs is they'll inline virtual method
invocations, because a lot of times only one or two implementations are actually being used.

Anders Hejlsberg: They can never inline a virtual method invocation.

Bill Venners: My understanding is that these JVM's first check if the type of the object
on which a virtual method call is about to be made is the same as the one or two they expect,
and if so, they can just plow on ahead through the inlined code.

Anders Hejlsberg: Oh, yes. You can optimize for the case you saw last time and check
whether it is the same as the last one, and then you just jump straight there. But there's
always some overhead, though you can bring the overhead down to fairly minimum.

Unsafe Code in C# and the CLR

Bill Venners: The CLR has IL instructions, and C# has syntax, for unsafe activities
such as pointer arithmetic. By contrast, Java's bytecodes and syntax has no support for
unsafe activities. When you want to do something unsafe with a JVM, Java basically forces
you to write C code and use the Java Native Interface (JNI). Why did you decide to make it
possible to express unsafe code in IL and C#?

Anders Hejlsberg: The irony is that although there have been all kinds of debate and
writing about how C# has unsafe code and "Oh my God, it is badness," the funny thing is that
unsafe code is a lot safer than any kind of code you would ever do with JNI. Because in C#,
unsafe code is integrated with the language and everybody understands what's going on.

First of all let's just immediately do away with the notion that there is a security hole with
unsafe code, because unsafe code never runs in an untrusted environment, just like JNI code
never runs in an untrusted environment. The right way to think about unsafe code is that it takes the
capabilities of JNI and integrates them into the programming language. That makes it easier, and
therefore less error prone, and therefore less unsafe, to write code for interoperating with
the outside world.

Bruce Eckel: Are you sorry you called it unsafe?

Anders Hejlsberg: No. I think you should call a spade a spade. It is unsafe, right?

Bill Venners: Are the marketing people sorry?

Anders Hejlsberg: Oh yeah. And we actually had those discussions. They said, "Oh, can't you call
it..."

Bill Venners: Special code.

Bruce Eckel: Put a positive spin on it.

Anders Hejlsberg: We said no. We stood our ground and said, "No, it's unsafe. Let's
call it unsafe," because we wanted it to stand out. If you can avoid writing unsafe code, you
should. Sometimes you do need to write it, and then we want it to be clear in your code
precisely where you wrote it. You can always search for the word unsafe in
your code and find all those places.

Bill Venners: Your point is that the unsafe code approach, because it is less error prone
than the JNI approach, is actually safer.

Anders Hejlsberg: Yes, and honestly I think experience bears us out too. People have
a lot of problems writing JNI code.

Value Types

Bill Venners: C# and the CLR support value types, which can exist both as values on
the stack and objects on the heap. By contrast, Java has separate primitive types and wrapper
types. In the design of C# and the CLR, to what extent were value types a performance consideration
versus a usability consideration?

Anders Hejlsberg: There is clearly a performance aspect to it. One possible solution would
be to say, "There are no value types. All types are heap allocated. Now we have representational
identity, and so we're done, right?" Except it performs like crap. We know that from
Smalltalk systems that did it that way, so something better is needed.

Over time, we've seen two schools of thought. Either you are very object-oriented, pay the
performance penalty, and get those capabilities; or you bifurcate a type system, like Java and
C++. In a bifurcated type system, you have primitives, which are endowed with special capabilities, and the user-
extensible realm of classes, in which you don't get to do certain things. And there's no über-type
for everything.
The notion that you can treat any piece of data as an object seems so benign. What's the big
deal?
When you can't treat ints as primitives, you can just use a wrapper type
that has an identity. That's true, but all that manual wrapping is irritating and gets in your
way.

The way we implemented it in C# and the CLR, I think we get to have our cake and eat it too.
Value types are just as efficient as Java or C++ primitives, as long as you treat them as values.
Only if you try to treat them as objects do they become heap
allocated objects on demand, through boxing and unboxing.
It gives you this beauty and simplicity.

Immutables

Bill Venners: In addition to being a C# and CLR construct, the value type is
a general object-oriented programming concept. Another such concept is immutable types.

When I went to the first JavaOne back in 1996, everyone seemed to have one complaint
about what they missed from C++ that Java didn't have. Different people had different
complaints, but they all seemed to have at least one complaint. My complaint was const. I really liked
what I could do with const in C++, though somehow I have gotten along in Java just fine
without it.

Did you consider including support for the concept
of immutable directly in C# and the CLR?

Anders Hejlsberg: There are two questions in here. With respect to
immutability, it's tricky because what you're saying when you say something is immutable, is that
from an external perspective, I cannot observe any mutation. That doesn't necessarily mean
that it doesn't have a cache inside that makes it go more efficiently. It's just on the outside it
looks immutable. That's hard for a compiler to figure out. We could certainly have a rule that
says you can only modify the fields of this object in the constructor. And we could make certain
usability guarantees. But it actually rules out some
scenarios that are in use. So we haven't codified immutability as a hard guarantee, because it's a hard
guarantee to make. The concept of an immutable object is very useful, but it's just up to the
author to say that it's immutable.

Bill Venners: Immutability is part of the semantics of the class.

Anders Hejlsberg: Yes. With respect to const, it's interesting, because we hear that
complaint all the time too: "Why don't you have const?" Implicit in the question is, "Why don't you have const
that is enforced by the runtime?" That's really what people are asking, although they don't come
out and say it that way.

The reason that const works in C++ is because you can cast it away.
If you couldn't cast it away, then your world would suck. If you declare a method that
takes a const Bla, you could pass it a non-const Bla. But if it's
the other way around you can't. If you declare a method that takes a non-const Bla,
you can't pass it a const Bla. So now you're stuck. So you gradually need a const version
of everything that isn't const, and you end up with a shadow world. In C++ you get away
with it, because as with anything in C++ it is purely optional whether you want this check
or not. You can just whack the constness away if you don't like it.

Next Week

Come back Monday, February 10 for an interview with Eric Gunnerson, C# Compiler Program Manager.
If you'd like to receive a brief weekly email
announcing new articles at Artima.com, please subscribe to
the Artima Newsletter.

Talk Back!

Have an opinion about the design principles presented in this article?
Discuss this article in the Articles Forum topic,
CLR Design Choices.