User login

Navigation

A glimpse into a new general purpose programming language under development at Microsoft

Microsoft's Joe Duffy and team have been (quietly) working on a new programming language, based on C# (for productivity, safety), but leveraging C++ features (for performance). I think it's fair to say - and agree with Joe - that a nirvana for a modern general purpose language would be one that satisfies high productivity (ease of use, intuitive, high level) AND guaranteed (type)safety AND high execution performance. As Joe outlines in his blog post (not video!):

At a high level, I classify the language features into six primary categories:

1) Lifetime understanding. C++ has RAII, deterministic destruction, and efficient allocation of objects. C# and Java both coax developers into relying too heavily on the GC heap, and offers only â€œlooseâ€ support for deterministic destruction via IDisposable. Part of what my team does is regularly convert C# programs to this new language, and itâ€™s not uncommon for us to encounter 30-50% time spent in GC. For servers, this kills throughput; for clients, it degrades the experience, by injecting latency into the interaction. Weâ€™ve stolen a page from C++ â€” in areas like rvalue references, move semantics, destruction, references / borrowing â€” and yet retained the necessary elements of safety, and merged them with ideas from functional languages. This allows us to aggressively stack allocate objects, deterministically destruct, and more.

2) Side-effects understanding. This is the evolution of what we published in OOPSLA 2012, giving you elements of C++ const (but again with safety), along with first class immutability and isolation.

3) Async programming at scale. The community has been â€™round and â€™round on this one, namely whether to use continuation-passing or lightweight blocking coroutines. This includes C# but also pretty much every other language on the planet. The key innovation here is a composable type-system that is agnostic to the execution model, and can map efficiently to either one. It would be arrogant to claim weâ€™ve got the one right way to expose this stuff, but having experience with many other approaches, I love where we landed.

4) Type-safe systems programming. Itâ€™s commonly claimed that with type-safety comes an inherent loss of performance. It is true that bounds checking is non-negotiable, and that we prefer overflow checking by default. Itâ€™s surprising what a good optimizing compiler can do here, versus JIT compiling. (And one only needs to casually audit some recent security bulletins to see why these features have merit.) Other areas include allowing you to do more without allocating. Like having lambda-based APIs that can be called with zero allocations (rather than the usual two: one for the delegate, one for the display). And being able to easily carve out sub-arrays and sub-strings without allocating.

5) Modern error model. This is another one that the community disagrees about. We have picked what I believe to be the sweet spot: contracts everywhere (preconditions, postconditions, invariants, assertions, etc), fail-fast as the default policy, exceptions for the rare dynamic failure (parsing, I/O, etc), and typed exceptions only when you absolutely need rich exceptions. All integrated into the type system in a 1st class way, so that you get all the proper subtyping behavior necessary to make it safe and sound.

6) Modern frameworks. This is a catch-all bucket that covers things like async LINQ, improved enumerator support that competes with C++ iterators in performance and doesnâ€™t demand double-interface dispatch to extract elements, etc. To be entirely honest, this is the area we have the biggest list of â€œdesigned but not yet implemented featuresâ€, spanning things like void-as-a-1st-class-type, non-null types, traits, 1st class effect typing, and more. I expect us to have a few of these in our mid-2014 checkpoint, but not all of them.

Comment viewing options

Wow! Reading the post, they are basing this on C#, which I think is a great idea, and also sets it significantly apart from D and Rust, though its obviously directly in their category of next generation native language.

I'm a bit worried about the premise though: C# has a best in class garbage collector and I'm afraid when they are done with it all, they might not get the significant performance boost on real application that they were hoping for with a deterministic memory management. On the other hand, their approach is good: if their extensions can coexist in C#, then this is just pay as you go: either pay for GC with performance or pay for performance with more programming effort. Also, there is a huge opportunity for this kind of work in the mobile space.

I still think their is room for more hyper-productive languages: we can push even further beyond Python and Javascript, creating slower languages that have ever nicer features. I'm excited to see both sides advancing: on the left, Bret Victor-style next generation programming experiences, on the right: safe high performance with Rust, D, and hopefully C#!

I think you're right that vanilla C# can get some very impressive performance (heck, Roslyn itself is proof of that), but you may be discounting some of the benefits they have in language modifications.

Specifically, the ability to go back and "fix" certain mistakes (e.g., no non-null references) that would be completely impractical for us at this point. I'd argue that that alone can be worth it if your goal is to develop an entire new ecosystem (i.e., a research operation system) on said language infrastructure.

I actually posted this here to _limit_ noise. The ideas/comments from this community are what I was after, not HN or reddit (this tech isn't ready for that level of exposure). I don't believe Joe was after traffic (I know I wasn't...). I guess it was naÃ¯ve of me to assume this topic wouldn't get picked up by the pop tech properties. I could have used a less compelling title...?

At any rate, it's great to see innovation happening in the gp pl space at MS. Joe mentioned his desire for openness for the technology (as in open source AND open communication/transparency). That we can have this conversation in the open means Microsoft is turning a corner vis a vis transparency. I hope this trend continues.

It's interesting, though not unexpected, that folks are already comparing this language to D, Rust, and Go which also aim for a nirvana (safe, productive, fast, open) without really having many of the details required to make intelligent comparisons (on reddit, somebody made a post asserting this language is MS's _answer_ to D and Rust... WTF?).

The actual information Joe did provide in his post is worthy of intelligent discourse (and, to be fair, speculation...), but comparing this unnamed language to Go, Rust and D seems a bit premature, no? The larger problem of designing a general purpose language that doesn't sacrifice performance for safety and ease-of-use, is the interesting bit. Again, with a stated goal of openness, this gets even more intriguing.

Their goals seem admirable and achievable, although a few things stick out to me as smells:

1) Based on C#. I suppose this was chosen for political reasons, but as long as it requires an MS runtime/OS, it will never be more than an interesting curiosity to me.

2) The mention of "const" and first-class immutability is good, but not great. To be great, it would instead have "var" and first-class mutability (like ML). Enforcing immutability is good, but restricting it to those values specifically declared as constant massively lowers the impact this could have (since the compiler will only offer help to those who already know to put "const").

3) Their example of an unavoidable performance penalty required by a type system is bounds checking. Not only is this avoidable, but type systems allow us to eliminate bounds checking completely! The go-to example is dependent types, where we can put the length of a "List t" into the type to get a "Vector n t". However, "sized types", type-level numerals and other such things are possible even without full dependent types (Haskell can do it, for example http://stackoverflow.com/a/12495348/884682 ).

4) There is mention of subtyping, which makes me nervous ;)

5) "Non-null types" sounds like another place where an unsafe-by-default policy has been chosen. Option/Maybe types are the safe way to go.

This post will probably sound negative, but keep in mind that I've only spotted 5 issues out of an impressively-long list of features!

This is not true of C# (see mono), so is not in itself a reason why something based on C# would suffer from platform-sickness. Microsoft's other "new language" being developed at the moment is TypeScript and that is absolutely platform independent.

Any sentence with the word "mono" in it should really end with a "for what it's worth". Yes, it's theoretically possible to deploy C# applications on a payment-free OS, but that really doesn't much happen in practice.

Indeed. Xamarin is another great example of successfully taking C# to other places beyond Windows (iOS and Android) - in practice, not theory... Built on Mono* technologies by the folks behind Mono* technologies, Xamarin has real potential for greatly broadening C#'s footprint in the mobile world.

Indeed. In my bizdev job we're using Xamarin. It's surprisingly full-featured. The whole "mono isn't practical" thing was true maybe a decade ago, but they've made good strides to improve that. The mobile need is just driving it further into practicality.

Hallelujiah. I can't access the page, but other comments based on the summary:

Since mutability is pervasive, I wonder if references will default to unique/linear types. The HN site referenced this previous paper which seems directly related to this project.

Since they're basing the language on C#, I suppose they're sticking with the standard inheritance=subtyping object model, with the addition of traits. I hope they'll at least support higher kinded types.

The error model concerns me given the summary. It seems there are multiple mechanisms for reporting errors conditions, which simply multiplies the number of control flow patterns one has to know to understand a program. Better to have a single composable mechanism that works in the small and scales to the large. This is the same concern as treating void as not a first-class type: every operation taking a first-class function now has to be duplicated, once for functions returning a value, and again for procedures returning void.

I have a fairly popular post on other C# problems. It sounds like they're solving a few of the problems (override parameterless struct constructors, non-nullable types, quadtraic IEnumerable behaviour, better checked exceptions, better immutability/mutability support), but there are plenty of other nuisances that aren't mentioned in the summary: casting/deconstructing subtyping hierarchies ala exhaustive pattern matching, better overload resolution that accounts for all expressible type properties, operators for interfaces, implicit conversions to and from interfaces, operator interfaces for core numeric types, simpler constructors, relaxed type constraints, some type-safe reflection facilities to implement interfaces like INotifyPropertyChanged.

I'll probably think of more given time, but my immediate reaction is cautious optimism with the above caveats.

most languages and language virtual machines still start being designed without much further thought about whether weâ€™re stuck in an outdated processor model, few will try to offer abstract and general language constructs for the concepts embedded into SIMD, barriers, GPGPU and other stuff that we can leverage from the currently generally available hardware, thereâ€™s a tendency for letting this stuff for voodoo optimization, unsafe libraries and non embedded DSLs. I believe constructs can be created and even providing compile-time safety. Thereâ€™re changes happening in this respect to some well known languages, but given the increasing number of languages borning everyday, I know none that target this. Most of what Iâ€™m seeing is about adoption of higher level concurrency models, which is a good thing nonetheless

Please don't come with a language that don't try to deal with any of that, it'll be yet another disposable M$ technology. We need to go back and harness the hardware.

Text format is greatly appreciated over video. :-) And this is higher quality writing than typical in blogs.

Joe Duffy writes well and sounds honest after reading all his 2013 posts. Joe should write more, providing good value to his company and others. I'm guessing the most he can write is limited by policy, because the 2000's saw a resurgence in corporate secrecy as normal operating procedure (especially at Apple but spreading to its competitors). Maybe conventions in disclosure can reach a new compromise where slightly more is said, especially when developer engagement is eventually required to apply product in development. This new bit on extending C# looks like a step in the right direction.

My quotes come from Joe Duffy's C# for Systems Programming post, about adding a set of â€œsystems programmingâ€ extensions to C# in recent years where he says this after a few paragraphs:

The result should be seen more of a set of extensions to C# — with minimal breaking changes — than a completely new language.

Nothing like words "new general purpose programming language" appear in Joe's article. Isn't that editorializing the headline here a lot? Maybe it's just missing extensions at the end. But if there's an actual new PL in the works, that would be interesting.

Among Joe's numbered items at the end, there's nothing I dislike, though I usually expect type-safety to increase spec complexity. I suppose it's not required to get more complex, but plain English meta-language involved would almost certainly use the arcane language of experts each dev is not expected to grasp. If devs must also know all old C# specs as well, adding extensions could make the whole thing more complex. Is there a way to minimize how much more complex things get?

The item appealing to me most is third, async programming at scale, but it's hard for me to be objective there. I like Joe's phrase agnostic to the execution model, because that sounds in line with supporting lightweight concurrency via fibers or another green-threading mechanism other than native threads. It's easy to do several things asynchronously with normal process and thread features, but it's hard to do thousands or tens of thousands of async operations concurrently in a clean way without systematic support in language.

Please make it open source eventually. What do you need to hear in places like this to grease political wheels so they can turn in this direction? Name the openly-discussable objections so we can help tear them down.

I don't think it's feasible to control reception to things we say, or how others spin it. Sooner or later it's necessary to talk about new tech, unless it doesn't ship. (To save up a whole message until a final big-bang release is high risk, if it means foregoing early feedback one can use to tune trajectory before it's too late to incorporate relevant market data about users. That may work for shrink-wrapped consumer gadgets, but developer tools are not as closed with neatly tied-off edges.) I wouldn't worry about it too much.

You can also talk about hypothetical tech options, without committing. There's no risk of a competitor swooping in with C# extensions first and stealing your thunder. We don't have very good conventions for talking about hypotheticals, though. Folks may be inclined to react to everything as news instead of discussion of ideas. All you can do is offer clear guidance.

So do you want to talk about error models here, too? Some things are unrecoverable. Evidence of memory corruption should abort the process, for example. Other things are recoverable, and you only want to abort a single request, or a green-thread fiber at most.

Indeed, you are correct. My reaction to all the reaction was perhaps reactionary :) It's great to see the response to this work if for no other reason than the potential of the technology (somewhat vague at this point, but promising based on Joe's initial post) has struck a chord with developers. We all want ease of use with real power (perf). We all want ease of doing things safely with real power. Etc...

Joe didn't mention very much at all about a new error model - I suspect he'll share those details is some future post.

Going out with what you're working on, as opposed to officially announcing some new product or technology on behalf of the company you work for (which is not what Joe did...), is the necessary first step in a real open dialog and eventual partnership with a global community of programmers. I applaud Joe's openness and it's clear from the response (when sifting out noise and "news"), that there's real interest in the programming community for learning more and helping out. Good times.

Seriously, I think it's great. By that, I am alluding to the spirit of their initiative anyway. See, I am really not quite competent to even make any useful approximate guess whether they will eventually achieve what they hope for or not from this point and on (I can just assume they know a heck of a lot more than I do, there).

However, I can at least contribute the rationale of my warm encouragement to them by sharing my recent experience as a practitioner at fighting against poor runtime performances in a context of managed code - regarding either or a mix thereof of execution speed, memory consumption, warm up time, etc.

If not an ideal case about what we have to tackle after choosing a managed runtime environment vs. a native one, I do believe it's still a pretty fair example, combining:

1) humble essential complexity - in this case, it's a parser, using well known recursive descent and hand crafted - nothing new;

2) the aim is a) to parse a very simple language, a data interchange format more specifically - JSON - but also b) to instantiate dynamically from the input alone the target POCOs (plain old CLR objects) about which the type information for the root object is provided by the client / caller of the parser;

3) since the 2nd, major design objective was to be able to parse JSON and deserialize it in a single pass, either from a one-large-chunk input string or from a stream, it means that I/Os can be involved;

4) finally the 3rd, major design objective was to be able to parse and deserialize as fast as possible, with possibly reasonably small tradeoffs on the memory consumption, if that can help the pure parsing speed.

My findings were not so surprising to me, but may surprise some people who often like to recall that "the JIT does a great job at optimizing your code already" (and yes, I agree, it does in many cases, but read on...)

Well, after a number of, say, "naive" implementation attempts, when the conscious, clearly explicited goal upfront is to do as best as you can to compete with folks with the same performance objectives but who are *in the native code realm*, all I can say is:

indeed, it *is* really hard to get *anywhere close* to what they are able to achieve with no GC around, no need for JIT, and the rest.

Until two days or so ago, I didn't know, but I might just have written the fastest (or second fastest?) JSON parser + deserializer for .NET with non-existent extensibility, but otherwise comparable to its competition for the stated goals above - parse, instantiate the POCOs, and do it fast, with a fail fast also if the input is malformed.

I also had the nice surprise/confirmation recently to get some consistent comparative results gathered over the same variety of payloads when the same code and the competitors run on Android devices ( here ).

Yet, and that's my point finally: another JSON parser focusing on high performances but written in C (Peter Ohler's "Oj") *very easily* outperforms mine, and by at least a factor 2 to 3 on average (and likely even more for more complex input "shapes").

(it parses a 12MB JSON file *AND* instantiate the corresponding ~ 100,000 objects of 4 distinct classes from it, in ~ 300ms, on my cheap laptop anyway; the perfs are roughly linear with the volume, until the OS swap file gets in the way, then that degrades badly of course)

I know, I know... really doesn't look formidable at all, does it? And granted, indeed, it really isn't extraordinary, in all fairness. As I'm sure all the individual techniques in there, but one or two maybe, have been known for a good while already for this sort of parser.

Still, it just wasn't exactly immediate to come up with, when the final judge you have chosen for that is... the stop watch, *that* I can tell you, now.

(and yes, of course, as you can see, you better know how to emit CIL to runtime-specialize the deserialization into reference vs. value types, or to avoid every unnecessary boxing/unboxing whenever possible/that you can predict, cache the reflection metadata upfront, bake your own typed delegates from it and cache them too, avoid allocating silly temporary strings if you're in the process of matching a property name of the target, and so on).

To be really honest though, I had another half-design / half-implementation goal (besides accepting a priori to go for all the optimization tricks I would find needed).

I also wanted to keep the code as short and straightforward as possible - which wasn't always easy, but still quite feasible when you don't have too many extensibility requirements (if any at all).

And, guess what - my feeling is this striving to stick to simplicity and lean code probably turned out also to be a great ally, besides all the cleverness one can more or less successfully put in the code's logic to save CPU cycles or bytes on the heap.

But the greatest ally of all in that effort made out of curiosity / for self teaching, here it is, as I found:

really, IMO, there is still big room for performances improvements (I agree, which will likely involve making profit of recent and promising PLT research works), to have one of today's state of art of the managed environments capable to bring its runtime code performances closer to its native realm counterpart.

So, my warmest encouragements go to Joe and his team, there. I do believe there is indeed a broad open field of things that remain to investigate, where they've started to get busy at.

I can just look at what it took me to be able to shove "only" 40K parses/sec with my little one written in C# above, when Peter's "Oj" written in C can perform easily twice (or more) as much for the same or even more complex JSON input shapes -

(... and, maybe, for less sweat to spend or aspirine to take on Peter's end, who knows)