Brian Rasmussen on Building Language Tools With Project Roslyn

Recorded at:

Bio Brian Rasmussen is a Senior SDET at Microsoft working on the next generation C# and Visual Basic language services in Roslyn. Before joining Microsoft Brian was a Microsoft MVP for Visual C# for four years. Brian is blogging on the C# FAQ (http://blogs.msdn.com/b/csharpfaq/) and can be found on Twitter (@kodehoved).

Sponsored Content

For the second time we are launching the GOTO Copenhagen conference in May 2012 after a successful execution in 2011. GOTO Aarhus has been an annual event in Denmark since 1996 and attracts more than 1200 participants (formerly known as JAOO). The target audience for GOTO conferences are software developers, IT architects and project managers.

Brian Rasmussen: Yes, at the core of this there are a set of new compilers for C? and Visual Basic, but the project is really about redoing the whole developer experience around those languages, so we are not only redoing the compilers but also opening the compilers and exposing the entire compiler pipeline as a set of APIs and then we are adding more APIs on top of that so you can interact with code and solutions and so forth in various ways and we are building new tools around that, so we are obviously building the new version of Visual Studio around that but we are also introducing new tools such as scripting and a REPL read/evaluate/print loop for C? and Visual Basic.

It means you can use the compiler as an API, traditionally compilers are like black boxes where you feed in source code at one end and it does a lot of work and then eventually assemblies come out the other end. So it’s really hard to interact with the different steps, traditionally a compiler is very sequential in that it does something that produces an intermediate output, and then taking that output it will do some other work and so on and so forth until assemblies eventually come out the other end. With Roslyn we are opening up the compiler so that you can interact with the compiler and get some of these intermediate steps from the compiler, so you can get for instance the parse tree or you can get the semantic model of what the compiler produces.

You can certainly add things to the data structures that we built, so you can modify the trees, but if you are looking to actually extend the language we don’t really offer the support here, so if you wanted to add new keywords to C? we don’t have any support for that. You could still do that by hand, but then we just don’t provide you with any tools to do that.

What we do is provide what we call the semantic model. Traditionally you would have the parse trees and the symbol tables and you would merge those into a bound tree, instead we provide more of a programming model for the semantic model, so you can ask the semantic model about types and symbols and so forth, but you can also get flows from the semantic models, so if you are doing static analysis you can ask about what variables are read or written and so forth within this section, if that’s what you are looking to do.

Brian Rasmussen: No, not on the official plans, no. This is primarily a scoping issue, a lot of people ask us "so are you building a compiler framework?", in that "can I extend the existing languages?", "can I come with my own favorite language and use Roslyn for that" and currently you can’t, because we’re doing a lot of work just to get on par with the existing compilers and tooling that we have and we want to be able to finish in reasonable time. Where we will go in the future, I don’t know, but for now we are specifically tackling C? and Visual Basic.

Brian Rasmussen: So you want to create your own frontend and then pass this into the emitter, for instance?

Werner Schuster: Yes.

Brian Rasmussen: I guess you can do that provided that you map to our structures, the emit part itself is not that big. The bulk of the pipeline is really in the semantic part which is the part that encapsulates all the rules and so forth that are specific to the given language.

So if we talk about this we can say that at the bottom we have the compiler and immediately above the compiler, the compiler can be seen as a service as well, so it’s an API, so at that level you can work with syntax trees and semantic models and so forth, as we discussed. On top of that we are building what we are calling the language services, and the language services is where we take a more holistic approach to working with the code so say you want to build tools that work on your solutions so instead of manually enumerating projects and files and so forth we have a type called the Workspace that will load an entire solution and do all the internal plumbing, so to speak. So once you have the solution loaded you can enumerate the projects, you can get the documents, you can get the associated trees, and all of that without actually doing the looking up references, making sure that all this stuff that you need to ask this specific question is actually in place, so we will do all that work for you. And on top of that we build what we call the editor services, which is the only part of this stack that is actually specific to Visual Studio, all the other stuff you can move out and do in other applications. Some of my colleagues actually built a really funny application in that they reimplemented QBasic, so you had the whole DOS style look and everything.

Exactly. And it was self hosting so they could load the source code for their QBasic into their own QBasic and actually run it through that. And they were doing that using the Roslyn APIs and just essentially providing a different host environment than Visual Studio. So if you want to do that, some of the things that are specific to the editor layers, such as syntax highlighting and stuff like that, you have to redo those yourself, but you have all the data structures underneath that will let you understand the code in a way that you need to to build these services.

Well, the APIs are .NET assemblies, so you can access them from any kind of .NET language, but the languages themselves they obviously understand C? and VB, so you are limited in the sense that you can only analyze code that works with these languages but you can access the assemblies from any other language. What you can do is you can use the assemblies and you can do analysis directly or you can plug into the Visual Studio model and we have integration points where you can actually analyze code, look for specific patterns, that’s what we generally refer to as a code issue, it’s some kind of pattern in the code that you want to identify and we provide another pattern that we call code actions, which is something you can do to the code as a reaction to identifying this pattern. So what that is usually used for is that you find something that you don’t like for whatever reason and you provide an action that can change this in some way. And the editor services layer makes that really easy because we set up, we will load your extension as an extension and we will make sure your code is called for all the nodes in the tree and you can do whatever filtering you like there on the code and then you can provide the code action and we will provide a vehicle to actually expose that as an action to the user so that the user can invoke your code action and change code like that. But analyzing the code is something that you can do in any kind of application but the whole interactive presenting "this is wrong" or dealing with the code and here you options of solving it, that’s baked into the editors services layer.

The APIs both analyze code and look for specific contracts that you want to react to in some way, it doesn’t have to be that the code is wrong, it can be "I have this construct here but I want to implement that". We have the extensibility model that allows you to make code actions react to specific patterns and we have a lot of APIs, what we call the code gen APIs, that will allow you to produce code in various shapes and forms to react to whatever event triggers this.

Exactly. As another example, if you have a symbol, we have the methods for that and one of them is find references so essentially you can do that by hand but it makes it easier to build tools once you have APIs that will let you do things that make sense in that context.

The scripting part is a different model for executing C? code or Visual Basic code, and on one hand you have the ability to actually include what we call the scripting engine into your own application so if you want to be able to let users interact with your application through the use of C? and Visual Basic code, you can do that very easily, it’s literally just two classes that you need to use and then everything is taken care of for you. So using the scripting engine that way is really really simple. The other thing is that we are building a scripting host ourselves, so you have the code that you can enter into the REPL which is a read/eval/print loop, it works a little bit differently from what you’re used to in that you can have methods and declarations and so forth at the global level so you can just declare any variable and use that, whereas in a normal C? program you would have to set up a class and set up methods and so forth. The scripting context is similar to what you may know from other languages that have scripting features in that you do not have to have all this ceremony of creating classes, you can just write the code as you would in a regular scripting language and you can have that executed, either in the REPL or as a stand-alone file.

Werner Schuster: So it basically takes the code and compiles and fits it into the run time.

Brian Rasmussen: There are already several examples of that, that you can find on the net, people build their own REPL and it’s through the scripting engine and the session class that we have for this, it’s very little code required to actually build a REPL. Obviously the REPL that we are building is integrated into Visual Studio and has all the editing facilities that you would expect from a Visual Studio component so it goes a little further than that but basically if you just want to bare bone REPL with the ability to read code and execute code, you can write that in very few lines of code using the Roslyn APIs.

Brian Rasmussen: Yes, even better. There is a community tech preview available, if you go to MSDN/roslyn, you will see the Roslyn main page that has the CTP for download and you will also find an overview document that tells you what is Roslyn, what can you do with it, and you will find included with the CTP’s there are numerous samples and walk troughs’ that explain the different parts of the APIs, the REPL, how can you make scripting available to your application and so forth.

When you install the CTP, what you get is we install the assembly so that you can make for instance code issues and code actions, make tools that use Roslyn but we don’t replace the editing experience with the Roslyn bits. But if you do make a Roslyn extension, then when you run this we will start a new instance of Visual Studio where we actually replace the parts with the Roslyn parts. And then you will see this how Visual Studio looks when you are running Roslyn and running your code in that. We do however install in the regular Visual Studio, we do install the REPL, so when you install the CTP you will get, for now you will get the C? interactive window installed.