My adventures in extending Visual Studio and general development on the GHC Haskell Compiler

Recent Updates

A preprocessor and library which allow you to create dynamic libs from arbitrary annotated Haskell programs with one click. It also allows you to use the generated lib in C, C++ and C# just by including the generated header files.

* Autogerates free functions for any StablePtr used. * Added support for memory leak tracing with the –debug flag https://mistuke.wordpress.com/2011/08/09/tracing-lexer-memory-leaks/* Fixed a bug in the library that caused an error in marshalling * No longer frees strings, in order to prevent heap corruptions on calls from c# * Fixed an issue with lists and type synonyms * Fixed an alignment issue with stdcall * Renamed the hs2c# pragma to hs2cs * Fixed an error in parsing pragmas * Fixed a majot marshalling bug having to do with lists of pointer types * Started on an implementation of an automatic test generator to test exported functions. * Re-arranged the namespaces generated in C#. Now Functions and types are put in different namespaces.

– Not Currently supported:

* Infix constructors * Control over the autogenerated free functions for StablePtr * Exporting of type/functions with the same name in different modules * Exporting polymorphic functions

Details =====

A more detailed documentation will follow in the next few weeks.

Why is the first version 0.4.8? ————–

This project has been in development for a very long time. Nearly two years on and off. It was developed mainly to facilitate the creation of Visual Haskell 2010. This is just the first public release of this tool.

It appears that both versions of the tool have a bug in the default conversions defined in NativeMappings.hs

They got introduced in a major restructuring that took place before 4.8 and slipped through my regression tests.

I’ll upload a fixed version as soon as possible.

Sorry for any inconvenience

Update: Version 0.5.1 fixes the bug mentioned.

Some more details: originally all my convertion functions were pure, the problem is, some activities like creating a pointer required IO. This is why currently everything is done in IO. When I converted the library I missed some calls, and those calls were giving an error now saying that the convertion function was undefined.

One of the problems I’ve been struggling with for a while now is the presence of pesky memory leaks in Visual Haskell. Hs2lib has one convention, It doesn’t free any memory, and so you’re responsible for freeing all memory.

As far as I knew, I was freeing any and all pointers that I had. It should not be leaking, but yet it was. So I decided to get to the root of this problem. I wrote a simple application that uses the Lexer classes of Visual Haskell and would emulate a user scrolling by feeding it lines of a Haskell file one at a time.

Using the Debug Diagnostics Tools I was able to track the application and make a full memory dump every few seconds in order to track the progression of the leak. The results were rather surprising:

WARNING – DebugDiag was not able to locate debug symbols for HsLexer.dll, so the reported function name(s) may not be accurate.

HsLexer.dll is responsible for 11.95 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:

This was detected in LexerLeakTest.exe__PID__4660__Date__07_04_2011__Time_03_20_30PM__18__Leak Dump – Private Bytes.dmp

So according to this tool, my little program was leaking quite extensively and not surprisingly, it was all coming from my inside my Haskell program. Unfortunately, GHC/GCC can not produce the proper symbols (.pdb files) for any of the Microsoft debugging tools to understand. So while we know conclusively that the program is leaking, we don’t know where.

Hs2lib-Debug

This is where the new version of Hs2lib comes into play. The idea is to track any and all allocations and de-allocations made during the run of the program, in essence a simple profiler.

For this reason we now have Debug versions of the of modules. Through the magic of CPPHS, Cabal and custom preprocessors we get the usual “release” modules and “debug” modules which write the allocations to a file.

I’ll skip over the implementation of that, but the idea is to override all allocation functions with a custom one. The structure which is being written out to disk (the current version uses the rather slow show/read instances generated by GHC. I will be replacing these in the future) is:

From this you can see that not only does it record allocation, but I’ve also implemented a simple artificial stack, that shows us where the allocation is/originated. The current implementation is rather simplistic. I will look into expanding this later, but for now it suits my needs.

For those wondering, this tracker is not enabled by default. To enable it just pass the “- -debug” flag to the call to Hs2lib. Compiling with this flag instructs the library to use the Debug version of the libraries, and changes the code generators so that they also add extra information to the generated code.

Since de-allocations also have to be tracked, using this flag also exposes a free function which should be used to free pointers. If the CallingConvention used is stdcall then a function “freeS” is exported, if ccall then “freeC” is exported. The reason for this is because these functions are statically defined. They aren’t being generated by the tool but are instead part of the library part of the tool.

Performing analysis

Once we have a .dump file, the next step is to analyze this information. This is where the new tool Hs2lib-debug comes in. This tool replays the allocations in a pseudo heap. If all goes well, at the end the heap should be empty. If it has any entries it means we’ have a leak.

Invoking it is quite easy, just pass it as an argument the folder which contains the dump file:

Hs2lib-debug.exe -v .\MemDumps

and that’s all.

Running this on the dump file created from the lexer program returned:

This is interesting for a couple of reasons., The profiler is saying that the pointer associated with the CWString (which is defined as a Ptr CWchar) is never freed. But why not.

The answer lies in the C# Marshaller and types being used. We are currently Marshalling C# strings using the String datatype. Strings in C# are immutable, so once the marshaller creates a wchar_t* from the string, it never worries about it again. They are strictly an in parameter.

There are two ways to solve this:

C# does have mutable strings, using the StringBuffer type. Using StringBuffer has the benefit that it already is implemented as a char pointer. The marshaller simply passes the pointer to the Haskell function and upon. After the function returns and the GC determines that the StringBuffer is no longer in use, it should free the memory (at least in theory).

Make the library just free any CWString it dereferences.

For now I’ve chosen the second approach, for no other reason other than it requires the least amount of change in existing code. In the future I’ll adopt the approach that Hs2lib will free any arguments being passed to it. I don’t know if this is the convention usually used. If someone has something against this approach I would love to hear it.

Update : There’s somewhat of a big gotcha that I’ve recently discovered. We have to remember that the String type in .NET is immutable. So when the marshaller sends it to out Haskell code, the CWString we get there is a copy of the original. We have to free this. When GC is performed in C# it won’t affect the the CWString, which is a copy.

The problem however is that when we free it in the Haskell code we can’t use freeCWString. The pointer was not allocated with C (msvcrt.dll)’s alloc. There are three ways (that I know of) to solve this.

use char* in your C# code instead of String when calling a Haskell function. You then have the pointer to free when you call returns, or initialize the pointer using fixed.

use StringBuilder instead of String. I’m not entirely sure about this one, but the idea is that since StringBuilder is implemented as a native pointer, the Marshaller just passes this pointer to your Haskell code (which can also update it btw). When GC is performed after the call returns, the StringBuilder should be freed.

I’ve once again updated the library to not free any pointers. This prevents a nasty heap corruption. To not get memory leaks in c# you are free to choose between solution 1 & 3 presented here.

Results

With this new code in place, I once again run the profiled lexer to generate a dump. This time when I analyze the result however I get

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 33027 record(s).

*** Analyzing [********************] 100.00%

*** Found 0 outstanding allocation(s).

Congratulations, No memory leak(s) detected.

*** Cleaning up….

And so the memory leaks are fixed. It’s worth noting that the analyzers can be quite slow. It uses a flat LinkedList like structure and lists to do the analysis. I will in future versions be replacing these with a Tree like structure and arrays respectively.

As most of you know Visual Haskell makes use of Haskell programs compiled to a dynamic library (dll). This task can be completely automated. This is what Hs2lib provides. Hs2lib (formerly WinDll) was made to facilitate making changes and updates to Visual Haskell’s support files. It was designed for purely that purpose and thus has some limitations and design decisions that reflect this. I’ve been working mostly on getting this to a stable state the last few weeks/months. The one used in the original Visual Haskell used a hack for some things like lists. This one has a stable interface.

Hs2lib does not export all functions automatically. you have to mark them with a special annotation “--@@ Export [=optional_name]”

if no name is specified, the function name is used. As a note, only functions with an explicit type signature can be exported. This tool works by static analysis. To compile a simple invocation to hs2lib is sufficient

PS C:\Examples> hs2lib Arith
Linking main.exe …
Done.

After this there should be 2 header files and one dll file named Hs2lib.dll in the folder. This name can be changed with the –n flag. Your Haddock documentation is carried over along with the original function type as comments for the new generated functions in the headers. Marshaling data is automatically generated for you for any data structures found during the dependency analysis phase. This can only be done if the source for the data type is available.

Since Visual Haskell is written in C# this tool can also output C# code, this is done by using the –c# flag. I’m working on a PDF that goes into much greater detail about this tool and it’s options, But I need to reword some parts of it as I’ve been told it’s too “handholdy” (simplistic) in some parts. So I’ll hold off on publishing that.

But here are the Highlight of this tool

Currently Supported:

Generates Marshalling information for abitrary datatypes

Types with kind other then * have to be fully applied. (e.g. Maybe String). A specialized Marshaling instance is then generated for each of these instances.

Generates FFI exports for any function you mark, with FFI compatible type

Properly supports marshaling of lists. [Int], where the list is an argument becomes CInt –> Ptr Int , getting an explicit size for the list, and where a return value becomes Ptr CInt –> Ptr Int, where it expects a pointer to which to write the size of the list. This introduces a semantic difference between [Char] and String. The former is treated as a list of characters with no explicit terminator, whereas the latter it treated as a Null terminated wide string.

Supports Ptr, FunPtr and StablePtr. Which introduces the possibility to have “caches”.

Does not support automatic expansion of lists in Applied types other then IO (e.g. Maybe [Int])

Does not support infix constructors (Arbitrary codegen limit, I’ve never needed it. Will add in future versions)

Code generator generates a bit too many parenthesis for the temporary Haskell file generated. Will fix in future versions.

Does not support polymorphic functions (this is a FFI limit, cannot know the size of a polymorphic value.)

Cannot export functions or types with the same name. The types are imported unqualified.

I will work on the pdf documentation in the coming 2 weeks, That should be a complete overview on the abilities of this tool and why I made certain decisions.

To verify that the install (and the package is complete) , there are a set of tests included in the tar. just unpack it, and ghci to Tests.TestRunner and run the function “runLocalTests”. The cabal test interface I was using before got deprecated. I’ll have to look up how the new ones work, until then, sorry about that

Hi all, I just wanted to let everyone know that even though I have been silent for quite a while, and rather busy I have been working on it. I have the modifications on Cabal done, and almost done with those on Cabal-Install. Various small bug fixes and I’m working on improving the speed of it all. I’ll post another video soon.

I’ve also upgraded the internal ghc from the 6.13 internal snapshot I’ve been using to the released GHC 7. This is turns out to be a quite annoying upgrade, the RTS seems to segfault randomly and code that used to compile on 6.x no longer compiles on 7. But as far as I can tell, it should.

In any case, this means all of the marshaling code has to be regenerated, which means I have to change the code generated by my tool. I’m also rewriting the way the DLLs are loaded. They’re currently loaded in the same/default address space as the IDE. The problem is, when GHC panics, for whatever reason, the RTS is forced to quit and takes everything with it. I haven’t found anyway to stop this, so I’m working on some kind of process isolation. This is unfortunately a design error on my part, I always figured that a ghc panic was just an exception and did not terminate the rts.

I’ve promised a beta a lot, so I won’t do it this time, I initially planned on releasing the source along with a first release since I don’t think it’s wise for multiple people to work on the core interactions, but If I don’t make that by September, I’ll just publish what I have so that others can help if they want to.

Visual studio 2010 sp1 is coming out “soon” as well. This should fix a few of the problems I’ve been having. (yay). And in 2012 the next major release of visual studio (don’t worry, it’ll be done before that ). Barring any major api changes, it’ll be easy to support it and any following versions of visual studio.

But to reiterate, the project is not dead, far from it. I just have a lot of work to do, some problems to solve.

I took some time this week from working on my Thesis to work on MSBuild again, I know it’s slow going, but unfortunately this project is not my highest priority task atm (I thank you all for your patience). The result is the following

The new compile and project pipeline for a .cabal file now looks as follows:

[.cabal file] <-> [IDE] <-> [Cabal2MSBuild] <-> [MSBuild]

The execution looks like

You double click on a .cabal file which launches visual studio

The IDE then invokes Cabal2MSBuild on the .cabal file to get a .hsproj file

This .hsproj file is use strictly internally. It’s only purpose is to instruct the IDE which files to include, references, Author etc. Any changes done to these fields will be also directly written to the .cabal file via Cabal.

When you want to build a project, a MSBuild task is run, this task even though defined in the .hsproj files will use no information from this file. It calls cabal-install on the original .cabal file.

Errors and warnings are captured and translated into something the IDE can understand.

This is a much simpler and stabler approach than the first one which was to deeply and directly integrate .cabal files inside the IDE. This requires a immense amount of work, And I never could get it to work 100%.

Having finished my MSBuild book (which was really the only *complete and coherent* source of information on MSBuild) I was able to create the Cabal2MSBuild tool. This tool will create a .hsproj file from a .cabal file which contains all information that was present in the .cabal file but also a full listing of which files to include in the project. References etc. the last line of this file is an important line

<Import Project="Haskell.Cabal.targets"/>

This line imports a set of predefined Cabal specific tasks for MSBuild. These tasks as they’re called wrap calls to the cabal-install tool. The exposed functionalities are

Build, Run, Configure, Deploy, Clean and Update. (As a side note, this Target file can be used by any msbuild compatible build system. So this means using Team Foundation Server as a source control server and having continuous builds going on should be possible).

These Tasks are all exposed by a custom Task file “Cabal.MSBuild.Tasks.dll”

This exposes a few problems. While it works, the feedback is inaccurate. The problem is cabal-install has no set format for warnings and errors. They’re all freeform so MSBuild will only recognize the first line and this is due to the “warning: “ prefix of the lines. Ideally we want the entire warning since that’s what we’re going to report back to the user.

This same problem exists for GHC as well, however GHC is a bit more structured, mostly errors are pretty printed and have some kind of structure to them, which you can parse by looking at the indentation of text.

As I’m trying to modify as little as possible (It’ll be easier to maintain in the future and takes the burden of maintenance of me) I’m writing a set of parser to augment MSBuild’s build in parsers to support messages generated by cabal-install and GHC.

The current .hsproj file is a valid project file, and though I haven’t integrated it into the IDE yet, when I do, It should just work, since there is full support for MSBuild project files already there (again, less to maintain, and no reflection hacks this time).

Eventually the Cabal2MSBuild tool will be available on hackage and the build tasks on codeplex since hackage doesn’t allow binary dependencies.

And now, as promised the full files, note however that both of these files are work in progress.

<Synopsis>A Library and Preprocessor that makes it easier to create shared libs (note: only tested on windows) from Haskell programs.</Synopsis>

<Description>The supplied PreProcessor can be run over any existing source and would generate FFI code for every function marked to be exported by the special notation documented inside the package. It then proceeds to compile this generated code into a windows DLL.

The Library contains some helper code that’s commonly needed to convert between types, and contains the code for the typeclasses the PreProcessor uses in the generated code to keep things clean.

It will always generated the required C types for use when calling the dll, but it will also generate the C# unsafe code if requested.

– Does not automatically resolve missing datatype declarations using hackage. Future releases will search library code for the types you need to resolve this but currently you’ll get a missing instance error.

– You cannot export functions which have the same name (even if they’re in different modules because 1 big hsc file is generated at the moment, no conflict resolutions)

– You cannot export datatypes with the same name, same restriction as above.

– Does not support automatic instance generation for infix constructors yet

This is a small update to just let people know what I’ve been up to. This will be a non-technical post.

There are 3 components which I want to get done before I put any code publicly online.

Cabal support (being able to build and run Haskell projects from the IDE)

Documentation support (class browser, quickinfo docs and F1 help integration along with jump to definition)

Intellisense support

Of the 3 I want to get Cabal support working, then release something, get feedback while I work on the other two. I’m working on all 3 concurrently (mostly depending on which part of visual studio I want to mess with that day) So how far along are they.

Cabal: I had a first version which hooked into the Cabal library and using quite a bit of reflection hacks and code changes to the MPF templates managed to load .cabal project files. This first approach was because I wanted to talk directly to Cabal, and not go through any intermediate layers. The reflection hacks were needed because the MPF templates are hardcoded to MSBuild, which uses an XML file format. Not using MPF would mean writing all that code myself which would have taken ages. Unfortunately this only worked sometimes, other times I would get an exception from deep within visual studio. I also had no idea how I would get building to work.

Ultimately I decided to scrap the entire approach al together, It wasn’t worth the hassle and would be hard to maintain. I’ve now settled on the idea of converting .Cabal files to an internal MSBuild script (and back to .cabal when saving), while adding new templates for a Haskell target type which just invokes cabal-install. This is a much simpler approach which takes some of the burden of maintenance of me and into other tools, but which unfortunately requires me to learn about MSBuild. Currently I’m creating the conversion tool Cabal2MSBuild, which is about 20% done.

Documentation support: Documentation from the current module is gained from the AST inside GHC (hopefully, haven’t checked the data I get back yet). Documentation on external modules (e.g. package modules) are gained from two places (hopefully). For quick info the intellisense cache will be used, for class browser (e.g. browing of current project and dependencies) the .hpi files will be used. For F1 help haddock generated documentation will be used.

However in order for the haddock documentation to be integrated into visual studio it has to be in the correct format. As it turns out, documentation has been greatly simplified in visual studio 2010. The documentation format basically comes down to a zip files renamed to something else, which contains a simple index xml file and just xhtml content files. Great news there, since haddock already does generate xhtml. However I still need to modify the files generated to include a few meta tags, which will be used by the document installer to create indices of the html files, and for F1. I’ve started the modifications to Haddock and they’re about 60% done.

Intellisense: I already have the .hpi files, which were those simplified indices of packages. I would like to provide intellisense for both project files and standalone Haskell files. There are a few ways to approach this. From what I’ve seen in the past, visual studio builds a intellisense cache file from the project dependencies on the fly on first launch/use of the project. If you have a large package database this could be handy, it limits the search space, but features such as auto-add imports/dependencies will become harder, as I would have to do 2 checks (local cache, and if not found hit a global cache). Standalone files also require me to only use the bigger (slower) global cache.

However the speed of that global cache hasn’t been measured yet (because the cache hasn’t been made yet) . So for now, until I have some hard numbers, I’ve settled on just always using the globally constructed index. This is also about 20-30% done. The majority of the work left is reading some documentation and papers.