Dynamic Languages and the Programmers that Love Them

May 25, 2009

The next thing we want for our dynamic DataTable is to do calculations between one or more columns. Imagine (for instance) that you want to add two columns and store the result in a third. In C#, the client code might look like this:

We’ve already defined GetMember and SetMember on our table, so all we need to do to make this work is to define BinaryOperation – at least for the “addition” operator. Right?

Alas, it’s a little more involved. The BinaryOperation isn’t being performed against the table itself; rather, it’s being performed against the results of the GetMember operation. Our current implementation has GetMember returning a System.Array, and we have no way to define a new operation against this preexisting type – whether the operation is static or dynamic.

What about extension methods?

If we were using statically-defined types, we might be able to effectively “monkey patch” System.Array by defining extension methods for either System.Array itself or for the specialized array types that we’re interested in. Unfortunately, this won’t work for us here because extension methods are neither supported against dynamic types nor can they be used to define operators for C# or VB.

(The first of these limitations is deliberate. Extension methods work at compile time because the compiler has direct access to the “using” statements which bring the methods you want into the local scope. There’s no obvious way to get the same information at runtime, which is when dynamic call sites are bound.)

First implementation

Our initial implementation will wrap the array we were originally returning and will add a CLS-compliant implementation of the “+” operator. We won’t derive this class from DynamicObject. Because our DynamicDataColumn will be typed to "dynamic", resolution of operator + will happen at runtime.

How do we implement this operator? We can identify two different scenarios – adding a sequence of values (such as another column) to a column, and adding a constant to a column. In both cases, we’ll end up performing n additions, where n is the number of rows in the table. To perform the element-wise operation, we’ll simply cast the two values to “dynamic” and let the C# runtime binder do the work.

(I’ve chosen to use IList instead of IEnumerable in order to simplify the code. In principle, we could create an overload for each. This would give us the flexibility of IEnumerable when we don’t have a more specific interface, while still letting us take advantage of IList.Count when we get an IList.)

The exact same code can be used to implement other binary operators – both arithmetic operators like “-“, “*” and “/” and logical operators like “>” and “<” – simply by replacing the four instances of “+” in the code above with the appropriate substitute.

Now we’ll need to change DynamicDataTable.TryGetMember so that it returns “new DynamicDataColumn(a)” instead of just Array “a”. Then - in conjunction with what we’ve done already - we’re able to write the following:

This is pretty exciting! The statement “t.IsOkay = t.Average > 50” creates a new bool column on our table and sets its value based on a comparison between another column and a constant value – and it does so with syntax that is both clean and natural. So it looks like we’re done implementing arithmetic.

The fly in the ointment

Unfortunately, there are a few problems with this approach – some obvious, some subtle.

Our code doesn’t currently handle the reverse sequence “t.Foo = x + t.Bar” – whether x is a single value or a non-DynamicDataColumn sequence. Changing this means that we need to create another two overloads per operator. And if we want to support both IList and IEnumerable sequences, we need a further two overloads. Six overloads times sixteen binary operators makes 96 methods to implement.

Nearly all of these methods are basically boilerplate copies of the first ones that we created for “operator +”. It would be nice if we could combine the implementations because duplicated code is a frequent source of errors.

Our original implementation let us do some interesting things with columns if we knew their type, because we were able to cast an array T[] into an IEnumerable<T>. In this first implementation of arithmetic operators, we can no longer cast columns to strongly-typed collections.

The semantics we’re using for addition are those of our implementation language (C#) and not those of the language that is using the DynamicDataTable. We may not be able to do anything about this, but it would be nice to change it if possible.

One potential problem is particularly hard to see, and it results from the way the C# compiler implements dynamic sites. The compiled code for “operator +” contains exactly one dynamic call site which is shared between all users of the method. This means that the sample code above will generate three rules into the site – for type pairs (double, double), (int, double) and (string, string). Generating many rules into a site will degrade that site’s performance. Three rules isn’t bad, but this code introduces the possibility of many more being created.

But we do have working code now, so this is a good time to take a break. We’ll tackle some of these issues in the next installment. The current version of the source code can be downloaded from here.

For now, we’ll use a DataTable for the actual storage and a DynamicObject to provide an implementation of IDynamicMetaObjectProvider. What can we accomplish with this? Well, quite a lot, actually – in a very real sense, we’re only limited by our imagination.

GetMember

The first ability we want is to be able to extract a column out of the data table; given a DynamicDataTable “foo”, the expression “foo.Bar” should give us something enumerable that represents the data in the column. The DLR describes this operation as “get member”, and DLR-based languages implement a GetMemberBinder in order to bind a dynamic “get member” operation.

DynamicObject makes it very easy for us to handle the GetMemberBinder. We simply override the virtual method TryGetMember and implement the behavior that we want. The binder has two properties: Name, which indicates the name of the member that is being bound, and IgnoreCase. You can reasonably expect that case-sensitive languages like C#, Ruby and Python will set IgnoreCase to false, while VB will set it to true.

Here I’ve chosen to return an Array whose elements are typed identically to the column’s original data type. That’s because it’s very easy to create an Array of a particular type and to set its individual elements from the System.Objects that we can get from the DataRow.

By factoring out GetColumn into a separate method, I’ve made it easy to change just this logic. We might want, for instance, to allow a symbol name like “hello_world” to match the column named “hello world”.

Non-dynamic members

What if I want to directly access other properties of the DataTable like the “Rows” DataRowCollection? The design of the DLR makes this easy. If you don’t handle a binding operation yourself, it’s possible to fall back to a default behavior implemented by the language-provided binder. And for VB, C#, Python and Ruby, the fallback behavior is to treat the object like a normal .NET object and to access its features via Reflection. That’s why it’s useful to call base.TryGetMember instead of throwing an exception when the column name can’t be found.

So if we implement a trivial “Rows” property, a reference to DynamicDataTable.Rows will return DataTable.Rows even when the GetMember is performed dynamically at runtime (unless there actually is a column named “Rows”…).

publicDataRowCollection Rows {
get { return _table.Rows; }
}

SetMember

The next interesting thing we want to be able to do is to set a column on the DataTable whether or not it already exists. The DLR describes this operation as “set member”, and defines a corresponding SetMemberBinder to perform the binding operation. Like the GetMemberBinder, this class has two properties: Name and IgnoreCase.

We want to be able to set the column either to a single repeated constant value or to a list of values. But there are lots of different lists we might like to support: for instance, lists, collections or even plain IEnumerables. Let’s make some decisions about the semantics of the SetMember operation on our type:

If the object’s type implements IEnumerable and the object isn’t a System.String, then we’ll treat it like an enumeration. Otherwise, we’ll treat it like a single value.

If it’s an IEnumerable<T> we’ll use the generic type as our DataType. For a plain IEnumerable, the DataType will be System.Object.

If the object does not implement IEnumerable (or the object is a System.String) then the DataType will be the object’s actual RuntimeType.

For an enumeration, we’ll read items into a temporary array until we reach the number of rows in the table. If the enumeration ends before then, we’ll raise an error. If at that point, there are still additional items remaining in the enumeration, then we’ll also raise an error.

The specific behavior of our implementation for each of these types isn’t very important. What is important is that we’ve identified all the types that we expect we might get, and have identified the logic we’re going to implement for those types. Now, on to the code!

(GetGenericTypeOfArityOne and ConstantEnumerator are methods whose names are pretty self-explanatory – and whose implementations can be found in the downloadable source code).

Armed with these two methods, our type now supports all of the operations we need to implement the sample program described in Part 0 of this series. A version of the complete source code can be downloaded this location.

In Part 2, we’ll add the ability to perform numerical operations between columns. See you then!

May 21, 2009

To commemorate the first beta release of the first CLR to contain the DLR, I’m going to try to implement something more interesting than useful – a dynamic version of the DataTable class. My goal is to show how the dynamic support in C#4 and VB10 can be used to create a better development experience, and to show how you can work effectively with IDynamicMetaObjectProvider.

The BCL’s DataTable is already fairly dynamic; its columns can vary at runtime in type and name. You can even add and remove columns on-the-fly. As such, it’s a pretty good candidate for “dynamicizing”. Of course, we can’t actually modify the existing DataTable implementation, so I’ll start out by creating a wrapper for it using DynamicObject. Where I hope to end up is with a self-contained implementation that implements its own DynamicMetaObject rather than leaning on DynamicObject and DataTable.

April 11, 2009

The only feedback I’ve gotten so far about the IronPython Profiler is that it would be nice not to have to specify a command-line option to use it. Imagine that you’ve got a REPL open and are in the middle of some interactive development – something at which I think Python excels. You’ve spent a fair amount of time setting up data and code, but now there’s something that’s running slowly that you’d like to analyze. Restarting your REPL means having to repeat all that setup work.

Sometimes, we even listen

As a result of this feedback, we’ve added a new EnableProfiler method to the clr module. clr.EnableProfiler(True) will enable profiling, and clr.EnableProfiler(False) will disable it again.

Of course, IronPython is a compiler, and the calls to capture profiler data are compiled right into your code. You can’t just throw a switch to turn on profiling without actually forcing the code in question to be recompiled. This can be done by doing a reload() on the module that contains the code. Naturally, all the standard caveats for reload apply. In particular, any references to the old code from other modules and from instantiated objects don’t get automatically updated.

Not Nearly as Naked

While making this change, it occurred to me that we could get function-level code coverage for free simply by looking for functions that had been compiled but not called at the end of a test run. This required a small change to the code that is run when you call clr.GetProfilerData(). Previously, this function only returned records for methods which had actually been called. This is how it continues to behave when no parameters are used. But when you call clr.GetProfilerData(True), you get all compiled methods – even those that were never used.

Armed with this change, here’s how I modified the test runner for a project I’m working on:

This code could be trivially modified so that an exception is thrown if coverage drops below a certain percentage. If the tests were being run as part of a checkin gate, it would then prevent code from being added to source control unless it was at least superficially under test.

I call this “poor man’s code coverage” because it was cheaply obtained and is not very sophisticated. It won’t tell us anything, for instance, if a module has not been imported at all. Nonetheless, it’s a potentially useful addition to the box of tools that’s available for IronPython.

These changes were made after IronPython 2.6 Alpha 1 was released, so if you want to play with them you’ll need to grab a recent source drop – one not earlier than Change Set 49035.

As always, your feedback could influence any further development in this direction.

March 29, 2009

I was recently working on improving the performance of a Python-based application and thought it would be useful to profile my code. My intuition was telling me that the problem was in a certain area of the code, but I wanted hard numbers to back up this intuition before I went to the trouble of doing a rewrite. Unfortunately, neither the CLR Profiler nor the profiling support in Visual Studio are very effective when used with IronPython, as they weren’t really created for environments where code is dynamically compiled at runtime.

But Tomáš had written a profiler for IronRuby nearly half a year ago, so I thought I’d port this to IronPython, where it’s now experimentally available in IronPython 2.6 Alpha 1.

The way Tomáš’ profiler works is pretty straightforward. At the start of every method we compile, a snapshot is taken of the current time. Another is taken at the end of the method and the difference is stored for later retrieval. The only tricky thing about it is trying to update the statistics in a manner that’s both thread-safe and has minimal effect on performance.

I’ve modified it a bit from the original, though, and added some fun features. In particular, it now tracks the number of calls made to each method and also keeps separate counters for the amount of time spent exclusively inside the method versus the duration spent inclusively in both this method and in any methods it calls.

The profiler isn’t enabled by default, as it does have a small impact on performance. You need to opt into it by running IronPython with the flag “-X:EnableProfiler”. If you use this flag with the above code, you’ll get the following output:

These records are returned in the order that they were compiled, not the order that they were first called – except that methods which were never called are omitted entirely.

A record with the name “module site” means the top-level import of the site module. Other records that start with the name “module” refer to Python code. Records that start with “type” refer to .NET code that is called directly from IronPython. Times are recorded in “Ticks”, which are increments of 100ns. That means that the 10077140 ticks spent in the Thread.Join method are actually 1.007714 second.

Here’s what this profiler doesn’t do a good job of recording: the time required to parse Python, compile it into expression trees, generate the IL and JIT the IL. Some of this is accounted for in the time it takes to import an external module, but not in a way that provides visibility to the constituent parts.

We’d love to get your feedback on this experimental addition to IronPython. Happy profiling!

November 15, 2008

I crashed Mads' C# Tech Chat at Tech Ed EMEA in Barcelona on the grounds that the dynamic world has monkey-patched C#. It was fun, and I had the opportunity to answer a few dynamic/DLR-related questions that Mads was probably more capable of handling than I was.

One question that I choked on was whether or not this new language feature could be used to enable multiple dispatch from within C#. My gut feeling was that the answer was “yes”, but I couldn’t quite justify the answer so I hedged and hemmed and hawed and didn’t provide anything remotely like a satisfactory answer for the guy asking the question.

But now that I’ve had the benefit of a good night’s rest, the answer is blindingly obvious: yes, and here’s the evidence in some C# 4 sample code:

When you use dynamic, you’re telling the C# binder to ignore anything that it knows about the type at compile time and to instead determine dispatch based on the actual type at runtime. The statically-bound method calls, by contrast, performs a dispatch based solely on the declared type.

So there you have it – one more use for dynamic: painless implementation of multiple dispatch.

(Thanks to Lucian Wischik for pointing out a flaw with the original version of this post.)

November 07, 2008

At PDC this year, both Anders and Jim used a helper “DynamicObject” class in their demos. This is actually a standard class in the DLR, which is why it’s in the “System” namespace.

Unfortunately, DynamicObject didn’t make it into the CTP, and the version that’s in the Open Source DLR is newer than – and not compatible with – the version of the DLR that’s in the CTP.

I’ve seen a few people ask where they can get a copy of DynamicObject, but haven’t seen the actual source posted. So to rectify this situation, I’ve placed the source code here. Standard disclaimers apply.

I’ll be speaking at a few sessions in Barcelona next week. Of particular note is that the date, time and location of the “Microsoft Visual Studio Languages Chat” has been updated to fix a scheduling conflict. See below for the details.

Come and learn about Microsoft’s two dynamic languages, IronPython and IronRuby, as well as the underlying Dynamic Language Runtime (DLR) that enables these languages. This session will give you insights on scenarios where you might want to consider dynamic languages including hands on demos using the IronLanguages in Visual Studio.

In this session, we will discuss Microsoft’s core languages (Visual Basic, Visual C#, and Visual C++), functional language (F#) and dynamic languages (IronRuby and IronPython). Each language has its own style and advantages. Come chat with representatives from the Microsoft Visual Studio product teams, bring your questions, and get ready for a fun discussion!

TLA01-LNCIronRuby in ActionNovember 12 12:20 - 13:10 Room 112

In this session you will learn about the new dynamic language IronRuby. You will get a quick overview of the language with emphasis on what makes Ruby unique from more static languages such as C#, F#, and VB. You will see Ruby code being written and gain a feel for the language and how you might use it in the context of .NET.

TLA02-LNCIronPython in ActionNovember 13 12:20 - 13:10 Room 117

IronPython is the older of the two IronLanguages with IronPython 2.0 coming out late in 2008. This session will give you an overview of the Python language and show you multiple Python code demonstrations in Visual Studio with emphasis on how Python, a dynamic language, is different from the more static languages such as C#, F#, and VB.

It’s not often that a developer speaks at these events; I hope I’m able to bring something a little different to my presentations.

August 12, 2008

Let's say you're working on a project such as IronPython or IronRuby that makes use of Reflection.Emit
to generate code at runtime. You're probably used to seeing a stack trace in
Visual Studio that looks something like this:

Visual Studio will do its best to prevent you from viewing any part of that
[Lightweight Function]. It won't let you trace into those methods, even while
viewing from assembly language. If you're feeling clever, you can use the
registers view and the and memory view to identify the return address on the
stack, but there's no way of knowing for sure whether or not the caller is
actual code or just a thunk of
some kind.

But it turns out that there's another way of doing this that's fairly
straightforward.

Enter SandWindbag

The managed debugger operates at a fairly high level of abstraction (as far
as debuggers go). So when the going gets tough, the tough resort to windbg.exe
(or its command-line cousin cdb.exe). These are part of the Debugging Tools for
Windows, which can be downloaded from http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx.

Here's what you need to do.

1. If you're not viewing it in Visual Studio already, bring up the Debug
Location toolbar, which looks like this:

This will tell us the name and id of the process and thread that we need to
connect to from windbg.

2. If you haven't already, start windbg. From the File menu, select Attach
to a Process -- or hit F6.

Here's the key part. When you select the process for connecting, you want to
specify that it's a noninvasive attach:

In Windows, only a single process can connect to any given other process as
the debugger. Visual Studio is already the registered debugger for this
instance of IronPython, so windbg cannot establish the same relationship with
it. What "Noninvasive" debugging does is to use SuspendThread to suspend all
the threads in the target process and then use ReadProcessMemory to access its
internals. At that point, it's not unlike debugging a core dump; you have the
entire memory image to look at, but you can't actually set breakpoints or
execute any code inside that process.

For our needs -- looking at the MSIL and the native machine code for methods
generated through Reflection.Emit -- this turns out to be good enough.

3. Now we'll want to load the SOS Debugging
Extension. This actually ships with the .NET runtime now, so there should
be nothing for you to load. Unfortunately, the ".loadby" windbg command doesn't
seem to work when we do a noninvasive connect, so you'll have to type a full
path in the command to load the extension. With a default installation of
Windows, this will probably be

4. Let's make sure that we're looking at the right thread. You can get a
list of threads by using the "~" command, or you can choose "Processes and
Threads" from the View menu. The thread identifiers here are expressed in
hexadecimal, so you'll need to do a quick conversion from the "1440" in Visual
Studio to 5a0. Here's the output from the "~" command.

In this list, thread 10 matches the one where we hit the breakpoint in Visual
Studio -- so we can switch to this thread from thread 0 by executing "~10 s".
Alternatively, if you were viewing the "Processes and Threads" window, you could
just double-click on the thread in question.

5. Now we're ready to look at the stack. Execute the command "!clrstack".
The output should closely resemble the stack trace that you see in Visual Studio
-- except now you'll see names for all of those "Lightweight Function" frames.
You'll also get the stack pointer and instruction pointer for each frame. The
result should look something like this:

With the method descriptor, the !dumpil command will show us the actual MSIL
for this method.

0:010> !dumpil 047ee580This is dynamic IL. Exception info is not reported at this time.If a token is unresolved, run "!do <addr>" on the addr givenin parenthesis. You can also look at the token table yourself, byrunning "!DumpArray 01eb82ec".

If you want to see what the associated x86 or x64 machine code looks like,
the "CodeAddr" value from the !ip2md command will give you the starting address
of the JITted code. You can use this with the windbg "u" command (for
"unassemble").

7. Once you're done, be sure to detach windbg from the program you're looking
at; it's quite likely that Visual Studio will be hung until you do. That's
because it will be waiting to get data back from the debug thread that was
injected into the target process -- but windbg has suspended that thread on your
behalf. "Detach Debuggee" is a menu choice on the "Debug" menu.

Left as an Exercise to the Reader

One of the interesting things that's possible with cdb.exe -- the
command-line version of windbg.exe -- is to control it from a separate program.
All of the functionality we used above is accessible through cdb. You can
therefore write a program that starts a separate cdb process, piping commands to
its standard input and reading -- and parsing -- the results from its standard
output. You could even write a GUI that shows the stack of the target process,
and when the user clicks on a frame it puts up the MSIL for that frame into one
pane and the assembly code into another.

July 31, 2008

That's a good question. In fact, I had the exact same reaction when I first started looking into this issue. So I changed IronPython's MSIL generation to use filters, and my updated version was able to pass all of our tests.

Later, during my weekly face-to-face meeting with Shri, I described the work I had done and Shri was immediately able to identify a corner case that was now broken.

Results 1 - 10 of about 163 for "vb.net" "jan brady"

Before looking at some sample Python code that causes the problem, it's worth digging a little into filters to learn something about their operation. To do so, I'll need to leave my comfort zone and enter the world of VB.NET -- as C# doesn't have any functionality that maps onto MSIL exception filters.

This code is designed so that the exception filter in the Test() method will
always return false while the one in Main() will return true. As a side effect
of calculating this value, the filter will give us a visual notification that it
has run.

The exception handlers form a stack that will be traversed in reverse calling
sequence, running the Finally block of each Test() invocation before continuing
back up the chain. Our natural expectation is that the filter will be executed
just before the Finally, giving us output that looks like this:

So what happens instead is that the CLR actually walks the list of exception
handlers twice. The first time, it's looking for a catch block that will handle
the exception. In the course of doing so, it needs to run any filter methods it
encounters in order to identify whether or not that particular catch block is
"the one". Only once it finds an exception handler does it go back and run all
of the finally and fault blocks between the thrower and the catcher.

You fascinate me; tell me more!

This subtlety matters to us because of the dynamic nature of the Python
language. You can't reliably evaluate any expression in the dynamic world until
you reach that particular point in the code -- because the meaning of just about
anything is subject to change.

In order for this to work correctly, the exception criteria "except x" at the
top level must run after x is reassigned in the finally block of the test()
method. But if we use an exception filter to match the thrown exception against
x, the filter will run before the value of x is set to SyntaxError and it will
incorrectly report that it is the right handler for the job.

And even though this is an edge case, there's nothing that actually
distinguishes this code from the more usual example of "except RuntimeError"
where we supply the type of the exception. That's because the meaning of the
symbol can't be determined until runtime no matter how familiar it may look to
human eyes. This is in sharp contrast to a statically typed language, where
"catch (ArgumentException)" has to be resolvable to a definite type at build
time or it's an error.

This is part of what it means to be dynamic. Take the power and use it
wisely.