Saturday, December 21, 2013

Just a quick announcement that Sasa v0.12.0 was just released. You can obtain the individual assemblies via Nuget, or the whole set from Sourceforge. The docs are available as a CHM file on sourceforge, or online here.

Overview

This release includes a few fixes, the most prominent of which are in the HTML parser and the parsing of MIME linked resources.

Note that since v0.11.0, some extension methods on MailMessage have been deprecated due to Microsoft's recommended usage guidelines. The headers those extension methods accessed are overwritten whenever a MailMessage is sent via SmtpClient, so it's better not to rely on them.

A new feature is the sasametal utility, which is basically a wrapper around sqlmetal that normalizes some of the bizarre property names from sqlmetal into more CLR friendly names. A forthcoming blog post will cover the use of this tool.

Other new features include first-class references, and first-class slots. These are currently in the core Sasa assembly, but may be moved to a satellite assembly in the future if they don't see enough use.

Other changes cover backend work that simply expands the power of pre-existing features, like a more efficient and reusable PIC dispatch, now supporting up to 16 type arguments. There's also a new assembly, Sasa.Partial, which provides partial application overloads for System.Func and System.Action, up to 10 arguments.

Changelog

* added Strings.RemoveLast extension which is a convenient way of removing
characters from the end of a StringBuilder.
* made MailMessage parsing stricter to conform to Microsoft's usage
recommendations for MailMessage
* deprecated certain header parsers, like ContentType() and
ContentTransferEncoding(), since .NET strips these when sending mail anyway
* HTML views are no longer unpacked into MailMessage.Body since docs say this
property is reserved for plain/text
* QuotedPrintable decoding is now more permissive to encoding errors
* merged sasametal tool that normalizes the output of sqlmetal
* ilrewrite now only outputs pdb file if an input pdb was available
* adapted MailMessage parsing code to .NET 4.0 (mainly ReplyToList)
* added text encoding extension methods to ContentType
* added extension method for filtering a sequence of attachments by media type
* added extension method for extracting attachment data as a string using the
encapsulated encoding
* added extension method to overwrite an attachment's data using string
content using the encapsulated encoding
* Strings.SliceEquals now has an overload accepting a StringComparison
parameter to customize the comparison type performed
* Tokenizer now takes an optional parameter for the type of string
comparisons
* HTML parser now performs case-insensitive comparisons
* now correctly parsing alternate view linked resources
* added implementations for first-class references to all core CLR types
* added implementations for first-class slots to all core CLR types
* PIC now based on a concurrency-safe map that looks up a tuple of System.Type
* PIC and CLR expression compiler can now efficiently dispatch up to 16 params
* Sasa.Dynamics simplified by switch to new PIC
* added .NET 3.5-specific overloads of System.Func and System.Action for up to
16 params
* added Sasa.Partial assembly, which provides partial application overloads
for System.Func and System.Action for up to 10 params
* fixed a couple of bugs in Sasa.Dynamics codegen

Monday, November 18, 2013

Yesterday, I posted about the new extension of IRef<T> to arbitrary reference semantics for the CLR, including referencing inner fields, properties, and array slots. First-class references make it simple to operate on specific mutable data without caring about the underlying type of that data.

I just pushed another abstraction that handles a related, but different case: first-class slots.

ISlot<T>

An object "slot" is a value that designates a mutable location of a specific class of values, not a mutable location of a specific instance like first-class references. Where first-class references hide the underlying object type, slots expose the object type and allow you to mutate the slots of multiple objects at once, as long as they are subtypes of the slot's object type. Here's the declaration of ISlot&ltT>:

As you can see, the object we're manipulating is passed in while operating on it, so an ISlot is actually a direct descriptor that can access members of any instance of a subtype of TObj. If you're wondering when you would ever need this, rest assured that there are applications that need to write algorithms of this type. For instance, consider an object-relational mapper (ORM), which uses reflection to extract the members that need to be get/set on object flush/load from a database. Essentially, the ORM is reflecting over all slots of an object being flushed or loaded from a database, but it does so in a manner that isn't very reusable, and the object hydration code is coupled tightly to the database access code as a result.

Reifying slots as a distinct, first-class abstraction makes them independently testable, and the reflection code and database access code is now very loosely coupled. An ORM is but one example of an application that makes use of generic object slots.

Similar to first-class references, the Slot class exposes some static constructors for creating slots:

These operations are exactly what you saw in the last article, where you can create slots from object members and array indices.

It might seem at first that slots are strict generalizations of first-class references, but this is deceptive. It's true that any algorithm you'd could write using references could be rewritten to use slots, but the number of type parameters and value parameters could increase non-linearly to the point where it's unwieldy and could easily obfuscate the underlying algorithm.

Limitations

There is one limitation at the moment. Specifically, value types need to be passed by reference during update in order for assignment to be visible to callers of ISlot.Set, but this isn't currently possible given the interface. As a result, there's currently a type constraint on TObj to restrict it to reference types.

A simple solution would simply be to return TObj from ISlot.Set, so the calling context can simply overwrite its own local value with the one modified by the slot. Another possibility is to make the TObj parameter to ISlot.Set a by-ref parameter. I'm considering these and a few other options, and Sasa's v0.12.0 release will probably contain the final solution.

This makes it easy to verify their safety statically, and makes them very efficient, but it somewhat limits their expressiveness.

First-class citizens can be passed around as values, and otherwise used in any way that you'd use any other object. The above capture would work, for instance. To make references first-class citizens means constructing an object that exposes get and set operations, and that can reference the internals of any .NET type.

IRef<T>

Sasa has had the IRef<T> type for quite some time, but its full potential as a first-class reference hasn't been realized. There was only a single implementation, and that was a simple mutable slot as found in ML. I've just pushed a series of changes that effectively expands the usefulness of IRef.

You can now create a reference to an object property, a field, and a reference into an array:

The first two overloads were the pre-existing constructors for slots as found in MLs, like OCaml. The third overload creates a reference that indexes into the given array. The fourth overload taking two delegates creates a ref to object members, like properties and fields. For properties these delegates are precisely the get/set methods of the property, so there's no additional overhead to using them.

The overloads taking the reflection members should be obvious as creating references to those members. The last overload is simply a convenient means of specifying the member you wish to access in a type-safe manner using LINQ expressions. For instance, here's an example from the test suite:

The reference variable fref is constructed via a simple LINQ expression at compile-time, instead of via error-prone runtime reflection. If Foo.Bar were a field instead of a property, the above code would be unchanged. The above code is for instance members. A static member looks a little different because the reference to provide is null:

I use default(FieldObj) so type inference resolves the type parameters to Ref.Create, but you can use null and specify the type parameter manually if you like.

First-class references enable you to decouple programs using simple get/set semantics from the underlying representation being modified. The same algorithm exploiting IRef can transparently manipulate arrays, instance fields and properties, or static fields and properties.

These new ref types will be in the upcoming Sasa v0.12.0 release, or you can grab the source from Sourceforge and play with them now!

Sunday, November 10, 2013

Scala and Haskell both have very useful type-directed concepts for resolving implicit function parameters, called "implicits" and "type classes" respectively. To illustrate the impact of this, consider a generic sorting function like Enumerable.OrderBy. It must define two overloads, one that takes a custom comparer, and one that does not and so should use a default comparison for type TKey:

While it's simple to define the second overload as calling the first with Comparer<TKey>.Default for the custom comparer, in general, for N overridable parameters, you would need to define something like factorial(N) overloads for all possible permutations of defaults and overridable parameters. This can quickly become unwieldy. Ideally, we could define only a single function with optional parameters and sane default values.

C# 4.0 introduced a much more limited form of implicits called "default values" to address part of this problem. These implicit values are limited to only primitive types like Int32 and String, so they aren't useful for general resolution of implicit values, but they were a good start.

Fortunately, C# has a few other features that, when combined with default values, allow us to add something like the more flexible implicit values as found in Scala: reified generics and value types. Basically, we define the implicit parameter to be some sort of value type, whose default value is always defined, and this value type resolves to either the default instance, or to a custom instance if one is provided. For instance, we can define a single OrderBy overload like so:

You can then write any sort of instance resolution for the IComparer<T> value you like, and still allow callers to specify optional overrides. For instance, here's an example of using the built-in comparison and overriding it with a "reverse" comparer:

I reused the Enumerable.OrderBy overload for clarity, but in principle, the OrderBy function would normally just call Resolve() to access the IComparer instance to use locally. We can define similar implicit arguments to use for resolving instances of IEqualityComparer, and so on, but there's perhaps a more uniform, reusable approach using only a single struct for all such implicits:

With a little more work for the library writer, this allows us to use a single struct for all implicit parameters, and the receiving function will simply need to provide the default instance if no override is specified. For instance, here's OrderBy rewritten using this type:

In order to avoid hard-coding the source of the default implicit in every function, it's better to use dependency injection. This will will decouple defining the default value for implicits from the uses, thus allowing more flexibility:

Technically, you don't even need the Implicit type in parameter list, as all reference types can be null be default. However, the presence of Implicit is good documentation implying that some sort of implicit instance resolution is taking place, and that this process can be overridden.

Finally, there are still some unfortunate C# warts in this approach. For instance, C# does not allow implicit conversions to or from interface types, so you will always have to explicitly construct the CompareImplicit or Implicit wrapper around a custom instance, even though Implicit defines an implicit conversion. This is annoying, but hardly the only time C# has restricted expressiveness for flimsy aesthetic reasons. I would also define an extension method to simplify lifting a custom value into an Implicit wrapper:

Wednesday, October 30, 2013

A new release of Sasa is now available on Sourceforge and Nuget. This release contains few bugfixes and some minor new convenience features.

Changelog

MailMessage extension method Reply() now considers the media type of the
original body, and if it's not text/plain, then the original body is attached
to the reply message as an inline attachment

if original e-mail doesn't specify sufficient content-type information,
generate it and reset the original e-mail's header

new IEnumerable extensions, SingleSelect and FirstSelect, which select a
component of the only or first items in a sequence, respectively, that match
an optional predicate, even if the item returned is null
refactored ToRaw extension to simplify and generalize MIME output

Friday, October 4, 2013

I just uploaded v0.10 of Sasa, my open source class libraries. This release features a few small bugfixes and enhancements to MIME parsing, and some useful new concurrency features. .NET 4.0 binaries are now also provided.

Vance helpfully provided the code he used for his benchmarks, so I decided to see if perhaps I was mistaken. Lo and behold, the numbers I received from running his code matched my past results, ie. RuntimeTypeHandle provided no advantage. This seemed extremely odd, and after digging a little deeper, it turns out that Vance and I are both right. I've been doing most of my development on 64-bit x64 machines, and I suspect Vance was running x86 at the time. It turns out that the x64 runtime for the CLR is woefully underperforming when compared to x86, at least for this type of code. I knew there was a performance difference between the two, but the x86 backend really shines in dynamic type tests, demonstrating the CLR team's heavy investment into x86 optimizations.

The Original Tests

I'll first present the results of the original benchmarks, this time compiled for x64, x86, and Any CPU:

These results differ significantly from Vance's original data. The type handle test was very nearly the slowest way to perform dynamic type tests, not the fastest as Vance concluded. The fastest were direct comparisons of System.Type, and subtype tests via the 'is' operator.

Here we see results much closer to Vance's original data. Type handles are blazingly fast, and the JIT seems to recognize certain comparison patterns and optimizes them heavily.

The Generic Tests

Vance's benchmarks only compared the costs of hard-coded type tests, where the JIT has better visibility on the static types involved and can perhaps optimize more readily. This doesn't necessarily translate to code that performs dynamic type tests on generic variables, so I duplicated Vance's test suite to operate on an abstract type T. The numbers were once again quite surprising. The fastest operations when the static types are hard-coded, became the slowest when operating on an abstract type parameter. The slowest tests became just a little slower when operating on type parameters, but not dramatically so.

As I mentioned, the fastest x64 type tests became the slowest tests when operating on generic type parameters. The other type tests slowed down slightly as well, not nowhere near as dramatic a fall. Still, my past results remain: comparing System.Type instances seems to be the most efficient, and stable way to compare dynamic types, at least for x64.

Here we partially confirm Vance's data again, whereby typeof(T).TypeHandle is incredibly fast on x86, but nearly 5 times slower than the non-generic test (which seems odd considering it should just be one or two memory fetches, so a 5x slowdown seems excessive).

However, the second and third fastest ways to perform dynamic type tests on x86 when static types are known, became the slowest when operating on generic parameters. This performance degradation matched the trend seen on x64, so at least that's consistent. The standard operations on System.Type are roughly the same, so their performance was much more stable.

Conclusions

I'm not sure we can draw any reliable conclusions from this data as to the fastest way to perform dynamic type tests, at least across platforms. Using RuntimeTypeHandles should be the fastest, but their poor showing on x64 makes me question whether it's worth it. Hopefully the CLR team will put some more effort into optimizing the x64 code generator to improve the performance of RuntimeTypeHandle. As it currently stands though, operating on System.Type seems like the most stable across all the platforms, and more importantly, it doesn't degrade in the presence of generics.

You can download the code for the modified benchmark suite here. All credit to Vance for the code, I just copied and pasted the tests and modified them slightly to operate on type parameters. The x86 and x64 builds are selected via different build configurations. Release mode builds for Any CPU.

Edit: The results above target .NET 2.0, which largely shares the same runtime as .NET 3.0 and 3.5, and the results are the same all the way up to .NET 4.0. The .NET 4.0 VM allegedly underwent many changes, and after performing a few preliminary tests, I can say that the above numbers are all completely different. RuntimeTypeHandle is now the slowest test on x86, and x64 is way faster than x86 on pretty much all the tests. System.Type is stable and fastest even for generics. So the conclusion seems pretty obvious now: stick to System.Type and avoid RuntimeTypeHandle at all costs.

As you can see from these updated results for .NET 4.0, the x64 VM is now comparable to the x86 VM, largely because the x86 VM appears to be slower than in .NET 2.0. The advantage of RuntimeTypeHandle in .NET 2.0 is completely gone, and it's now (surprisingly) the slowest means of comparing runtime types. Comparing instances of System.Type appears to be the fastest all around, and doesn't degrade if you're comparing generic type parameters.

Saturday, September 21, 2013

The CLR's value types are incredibly useful for reducing memory usage of programs, but they have a severe limitation in concurrent scenarios: structs larger than the atomic type on a given machine can suffer from torn reads.

Most typical applications won't encounter this because their concurrent accesses, both reads and writes, are protected by locks. However, there are some scenarios where locks just aren't viable for various reasons. For instance, if a few shared variables are read an order of magnitude more often than they're written, then all the lock contention is 90% wasted work among readers that aren't performing any updates.

In principle, the lock is really there to permit only one writer to modify the variable at a time. This means we can possibly use some other signalling mechanism to notify readers that a write is taking place, or has taken place. Ideally, this mechanism shouldn't cause contention among readers thus permitting more read parallelism.

The typical abstraction for this is a reader-writer lock (rwlock). Basically, a number of concurrent readers are permitted to access a resource protected by the rwlock, and writers have to request access and wait until all readers are done. Then the writer proceeds, and all readers must wait until the writer is finished. Unfortunately, rwlocks aren't all they're cracked up to be. Most of the problems stem from the fact that any amount of shared state will inherently limit scalability, but part of the problem is also because readers are performing writes, and thus they are introducing unnecessary contention.

Turns out, it's possible to solve this contention issue using only an additional piece of data: a version number. Writers still coordinate via a standard lock, and when a writer enters the critical section, it increments a version number and executes a memory barrier. The version number is now odd, which indicates that a write is in progress. Then the writer writes to the variable, executes another barrier, then increments the version number again. The version number is now even, indicating that no write is in progress:

Readers instead only consult the version number to check whether a write is in progress, or whether a write has transpired since we first started the read. We read the version number into a local called 'old', and then spin until the version number is even, which indicates that no write is in progress. Then we read the value from 'location' into a local.

However, a write could have occurred since we exited the loop checking for an odd version number. So we read the version number again and compare it against 'old'. If it differs, that means a write occurred while we were reading, possibly corrupting our read. In that case, we abort and retry the whole process from the beginning. If the version number matches 'old', then the value we read is correct, and we can safely return it [1]:

So we've achieved our goal: we have fully concurrent reads that don't contend for resources, and that don't block writers. I've just included these two atomic read/write functions, and some useful variants, into Sasa's Atomics class.

Turns out this approach isn't new, and Duffy covered a similar approach in a later blog post which I only came across after writing this piece. He rightly points out that these are the sort of techniques employed in software transactional memory (STM), whereby we do as much as possible optimistically, then validate at the end that nothing untoward happened (like a write when we weren't expecting it). If the unexpected happens, we "abort" and retry. As pointed out in the comments to that post, this is essentially the seqlock locking mechanism as used in the Linux kernel.

I don't anticipate using this too often, but I wouldn't have included it in Sasa if it didn't have an important application within Sasa itself. Sasa.Reactive is the assembly that provides reactive variables that can be concurrently read, but only updated by a single writer. I designed the atomic read/write functions above while refactoring the Sasa.Reactive implementation to be more robust, yet more conservative in resource use. These atomic read/write functions allow concurrent readers to safely obtain a snapshot of a reactive variable's value, without resorting to storing values in atomic types, like a reference type.

Hopefully others will find it useful as well. If anyone spots a problem in the above algorithm please do let me know! The CLR provides a relatively strong memory model, so I'm pretty sure I can eliminate one memory barrier in the atomic write function, but I'd love to have some other input. Concurrency is hard after all!

[1] Technically, the version number could have been incremented so many times that it wrapped around until it matches the saved 'old' value, but IMO it's exceedingly unlikely that 232 writes occurred while we were trying to read a single variable.

Friday, September 20, 2013

Awhile back I had written about a a type unification nuisance I had run into. In a nutshell, the problem occurs when a class with two type parameters tries to implement the same interface twice, once for each type parameter:

As Doug McClean pointed out in the comments, the reason behind this error is because the two implementations of the interfaces may not be confluent, ie. behaviourally identical, in which case there's no legitimate way to choose between the two.

The application I had in mind at the time used marker interfaces, ie. interfaces with no methods or properties, so they were guaranteed to be confluent. I also had a sneaking suspicion that C# already permitted this structure elsewhere, but they just try to superficially enforce this rule like some other annoying aesthetic constraints.

This turns out to be exactly the case, and it is possible to implement the same interface twice for two type parameters. All you need to do is implement the interfaces at two separate classes in the same inheritance hierarchy, and C# lets this pass with nary a whimper. Here's the sample code:

So reduction is still confluent because all views of IFoo<T> go through the most recent implementation in the inheritance hierarchy. The only way to call the Bar method on FooBase is by explicitly casting to an instance of FooBase and invoking Bar.

This recently bit me since I was implementing an IObserver<T> that was observing two differently typed streams, but because the interfaces were declared at different levels of the inheritance hierarchy, the compiler never complained. Not very common I agree, but for consistency, I'd suggest that either this structure too be ruled out, or that type unification on type parameters be permitted via some convention just like it's allowed via inheritance. For instance, select the implementation for the first (or last) type parameter.

Tuesday, September 17, 2013

A bit of departure from my usual practical posts in C#, I decided to try my hand at an abstract computer science topic. I've become interested in concatenative programming, and combinator calculi are right up this alley.

The quintessential combinator calculus is the well-known SKI calculus, which features two core combinators, S and K, and one optional combinator I. SKI is quite simple, and it's pretty straightforward to implement as an EDSL. I suspect the key stumbling block for most programmers unfamiliar with functional programming will be the pervasive partial application inherent to combinators.

Partial Application

Partial application is a pretty simple concept for anyone whose done even a little programming with LINQ. C#/.NET features first-class functions called "delegates" with types like:

public delegate T2 Func<T0, T1, T2>(T0 arg0, T1 arg1);

That's the type declaration for a delegate that accepts two arguments of types T0 and T1, and returns a value of type T2. Partial application is simply an operation that applies a function to only some of its arguments, with the result of the partial application being another function that accepts the remaining arguments.

We can simulate a partial application operation in any language featuring first-class functions as follows:

The above code sample simulates partial application for the two-arg delegate. For every delegate type accepting arguments of types T0, T1, ... Tn, partial application requires a set of Function.Partial overloads for partially applying 1 argument, 2 arguments, 3 arguments, ..., up to n-1 arguments. This reduces to the formula:

(n choose 1) + (n choose 2) + ... + (n choose n-1)
= 2n - 2

Normally, partial application is provided by a language's compiler, so all these overloads aren't typically required in a language that has strong support for functional programming.

There is also another way to support something like partial application in a language without needing copious overloads: currying.

Aside: Currying

Most .NET languages feature uncurried functions by default, which is to say that functions can take multiple arguments (F# being the only exception I'm aware of). Curried functions only ever take a single argument, and return a single value, no exceptions. Functions that accept multiple arguments can be simulated by functions that return other single-argument functions that encapsulate the previous arguments.

For instance, consider simple integer addition, which can be represented by the following delegate:

Func<decimal, decimal, decimal> add = decimal.Add;

If C# featured curried functions, this would instead look like:

Func<decimal, Func<decimal, decimal>> add = decimal.Add;

As you can see, add has now become a function that only accepts a single argument, and returns a single argument: it accepts an argument of type decimal, and returns a function that accepts a decimal argument and returns a decimal. In abstract, all N-argument function applications are then a series of N-1 partial applications terminated by a final application that invokes the underlying operation:

Currying is a pretty powerful concept because it simplifies functions into a standard form, taking and returning single-values, which allows easy composition of functions in higher-order patterns, like we find in LINQ.

Languages with currying by default avoid all the above parentheses by simply having whitespace as application. For instance, the above in F# is something like:

This syntax closely matches the concepts you'll find in combinator calculi like the SKI calculus.

The SKI Calculus

A minimal understanding of partial application is needed in order to understand how to implement the SKI calculus in C#. Basically, you can view the S, K and I combinators either like curried functions, or like operators supporting partial application by default. In abstract, the SKI combinators are described as:

Basically, I is a function that accepts one argument and simply returns it. K is a function that accepts two arguments, and simply returns the first argument. S is a function accepting 3 arguments, and first applies the first argument to the third, expecting it to return a function of some sort, and then applies that function to the result of applying the second argument to the third argument.

It's important to note here that all function applications above can be partial applications. So you can have a legitimate expression like "S x", which supplies S with only one argument, even though S takes 3 arguments; it's simply a partial application. The result of "S x" is then a function that accepts at least two more arguments.

This is where currying comes in handy, because partial application semantics is pervasive. The SKI calculus in a language with curried functions would look like:

This is how the raw SKI calculus is generally presented in the academic literature.

Given that the SKI calculus supports pervasive partial application, we need to create a class hierarchy that represents the S, K and I terms, and all possible partial applications. We can do this naively in standard object-oriented fashion with the following set of classes:

This is a standard object-oriented representation of curried SKI combinators. Each combinator takes a single argument of type SK, and returns a single value of type SK. This is all well and good for combinators like I, which only take a single argument, but K and S take more. This where partial application comes in. We return intermediate terms K1 and S1 for partial applications of K and S, respectively:

And thus we have the full SKI combinator calculus. An example SKI term:

var SKSK = new S().Apply(new K()).Apply(new S()).Apply(new K());

This is pretty awkward, and we can do much better. You might have noticed that the core S, K, I classes were singletons, and that's because we only ever need the one instance since they contain no state. Combined with some additional properties on the base SK class, embedded SKI expressions can be pretty convenient:

The SKI calculus is actually Turing complete, so any computable function can be computed by some SKI expression. For instance, here's a simple infinite loop:

var SII_SII = new S().I.I.Apply(new S().I.I); // loops forever

The Functional Representation

The above is a canonical object-oriented representation of terms. C# provides us with some other options though, including a more functional encoding. Simply stated, we convert the class hierarchy into enums representing a disjoint union, and evaluation is a switch over the possible cases instead of a method dispatch:

Combinators are here represented in a form that's closer to an efficient instruction encoding for an abstract machine. We have an opcode representing the combinator, and we have a list of arguments represented as an array. The I combinator only ever gets 0 or 1 element arrays. The K combinator only ever gets 0, 1 or 2 element arrays, the first two of which represent partial applications. And finally, the S combinator only ever has 0, 1, 2, or 3 element arrays.

I tend to find this representation more convenient to work with since the evaluation function is defined concisely in one place. It's also a small step from this encoding, to an opcode array representing an instruction machine for an SKI machine (I'll write about it if there's interest).

We can make embedded SKI expressions convenient with the following helper properties:

This is the direct algebraic equivalent of the first object-oriented representation of the SKI calculus, with values representing the partial applications of combinators.

What's It All For?

This was really just a bit of fun in exploring abstract computer science concepts with C#, and hopefully can serve as a rudimentary introduction to convenient functional encodings for language terms. Combinator calculi are a good introduction because they are simple and concise, yet computationally complete.

Despite being simple and studied for a long time, there's still interesting work being done with these calculi. For instance, a great feature of some dynamically typed languages which still isn't well understood in typed terms, is metacircularity. Dynamically typed languages can represent native programs as values, can inspect those values, and can execute those values (Lisp is the canonical example).

However, doing this in a statically typed language is still an open problem... except for typed combinator calculi! I believe the SFB combinator calculus is the first example of a statically typed language that can inspect and evaluate its own programs, and it doesn't need anything beyond the types of System F, which we also have in C#.

Like the SK calculus, the SFB calculus can represent lambda abstraction and application, and so in theory, we actually have a metacircular statically typed language! The downside is that the metacircularity is at the level of the combinator terms, not at the lambda abstraction level, so it's allegedly a little awkward to use for programs written using lambda abstraction.

Still, concatenative languages exist, and some are quite popular, like Factor, so perhaps there's some application of this approach to concatenative languages. My next post will probably be a C# encoding of the SFB calculus, which I've already written, along with some interesting terms showing its expressive power.

Friday, July 26, 2013

Some minor bugfixes to the new Sasa.Net.Message.ToRaw extension method prompted a minor release of Sasa v0.9.4.3. Nothing else was changed.

I apologize for the excessively long version numbers. The next major Sasa release will be v1.0, and henceforth, point releases will be reserved for bugfixes and other API/backwards-compatible changes. Incompatible changes will increment the major version number,

Sasa.Enums

Sasa.Enums provides a statically typed API for working with enums, analogous to the dynamically typed System.Enum. Every method call in System.Enum that accepts a System.Type representing the enum type, here accepts a type parameter that is constrained to be of type enum.

Sasa.Enums.HasFlag

The Sasa.Enums.HasFlag extension method checks for the presence of flag bits set in an enum that has the FlagsAttribute applied:

Sasa.Enums.Names

The Sasa.Enums.Names static method provides a enumerable sequence of strings corresponding to the enum's descriptive names. Essentially, this is the equivalent of System.Enum.GetNames, except it's statically typed with an enum constraint, and it returns a cached immutable sequence so it avoids the overhead of allocating an array for every call:

Sasa.Enums.Values

The Sasa.Enums.Values method provides all the underlying values of the given enum type. Like the Enums.Names method, it returns a cached immutable sequence so it avoids the overhead of allocating arrays for every call:

Saturday, June 29, 2013

I've just uploaded the final Sasa v0.9.4 release to Sourceforge and Nuget. The full API documentation for all assemblies in the Sasa framework is available here. The full changelog is available in the Sourceforge download, or directly in the repo here. Suffice it to say, changes since v0.9.3 include hundreds of bugfixes, and many, many new features.

A brief overview of what Sasa is, and what features it provides is covered on the wiki, and reproduced below.

Sasa Class Libraries for .NET

Sasa is a set of organized class libraries for the .NET framework 3.5 or higher. Here's an overview of the assemblies available and the features they provide:

After this post, I will have covered everything in the core Sasa assembly, Sasa.dll. The first two posts covered Sasa.Parsing.dll and Sasa.Dynamics.dll, and now that the core assembly has been covered, I have only a few final cleanups of the codebase before I release v0.9.4 final on Sourceforge and Nuget.

Of course, there still remains Sasa.Concurrency, Sasa.Binary, Sasa.Collections, Sasa.Numerics, Sasa.Linq, Sasa.Mime, Sasa.Net, and Sasa.Reactive to document, at the very least. The world of Sasa is far larger than what's been covered so far, so there's plenty left to explore! I will continue to write periodic blog posts or other sorts of documentation covering these assemblies, but I won't hold up the v0.9.4 release any further.

Sasa.Lazy<T>

.NET 4.0 was released with a Lazy<T> type, although the one in Sasa predates this one by quite a bit and is somewhat simpler. In principle, there is little difference between a Lazy<T>, a Task<T>/Future<T> and a reactive value, React<T>, like that found in the Sasa.Reactive assembly. In fact, the latter largely generalizes the semantics of the previous three since you can easily register for notifications and construct chained computations ala promise pipelining. As such, Sasa.Lazy<T> may one day be replaced by React<T> or something even more general. But it's available in the meantime, it's simple, and it's well tested in production environments.

Sasa.Lazy<T> Interfaces

The lazy type implements the following core interfaces: IValue<T>, IOptional<T>, IResolvable<T>, IVolatile<T>, and IResult<T>. To summarize, this set of interfaces exports the following methods and properties:

// starts the lazy computation, if not already
// run, and returns the computed value
T IRef<T>.Value { get; }
// returns false if the computation has not yet started
// or it returned an error of some kind
bool IResolvable<T>.HasValue { get; }
// returns false if the computation has not yet started
// or it returned an error of some kind and sets
// 'value' to the encapsulated value
bool IVolatile<T>.TryGetValue(out T value);
// provides the exception generated by the lazy computation
// if any was generated; if an exception was generated, then
// HasValue and TryGetValue both return false
Exception IResult<T>.Error { get; }

Sasa.Values

Sasa.Values is a static class intended to encapsulate a set of useful methods defined for all values. It currently exports only a single overloaded extension method.

Sasa.Values.IsIn

Sasa.Values.IsIn is an overloaded extension method used to check if a value is contained within a collection of other items. Logically, it's equivalent to SQL's "IN" operator. In it's most general form, IsIn is a shorthand for Enumerable.Contains, but the more specific overloads are far more efficient and don't require the construction of a collection to check membership:

As explained before, you can already perform this check in standard C#, but it's far more verbose. You either have to create a switch statement, or a set of logical conjunctions in an if-statement, or you have to construct a collection and call Contains like so:

As you can see above, there are two overloads, one accepting a simple value, and one accepting a function that produces a value for the cases where the default value may be expensive to produce.

Note that there may be a few more abstractions or extension methods in the core Sasa dll that weren't entirely documented in this series, but these are abstractions whose API isn't entirely satisfactory, and may soon be refactored or removed. For instance, this is the case with Dictionaries.FindOrOtherwise, which has been in Sasa for quite awhile, and which I've used in some production code, but will probably be replace by the more genreal FindOrDefault.

It's well known that C# forbids certain core types from appearing as constraints, namely System.Enum, System.Delegate, and System.Array. Sasa.TypeConstraint are two purely declarative abstract classes that are recognized by Sasa's ilrewrite tool, and which permit programs to specify the above forbidden constraints. This pattern is used throughout Sasa to implement functionality allowed by the CLR but that would normally be forbidden by C#.

There are two definitions of TypeConstraint, one with only a single type parameter, and one with two type parameters. The second extended definition is unfortunately required to express some corner cases. It's primarily used in Sasa.Operators, and it's generally only needed if you're going to use some methods defined with TypeConstraint within the same assembly. If you can factor out those constrained methods into a separate assembly, you shouldn't ever need it.

Sasa.TypeConstraint.Value

The Sasa.TypeConstraint.Value property allows code compiled assuming a value of type TypeConstraint<Foo> to access the underlying value of type Foo. The ilrewrite tool erases calls to this property as well, leaving the access to the raw value. The following example is actually the definition of Sasa.Func.Combine:

TypeConstraint also defines implicit conversion from T to TypeConstraint<T> and TypeConstraint<T, TBase>, so you should never have to construct such an instance manually. If you forget to run ilrewrite on an assembly, attempting to construct or access any members of TypeConstraint will throw a NotSupportedException naming the assembly that needs rewriting.

Any options not listed above will simply be ignored by ilrewrite. The /verify option runs the platform's "peverify" tool to ensure the output passes the CLR verification tests. The /key option additionally creates strongly named assemblies from the given key file. This way you can keep strong names, you just need to defer signing to ilrewrite.

I should also note that this is how Sasa is complying with the LGPL while also providing strongly named assemblies with a privately held key. The LGPL stipulates that Sasa users ought to be able to replace my assemblies with their own whenever they wish, and this is possible using ilrewrite. An assembly that was built against my Sasa.dll simply needs to pass through ilrewrite that's given a different Sasa.dll signed with another key, and the output assembly will then be bound to the new assembly and key. After a brief exchange with a associate member of the EFF, these terms seemed satisfactory, although I should note that he isn't licensed to practice law, and his opinion does not constitute an official EFF response on this issue.

To integrate ilrewrite into my VS builds, I simply place ilrewrite in a bin\ folder under the root folder of my solution, then add the following line to my post-build event:

When running a DEBUG build, $(ConfigurationName) specifies the /DEBUG option, and any other configuration specifies an unknown option that ilrewrite simply ignores.

Sasa.Raw: Building on Sasa's Constrained Operations

From time to time, it may be necessary to build upon the constrained operations provided by Sasa, ie. defining a generic but different Func.Combine from the one above, but which takes some additional parameters. Here you'll run into a little trouble because you'll need to specify TypeConstraint<T> on your function, but the rewritten function in Sasa.dll is expecting a T. For this reason, Sasa also ships with Sasa.Raw.dll, which is Sasa.dll prior to rewriting and signing. This means all the TypeConstraint IL is intact, and you can write your extensions as needed.

The step by step procedure is:

Link to Sasa.Raw.dll, not Sasa.dll, when building your project.

In the post-build event, delete Sasa.Raw.dll and copy over Sasa.dll.

In the post-build event, run ilrewrite as you normally would.

This is exactly the same procedure you'd follow when replacing my Sasa release with someone else's. For instance, here's the post-build event for Sasa.Reactive:

Sasa.IO.DisposableFile is a simple object intended to automate the handling of temporary files. Like most IDisposable objects, a DisposableFile is intended to be used within a "using" block, and the file it references is deleted upon exiting the block: