This gives us an ability to dispose the objects without repeating null-check:

try {
writer.DisposeSafely();
}
finally {
writer = null;
}

Let's go further now. Frequently you must safely dispose two or more IDisposable objects. Note that "safely" is an essential word here. You can't write something like:

disposable1.DisposeSafely();
disposable2.DisposeSafely();

This code may fail: an exception can be thrown on execution of its first line, and if this happens, its second line won't be executed at all. So disposable2 has a chance of not being disposed. That's why it's a good idea to implement few helpers allowing us to deal with such issues safely.

And now we're going to the final step. Let's add one more extension method to our DisposableExtensions:

/// <summary>
/// Joins the specified disposable objects by returning
/// a single <see cref="JoiningDisposable"/> that will
/// dispose both of them on its disposal.
/// </summary>
/// <param name="disposable">The first disposable.</param>
/// <param name="joinWith">The second disposable.</param>
/// <returns>New <see cref="JoiningDisposable"/> that will
/// dispose both of them on its disposal</returns>
public static JoiningDisposable Join(this IDisposable disposable, IDisposable joinWith)
{
return new JoiningDisposable(disposable, joinWith);
}

When this is done, you can use the following code with Join to safely dispose two or more IDisposables:

Monday, July 20, 2009

It's well-known that Microsoft Help 2.0 SDK can be installed only as part of Visual Studio 2005/2008 SDK - there is simply no separate installer for it. Thanks to us, there is such an installer now, moreover, it contains both Microsoft Help 1.0 and Microsoft Help 2.0 SDKs.

You can download it here (as .rar archive) - there are just 2.8 MB instead of more than 100 MB you should normally install (Visual Studio 2008 SDK).

We've faced the issue while implementing compilation to executable providers for our RecordSet engine for DO4. We should compile generally any expression, such as predicate passed to FilterProvider here. Event quite simple LINQ queries using just a single .Where invocation normally require compilation of 2 expressions: one for original .Where criteria, and one - for index range definition for RangeSetProvider (we build it as expression).

Obviously, we recommend everyone to use our cached queries in DO4 - they eliminate the problem. But what if you can't cache the query - e.g. if it is built dynamically, and actual instances almost always differ?

So we've implemented LambdaExpression compilation cache. Being implemented well, it solves all the above problems.

As you see, the acceleration factor varies from ~ 3x to 26x! Even this result seems very good. But we decided to investigate why results of standard .NET expression compilation differs so much here and implemented one more test:

Always new expression compilation test:- Expression: (int a, int b) => i;- Without caching: 1,707 K compilations/s.- With caching: 27,960 K compilations/s.i is an integer increasing its value by one on each compilation attempt. So it looks like there is already some kind of caching in .NET expression compilation logic. But IT FAILS even on such a simple case (difference in constant value)! Probably, they simply cache the compilation result using expression instance as key.

So "true" acceleration factor is at least 16x.

Implementation

Note: Further I'll use GenericType(of T) instead of standard C# notation with square brackets, because Blogger hates them - it simply cuts them out.

There is Xtensive.Core.Linq.ExpressionCompileExtensions class providing a set of additional .CachingCompile() methods to Expression type. Its usage is almost the same as of original .Compile method:- Original code: var compiledLambda = lambda.Compile();- New code: var compiledLambda = lambda.CachingCompile();

How it could work:- Find cached version of provided expression tree in some dictionary- Return its compiled version, if it exists; otherwise, compile & cache it.

Actually everything is much more complex:- Expressions aren't comparable for equality. They neither override Equals, nor GetHashCode. But comparison for equalitry is required to use them as key in dictionary. We can't use default implmentation as well - Expression instances are almost always built anew rather than cached.- Even if they would be comparable for equality, they won't really be equal because of closures: a new instance of closure is referenced by corresponding ConstantExpression on the subsequent creation of expression tree. Even if we could be able to compare such constants for equality, they would almost always differ.

So we've implemented:- ExpressionTree class implementing IEquatable(of ExpressionTree), and properly overriding default GetHashCode & Equals. This allows us to compare expressions.- ConstantExtractor - a visitor rewriting the original expression and removing any dependencies on constants from it. In fact, we rewrite original expression replacing each const of type T there to ((T) additionalLambdaParameter[constNumber]) there. E.g. () => 1 would be rewritten to (consts) => ((int) consts[0]), and we would cache this expression. ConstantExtractor builds the array of constant values (actual consts value) during processing of the original expression. Since this happens on any attempt to compile the expression by our caching compiler (because first of all we must build caching key - an ExpressionTree containing no constants), we always have this array of constants.- So in the end we always have both compiled expression (processed by ConstantExtractor) and the array of constants. So we should just "bind" this array to the comiled expression. "Bind" means converting a pair of f(consts,a,b,c, ...) and extractedConsts to g(a,b,c, ...) = f(extractedConsts,a,b,c, ...). This actually done by corresponding overload of .CachingCompile method. Its typical code is:

.Bind is one more extension method provided by Xtensive.Core.Helpers.DelegateBindExtensions class.All .Bind and .CachingCompile versions are generated by T4 templates - finally we've found the place where we could use them ;)

And finally, we use our ThreadSafeDictionary as actual cache, so any cached compiled result is never purged. Initially this may look as a serious lack, but actually it isn't: currently .NET is incapable of unloading any IL code. Even if we don't use cache, expression compilation eats some RAM, and this RAM is never released. So our "caching compiler" just eats a bit more - certainly, only if we don't hit the cache.

Conclusions

Pros and cons:- We've got much faster expression compilation- It's quite likely this solution significantly decreases the amount of RAM consumed by complied expressions during the application lifetime, since it significantly decrease the number of actual compilations.- The compiled expression we return is a bit slower, because we replace constants to array accessors there and add one more delegate call (.Bind method does this). But I feel this will be acceptable in 99.9% of cases.

Possible improvements:- Use lightweight expression adapters instead of Expression descendants as result of ConstantExtractor, such as the ones from MetaLinq. .NET expressions perform lots of checks during their construction, which aren't necessary here. We must just be able to compare such tree for equality with another one, compute its hash code, and much rarely - convert it to .NET expressions to get it compiled. I feel this should improve the performance at least twice.

Usage

- Download DO4 - it contains compiled version of Xtensive.Core.dll, as well as its source code- Add reference to Xtensive.Core.dll to your project- Add "using Xtensive.Core.Linq;" to C# file containing lambda.Compile()- Replace lambda.Compile() to lambda.CachingCompile().

Wednesday, May 6, 2009

Have you ever heard about code generator built-in VS 2008 by default? Microsoft named it "T4: Text Template Transformation Toolkit". It is integrated into Visual Studio and very easy to use. Let me demonstrate it on example.

Let us suppose we need a class with methods like those for such primitive types as Int32, Double, Decimal, DateTime, TimeSpan and other.

Because it isn't very interesting to implement it manually I suggest to use built-in code generator. How we can do this:

1. Create class SmartParser.cs and implement those methods for one type, e.g. for Int322. Rename SmartParser.cs to SmartParser.tt3. Now it is possible to use ASP.NET-like tags and to generate this code for all types we need and Visual Studio will automatically create file SmartParser.cs

Sunday, November 9, 2008

Each C# developer knows the "internal" access level keyword — The "internal" keyword is an access modifier for types and type members. Internal types or members are accessible only within the same assembly.

But don't be fully confident in it when dealing with microsofties. Those artful guys invented in the .Net Framework 2.0 a special attribute — "InternalsVisibleToAttribute" and now your internal members can become - guess what? - public! (of course they will be visible only for specified assemblies).

No one knows the real reason for such an "invention" but this innovation has been highly estimated by test-driven development adopters because the usage of this attribute allows your test libraries to access internal classes and methods for additional testing and coverage.

I think that any other usage of it can be considered a lack of architectural design but what about the microsofties? Do they use it for product-level assemblies or for testing purposes only? Let's see. Oren Eini made some interesting investigation on how the attribute is used inside the .Net Framework:

System.Data allows:

System.Data.Entity

SqlAccess

System.Data.DataSetExtensions

Microsoft.NETCF.Tools allows:

System.Web.Services

Microsoft.Office.Tools.Common.v9.0 allows:

Microsoft.Office.Tools.Word.v9.0

Microsoft.VisualStudio.Tools.Office.Designer.Office2007

Microsoft.VisualStudio.Tools.Office.Designer.Office2007Tests

Microsoft.VisualStudio.Tools.Office.Outlook.UnitTests

Microsoft.Build.Conversion allows:

Microsoft.Build.Conversion.Unittest

Microsoft.Build.Engine allows:

Microsoft.Build.Engine.Unittest

System.Core allows:

Dlinq.Unittests

And so on.

As for Xtensive products, unit testing is the only application of the "InternalsVisibleTo" attribute.

Friday, November 7, 2008

This time I publish a link to an article - the article itself is really perfect. "Must know" for any .NET developer.

Here are some quotations from it to make you a bit more interested:- Do allow your Dispose method to be called more than once. The method may choose to do nothing after the first call. It should not generate an exception.- Consider setting disposed fields to null before actually executing Dispose when reference cycles in an object graph are possible.- Avoid throwing an exception from within Dispose except under critical situations where the containing process has been corrupted.- Do not assume your finalizer will always run.- Do write finalizers that are tolerant of partially constructed instances.- Do write finalizers that are threading-agnostic. Finalizers can execute in any order, on any thread, can occur on multiple objects concurrently, and even on the same object simultaneously.- Do gracefully handle situations in which your finalizer is invoked more than once.

The test is actually quite simple: we read specified field in a loop. As before, on Core 2 Duo @2.66GHz. The code can be found in DataObjects.Net 4.0 test suite, see Xtensive.Core\Xtensive.Core.Tests\DotNetFramework\FieldTypeTest.cs.

Now the main question: why? It isn't so obvious [ThreadStatic] is ~ 60 times slower than static.

JITted [ThreadStatic] access code actually always consists of two parts:- Call to a system routine returning address of [ThreadStatic] field by its token- Regular field access instruction.

Obviously, the first part (call) "eats" almost the whole execution time: there is no more efficient way to get the address of such a field by its token rather than using hash table. As I've mentioned before, reading from a system hash table takes ~ 10x. So that's nearly what we have in this case.

Why they're implemented this way in .NET? I can't imagine why they didn't use some faster approach. E.g. I suspect calculating the lowest stack boundary (as well as the upper one) from the current stack pointer value is quite simple operation - something like bitwise and. Why we can't store the address of the first [ThreadStatic] location as fixed address nearby it, and use constant offset for each [ThreadStatic] field relatively to the address of the first one? In this case it would take ~ 1x to access it...

Ok, this is what could be, but in reality we have 14x.

Finally, there are thread data slots as well. But they're 10 times slower than [ThreadStatic] fields, so it's always better to simulate their behavior with e.g. a Dictionary stored in [ThreadStatic] field.

Wednesday, October 29, 2008

Here I'll talk about the cost (or performance) of various ways of method invocation in .NET.

First of all, let's assume:- Virtual method call time is 1x. This is about ~ 600M calls / sec. on Core 2 Duo @ 2.66 GHz. To be precise, we're talking about instance method getting no arguments and returning a single Int32 value (i.e. it is an average property getter).

So:- Delegate method call time is 1.5x (the same method invoked by delegate pointing to it)- Interface method call time is 2x (the same method invoked on reference of interface type)

A bit surprising, yes? The explanation is here. Briefly, interface method dispatch is more complex than delegate dispatch, since we must locate appropriate interface method table for the instance we have first. If count of implemented interfaces on a particular type is quite large (that's actually almost impossible), the only good way to do this is to use hash table. But if it is small (that's the most frequent case), there must be just a search in small [possibly - ordered] list. But this is anyway more complex than in case with delegate, since delegate already contains the exact method address (this isn't correct for open delegates).

Now one more fact to think about: all the tests I'm talking about are loop-based, performed in Release configuration, and their code is nearly the following:

.NET is able to cache resolved interface method tables (and possibly - even method addresses), so calling interface method in a loop must be a bit faster, than calling it once. So in general the cost of interface method call in comparison to delegate call is even bigger.

This explains why we have such types as Hasher(of T) (Comparer, Arithmetic, etc.). They cache the delegates performing a set of operations on T type faster than with use of similar interface in their fields. See e.g. Hasher(of T).GetHash field. Certainly, such an approach is used only when performance is essential - i.e. when it's well known these operations will be invoked many times e.g. on any index seek.

Let's look of few more metrics:- Creating a delegate pointing to non-virtual method time is 7.5x- Creating a delegate pointing to virtual method time is 50x- Creating a delegate pointing to interface method time is 150x

So delegate construction isn't cheap at all.

Now we're turning to virtual generic methods:- Virtual generic method call time is 10x - independently on of it is called on interface or not. Dependency on count of generic arguments almost absents as well - adding one more argument makes the call longer by ~ 0.5%.- Creating a delegate pointing to generic virtual method time is 1000x. Not sure, why - it looks like because of some bug in .NET. Since delegates in .NET may point to fully parameterized methods only (they store method address), so the time of calling such a delegate is 1.5x, as before.

Why it's so costly to call virtual generic method? Because there can be generally any number of its implementations, dependently on the argument substitution, so .NET framework resolves its address using internal hash table, that must be bound to corresponding virtual (or interface) method table.

So we may also take that:- Internal hash table seek time is ~ 10x - I'll use this time in my future posts to show how to estimate the cost of generally any basic operation in .NET.

And few conclusions related to generic virtual methods:- The smallest heap allocation time is 3.5x; Int32 boxing time is 4x (with its heap allocation). So it's almost always cheaper to have non-generic virtual method with object type argument, rather than generic virtual method.- If generic virtual method seems anyway preferable, you might think about implementing its "call caching" with use of delegates. E.g. we use DelegateHelper.CreateDelegates and DelegateHelper.ExecuteDelegates methods to perform the operations on Tuples of the same structure faster.

So what is open delegate? It is a delegate bound only to a method, but not to an instance, so you can use the same delegate to call the method it is bound to on many instances. That's it.

And few more notes:- Open delegate invocation must take at least the same time as the underlying virtual or interface method invocation, since .NET can't use the same way of invocation as for the regular one (there is no single method call address to use). We didn't test this yet, but if we will, I publish the results here.- An open delegate bound to virtual generic method must be the slowest one - by the same reason. Do you know why virtual generic methods are the slowest ones? My next post here will explain this.- Btw, various ways of invocation in .NET are perfectly described here. The article covers i.e. regular, virtual, interface methods and delegates.