.NET Powered

TLDR: The Reactive Extensions provide a TestScheduler class that abstracts time and allows for a precise control of a virtual time base. Using the Rx Schedulers mecanism instead of real timers enables the creation of fast running unit tests that validate time related algorithms without the inconvients of the actual time.

Granted, the title is a bit provocative, but nonetheless it's still a bad idea to use timer classes like System.Threading.Timer.

Why is that ? Because classes like this one are based on the actual time, and that makes it a problem because it is non-deterministic. This means that each time you want to test a piece of code that depends on time, you'll be having a somehow different result, and particularly if you're using very long delays, like a few hours, you probably will not want to wait that long to make sure your code works as expected.

What you want actually, to avoid the side effect of "real" time passing by, is virtual time.

Abstracting Time with the Reactive Extensions

The reactive extensions are pretty good at abstracting time, with the IScheduler interface and TestScheduler class.

Most operators in the Rx framework have an optional scheduler they can use, like the Dispatcher or the ThreadPool schedulers. These schedulers are used to change the context of execution of the OnNext, OnCompleted and OnError events.

But for the case of time, the point is to freeze the time and make it "advance" to the point in time we need, and most importantly when we need it.

Let's say that we have a view model with a command that performs a lengthy action on the network, and that we need that action to timeout after a few seconds.

Consider this service contract :

public interface IRemoteService
{
///
/// Gets the data from the server,
/// returns an observable call to the server
/// that will provide either no values, one
/// value or an error.
///
IObservable<string> GetData(string url);
}

This implementation is exposing an observable based API, where the consumer of this contract must take into account the fact that getting data must be performed asynchronously, because it may not provide any value for a long time or even not return anything at all.

The problem with this test is that it depends on actual time, meaning that it takes at least 16 seconds to complete. This is not acceptable in a automated tests scenario, where you want your tests to run as fast as possible.

Adding the IScheduler in the loop

We can introduce the use of an injected IScheduler instance into the view model, like this :

.Timeout(TimeSpan.FromSeconds(15), _scheduler)

Meaning that the both Start and Timeout will get executed on the scheduler we provide, for which the time is controlled.

TLDR: This post talks about how the Reactive Extensions ObserveOn operator changes the execution context (the thread) of the IObservable OnNext/OnComplete/OnError methods, whereas the SubscribeOn operator changes the execution context of the implementation of the Subscribe method in the chain of observers. Both methods can be useful to improve the performance of an application's UI by putting work on background threads.

When developing asynchronous code, or consuming asynchronous APIs, you find yourself forced to use specific methods on a specific thread.

The most prominent examples being WPF/Silverlight and Winforms, where you cannot use UI bound types outside of the UI Thread. In the context of WPF, you'll find yourself forced to use the Dispatcher.Invoke method to manipulate the UI in the proper context.

However, you don't want to execute everything on the UI Thread, because the UI performance relies on it. Doing too much on the UI thread can lead to a very bad percieved performance, and angry users...

This scheduler's purpose is to intercept calls to the ISchedule methods (You'll fill the missing Schedule methods by yourself) and flag them with a custom thread ID. That way, we'll know which scheduler is executing our code.

Note that this code will not work properly on Windows Phone 7, since ThreadStaticAttribute is not supported. And it's still not supported on 7.1... Seems like not enough people are using ThreadStatic to make its way to the WP7 CLR...

Each time a scheduler was specified, the following operators OnNext delegates were executed on that scheduler.

In this case, we're using the Do operator which does not take a scheduler as a parameter. There some operators though, like Delay, that implicitly use a scheduler that changes the context.

Using this operator is particularly useful when the OnNext delegate is performing a context sensitive operation, like manipulating the UI, or when the source scheduler is the UI and the OnNext delegate is not related to the UI and can be executed on an other thread.

You'll find that operator handy with the WebClient or GeoCoordinateWatcher classes, which both execute their handlers on the UI thread. Watchout for Windows Phone 7.1 (mango) though, this may have changed a bit.

An Rx Expression's life cycle

Using an Rx expression is performed in a least 5 stages :

The construction of the expression,

The subscription to the expression,

The optional execution of the OnNext delegates passed as parameters (whether it be observers or explicit OnNext delegates),

The observer chain gets disposed either explicitly or implicitly,

The observers can optionally get collected by the GC.

The third part's execution context is covered by ObserveOn. But for the first two, this is different.

The expression is constructed like this :

var o = Observable.Timer(TimeSpan.FromSeconds(1));

Almost nothing's been executed here, just the creation of the observers for the entire expression, in a similar way IEnumerable expressions work. Until you call the IEnumerator.MoveNext, nothing is performed. In Rx expressions, until the Subscribe method is called, nothing is happening.

Then you can subscribe to the expression :

var d = o.Subscribe(_ => Console.WriteLine(_));

At this point, the whole chain of operators get their Subscribe method called, meaning they can start sending OnNext/OnError/OnComplete messages.

This expression will merge the two observables into one that will provide two values, one from Return and one from the timer.

And this is the output :

Do(1):
Subscribed !
Do(1): 42

The Observable.Return OnNext was executed during the call to Subscribe, and has that thread has no SchedulerId, meaning that a whole lot of code has been executed in the context of the caller of Subscribe. You can imagine that if that expression is complex, and that the caller is the UI Thread, that can become a performance issue.

Building Windows Phone 7 applications in an agile way encourages the use of Continuous Integration, and that can be done using Team System 2010.

There are a few pitfalls to avoid to get there, but this can be acheived quite easily with great results.

I won't cover the goodness of automated builds, this has already been covered a lot.

Adding Unit Tests

Along with the continous integration to create hopefully successful builds out of every check-in of source code, you'll also find the automated execution of unit tests. Team System has the ability to provide nice code coverage and unit tests success rates in Reporting Services reports, where statistics can be viewed, which gives good health indicators of the project.

Unfortunately for us, at this point there are no ways to automatically execute tests using the WP7 .NET runtime. But if you successfuly use an MVVM approach, your view models and non UI code can be compiled for multiple platforms, because they do not rely on the UI components that may be specific to the WP7 platform. That way, we are still able to test our code using the .NET 4.0 runtime with MSTest and Visual Studio 2010 test projects.

To avoid repeating code, multi-targeted files can be integrated into single target projects either by :

Creating project files in the same folder and use the "include file" command when showing all files in the solution explorer. Make sure to change the output assembly name to something like "MyAssembly.Phone.dll" to avoid conflicts.

Multi-targeted files are using the #if directive and the WINDOWS_PHONE define, or the lack thereof, to compile code for the current runtime target.

Testing with MSTest ensures that your code runs successfully on .NET 4.0 runtime, but this does not test on the WP7 runtime. So to be sure that your code is successfully running on it, and since Windows Phone 7 is build on Silverlight 3, tools like the SL3 Unit Test framework can be used to manually run tests in the emulator. This cannot be integrated into the build for now, unfortunately; you'll have to place that in your QA tests.

TeamBuild with the WP7 toolkit

To be able to buid WP7 applications on your build machine, you need to install the SDK on your build machine, and a Team Build agent and/or controller.

Creating a build definition is done the same way as for any other build definition, except for one detail. You need to set the MSBuild platform to x86 instead of Auto if your build machine is running on a 64 Bits Windows. This forces the MSBuild runtime to use the 32 bits runtime, and not 64 bits, where the SDK does not work properly.

If you don't, when building your WP7 application, you'll find that intriguing message :

Could not load file or assembly 'System.Windows, Version=2.0.5.0'

Which is particularly odd considering that you've already installed the SDK, and that dll is definitely available.

You may also find that if you install that DLL from the SDK in the GAC, you'll get that other nice message :

Common Language Runtime detected an invalid program.

Which is most commonly found when mixing 32 bits and 64 bits assemblies, for which the architecture has been explicitly specified instead of "Any CPU". So don't install that DLL in the GAC and set the MSBuild architecture to x86.

TL;DR: While trying to fix the "Black Screen" issue of the Windows Phone 7 flickr app 1.3, I found out that HttpWebRequest is internally making a synchronous call to the UI Thread, making a network call negatively impact the UI. The entire building of an asynchronous web query is performed on the UI thread, and you can't do anything about it.

Edit: This post was formerly named "About the UI Thread performance and HttpWebRequest", but was in fact about Yahoo's Flickr application and was enhanced accordingly.

When programming on Windows Phone 7, you'll hear often that to improve the perceived performance, you'll need to get off of the UI Thread (i.e. the dispatcher) to perform non UI related operations. By good perceived performance, I mean having the UI respond immediately, not stall when some background processing is done.

To acheive this, you'll need to use the common asynchrony techniques like queueing in the ThreadPool, create a new thread, or use the Begin/End pattern.

All of this is very true, and one very good example of bad UI Thread use is the processing of the body of a web request, particularly when using the WebClient where the raised events are in the context of the dispatcher. From a beginner's perspective, not having to care about changing contexts when developing a simple app that updates the UI, provides a particularly good and simple experience.

But that has the annoying effect of degrading the perceived performance of the application, because many parts of the application tend to run on the UI thread.

HttpWebRequest to the rescue ?

You'll find that the HttpWebRequest is a better choice in that regard. It uses the Begin/End pattern and the execution of the AsyncCallback is performed in the context of ThreadPool. This performs the execution of the code in that callback in a way that does not impact the perceived performance of the application.

This code is basically beginning a request on the thread pool, while blocking the UI thread in the App.xaml.cs file. This makes the construction (but not the actual call on the network) of the WebRequest synchronous, and makes the application wait for the request to begin before showing any page to the user.

While this code is definitely not a best practice, there was a code path in the Flickr 1.3 application that was doing something remotely similar, in a more convoluted way. And if you try it for yourself, you'll find that the application hangs in a deadlock during the startup of the application, meaning that our event is never set.

What's happening ?

If you dig a bit, you'll find that the stack trace for a thread in the thread pool is the following :

The BeginGetResponse method is trying to execute something on the UI thread. And in our example, since the UI thread is blocked by the manual reset event, the application hangs in a deadlock between a resource in the dispatcher and our manual reset event.

This is also the case for the EndGetResponse method.

But if you dig even deeper, you'll find in the version of the System.Windows.dll assembly in the WP7 emulator (the one in the SDK is a stub for all public types), that the BeginGetResponse method is doing all the work of actually building the web query on the UI thread !

That is particularly disturbing. I'm still wondering why that network-only code would need to be executed to UI Thread.

What's the impact then ?

The impact is fairly simple : The more web requests you make, the less your UI will be responsive, both for processing the beginning and the end of a web request. Each call to the methods BeginGetResponse and EndGetResponse implicitly goes to the UI thread.

In the case of Remote Control applications like mine that are trying to have remote mouse control, all are affected by the same lagging behavior of the mouse. That's partially because the UI thread is particularly busy processing Manipulation events, this explains a lot about the performance issues of the web requests performed at the same time, even by using HttpWebRequest instead of WebClient. This also explains why until the user stops touching the screen, the web requests will be strongly slowed down.

The Flickr 1.3 "Black Screen" issue

In the Flickr application for which I've been able to work on, a lot of people were reporting a "black screen" issue, where the application stopped working after a few days.

The application was actually trying to update a resource from the application startup in an asynchronous fashion using the HttpWebRequest. Because of a race condition with an other lock in the application and UI Thread that was waiting in the app's initialization, this resulted in an infinite "Black Screen" that could only be bypassed by reinstalling the application.

Interestingly enough, at this point in the application's initialization, in the App's class constructor, the application is not killed after 10 seconds if it is not showing a page to the user. However, if the application stalls in the constructor of the first page, the application is automatically killed by the OS after something like 10 seconds.

Regarding the use of the UI Thread inside the HttpWebRequest code, applications that are network intensive to get a lot of small web resources like images, this is has a negative impact on the performance. The UI thread is constantly interrupted to process network resources query and responses.

Can I do something about it ?

During the analysis of the emulator version of the System.Windows.dll assembly, I noticed that the BeginGetResponse is checking whether the current context is the UI Thread, and does not push the execution on the dispacther.

This means that if you can group the calls to BeginGetResponse calls in the UI thread, you'll spend less time switching between contexts. That's not the panacea, but at the very least you can gain on this side.

What about future versions of Windows Phone ?

On the good news side, Scott Gu annouced at the Mix 11 that the manipulation events will be moved out the the UI thread, making the UI "buttery smooth" to take his words. This will a lot of applications benefit from this change.

Anyway, let's wait for Mango, I'm guessing that will this will change is a very positive way, and allow us to have high performance apps on the Windows Phone platform.

To improve the quality of the software, and be notified when an unhandled exception occurs somewhere in my code, or in someone else's code executed on my behalf, I've added a small opt-in unhandled exception reporting feature. This basically sends me back information about the device, most of what's available in DeviceExtendedProperties for device aggregation of exceptions, plus some informations like the culture and, of course, the exception stacktrace and details.

The MarketplaceDetailTask exception

A few recurring exceptions have popped up a lot recently, and one coming often is the following :

Exception : System.InvalidOperationException: Navigation is not allowed when the task is not in the foreground. Error: -2147220989 at Microsoft.Phone.Shell.Interop.ShellPageManagerNativeMethods.CheckHResult(Int32 hr) at Microsoft.Phone.Shell.Interop.ShellPageManager.NavigateToExternalPage(String pageUri, Byte[] args) at Microsoft.Phone.Tasks.ChooserHelper.Navigate(Uri appUri, ParameterPropertyBag ppb) at Microsoft.Phone.Tasks.MarketplaceLauncher.Show(MarketplaceContent content, MarketplaceOperation operation, String context) at Microsoft.Phone.Tasks.MarketplaceDetailTask.Show()

This code is called when a user clicks on the purchase image located on some page of the software, and it looks like this :

The call is performed directly on the image's ManipulationCompleted event.

I've been trying to reproduce it for a few times and I finally got it: The user is tapping more than once on the image.

I can see a few reasons why:

The ManipulationCompleted event is fairly sensitive and is raised multiple times when the user did not tap twice

The user did actually tap twice because the action did not answer fast enough.

The user tapped twice because he is used to always tap twice, as some PC users do... (You know, double clicking on hyperlinks in browsers, things like that)

What do we do about it ?

There may be actually more, be this actually tells me a lot.

First, I should be having some kind of visual feedback on the click of that image, to tell the user that he has done something (and also to actually follow the design guidelines)

Second, that even if the feedback is there, that there may always be two subsequent clicks, and one that may be executed after the first has called the MarketplaceDetailTask.Show(), and the application has been deactivated. I cannot do much about it, except handle the exception silently or track the actual application state and not call the method.

I'll go with the exception handler for now as it is not a very critical peace of code, but I'd rather have some way of that tell me that the application cannot do that reliably and not have to handle an exception. The API is rather limited on that side, where the PhoneApplicationService is only raising events and does not expose the current "activation" state.

Any other examples of exceptions ?

I'll talk more about some other findings this opt-in exception reporting feature has brought me, with some that seem to be pretty tricky.

One of the advantages of virtualization is the P2V (Physical to Virtual) process: Converting an "old" build machine to a VM so it can be moved around with the load as-is, snapshotted, backed-up and so on.

This is particularly useful when say, you have a build machine that's been there for a very long time, has a lot of dependencies over old third party software, has been customized by so many people (that have long left the company) that if you wanted to rebuild that machine from scratch, it would literally take you weeks of tweaking to get it to work properly. And that machine is running out of very old hardware that may break at any time. And that the edition of Windows that does not migrate easily to new hardware because of HAL or Mass Storage issues, requiring a reinstallation. That a lot of "ands".

That's the kind of choice you do not need to make: You just take the machine and virtualize it using SCVMM 2008 R2.

But still, even virtualized, the machine been there that long, and things have started falling apart, like having the services.exe process taking 100% of the CPU. And I did not want to have to rebuild that machine just because of that strange behavior.

If you read scott hanselman's blog, you've been recalled that Windows Server 2008 and later has the resource monitor that gives a wealth of information about the services running under services.exe. But if you're out of luck, like running under Windows Server 2003, you can still use Process Explorer. This will give you the similar kind of insight in the Windows Services that are running.

For my particular issue, this was actually the Event Log service that was taking all the resources.

How about I get my CPU back ?

After some digging around, I noticed that :

All Event Viewer logging sections in the MMC snap-in were all displaying the same thing, which was actually a mix of all the System, Application and Security logs.

Displaying any of these logs was taking a huge amount of time to display.

The HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\System\Source was containing something like "System System System System System System System" a hundred of times, the same thing for the keys for the other event logs

A whole bunches (thousands) of interesting sources named like some .NET application domain created by the application being built on this machine

To fix it, a few steps in that order :

Disable the Event Log service and reboot. You won't be able to stop it, but at the next reboot it will not start.

In the C:\WINDOWS\system32\config folder, move the files *.evt to a temporary folder, so they don't get picked up by the service when it'll restart

In the registry, for each HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\[System|Security|Application]\Source, replace the content with the one found on the same key on another very similar Windows Server 2003 machine. You can install a brand new machine and pick up the content.

If you have, like I did, a whole bunch of sources that look familiar and should not be there under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\[System|Security|Application], remove their keys if you removed them from the "Source" value.

Set the Event Log service to "Automatic" and reboot.

The interesting part about the virtualization of that build machine is like in many other occasions, the snapshots, where you can make destructive changes and go back if they were actually too destructive.

What's with the event log "interesting sources" ?

The application being built and tested is running tests on the build machine, and it makes use of application domains and log4net. Log4net has an EventLogAppender that allows the push of specific content to the Windows Event Log. Log4net defaults the name of the source to the application domain name, if there is no entry assembly.

Those tests were actually using a default configuration, and were logging Critical messages to the event log, but the domains were created using a new GUID to avoid supposed name collisions. This is something that did actually more harm than good in the long run, because each new appdomain that was logging to the event log was creating a new event source.

And the build system has been there for a long time. Hence the thousands of "oddly named" event sources.

During the development of a project, you may need at some point to automate the testing of your whole application. Virtual Machines make that very easy, especially with Visual Studio Lab Management in the loop.

You can have a step in your nightly build to have your application installed silently, and then have some acceptance tests running on it.

To make all this repeatable and mostly predictable, you can use snapshots of a clean environment and restore that environment before starting your test scenarios. That way, any previous test run does not affect the new run, as long as the tests can run confined in a single machine. For multiple machines tests, like a web front end and DB backend, you may need to synchronize all of the snapshots, but this is definitely doeable.

The case of the VM part of a Domain

Your tests may also need to have a domain account to run. To do this, you join your VM to the domain, then make a snapshot.

Your automated tests run fine for about a month, then start to fail for reasons like :

The trust relationship between this workstation and the primary domain failed.

Or something a bit different like :

This computer could not authenticate with \\MYDC, a Windows domain controller for domain MYDOMAIN, and therefore this computer might deny logon requests.

Which means that the machine account registered with the domain is out of sync.

For a bit of background, the machine account is used by windows subsystem, but also the "NT AUTHORITY\System" account to communicate with the domain to apply GPOs, for instance. This account is also used for many other things, like in SCVMM, to ensure that the server has access to a host without requiring a "real" account that has administrative access to that host.

Password Renewal

That machine trust account has a password like a normal user account, even though it is not accessible to the users. That password is changed periodically, every 30 days or so, and that change is initiated by a request of the machine tied to this account. It is designed in such a way that it allows machines to be offline for long periods and not get out of sync because the DC would have changed the password unilaterally.

At this point, you're probably seeing why that account information can be out of sync when using VMs and reverting to snapshots.

When joining the domain, the account is in sync, and it is possible to revert to that snapshot until that 30 days window is reached. Then, the live snapshot of the machine asks for password renewal of the trust account, and then you revert to your original snapshot. This is where the problem occurs, the machine cannot authenticate on the domain anymore because it is using the previous password.

Keeping the password in sync

There are a few techniques to keep that password in sync, the first being a leave and join domain sequence. This is fairly easy to do, but there are some caveats.

When you leave and re-join, you get a new SID for that machine, which means that if you gave access rights to the previous machine account on an other machine, you're forced to update your ACLs on that other machine.

But there is a lesser known technique based on the netdom command, to reset that trust account password :

Even if it is possible to be notified when an exception has been raised and not handled properly, using the AppDomain.UnhandledException event, most the time this leads to the application being terminated. This termination behavior has been introduced in .NET 2.0, to prevent unhandled exception to be silently ignored.

While this is a very appropriate default behavior, in an enterprise environment, I’m usually enforcing custom static analysis or NDepend rules to prevent the use of these classes directly. This forces new code to use wrappers that provide a very wide exception handler and logs and reports the exception, but does not terminate the process. That also implies that there is still a very valid bug to be investigated, because exceptions should not be handled that late.

The case of the Reactive Framework

Reactive operators like Timer, BufferWithTime, ObserveOn or SubscribeOn allow for specific Schedulers like ThreadPool, TaskPool or NewThread to be used, and if a subscriber does not handle exceptions properly, it ends up with a terminated application.

The Observable.Timer operator uses the System.Threading.Timer class and that makes it vulnerable to the same termination problems. Every subscriber needs to handle exceptions thrown in the OnNext delegate, or the application will terminate.

Also, do not think that the OnError delegate passed to Observable.Subscribe will handle exceptions thrown during the execution of OnNext code. OnError only notifies of errors generated by previous Reactive operators, not the current.

The IScheduler.AsSafe() extension method

Unfortunately, it is not possible for now to override the default schedulers used internally by the Reactive operators. The only way to handle all unhandled exceptions properly is to use the ObserveOn operator and intercept calls to IScheduler.Schedule methods. Calls can then be decorated with appropriate exception handlers to log and report the exception without terminating the process.

So, to be able to generalize this logging and reporting behavior, I created the AsSafe() extension that I place at the very top of a Reactive expression :

Depending on which corporation you work for, you may have to connect to your exchange server using a self-signed server certificate to be used with HTTPS protocol (using either TLS or SSL).

If you're unlucky enough to be in this situation, but are using a modern browser, you can install the certificate in either your windows certificate store, or using your browser's store. You can do that using this lengthy technique for IE8.

But if you're on a Windows Phone 7, if you try to connect to your exchange account, you'll get a nice message telling you that there is a problem with the server certificate. Well, neither Internet Explorer or the bundled Exchange tools give you the ability to install that custom certificate. And there is no access to the file system either.

Luckily, you can email your certificate on your GMail account for instance, and the WP7 mail client has the ability to install certificates !

So, use the lengty technique to export your certificate in the ".cer" format by connecting to your exchange server using its HTTPS address in Internet Explorer on your PC, email it to yourself, and tap on it on your Windows Phone 7 to install it.

Now you can enjoy having your work emails and calendars on your weekends, in case you don't have anything else better to do :)

EDIT: If it still does not work, you may need to also import the full chain of certificates, up to the root. To do so, in the certificate from your exchange server, open the "Certification Path" tab, then for each item in the tree, click "View Certificate", then "Details", then "Copy to File...". Email each certificate to your Windows Phone and you're done !

Cloud

About me

My name is Jerome Laban, I am a Software Architect, C# MVP and .NET enthustiast from Montréal, QC. You will find my blog on this site, where I'm adding my thoughts on current events, or the things I'm working on, such as the Remote Control for Windows Phone.