“GitHub for Windows uses the Reactive Extensions for almost everything it does, including network requests, UI events, managing child processes (git.exe). Using Rx and ReactiveUI, we've written a fast, nearly 100% asynchronous, responsive application, while still having 100% deterministic, reliable unit tests. The desktop developers at GitHub loved Rx so much, that the Mac team created their own version of Rx and ReactiveUI, called ReactiveCocoa, and are now using it on the Mac to obtain similar benefits.” – Paul Betts, GitHub

There’s a LOT included, so be stoked. It’s not just Rx.NET, but also the C++ library as well as RxJS for JavaScript! Now everyone gets to play with IObservable<T> and IObserver<T>.

Reactive Extensions:

Rx.NET: The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators.

RxJS: The Reactive Extensions for JavaScript (RxJS) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators in JavaScript which can target both the browser and Node.js.

Rx++: The Reactive Extensions for Native (RxC) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators in both C and C++.

Interactive Extensions

Ix: The Interactive Extensions (Ix) is a .NET library which extends LINQ to Objects to provide many of the operators available in Rx but targeted for IEnumerable<T>.

IxJS: An implementation of LINQ to Objects and the Interactive Extensions (Ix) in JavaScript.

Why do I think Rx matters? It’s a way to do asynchronous operations on event streams. Rather than hooking up click events and managing state with event handlers all over, you effectively “query” an infinite stream of events with LINQ. You can declaratively sequence events…no flags, no state machine.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

This is a pretty straightforward method that calls a .NET BCL (Base Class Library) method and filters the result with LINQ. Of course, when any function calls another one that you can't see inside (which is basically always) you've lost control. We have no idea what's going on in GetProcessesByName.

Let's look at the source to the .NET Framework method in Reflector. Our method calls Process.GetProcessesByName(string).

if we look inside GetProcesses(string), it's another loop. This is getting close to where .NET calls Win32 and as these classes are internal there's not much I can do to fix this function other than totally rewrite the internal implementation. However, I think I've illustrated that we've got at least two loops here, and more likely three or four.

This code is really typical of .NET circa 2002-2003 (not to mention Java, C++ and Pascal). Functions return arrays of stuff and other functions higher up filter and sort.

When using this .NET API and for looping over the results several times, I'm going for(), for(), for() in a chain, like O(4n) here.

Note: To be clear, it can be argued that O(4n) is just O(n), cause it is. Adding a number like I am isn't part of the O notation. I'm just saying we want to avoid O(cn) situations where c is a large enough number to affect perf.

Sometimes you'll see nested for()s like this, so O(n^3) here where things get messy fast.

LINQ is more significant than people really realize, I think. When it first came out some folks said "is that all?" I think that's unfortunate. LINQ and the concept of "deferred execution" is just so powerful but I think a number of .NET programmers just haven't taken the time to get their heads around the concept.

Here's a simple example juxtaposing spinning through a list vs. using yield. The array version is doing all the work up front, while the yield version can calculate. Imagine a GetFibonacci() method. A yield version could calculate values "just in time" and yield them, while an array version would have to pre-calculate and pre-allocate.

First, the underlying implementation where GetProcessesInfos eventually gets called is a bummer but it's that way because of how P/Invoke works and how the underlying Win32 API returns the data we need. It would certainly be nice if the underlying API was more granular. But that's less interesting to me than the larger meta-issue of a having (or in this case, not having) a LINQ-friendly API.

The second and more interesting issue (in my option) is the idea that the 2002-era .NET Base Class Library isn't really setup for LINQ-friendliness. None of the APIs return LINQ-friendly stuff or IEnumerable<anything> so that when you change together filters and filters of filters of arrays you end up with O(cn) issues as opposed to nice deferred LINQ chains.

When you find yourself returning arrays of arrays of arrays of other stuff while looping and filtering and sorting, you'll want to be aware of what's going on and consider that you might be looping inefficiently and it might be time for LINQ and deferred execution.

Here's a simple conversion attempt to change the first implementation from this classic "Array/List" style:

To this more LINQy way. Note that returning from a LINQ query defers execution as LINQ is chainable. We want to assemble a chain of sorting and filtering operations and execute them ONCE rather than for()ing over many lists many times.

return from p in processes where String.Equals(p.ProcessName, processName, StringComparison.OrdinalIgnoreCase) select p; //the value of the LINQ expression being returned is an IEnumerable<Process> object that uses "yield return" under the hood

This is a really interesting topic to me and I'm interested in your opinion as well, Dear Reader. As parts of the .NET Framework are being extended to include support for asynchronous operations, I'm wondering if there are other places in the BCL that should be updated to be more LINQ friendly. Or, perhaps it's not an issue at all.

Your thoughts?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

Do you like a big pile of source code? Well, there is an imperial buttload of source in the Visual Studio 2010 and .NET Framework 4 Training Kit. It's actually a 178 meg download, which is insane. Perhaps start your download now and get it in the morning when you get up. It's extremely well put together and I say Kudos to the folks that did it. They are better people than I.

I like to explore it while watching TV myself and found myself looking through tonight. I checked my blog and while I thought I'd shared this with you before, Dear Reader, I hadn't. My bad, because it's pure gold. With C# and VB, natch.

Here's an outline of what's inside. I've heard of folks setting up lunch-time study groups and going through each section.

There's Labs, Presentations, Demos, Labs and links to online Videos. It'll walk you step by step through loads of content and is a great starter if you're getting into what's new in .NET 4.

Here's a few of my favorite bits, and they aren't the parts you hear the marketing folks gabbing about.

Code Contracts

Remember the old coding adage to "Assert Your Expectations?" Well, sometimes Debug.Assert is either inappropriate or cumbersome and what you really need is a method contract. Methods have names and parameters, and those are contracts. Now they can have conditions like "don't even bother calling this method unless userId is greater than or equal to 0 and make sure the result isn't null!

Code Contracts continues to be revised, with a new version out just last month for both 2008 and 2010. The core types that you need are included in mscorlib with .NET 4.0, but you do need to download the tools to see them inside Visual Studio. If you have VS Pro, you'll get runtime checking and VS Ultimate gets that plus static checking. If I have static checking and the tools I'll see a nice new tab in Project Properties:

Here's a basic idea of what it looks like. If you have static analysis, you'll get squiggles on the lines I've highlighted as they are points where the Contract isn't being fulfilled. Otherwise you'll get a runtime ContractException. Code Contracts are a great tool when used in conjunction with Test Driven Development.

COM Interop sucks WAY less in .NET 4

I did a lot of COM Interop back in the day and it sucked. It wasn't fun and you always felt when you were leaving managed code and entering COM. You'd have to use Primary Interop Assemblies or PIAs and they were, well, PIAs. I talked about this a little bit last year in Beta 1, but it changed and got simpler in .NET 4 release.

Here's a nice little sample I use from the kit that gets the Processes on your system and then makes a list with LINQ of the big ones, makes a chart in Excel, then pastes the chart into Word.

If you've used Office Automation from managed code before, notice that you can say Range[] now, and not get_range(). You can call COM methods like ChartWizard with named parameters, and without including Type.Missing fifteen times. As an aside, notice also the default parameter value on the method.

You can also embed your PIAs in your assemblies rather than carrying them around and the runtime will use Type Equivalence to figure out that your embedded types are the same types it needs and it'll just work. One less thing to deploy.

Parallel Extensions

The #1 reason, IMHO, to look at .NET 4 is the parallelism. I say this not as a Microsoft Shill, but rather as a dude who owns a 6-core (12 with hyper-threading) processor. My most favorite app in the Training Kit is ContosoAutomotive. It's a little WPF app that loads a few hundred thousand cars into a grid. There's an interface, ICarQuery, that a bunch of plugins implement, and the app foreach's over the CarQueries.

This snippet here uses the new System.Threading.Task stuff and makes a background task. That's all one line there, from StartNew() all the way to the bottom. It says, "do this chunk in the background." and it's a wonderfully natural and fluent interface. It also keeps your UI thread painting so your app doesn't freeze up with that "curtain of not responding" that one sees all the time.

This code says "go do this in a background thread, and while you're there, parallelize this as you like." This loop is "embarrassingly parallel." It's a big for loop over 2 million cars in memory. No reason it can't be broken apart and made faster.

Here's the deal, though. It was SO fast, that Task Manager didn't update fast enough to show the work. The work was too easy. You can see it used more CPU and that there was a spike of load across 10 of the 12, but the work wasn't enough to peg the processors.

Did it even make a difference? Seems it was 5x faster and went from 2.389s to 0.4699 seconds. That's embarrassingly parallel. The team likes to call that "delightfully parallel" but I prefer "you're-an-idiot-for-not-doing-this-in-parallel parallel," but that was rejected.

Let's try something harder. How about a large analysis of Baby Names. How many Roberts born in the state of Washington over a 40 year period from a 500MB database?

Here's the normal single-threaded foreach version in Task Manager:

Here's the parallel version using 96% CPU.

And here's the timing. Looks like the difference between 20 seconds and under 4 seconds.

You can try this yourself. Notice the processor slider bar there at the bottom.

The .NET 4 Training Kit has Extensibility demos, and Office Demos and SharePoint Demos and Data Access Demos and on and on. It's great fun and it's a classroom in a box. I encourage you to go download it and use it as a teaching tool at your company or school. You could do brown bags, study groups, presentations (there's lots of PPTs), labs and more.

Hope you enjoy it as much as I do.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

Remember good developers don't just write source code, they also READ it. You don't just become a great poet by writing lots of poems. Read and absorb as well. Do check out the Source Code category of my blog here, there is (as of today) 15 pages of posts on Source Code you can check out.

Recently my friend Jonathan Carter (OData Dude, my name for him) was working with a partner on some really weird stuff that was happening with a LINQ to SQL query. Remember that every abstraction sometimes leaks and that the whole port of an abstraction is "raise the level" so you don't have to worry about something.

Plumbing is great because it abstracts away water delivery. For all I know, there's a dude with a bucket who runs to my house when I turn on the tap. Doesn't matter to me, as long as I get water. However, sometimes something goes wrong with that dude, and I don't understand what's up with my water. This happened to JC and this partner.

CustomerQuery_Test_01 calls the GetByPrimaryKey method. That method takes a Func as a parameter. He's actually passing in a lamdba expression into the GetByPrimaryKey function. That makes the method reusable and is the beginning of some nice helper functions for his DAL (Data Access Layer). He's split up the query into two places. Seems reasonable, right?

Well, if you run this in Visual Studio - and in this example, I'll use the Intellitrace feature to see the actual SQL that was executed, although you can also use SQL Profiler - we see:

OK, there it is, but why does the second LINQ query cause a WHERE clause to be emitted but the first doesn't? They look like basically the same code path, just one is broken up.

The first query is clearly returning ALL rows to the caller, which then has to apply the LINQ operators to do the WHERE in memory, on the caller. The second query is using the SQL Server (as it should) to do the filter, then returns WAY less data.

Here's the deal. Remember that LINQ cares about two things, IEnumerable stuff and IQueryable. The first lets you foreach over a collection, and the other includes all sorts of fun stuff that lets you query that stuff. Folks build on top of those with LINQ to SQL, LINQ to XML, LINQ to YoMomma, etc.

When you are working with something that is IQueryable; that is, the source is IQueryable, you need to make sure you are actually usually the operators for an IQueruable, otherwise you might fall back onto an undesirable result, as in this database case with IEnumerable. You don't want to return more data from the database to a caller than is absolutely necessary.

From JC, with emphasis mine:

The IQueryable version of SingleOrDefault, that takes a lambda, actually takes an Expression>, whereas the IEnumerable version, takes a Func. Hence, in the below code, the call to SingleOrDefault, is treating the query as if it was LINQ To Objects, which executes the query via L2S, then performs the SingleOrDefault on the in memory collection. If they changed the signature of GetByPrimaryKey to take an Expression>, it would work as expected.

So, the partner doesn't want a Func (a Func that takes a customer and returns a bool, they want a compliable Expression with a Func that takes a Customer and returns a bool. I'll have to add "using System.Linq.Expressions;" as well.

Just changed that one line, so that GetByPrimaryKey takes a Expression> and I get the SQL I expected:

Someone famous once said, "My code has no bugs, it runs exactly as I wrote it."

Layers of Abstraction are tricky, and you should always assert your assumptions and always look at the SQL that gets generated/created/executed by your DAL before you put something into production. Trust no one, except the profiler.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

You can learn a lot by reading other people's source code. That's the idea behind this series, "The Weekly Source Code." You can certainly become a better programmer by writing code but I think good writers become better by reading as much as they can.

This code is synchronous, meaning basically that it'll happen on the same thread and we'll wait around until it's finished. Now, here's an asynchronous version of the same code. It's a nice combination of the the new (LINQ, in this case, LINQ to SQL) and the older (DataReaders, etc). The LINQ (to SQL) query is in query, then they call GetCommand to get the underlying SqlCommand for that query. Then, they call BeginExecuteReader on the SqlCommand which starts asynchronous execution of that command.

a) It's a standard approach to converting a LINQ query to a command to be executed with more control over how it's executed. That said, I don't see it done all that much, so in that capacity it's clever.

b) It's necessary to run the query asynchronously; otherwise, the call to MoveNext on the enumerator will block. And if ADO.NET's MARS support is used (multiple asynchronous result sets), you could have multiple outstanding operations in play.

c) TPL can't improve upon the interactions with SQL Server, i.e. BeginExecuteReader will still need to be called. However, TPL can be used to wrap the call such that you get a Task<Widget> back, which might be a nicer API to consume. Once you have it as a Task, you can do useful things like wait for it, schedule work for when its done, wait for multiple operations or schedule work when multiple operations are done, etc.

They fire off their task, which then does its database work asynchronously, and then it all comes together.

I'll leave (for now) the wrapping of the APIs to return a Task<TResult> as an exercise for the reader, but it'd be nice to see if this pattern can benefit from the Task Parallel Library or not.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.