Well, gosh. What a busy few months. What started as a bit of spit and polish to a website has turned into a huge rewrite and a great opportunity to learn and play with MVC(2), the Spark view engine, Lucene.net and NHibernate. But more on those some other time. Today, I’m interested in a performance problem that wasn’t what I was expecting it to be.

One core scenario of this website is that users can watch video files. We don’t want to allow them to view just anything, so we grant them access to a subset of the videos. Which means we have the following requirement: what programmes does the user have access to watch?

This is calculated like so: get the video access tokens for the user. Map each token to an episode. Strip out episodes which don’t have videos available. Retrieve the programmes for all viewable episodes.

For better or worse, this touches 5 tables across 3 separate databases.

We wrote a cache.

And you know what? It works great. The site is fast, the data is readily available, everybody is happy.

Except for one user.

We have a demo user that is granted access to a large number of episodes in a large number of programmes, ready to be used at conferences and other events.

This user killed the site.

Once the data was in the cache, everything was great. But it took 5 minutes to get into the cache. And killed the site while it was doing so.

I was expecting some trouble with this user – they didn’t work well on the existing site, and it’s obvious that we’re doing too much database activity. So I fired up the wonderful NHProf to see exactly what was going wrong and immediately proved myself wrong. All the data was being retrieved with a minimum amount of fuss and sql statements and round trips to the database. It took 5 seconds.

Now, we are definitely doing too much with the database here. We really could do with denormalising some of the data, using less databases and pulling back ids instead of unused entities. But it still only took 5 seconds. As an edge case populating a cache for a user only used by staff at conferences, 5 seconds is pretty much ok.

So what was happening for the remaining 4 minutes and 55 seconds?

The title of the post might help. The heart of this little process was a LINQ to objects query that joined up the lists from the various databases to create the final list of programmes, episodes and tokens that we could cache and present to the user. It went something like this:

Now, there are two glaring issues going on here: We’re always ordering the tokens list – we could do that once before anything else. Similarly, we could do something about the p.Episodes.Contains(e) which isn’t going to be very quick.

But that’s small fry. Where’s the real problem?

It took me a little while to realise that what looks deceptively like nice, declarative, friendly LINQ, is in fact (of course) nested loops. Three deep. This simple bit of data collation is O(n * m * p).

I didn’t mention data size before. Turns out it’s not really that large. The demo user has about 1600 tokens, for 1500 available episodes for about 300 programmes.

Not much, actually.

But when you count the looping, this gives us 300 * 1600 * 1500 = 720,000,000 iterations.

Seven hundred and twenty MILLION iterations. That’s 720 MILLION string comparisons, and 720 MILLION list traversals for the Contains calls. It’s also 299 unnecessary ordering of the tokens list.

That’s where my 5 minutes was going.

Rewriting this as a bunch of lookups has reduced this bottleneck to essentially the time it takes to read from the database – about 5 seconds.

So the moral of the story is simple. LINQ is nice. It’s lovely. But when you’re doing multiple from statements, don’t forget it’s now doing nested loops.

Fair-and-square point you produce in sticklebackplastic com . Interesting to study the post and supplemental thoughts verbalised by writers on this and different websites. A post a day keeps the remarks arriving.

Let me be the first to say: Congrats, Neil! I've been following this site from day one, and have loved every single day of awesomeness that it has delivered. You deserve all the success coming your way. Keep it up!AWESOME!

I used to be very pleased to find this net-site.I wished to thanks to your time for this excellent learn!! I undoubtedly enjoying every little little bit of it and I've you bookmarked to take a look at new stuff you weblog post.

an investment. Although you may have to pay for SEO services for a while, you can make all of that money back and then some once you start seeing a rise in traffic. You will need traffic to do your brick-and-mortar business, whether youÕre looking to promote it, sell things online or only make money off of advertisements.Luckily, great SEO services can provide you with additional traffic than it is possible to probably even imagine.

I really like your site design, where did you obtain it from? eitherway have your blog readers heard about this crazy tech like bitcoin? I searched for TrumpCoin and this url came up TrumCoin.infoI think it would be a great point of discussion for your readersthanks from England! Here's a few great resources <a href="htttp://www.trumpcoin.info">trumpcoin</a&gt; & <a href="htttp://www.trumpcoin.com">trumpcoin llc</a>

I adore your blog design, wherever did you get a hold of it through? anyways have your readers heard about this weird thing similar bitcoin? I googled TrumpCoin and this site came up TrumCoin.infoI think it would be a great point of discussion for your readersCheers from England! Here's a few great resources <a href="htttp://www.trumpcoin.info">trumpcoin</a&gt; & <a href="htttp://www.trumpcoin.com">trumpcoin llc</a>

I adore your blogging design, wherever did you get a hold of it from? back on track have you heard about this interesting coin similar bitcoin? I googled TrumpCoin and this site came up TrumCoin.infoI believe it would be a great point of discussion for your readersCheers from England! Here's a few great resources <a href="htttp://www.trumpcoin.info">trumpcoin</a&gt; & <a href="htttp://www.trumpcoin.com">trumpcoin llc</a>