How big is the MSDN Library – and why should you care?

As of September 2010, the MSDN Library provides online access to over 16 million topics of Microsoft technical and product documentation. What exactly is a topic? In layman’s terms a topic is simply a page authored for users to view. From a database perspective, it is a unique content item with an XHTML representation. Other statistics:

From Office to Windows to Servers to Developer Tools, there are over 20 product families represented in the Library. Most product families provide technical documentation for multiple products, past and present. For example, the Visual Studio product family includes all of the flavors of Visual Studio, LightSwitch, the Windows SDK, the .NET Framework, Silverlight, ASP.NET, Visual SourceSafe, Visual FoxPro, and many more.

We typically provide multiple versions of technical documentation for each product family. Some of this documentation reaches back over a decade. There are two main reasons for this: 1) we provide access to older technical documentation as part of the Microsoft Support Lifecycle and 2) we recognize that customers can’t always upgrade to a newer product or technology. If content is still receiving significant traffic in the Library, developers are still finding it valuable.

Finally, the Library is available in 14 locales. A locale represents a user’s language and country/region preference. Not all topics are translated, but many are: over 13 million topics are available in non-en-us locales.

How is all this content organized?

At the database level, topics are organized by Locale × Product Family × Product Family Version. The diagram below provides a visual representation. The leftmost table depicts the distribution of our 16 million topics across 14 supported locales. If we zoom in on just the en-us content, you can see the major product families that contribute to our 3 million en-us topics. If we then zoom in on our largest product family, you can see the distribution of topics across different versions of Visual Studio. VS.100 is the internal product family version number for Visual Studio 2010 which delivers over 374,000 topics. (A full list of product family version numbers for Visual Studio is documented in my MSDN URL Cheatsheet.)

We use internal version numbers to uniquely identify product family versions in the database. The easiest way to discover internal version numbers for a given product family is to open Control Panel, navigate to ‘Programs and Features’ and look at the Version column. We use a compact form of the full product family version so ’Visual Studio 10.0.30319’ collapses to ‘vs.100’.

You can see the (Locale × Product Family × Product Family Version) organization at work in the topic URL. For example, the URL for the German (Germany) translation of the .NET Framework 3.5 version of the “System.XML Namespace” topic is:

The locale is specified in the URL as “de-de”. The product-family/product-family-version is specified by the string “(v=VS.90)” where ‘VS’ is the Visual Studio product family and ‘90’ corresponds to Visual Studio 2008, the version that shipped .NET Framework 3.5.

The MSDN/TechNet Publishing System (MTPS) that sits beneath the MSDN Library also supports the TechNet Library (for IT Pros), the Expression Library (for designers), and a number of other smaller sites.

The TechNet Library has over 3 million topics in 19 locales.

The Expression Library has 8,904 topics, all in en-us.

Overall, there are around 20 million topics hosted in MTPS for a total of 1.3TB of data in production. Uff da!

So why does any of this matter?

The size, structure and age of the Library have a direct bearing on the ease with which you can find the Library information you need.

Let’s look at a specific example. When there are multiple versions of the same topic, search engines may return similar and confusing results. And the specific version you need may not even appear in the list. Once you enter the Library, it’s difficult to tell which version you’re viewing based on content alone since many topics change very little, if at all, from version to version. As I mentioned in a recent post, we’ve introduced a new Version Selector to improve this aspect of our library experience. All six versions of the System Namespace topic are now easily accessible from a dropdown list.

There are many other size- and structure-related challenges including:

How well do search engine queries help you locate the one topic you need (out of 16 million)?

Once you’re in the Library, how effectively can you navigate from the topic you’re looking at to the topic you may need?

If you know exactly which topic you’re looking for, how quickly can you get to it?

If you want to find information tightly scoped to a specific task, how easy is to get that view (e.g. you only want to see content relevant to Windows Phone development)?

How quickly can you find a relevant code example for the API you’re working with?

If you’re in a non-English translation of the Library, how easily can you discover and consume topics that aren’t available in your locale?

These are some of the questions I plan to address in future posts and for which we are constantly working on better answers.

In the meantime, what challenges are you personally facing when using the MSDN Library? If you could make one change to the online Library user interface, what would it be?

One Change? An ignore list for entire product families. When I’m searching for “Select” I don’t want to see results for Select in Dynamics AX, but only SQL Server. Okay, bad example because SELECT is too generic and I can easily filter out SQL Server by adding T-SQL.

Also, I would like to filter out .net Framework versions before 3.5 completely as I either need VS.90 or VS.100 but not VS.71 anymore.

Great suggestions, Michael S., thank you. The scenarios you describe are top of mind for us. An example of a related scenario we’re thinking a lot about is: “Show me a view of the Library that contains all topics relevant to Windows Phone 7 development and only those topics.” I’ll provide an update on this blog whenever we have progress to share in this area.

The main issue for me is that the built-in search hardly comes up with any useful results. I always find myself going to Google and entering CreateFile msdn e.g. where the very first result is exactly what I’m looking for most of the time.

Always have the documentation proofread by a newbee that really programs with it, and use his feedback.

Many pages are of little use because they just repeat the obvious (e.g. StringLength: the length of the string), and hide or miss the relevant details (e.g. StringLength: number of single byte characters, terminating null excluded).

Yves and Michael G., my friends on the content team provided this response on how they approach documenting such a large surface area: “Thanks for taking time to comment on the documentation — we’re sorry the content didn’t meet your expectations. The documentation teams are working to add depth to the documentation, including code examples and detailed context in the topic remarks. Some API topics just include basic descriptions (which are used to generate IntelliSense). Sometimes this is by design because we’ve focused more of our documentation coverage on APIs that support more-used scenarios. After the product releases, we continue to add information to the topics via regular updates to online content. We use customer feedback and usage data to help us find areas that need more depth. We also really appreciate when the community helps enhance the topics with tips or code examples in the Community Content section.”

I always wondered why when, for example, I’m programming in C# and I press F1 on a method name to get the help page.. why doesn’t it automatically filter the help for C#?
For every definition or example I get a whole list of examples, each for a different language, when this could easily automatically filtered for C#.
Sure, I can manually go to the top of the page to let it only show C#, but It makes me do that every single time!
Which is annoying.

When I’m programming I want the information I need as quickly as possible without jumping through too many hoops.

What annoys me more than the new hindrance system, with no ‘search’ facility on the home page, is the complete lack of EXAMPLES.
I learn very quickly by inspecting real (error-free) examples, not obscure jargon.
The new 2010 hindrance-system is a giant leap backwards.
I HATE it, and resort to Google for my Visual Studio help system.
Seriously.
It is a sad reflection on the criminally incompetent Microsoft ‘help’ team, that Google manages to offer a superior service.
Did I mention that I thoroughly DESPISE the new 2010 “Help” system?

1. Bring back a decent integrated reader inside Visual Studio. The current situation is pathetic where there is NO viewer that is integrated in Visual Studio.
2. The new help page layout with the related links pane is terrible. I want the entire tree to remain available.
3. Where is the sync contents button? I think that button is the most important feature of the entire help system! It lets you navigate to the start of a subject after landing at some page via a search. This is like “Finding the front door” to any subject you are trying to learn.

This maybe more of a recomendation for the MDE (Microsoft Document Explorer) then the MSDN Library ityself but since you asked…

At this stage of teh game one shoudl eb able to Annotate/Comment pages in the MSDN on their local system as well as use Tags so as to quickly find and group specifc pages of the Library . The FAVORITES feature is cutting edge back in 1999 but is unexcusbale in 2009 and later. Please do something about these.

Second the comments about the new help system. The old one was bad, but I kind of knew how to get it to do what I want. I really want the synchronise with contents feature back and in-ide help was really handy. For now, google does ok, but it is a pain to switch to a web browser which I may be using for other things. Anyone know how to integrate it with context sensitive help in the IDE?

Oh, and please please please fix F1 help for C++/CLI (and intellisense for that matter)… Don’t get me started.

Your diagram brings to mind a question: why do you organize .NET content by Visual Studio product family, as opposed to placing it in a discrete .NET product family?

I mean, I get the fact that there is a tight coupling between VS versions and framework versions, and, given that VS is used for the vast majority of .NET development, you can make a case that, conceptually, .NET 4 == VS 2010, .NET 2 == VS 2005, etc.

However, technically, .NET is independent of VS. Why should someone using .NET outside of VS be required to know which version of VS their framework version of interest was released in? For example, why should a PowerShell user looking for System.Foo member documentation have to carry around a VS-to-.NET cross reference, much less, even care?

While VS product families are warranted for content that is truly specific to the Visual Studio IDE itself, such as, say, how to add a file to a solution, it seems like it would be “more correct” for framework-specific content, such as class member topics, to have URLs with, say, (v=dotnet.40), instead of (v=vs.100).

Now, with all that said, I do give you props that you have put framework-specific content under its own topic in the TOC…so, at least from the standpoint of someone drilling through the TOC to get to the .NET class library documentation, the VS-to-.NET linkage never enters into play.

Content is organized this way for historical and practical reasons. Before framework multi-targeting was introduced, .NET wasn’t conceptually tied to a version of Visual Studio, it was *literally* tied to a version of Visual Studio. The framework documentation was delivered as part of its corresponding Visual Studio release. So .NET Framework and Visual Studio documentation had the same Product Family Version. While .NET and Visual Studio are now loosely coupled, major releases of the Framework still occur in conjunction with their corresponding tooling support. However, it’s not necessary to understand the Product Family Version relationship unless you want to manually edit Library URLs. That is especially true since our introduction of the new and improved Version Selector.