We've also received some questions and concerns about our data, our methodology, and what we plan to do with this project going forward. Here are some responses to the most common issues that have been brought up in the last 24 hours or so.

Indeed. Before posting our analysis last night, I was not aware that Steam only started tracking the "number of hours played" statistics on SteamCommunity.com in March of 2009. This isn't a small oversight: games played solely before this date would show up erroneously as "unplayed" in our data, and games released before that time might show fewer total hours than they should. This helps explain why older games like Ricochet and Deathmatch Classic seem so unpopular among people who own them—because most players probably put in their hours before March of 2009.

To be clear, this issue should not affect the "ownership" data in the original piece—games bought at any time appear in the scraped Steam data correctly. For some of the other charts, it simply means that games from the pre-2009 period can't be compared completely accurately to those released after March of 2009. I've noted this in the original piece and updated a number of charts to reflect this.

The biggest change in the actual data comes in the aggregated distribution of hours played. Restricting these charts to games with a release date after March of 2009 (i.e. the ones we have a "complete" gameplay picture of) shows that only 26.1 percent of registered copies are sitting unplayed, as you can see above, rather than the 36.9 percent cited in the original article. This is probably closer to the true number across all Steam games, though it's hard to say how the actual play data looks for the 800 or so Steam games released before gameplay tracking was activated.

While this is an important limitation in the data, it doesn't completely change the general conclusions we've reached so far. The most played games are still the games that have seen the most play in the last five years, which is all we have information about. That time span covers the entire release history of over two-thirds of the games ever released on Steam. It's a pretty robust data set to study even if it isn't perfect.

Steam doesn't measure my gameplay hours correctly, even since 2009. Is this common?

Since launching our piece last night, we've received a lot of anecdotal reports from people saying the gameplay hour numbers listed for their games on SteamCommunity.com are inaccurate. Most say the site failed to register sometimes huge chunks of gameplay, while some say the site is actually showing more hours than they've actually spent with the game.

It's hard to tell exactly how widespread or impactful this problem is, unless there's someone out there with a treasure trove of independent verification for exactly how many hours they've put into their games separate from Steam's reporting.

Absent that, there's not much we can do to account for these kinds of reported discrepancies. I will say that, in my personal experience, the hourly reporting from Steam seems to match up with real world gameplay time in most, if not all, cases. In any case, we're at the mercy of Valve's reporting here. If you really think there's a problem with their numbers, we encourage you to take it up with them.

What about offline gameplay and time spent idling on menus?

Both of these are also potential issues with the accuracy of Steam's "hours played" reporting. Players that are using Steam in offline mode don't seem to be counted in the service's aggregate statistics, and those that leave a game window open and idle for hours or days can inflate numbers substantially (though stats like "median hours played" are more resistant to these problems). In the case of a game like Dota 2, time spent spectating other matches is also showed as "gameplay," which can be considered a skew in the data as well.

Again, we're at the mercy of Steam's reporting here. There's no real way to accurately separate out or re-add these numbers into the gameplay data. It would be nice to think these issues cancel each other out to some extent, but in reality they probably slightly inflate the data for a lot of games that users are liable to leave running in the background for one reason or another. All we can suggest is that you take this into account when considering the data.

How accurate are your numbers?

Since publishing, we've had a few more developers reach out either privately or publicly to offer their own Steam sales data for comparison to our estimates. With over a dozen "real world" spot checks in hand now, we have yet to see an instance where the error in our numbers is more than 10 percent off from the actual numbers developers have access to. Sometimes our error is much less than that, of course, and the error can go in either direction (though so far our numbers seem to over-estimate slightly more often than under-estimating).

While 10 percent isn't a small functional margin of error, it's also much better than a simple shot in the dark guess. If we're reporting sales of two million units for a game, you can be pretty confident the actual sales number is somewhere between 1.8 and 2.2 million.

How are free-to-play games handled? Doesn't everyone on Steam "own" them?

In a way, everyone "owns" every free-to-play game, but not in our stats. If you go to the store page for a free-to-play game like Dota 2, Steam will indeed tell you that you "already own the game." However, the SteamCommunity.com profile pages don't seem to show the game on your account until you actually download and start playing it. Since we're taking our data from those Steam Community pages, the only people who show up as "owners" of free-to-play games in our reports are the ones who have downloaded and played those games at least once.

What about total revenue? How many of these games are selling during deep discounts?

No doubt a lot of sales are coming through bundling or discounted sales. The problem is, we really have no idea when a particular Steam game was purchased before we started running our analysis in February. Thus, we don't know how much the going price was for that copy of the game when it was acquired.

Even then, it's hard to tell if a game was purchased at full Steam asking price or through some sort of pay-what-you-want bundle. And that doesn't even begin to take into account in-game purchases in free-to-play games. We could look at how much the game library would be worth at current Steam asking prices, but that would probably be more misleading than illuminating.

That said, we are looking at sales rates since we started running our analysis and plan to examine how things like discounts and bundling promotions affect that sales trajectory in the future.

That long tail graph is pretty hard to read. Can you improve it at all?

Sure. Here's a version with the vertical axis in logarithmic format to better discern the vast middle ground.

Is anyone else doing anything like this?

There are a few other notable projects that take a similar tack to aggregating public Steam data, though none that we think are quite the same as our project:

Steam Charts has been tracking data based on Steam's real time, daily "most played" numbers. They archive and aggregate the results for posterity.

GaugePowered lets you view hourly play data for your own account, and it even gives aggregated estimates of how man dollars per hour you'll spend on a particular game.

The Global Stats Project looked at achievement data for 80 popular Steam games to get some interesting data on completion rates.

Steam Database gathers a bevy of basic information about every game and app on the service through Valve's API and serves it up in an easy to use format.

SteamPrices.com is one of a number of sites that keeps track of when games are discounted. Steam Alerts will even send you a notice when the price drops below a certain level.

Finally, the TF2 Backpack Examiner is an incredibly thorough database of which hats and items are owned by players of that game.

I'm sure there are plenty I am missing; please leave links to good ones in the comments.

Can we get your raw data? How about your code?

I'm a bit hesitant to simply give away the results of what has been nearly a year of on and off work on this concept before using it for a least a few more "exclusive" reports and analyses myself. And rest assured, there are a lot more reports on this data from a number of angles coming down the pike.

As for my code, it's a bit of a disorganized, uncommented mess at the moment. I have laid out my methodology in quite a bit of detail, though, so anyone with the server capacity and coding skill could probably replicate my work pretty easily (and do whatever they want with the numbers).

That said, I know there are a lot of people clamoring for a deeper dive into this treasure trove, so I've included an expanded list of the top 100 played games on the next page, complete with "ownership" data as well. Hopefully that will give all you data-heads something to chew over while we work on our next report.

If there's some particular analysis you'd like to see in the future which we might not have thought of, leave a comment and we'll take it under consideration. If you have a specific request for data about a particular game that you simply must know about, drop me a note and I'll see what I can do.

Kyle Orland
Kyle is the Senior Gaming Editor at Ars Technica, specializing in video game hardware and software. He has journalism and computer science degrees from University of Maryland. He is based in the Washington, DC area. Emailkyle.orland@arstechnica.com//Twitter@KyleOrl

91 Reader Comments

I had thought that LoL had run away with the market prior to this article. Considering that LoL is the most played game in North America and Europe seeing the number 2 game in that category (DOTA2) with that many copies is even more impressive.

I'm kind of surprised you've had no pushback from Valve on this. If I saw a single IP doing 100K queries/day on something I ran, I'd likely block that. It does seem like a low enough rate (just over 1 query/second) to not really impact things though. Have they had anything to say about this?

I had thought that LoL had run away with the market prior to this article. Considering that LoL is the most played game in North America and Europe seeing the number 2 game in that category (DOTA2) with that many copies is even more impressive.

I had thought that LoL had run away with the market prior to this article. Considering that LoL is the most played game in North America and Europe seeing the number 2 game in that category (DOTA2) with that many copies is even more impressive.

Isn't but LOL isn't launched through Steam and the number of people that bother to link the two is insignificant given it's likely not representative of the actual population.

Also I'm having issues with the graph on mobile. I can't see the right most columns. I'm on IE11 on WP8.1.

Wow, Deus Ex: Human Revolution isn't even on there? Not even a million players through Steam? I'm finding it hard to believe that FTL: Faster Than Light did so much better, as much as I love that space sim.

I think it's pretty universally known amongst TW players that Rome: TW (which doesn't appear here) is more popular than Empire. The reason is - most Rome players don't use a steam version. I wonder how much off channeling affects things. For example, I have over 100 hours in the Binding of Isaac, but my Steam version has 0 hours of record, because I play the DRM free version. There are many more like this, in my case. So the % unplayed is probably even lower than the revision, because items purchased DRM free with a steam key addon (The Humble Bundle, most notably, and several Kickstarter games) are only logging hours for those connected to the DRM engine. This doesn't matter for the purposes of this reporting, since the focus is only on steam, but the takeaway is: Games released after March 2009, that were only released to steam, are likely dead-on accurate (ex: Skyrim) with the exception of offline play mode. If there are other release channels, it only reflects a slice. The biggest slice in most cases, but not all. Rome: TW I already mentioned, but M&B:Warband has a rather large following that steam doesn't even begin to scrape the surface of, and Arma2 (aka DayZ Mod). Those are just off the top of my head.

Valve has always been rather cagey about this sort of thing, and I'm sure they don't like the fact that data they may very well consider trade secrets seems to be leaking at an alarming rate. Flatland manual dropped a few years ago; Greenlighters leaking all sorts of random bits because they don't know they shouldn't be doing that; and lastly this: the website and interface structure leaking something close to actual sales data within a 10% margin of error. I do have to wonder if Valve will 'plug the security hole' or if they have solidified their position as market leader to the point where they've stopped caring, and let the chips fall as they may. Still, the ability to render corporate secrets by aggregating relatively benign bits of data from the published content on their website is a fascinating exercise.

I had thought that LoL had run away with the market prior to this article. Considering that LoL is the most played game in North America and Europe seeing the number 2 game in that category (DOTA2) with that many copies is even more impressive.

Sorry for a sort-of tangent, but this list right here is all the proof you need that PC gaming is alive and well. That there is 100 titles from the past 4-5 years that are either PC-exclusive or multiplaform's, and every single one has both sales AND players above 1MM!

And this is just from Steam... Imagine adding data in from GOG, Desura and the publisher-created steam-ripoff's!!!

Just look at the number of people that stream it on twitch (though this will probably include them having the program running in the background and not actually playing, whilst they do other things - like play another game, chat or do toilet/food runs)

But then again - it's a pity that you can't see the statistics for league of legends. The amount of hours for that game should be massive, even if its just Korea alone. Add in China and the rest of the world and it should be mind-blowing.

- surprised to see path of exile on the list. Well, the surprise is to see it on steam. It is a free game that I downloaded directly from the developers (though there seems to be less ppl playing nowadays)

Kyle, thanks for log-lin plot. So about 2600 Steam games sold 1000 or more copies, 2100 games 10000 or more, 1000 games 100000 or more, 100 games a 1 million or more, and only a few games 10 million or more. A small dev with sale goal of ten thousand copies would need to get into top 2000 or so.

I would not give out your scraping code for simple reason that not everyone will have discipline not to abuse it. Many people using it concurrently may register on Valve's radar and impatient types will surely attempt scans with your query rate limits bypassed.

Playing both Dota2 and LoL and every other Moba, I sorta get why people who never went over to LoL went to Dota2 AND why people who never played either have gone to Dota2, but WTF is with people from LoL going to Dota2, I just simply cannot understand it - it seems very much like continuing on dota1 with a new graphics engine, in the most unfortunate way possible. *shrug* different strokes or something.

Just a note. I was trying to get lotro (lord of the rings online) running on my mac over the last couple days and one of the options was steam. While trying to use steam and lotro I could never get the game to load. It would hang on a white screen. Through this steam had me playing the game for fifteen minutes.

I appreciate the data and find it very interesting even if it is not realistic for it to be precise. The big picture information it provides is useful.

Something's gone wrong with the Publisher and Developer columns towards the end of the list - I know that Trine 2 is a Frozenbyte game and Dishonored is Arkane Studios, but this chart has the credits offset by one place, so Dishonored is credited to Frozenbyte and Amnesia: The Dark Descent is credited to Arkane Studios. I'm not sure exactly where the rot sets in but I suspect "War Inc: Battlezone" is not a Valve game.

It'd be interesting to see some sort of charts for genre/subgenre popularity based on these numbers. Both owned and actually played. Raptr does something like this in its tracking, but only on a per-user basis:

On the other hand, figuring out subgenres might be more trouble than it's worth, given it looks like Steam just reports main and probably overlapping genres in the store's advanced search.

edit: I think if you did an "actually played" chart using data from a given day or set of days, you could partially avoid skewing by games that are loaded at the time only long enough to get free trading cards. In my case, the chart shown above was surprising to me, because I actually don't think I play many of those genres; like most people, a lot of my bundled keys are just wasted.