Category Archives: chess databases

When I was in high school and learning about the basics of computer science, I was taught an acronym to underscore the importance of having clean data to work with: GIGO, or ‘Garbage in, Garbage out.’ You can have all the fantastic algorithms and formula you like, but if your data is in poor shape, you’ll never come close to the results you desire.

The same is true of chess data. You can buy the fanciest GUI (graphical user interface) the market has to offer, and you can collect all of the strongest engines around, but if you’re working with poor quality data, your research will suffer for it. Fortunately for us, there are a number of high quality databases out there, each fulfilling a specific set of needs for different types of users.

In this review I’ll look at four (or five, depending on how you look at things) of the most important databases out there, and as we will see, there is something useful for just about everyone. All of them are available in ChessBase’s native data format, and two (TWIC and Paramount) are also available in .pgn format, making them readable by those using GUIs other than ChessBase or Fritz.

Big / MegaBase 2016

There’s no way around it. You need a large reference database if you’re going to do any serious chess research or study. Online databases like chess-db.com, chessgames.com and ChessBase’s own online database are no substitute. They require internet connections and you can’t easily manipulate online data. The largest and most well-known of these reference databases are Big Database (BigBase) and Mega Database (MegaBase) 2016 from ChessBase.

BigBase and MegaBase each contain over 6.46 million games running from the earliest recorded games through October of this year. The database is searchable by player, tournament, and annotator (among other things), and you can access various indices or ‘keys’ for openings, endgames, strategic and tactical themes. Note the last three keys are not accessible in the default ChessBase 12/13 settings. You can access them by going to Options – Misc – Use ‘Theme Keys.’

You might suspect, given the name of the product, that each year brings a new version of the database to the market. And you would be correct to do so. The 2015 release of MegaBase contained 6,161,344 games, and the data wranglers at ChessBase have bumped that total to 6,466,288 in the 2016 edition. About half of these games have appeared in issues of ChessBase Magazine and ChessBase Magazine Extra, but 166,692 of them are entirely new to the ecosystem.

While the majority are from 2014 and 2015 events, there are some historical additions as well. Among them are 18 games played by Botvinnik, 14 by Alekhine, and 9 by Spassky.

There are a number of similarities between BigBase and MegaBase. The number of games in each product is identical, as are the indices and keys. So what distinguishes them? MegaBase comes with two additional features that BigBase lacks: the inclusion of annotated games and a year’s worth of weekly updates. [MegaBase also comes with an updated version of PlayerBase, which collects rating data and pictures for thousands of players, but since I don’t use the feature, I will refrain from commenting on it.]

The 2016 version of MegaBase includes over seventy five thousand games with named annotators. This represents an increase of 3425 annotated games over the 2015 edition. While regulars like Atalik, Ftacnik and Marin provide notes to Super-GM games, there are also analyzed games by lesser-known combatants. Hundreds of annotated games from John Donaldson and Elliot Winslow are new to this edition, all of which come from amateur contests at the Mechanics Institute in the past few years.

MegaBase also comes with an update service, where weekly downloads of 5000 games are provided for a year. As a point of comparison, we are currently at update number 49 for MegaBase 2015, and 245713 games have been added to the database with all updates included.

This means, by the way, that not every game submitted to ChessBase is included in these weekly updates. Apples to oranges comparisons aren’t possible, but about sixty thousand or so games are in the 2016 database and not in the fully updated 2015 version.

BigBase and MegaBase are the preeminent reference databases available today. They are not perfect. Tim Harding has remarked on problems (some of which appear to have been fixed) with Blackburne’s games, for example, and John Watson never played in the 1966 British U14 Championship. Doubtless there remains plenty of tournaments, like the 1995 MCC/ACF Summer International (whose bulletin sits on my desk), just waiting to be entered into the computer. But no other database comes close to these two in terms of comprehensiveness and cleanliness of data. Anyone doing serious chess work, from openings to history to biography, needs one of these two products.

BigBase 2016 is available for download or post for €59.90 ($55.42 without VAT for those outside the EU). MegaBase 2016, which includes the annotated games, the weekly updates and the PlayerBase, costs €159.90 ($147.93 without VAT), and updates from previous versions of MegaBase costs €59.90 ($55.42 without VAT). The Update option comes with the annotated games, weekly updates, etc.

Correspondence Database 2015

Opening theorists are increasingly turning to correspondence games in their work. In his newly released Grandmaster Repertoire 20: The Semi-Slav, for instance, Lars Schandorff makes extensive use of games by the Russian Correspondence Grandmaster Efremov in working out the theory of the Botvinnik Variation. Such scrutiny is entirely logical if you think about it. The best correspondence players use all possible resources – books, computers, whatever! – over a period of months to choose their moves, making their games a veritable gold mine for opening ideas and novelties.

This is one area in which both the Big and Mega Databases are lacking, as they contain only over-the-board games. It is possible to cobble together a database of correspondence games by going to the websites of major correspondence organizations (ICCF, IECC, BdF, LSS) and collecting published games, but instead you might consider the Correspondence Database 2015 from ChessBase.

The Correspondence Database 2015 (CorrBase) contains 1,274,161 games played by post and e-mail from 1804 through January 2015. (The dates in this database seem to refer to the start date for the games.) 5649 of those games are annotated. The 2015 version of CorrBase also contains over 200,000 new games when compared with its 2013 incarnation, and it includes games from all of the leading correspondence groups.

So what will you find here? Let’s look at the games of ICCF-GM Aleksandr Gennadiev Efremov, the ‘hero’ of the early chapters of Schandorff’s new book. 577 of Efremov’s games appear in CorrBase 2015, including dozens of games (with both colors) in the Semi-Slav. The latest of these began sometime in 2013, and just about every one of Schandorff’s citations can be found in CorrBase.

CorrBase 2015 is an incredibly useful resource for the serious opening theorist or correspondence player. Because there is no update service (the TeleChess sections of CBM notwithstanding) discerning users will want to search out the latest games each month at organizational websites and add them to their databases. The effort is entirely worth it.

The Correspondence Database 2015 is available via download or post for €99.90 ($92.42 without VAT). An upgrade from earlier versions is available for €59.90 ($55.42 without VAT).

The Week in Chess

Not everyone can afford to buy MegaBase, and for those who do buy BigBase, there remains the problem of keeping the database up-to-date. For both of these problems there is Mark Crowther’s indispensable e-magazine The Week in Chess (TWIC).

The first issue of TWIC appeared in September of 1994. Each week since then, Crowther has produced a text report on the week’s chess news along with a database of new games in ChessBase and .pgn formats. Because both have always been available to download at no cost, TWIC has become a weekly must-see for players of all strengths. Indeed, we get a sense of just how central Crowther’s work has become with this tweet from Anish Giri:

Every issue of TWIC, from #1 (Sept 17, 1994) through the current day (#1094 at the time of writing), can be downloaded from The Week in Chess website. The databases from issue #920 (June 25, 2012) forward are also available. Combining those 175 files, a user could create a free database with 495,966 (482,290 after killing doubles) games to study. Among them we find 640 games played by Vachier-Lagrave (the most in the database), 516 by Nakamura, 507 by Svidler, and 7 miserable efforts by Hartmann.

Crowther’s £30 offer is, in my opinion, very good value for the money. This is all the more true once you consider that you can keep it updated for free by downloading new issues of TWIC each week. I also suspect that you would boost your karmic standing by supporting Crowther’s tremendous efforts with a donation.

Owners of BigBase, who do not receive weekly updates as part of their purchase, can also use new issues of TWIC to update BigBase. Just keep in mind that the standardized names used by ChessBase and TWIC are different, so if you’re interested in studying (for instance) Kramnik’s games, you’ll have to look at ‘Kramnik,Vladimir’ (BigBase) and ‘Kramnik,V’ (TWIC) to find them all.

Paramount Chess Database

The Paramount Chess Database (Paramount) represents a complementary approach to chess research. Instead of the millions of games found in the databases discussed above, Paramount only contains 113,832 games with a roughly 70/40 split between complete games and fragments. What’s the value in that, you might ask? These are the collected games of issues 1-123 of the Chess Informant series of books, legendary among players since the first one was published in 1966. There are decades of history and knowledge collected in these games.

What has traditionally separated the Informant series from other chess publications was its annotators. It was a badge of honor to have your game selected for inclusion in the Informant, and just about every major player since the 60s has annotated for the series. All of those annotations are collected in the Paramount Database, and that’s what differentiates this products from those discussed above.

Here are some examples: there are 60 games annotated by Kasparov in MegaBase 2016, and 592 in Paramount. Anand annotated 506 games in Paramount and 267 in MegaBase. Older players like Larsen, Petrosian and Tal each have hundreds of annotated games in Paramount, while their notes in MegaBase can cumulatively be counted on two hands.

Why is this important? Others might provide competent notes, especially in the age of the computer, but games annotated by the combatants themselves have a special value. This is where the Paramount database shines, albeit with one caveat. You are more likely to find annotations by today’s Super GMs in MegaBase than in Paramount due to editorial shifts in Belgrade.

How might a player use the Paramount database? Two avenues come to mind. First, this database is very well suited to doing the kind of historical opening research championed by Kasparov in Garry Kasparov on Modern Chess: Revolution in the 70s. It’s hard to think of a better way to gain insight into, say, the Zaitsev Ruy than to actually study the games and notes that created modern theory, most of which appear in Paramount. The database can also be used to study the most important games of specific players, many of which are (as noted above) annotated by the players themselves.

One nice feature of the Paramount package is the way in which the data is presented after installation. You get a complete database of all the games, but dozens of smaller databases organized by opening, player and annotator are also included. This makes studying a specific player or important opening very easy. Each issue of the Informant appears in its own separate file, and the data is also provided in .pgn format.

The Paramount Chess Database is available by download or post for $199 from the publisher, although you can find discounted deals at various chess retailers on the web.

Summary

There is no substitute for having a large research database such as MegaBase or BigBase at your disposal for pre-game preparation, opening research and general chess study. Because MegaBase comes with annotated games, weekly updates and the PlayerBase, it is the premier database product on the market today. Serious opening analysts and correspondence players should absolutely consider supplementing BigBase or MegaBase with CorrBase.

Not everyone can afford MegaBase. For those on a budget, BigBase is an adequate stand-in for MegaBase. For those less interested in historical games and more in recent examples, Mark Crowther’s complete The Week in Chess database is perhaps a more worthy and cost-effective replacement.

Downloading the free weekly updates of TWIC and maintaining a stand-alone TWIC database should be part of every ambitious player’s weekly schedule, even if you own MegaBase and use the update subscription service. Games appear at different times in the TWIC and MegaBase updates, so if you’re doing pre-game scouting on an opponent, you should have a look at both sources.

The Paramount Chess Database has a different role to play in your research portfolio. Paramount is a wonderful historical document, a font of opening ideas to be mined and a tremendous source of well-annotated games by the best players of the past half-century. It is a superb complement to your reference database of choice, but it does not replace the need for one.