Data Coverage

There is a lot of data on the site. Some of it is ours and much of it comes from RetroSheet.org. This page attempts to show you a complete list of what we do and do not have on the site. When stats were accumulated and when they are missing. Please let us know if there is an item you would like to see reported here.

Full Season Stats

We consider the start of major league baseball to be 1871, and obviously the game has changed a lot in that time, and so has the
recordkeeping. Below we summarize the seasons for which we have complete data. Any entry with a "NO" in it means the data is completely missing for that season. Entries with a "Partial" means we have it for some leagues or some players in that season and "YES" means everything is known. This doesn't mean there aren't errors, just that we have a value for it.

Batting

Pitching

Fielding

Minor Leagues

Play-by-Play

For 1974 on we have complete play-by-play (PBP) accounts for all games. We likewise have complete accounts for all postseason and all-star games. Pre-1973, we have a good deal of PBP, but a few games are missing and we can only present box scores for those games. This means that WPA, RE24 and other PBP dependent stats are incomplete for those seasons. Below see a list of games for which we have full play-by-play or just boxscore data and the percentage of all games that year for which we have PBP.

Missing Play-by-Play

Hit Location Data and Batted Ball Type Data

Please note that this data is not 100% complete, and that locations
and trajectories have been measured differently in different years.
We have attempted to merge different sources whenever possible to have
as complete a dataset as possible. Here is the coverage for the 50+
years of data that we have on hand. The table below looks at all of
the balls of play and then gives a breakout of the percent of time we
know the trajectory (and the type to show how this has changed), the
percent of the time we know the location and who fielded the ball
(won't be 100% as there is no fielder for home runs and some things
like ground rule doubles), and the percentage of plays that result in
air_outs or ground_outs. Even in cases where the trajectory and
location are not exactly known we may still know the fielder (even for
hits) and whether a ground ball or fly ball out was recorded and by
whom.

Note: for 2000-2002, home runs were classified with empty
batted ball types in our data source. We have reclassified all of
these hits as fly balls. Probably 20% of these home runs should be
line drives and perhaps 1-2/year as ground balls. We realize this is
a simplication, so please adjust your expectations of splits, etc
accordingly.

Hit Locations

Pitch Data

The pitch data is only given when we know the values for the entire
game and for all plays in the game. This report does not include
pitch type or velocity. Instead, it records the sequence of balls and
strikes, fouls, swinging strikes, pitchouts, etc.

Please note that this data is not 100% complete, and we have merged
several datasets when producing this data. Back to 1998 is
essentially complete and before then there is a great deal of data
back to 1988. Previous to 1988, only a few years have data. For
example, Allan Roth of the Dodgers compiled such data for many, many
Dodgers games from the '50s and '60s.

Below are the percentage of all plays in a season that are missing
pitch sequence data.

Pitch Results

Weather Data

The weather data is based on conditions at the start of the game.
Below we show the percentage of each data set (temp, wind speed &
dir, etc) which are not null (or unknown). This data is included in
the RetroSheet data files and is provided as is and most certainly
contains some errors. There is no weather data pre-1950.