Hayong Yun
and I have parsed all of the fields appearing in headers for 10-K forms
(including 10-K405, 10KSB, and 10KSB40 forms) available on the SEC’s EDGAR
website.Currently the data includes
all filings from 1994-2010 (N = 170,413).Note that the SEC did not require electronic filing until May 1996,
thus the first two years of the sample are biased toward large firms.For each firm, only the first 10-K filing
in a given year is included in the sample.

There are
1,034 cases where the CIK in the file name (f_cik)
is not equal to the CIK reported in the document header (cik).These
cases occur when the header contains multiple “filings” (typicallyutilities with multiple
subsidiaries). When the header contains multiple filing fields,we provide data
for the first filing listed.

The
variables “lagzip” thru “ma_state_fips”
are derived data, where if a firm’s one-year lag of latitude and longitude
change, the firm/year observation is identified with a dummy variable
(“mover”=1) along with the distance in kilometers from the prior location
(“distance”).

The data are in a standard STATA .dta
format.The size of the dataset
requires the STATA command “set mem 200m” before
the “use” statement importing the data.The variables and their definitions are as follows: