چکیده انگلیسی

Large blocks of stock play an important role in many studies of corporate governance and finance. Despite this important role, there is no standardized data set for these blocks, and the best available data source, Compact Disclosure, has many mistakes and biases. In this paper, we document these mistakes and show how to fix them. The mistakes and biases tend to increase with the level of reported blockholdings: in firms where Compact Disclosure reports that aggregate blockholdings are greater than 50%, these aggregate holdings are incorrect more than half the time and average holdings for these incorrect firms are overstated by almost 30 percentage points. For researchers using uncorrected blockholder data as a dependent variable, these errors will increase the standard error of coefficient estimates but do not appear to cause bias. However, we find that if blockholders are used as an independent variable, economically significant errors-in-variables biases can occur. We demonstrate these biases using a representative analysis of the relationship between firm value and outside blockholders. An online appendix to our paper provides a “clean” data set for our sample firms and time period. For researchers who need to work outside of this sample, we also test the efficacy of alternative (cheaper) fixes to this data problem, and find that truncating or winsorizing the sample can reduce about half of the bias in our representative application.

مقدمه انگلیسی

Large-block shareholders play an important role in corporate governance. For this reason, the presence of such “blockholders” and the size of their holdings is a common explanatory variable in financial research. In just the last few years, a representative sample of such studies includes analyses of the role of blockholders in executive turnover, executive compensation, firm diversification, discretionary expenses, market liquidity, IPO underpricing, organizational efficiency, and corporate performance.1 Furthermore, blockholder data is a crucial input in the analysis of the relationship between ownership structure and firm value, where seminal works by Demsetz and Lehn (1985) and Mørck et al. (1988) gave rise to a vast and growing literature.
Despite the common use of large shareholder data, there is no clean off-the-shelf database to facilitate research. Many of the papers cited above required their authors to gather their own data. This time-consuming task is necessary because of several weaknesses in the available databases. Of course, decentralized data gathering causes duplication of effort and lack of standardization across projects. Also, because of the large time commitment necessary to clean the data for each firm, most researchers have gathered data for a relatively small number of firms. This paper aims to fill this data gap by documenting the problems with the currently available data, proposing a consistent set of solutions to these problems, and making a “clean” database freely available to all researchers.2 Furthermore, we demonstrate the superiority of clean (vs. raw) data with a representative study on the relationship between outside blockholders and firm value and discuss some alternatives to this exhaustive cleaning for other samples.
The Securities Exchange Act of 1934 (SEA) lays out the ownership disclosure requirements for public corporations in Regulation 14A and Schedule 14A. Virtually everything we know about blockholders in the United States comes from these disclosure requirements, which are described in detail in Appendix A of this paper. The two main types of data produced by the SEA are for holdings (once per year, reported in the annual proxy statement), and for transactions by corporate insiders and beneficial owners (updated through Forms 3, 4 and 5). While the trading data would appear to provide the most current and comprehensive information, past research has demonstrated that this data is difficult to work with and cannot be relied upon to infer the holdings of individual blockholders (Anderson and Lee, 1997a and 1997b), Jeng et al. (2003)). Thus, we focus in this paper on the annual proxy data, which is more reliable and more commonly used by researchers.
Proxy data is available from many sources, including direct electronic access using the SEC's “Edgar” tool for all corporate filings since the mid-1990s. For large-scale data downloads, however, it is necessary to use a commercial product. The most widely used product is the Compact Disclosure(CD) database of Standard and Poor's. Anderson and Lee, 1997a and Anderson and Lee, 1997b focus their analysis on the holdings of corporate officers and directors, and show that CD accurately reproduces the information in proxy statements for all firms except those with multiple classes of stock. While CD also reproduces data on blockholders from the tables in the proxy statement, there are additional problems with these data. We discuss these problems and their solutions in Section 2, and summarize the changes for a large sample of firms from 1996 to 2001. For researchers using blockholder data in regression analysis, the raw data present an errors-in-variables problem. If blockholder data is used as a dependent variable, then these errors only cause biases if they are correlated with the regressors. In Section 2 we demonstrate that the errors are independent of a set of logical regressors, so bias is unlikely for many applications.
If, however, blockholders are used as an independent variable, then there are several possible biases. In Section 3, we perform a representative study using both raw CD data and a “clean” data set where the CD data problems have been fixed. In our sample, we find that the raw data is much noisier: in annual regressions of Tobin's Q on outside blockholder ownership and other control variables, the clean data set is far more likely to yield statistically significant point estimates for the ownership variables. Furthermore, bootstrap estimations demonstrate that improved precision is the typical outcome for this regression. The good news is that the bias appears restricted to the blockholder coefficients only, with no bias induced for the coefficients in other regressions.
Since our cleaned data is only available for a subset of firms and years, researchers will also be interested in the efficacy of alternative fixes for these data errors. In Section 4 we discuss several alternatives based on truncating, winsorizing, or partial cleaning. While several of these fixes can alleviate the errors-in-variables bias, an economically significant bias still remains in all cases, with the best fix eliminating approximately one-half of the bias. Section 5 summarizes and concludes. Two appendices supplement the text. Appendix A provides details on the 1934 SEA and the disclosure requirements it created, and Appendix B provides details on the construction of our sample.

نتیجه گیری انگلیسی

Researchers rely on ownership data for many studies. The lack of a standardized source of data on large blockholders is an impediment to this work. In this paper, we document the weaknesses with the commonly used data, show how to fix them, and demonstrate that these fixes are both quantitatively large and also important for some applications.
The measurement error in blockholder data creates several possibilities for bias. Our analysis suggests that empirical work with blockholder data as the dependent variable will produce unbiased results, as the measurement error is not correlated with the other firm-level characteristics that we tested. While the measurement error does increase standard errors, the increase is not severe.
Researchers who use blockholder data as an independent variable face a larger challenge. In a representative analysis of firm value and blockholdings, we find that using the uncorrected raw data leads to significant biases for the blockholder coefficients, and simple fixes such as truncating or winsorizing the sample provide only partial alleviation of these biases. The coefficients on other regressors do not appear to have biases. Thus, if the blockholder effect is the key independent variable, we believe it is necessary to work with a cleaned sample. If blockholder data is only being used as a control variable, then a cleaned sample is much less crucial.