Context Until recently, Web of Science was the only database available to track citation counts for published articles. Other databases are now available, but their relative performance has not been established.

Objective To compare the citation count profiles of articles published in general medical journals among the citation databases of Web of Science, Scopus, and Google Scholar.

Design Cohort study of 328 articles published in JAMA, Lancet, or the New England Journal of Medicine between October 1, 1999, and March 31, 2000. Total citation counts for each article up to June 2008 were retrieved from Web of Science, Scopus, and Google Scholar. Article characteristics were analyzed in linear regression models to determine interaction with the databases.

Main Outcome Measures Number of citations received by an article since publication and article characteristics associated with citation in databases.

Results Google Scholar and Scopus retrieved more citations per article with a median of 160 (interquartile range [IQR], 83 to 324) and 149 (IQR, 78 to 289), respectively, than Web of Science (median, 122; IQR, 66 to 241) (P < .001 for both comparisons). Compared with Web of Science, Scopus retrieved more citations from non–English-language sources (median, 10.2% vs 4.1%) and reviews (30.8% vs 18.2%), and fewer citations from articles (57.2% vs 70.5%), editorials (2.1% vs 5.9%), and letters (0.8% vs 2.6%) (all P < .001). On a log10-transformed scale, fewer citations were found in Google Scholar to articles with declared industry funding (nonstandardized regression coefficient, −0.09; 95% confidence interval [CI], −0.15 to −0.03), reporting a study of a drug or medical device (−0.05; 95% CI, −0.11 to 0.01), or with group authorship (−0.29; 95% CI, −0.35 to −0.23). In multivariable analysis, group authorship was the only characteristic that differed among the databases; Google Scholar had significantly fewer citations to group-authored articles (−0.30; 95% CI, −0.36 to −0.23) compared with Web of Science.

Conclusion Web of Science, Scopus, and Google Scholar produced quantitatively and qualitatively different citation counts for articles published in 3 general medical journals.

Citation counts are used to measure the impact of articles, journals, and researchers and are frequently incorporated in decisions of academic advancement. However, the validity and methods behind the procurement of citation counts have received limited attention. Until relatively recently, Web of Science was the only practical way to obtain citation counts.1 In the general medical literature, virtually all previous citation analysis studies have used this database exclusively2- 6 and checking the accuracy or validity of these citation counts against another measure was not feasible.

Now, however, several other citation databases have become available, including Scopus7 and Google Scholar,8 both introduced in 2004. Scopus, like Web of Science, requires a paid subscription, while Google Scholar is free. Each of these databases uses unique methods to record and count citations. The scope of these databases also differ9- 12 in that Web of Science and Scopus claim strong coverage of selected peer-reviewed journals, while Google Scholar might be better able to record citations from books and nontraditional sources, such as Web sites, dissertations, and open-access online journals.

Previous studies in some scientific fields, such as computing, biology, physics, and oncology, have shown differences in citation counts among these databases.9,13,14 To our knowledge, this topic has not previously been addressed in general medicine. Differences in citation counts among the databases could have implications for citation analysis studies and in the use of citation counts for academic advancement decisions. If, however, the results across the databases are similar, then other features of the database, including cost and ease of use, may dictate preference. For this study, we chose a cohort of index articles published in 3 general medical journals and compared their citation profiles in Web of Science, Scopus, and Google Scholar.

Methods

We acquired, through hand-searching, a cohort of original research papers published in JAMA, Lancet, and the New England Journal of Medicine (NEJM) between October 1, 1999, and March 30, 2000.4 These journals were selected because they were rated the top 3 general medical journals by the impact factor. This 6-month period of publication allowed for the accrual of enough index articles to perform multivariable regression analysis. We included all articles under the following table of contents headings: Original Contributions in JAMA, Original Research–Articles in Lancet, and Original Articles in NEJM. This excluded all nonsystematic literature reviews, other original research articles, and systematic reviews that appeared under different table of contents headings in the journals. These articles were published within the time frame covered by all 3 databases (Web of Science has been shown to have an advantage in retrieving citations for older studies published before 199610). Therefore, all databases should have an equal opportunity to retrieve any citations received by these articles since publication.

We extracted 9 article characteristics for each article: (1) journal of publication (JAMA, Lancet, or NEJM); (2) study design (randomized trial, prospective observational study, retrospective study, meta-analysis, or survey); (3) clinical category: medical subspecialty to which the main conclusion of the article was most applicable (cardiovascular, general medicine, infectious disease, obstetrics/gynecology, oncology, or other); (4) whether the author byline included group authorship; (5) whether the research was performed partly or fully in the United States (meaning that research participants were recruited within the United States or, for studies that did not use research participants, the address of the corresponding author was within the United States); (6) sample size of the study (in cases of meta-analysis, the sample size was the total number of patients in all analyzed studies); (7) declared for-profit industry funding; (8) whether the article studied a drug or medical device; and (9) whether the study had been reported contemporaneously by the Associated Press in the news media based on a daily search of the Associated Press news wire during the 6-month period during which the articles were published (plus an additional 7 days). These data were extracted as previously described.4

In June 2008, 2 of us (B.A., I.S.), working independently, determined the total cumulative citation counts to date for all articles according to the Web of Science's Science Citation Index, Scopus, and Google Scholar. Repeat, independent citation searches were performed for the first 30 articles (based on their chronological order of publication) and a further 50 randomly selected articles by 2 of us (A.V.K., J.W.B.) to further confirm accuracy of data collection. No discrepancies were found. For any 1 article, the maximum time between our assessments of any of the 3 databases was no more than 7 days. For citations from Web of Science and Scopus, we also recorded the type of citing document as categorized by the databases (article, review, editorial, letter, or other) and the language of the citing document. This information was not available through Google Scholar.

To determine whether the citations retrieved by the databases were, in fact, true citations of the index articles, we reviewed a sample of citing documents for accuracy. Accuracy was defined as the percentage of citing documents that truly cited the index article. After eliminating 7 articles that had received fewer than 5 citations in any database, we randomly selected 60 index articles from the remaining 321 and checked the accuracy of 5 citing documents for each article from within each database (for a total of 300 citing documents per database). We used systematic sampling to select the 5 citing documents as follows: all citing documents for the index article were ranked by the number of times they had been cited within the database and we selected the first citing document followed by every nth document (where n = total number of citing documents divided by 5). The reference list for each citing document was reviewed to establish whether the index article had actually been cited. The sample size of 300 citing documents per database was selected to provide a 95% confidence interval (CI) of ±3% for the estimate of accuracy, assuming the accuracy was at least 95%.

Citations per year for each index article were calculated as the total number of citations received divided by the number of years since publication (ie, the number of months between publication of the index article and June 2008, divided by 12). To assess differences in citation counts among the databases, we used nonparametric Friedman analysis of variance and the Wilcoxon matched-pairs test. Differences in the percentage of citing documents by type (article, review, editorial, or letter) and language (English vs non-English) between Web of Science and Scopus were compared using the Wilcoxon matched-pairs test and the percentage increase in citations among the journals was compared using the Kruskal-Wallis test.

We explored whether article characteristics were associated with citation counts differently in the 3 databases using linear regression analysis. Because of the skewed nonnormal distribution of citation counts, we log10 transformed these data for analyses and the approximation to the normal distribution was confirmed with a Kolmogorov-Smirnov test (P > .36 for all 3 databases) and examination of probability plots. For each of the 9 article characteristics, we ran a separate linear regression analysis in which we included database, the article characteristic, and an interaction term between article characteristic and database. The interaction term assessed whether any article characteristic was associated with a relatively different citation count in any 1 database compared with the others. To account for the repeated observations among articles, we included article as a random-effect variable in the model (coded from 1-328). The dependent variable in all models was total citation count (log10 transformed). All article characteristics that demonstrated an interaction with database (at P < .10) were entered into a multivariable model. Variance inflation factors for all variables were less than 10, indicating no important multicollinearity.15 All comparisons were 2-tailed and P<.05 was considered statistically significant. All analyses were performed with SPSS Advanced Statistics version 17.0 (SPSS Inc, Chicago, Illinois). Data are reported as median and interquartile ranges (IQRs).

Results

Characteristics of the 328 articles (median sample size of 642 [IQR, 147-6363]) are shown in Table 1. There were 15 meta-analyses with a median sample size of 5893 (IQR, 930-261 769) and 313 other studies with a median sample size of 607 (IQR, 145-5889). The 328 articles received a total of 68 088 citations in Web of Science, 82 076 citations in Scopus, and 83 538 in Google Scholar. All articles received at least 1 citation in all databases except for 1 article that could not be located in Google Scholar and was assigned 0 citations. The median number of citations per article was significantly different among the 3 databases with lower counts in Web of Science (122; IQR, 66-241) compared with either Scopus (149; IQR, 78-289) or Google Scholar (160; IQR, 83-324) (P < .001 for both comparisons; Table 2). Compared with Web of Science, Google Scholar provided a median of 37% additional citations for JAMA articles, 32% for Lancet articles, and 30% for NEJM articles (P = .22). Compared with Web of Science, Scopus provided a median of 19% additional citations for Lancet articles, 19% for NEJM articles, and 18% for JAMA articles (P = .48).

The citation accuracy for a sample of 300 citing documents differed among the databases: 98.0% (95% CI, 96.4% to 99.6%) for Google Scholar (6 of the 300 citing documents did not cite the index article) and 100% for both Scopus and Web of Science (P = .002 by χ2 test).

In a regression analysis, the article characteristics of presence of declared industry funding, study of a drug or medical device, and group authorship demonstrated interaction with database for Google Scholar compared with Web of Science (all P < .10; Table 3). In each case, these article characteristics were associated with higher citation counts in all databases, but the effect was significantly less in Google Scholar than in either Scopus or Web of Science. We entered all 3 article characteristics with their interaction terms into a multivariable model. The only interaction term that remained significant was group authorship. Compared with Web of Science, Google Scholar had significantly fewer citations to group-authored articles even though Google Scholar had more citations overall (Table 3).

Comment

We found that, for a sample of 328 high-profile general medicine articles, Google Scholar and Scopus retrieved a greater number of citations than Web of Science. Scopus retrieved a greater proportion of non-English and review citations, and Web of Science retrieved more citations from articles, editorials, and letters. Studies with declared industry funding, those that studied a drug or medical device, and those with group authorship were all associated with more citations in Scopus and Web of Science compared with Google Scholar. In multivariable analysis, however, only group-authored articles maintained a significant citation count difference among databases, and were associated with significantly fewer citations in Google Scholar. The citation accuracy of Google Scholar was found to be slightly lower than Scopus or Web of Science.

Strengths of our methods include having a sample of index articles representing a broad range of study designs and medical subspecialties. The 8 years between publication of the articles and our search of the databases allowed time for the articles to develop an established citation history in a variety of sources, including traditional peer-reviewed journals, open-access journals, conference proceedings, dissertations, and books. This was important to fully explore the putative advantages of each database. Our data collection was comprehensive and careful, including independent abstraction of data at all stages conducted by trained reviewers.

Our study has potential limitations. We did not identify the degree to which citations overlapped among the databases. Previous work has shown that the degree of overlap of citations appears to vary by field of study,9 but to be no more than 58% between Web of Science and Scopus16 and with no more than 31% of citations overlapping in all 3 databases.9,16 Our sample of articles was acquired from select journals in general medicine and our results might not apply to other journals. The citation databases in our study are evolving and our results only apply to the period of database access of our study. In addition, we did not include previously studied variables such as statistically significant results and industry-favoring results.4 With the heterogenous sample of studies included in our sample, the concept of statistically significant results was difficult to determine uniformly or accurately. We have also determined that defining a result as industry-favoring is difficult to do objectively.

The Web of Science has long defined the standard for determining which citations are counted. The Web of Science claims as one of its strengths the selection process for only including certain journals in its content coverage. A description of the Web of Science Web site17 refers to Bradford's Law, first proposed in 1934, that states that the bulk of important scientific findings are reported in only a small number of journals. Therefore, the Web of Science emphasizes the quality of its content coverage, rather than the quantity. This scope of coverage, however, has been criticized for favoring North American–based, English-language journals18 and for not fully covering other citation sources, such as books.

Other citation databases offer alternative approaches to counting citations. Scopus, for example, covers more journals (approximately 15 000 peer-reviewed journals vs 10 000 for Web of Science)19 with greater relative coverage of non–North American sources.20 Scopus claims that more than half of its content originates from Europe, Latin America, and the Asia-Pacific region. Scopus also covers conference proceedings (which Web of Science also covers), trade publications, books, and several Web sources. Unlike Web of Science, however, whose content extends to 1900, Scopus is limited in its coverage of older publications, especially those before 1996.10 The automated, Web-based Google Scholar appears to include coverage of nontraditional online documents, including university theses and non–peer-reviewed Web sites. Google Scholar has been criticized,12 in part for including citations from what many would consider nonscholarly sources, such as student handbooks and administrative notes.21 Regardless, within a year of its introduction, Google Scholar was apparently responsible for bringing far more visitors to the BMJ Web site than PubMed.22

Although the content coverage of Web of Science and Scopus differs, the methods used to retrieve information are relatively similar; content is received directly from publishers, from which information, including citations, are extracted and then validated.23,24 Proprietary algorithms are used to match references to specific records. Much of the details of Google Scholar's methods have not been made public and it does not provide a list of all publishers with whom it has content agreements.13,25 Google Scholar extracts information from online content using automated robot Web crawlers, but the algorithm used to link records is not publicly known.26 Google Scholar is believed to be updated monthly,10 whereas Web of Science is updated weekly19 and Scopus daily (according to its developer, Elsevier).27

Conclusions

We found that Web of Science, Scopus, and Google Scholar produce quantitatively and qualitatively different citation counts for high-profile general medicine articles. In offering alternative scopes of coverage and search algorithms, new citation databases raise questions of how to count citations. For example, should a citation on a non–peer-reviewed Web page be viewed as quantitatively equivalent to a citation in a high-profile peer-reviewed medical journal? Future research should focus on the development of guidelines for the use and interpretation of different citation indexing databases.

Author Contributions: Dr Kulkarni had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Kulkarni, Aziz, Shams, Busse.

Acquisition of data: Kulkarni, Aziz, Shams, Busse.

Analysis and interpretation of data: Kulkarni, Busse.

Drafting of the manuscript: Kulkarni.

Critical revision of the manuscript for important intellectual content: Kulkarni, Aziz, Shams, Busse.

Statistical analysis: Kulkarni, Busse.

Administrative, technical, or material support: Aziz, Shams.

Financial Disclosures: None reported.

Funding/Support: Dr Busse is funded by a New Investigator Award from the Canadian Institutes of Health Research and the Canadian Chiropractic Research Foundation.

Role of the Sponsor: The Canadian Institutes of Health Research and the Canadian Chiropractic Research Foundation had no role in the design and conduct of the study; the collection, analysis, and interpretation of the study; or in the preparation, review, or approval of the manuscript.

Additional Contributions: We thank Stephen D. Walter, PhD (Department of Biostatistics and Clinical Epidemiology, McMaster University, Hamilton, Ontario, Canada), for his advice in the drafting of the manuscript. Dr Walter did not receive compensation for his statistical advice.