This feature is available to Subscribers Only

Context: Since its launch in 2001, Wikipedia has become the most popular general reference site on the Internet and a popular source of health care information. To evaluate the accuracy of this resource, the authors compared Wikipedia articles on the most costly medical conditions with standard, evidence-based, peer-reviewed sources.

Methods: The top 10 most costly conditions in terms of public and private expenditure in the United States were identified, and a Wikipedia article corresponding to each topic was chosen. In a blinded process, 2 randomly assigned investigators independently reviewed each article and identified all assertions (ie, implication or statement of fact) made in it. The reviewer then conducted a literature search to determine whether each assertion was supported by evidence. The assertions found by each reviewer were compared and analyzed to determine whether assertions made by Wikipedia for these conditions were supported by peer-reviewed sources.

Results: For commonly identified assertions, there was statistically significant discordance between 9 of the 10 selected Wikipedia articles (coronary artery disease, lung cancer, major depressive disorder, osteoarthritis, chronic obstructive pulmonary disease, hypertension, diabetes mellitus, back pain, and hyperlipidemia) and their corresponding peer-reviewed sources (P<.05) and for all assertions made by Wikipedia for these medical conditions (P<.05 for all 9).

Conclusion: Most Wikipedia articles representing the 10 most costly medical conditions in the United States contain many errors when checked against standard peer-reviewed sources. Caution should be used when using Wikipedia to answer questions regarding patient care.

Since its 2001 launch, Wikipedia (http://www.wikipedia.org/) has become the most popular general reference site on the Internet, ranking 6th globally based on Internet traffic.1 As of March 2014, it contained more than 31 million articles in 285 languages.2 Wikipedia's prominence has been made possible by its fundamental design as a wiki, or collaborative database, allowing all users the ability to add, delete, and edit information at will. However, it is this very feature that has raised concern in the medical community regarding the reliability of the information it contains.

Despite these concerns, Wikipedia has become a popular source of health care information,3 with 47% to 70% of physicians and medical students admitting to using it as a reference.4-6 In actuality, these figures may be higher because some researchers suspect its use is underreported.7 Although the effect of Wikipedia's information on medical decision making is unclear, it almost certainly has an influence.

Wikipedia has several mechanisms in place to deal with unverifiable information and vandalism.8 Because of the frequency of editing and revisions, most instances of vandalism only exist for a few days after being identified, with half of the corrections being posted less than 3 minutes after being identified.9 One study found that some corrections were made almost instantaneously in 42% of cases.10 There is a push on Wikipedia to have statements backed by references and unverifiable statements being called out to readers.11 Haigh12 observed that, in general, medically related articles on Wikipedia are accompanied by a sufficient amount of reputable citations.

To evaluate Wikipedia's accuracy, we compared Wikipedia articles on the 10 most costly medical conditions in the United States with recognized peer-reviewed sources.

Methods

The 10 most costly conditions in the United States by public and private expenditure in 2008—the year that the most complete data were available for the present study—were identified from the publicly available database from the Agency for Healthcare Research and Quality.13 We then identified 10 Wikipedia articles that we believed most closely related to each of those conditions. Because Wikipedia articles are dynamic and subject to frequent changes and updates, we printed the selected articles on April 25, 2012, for our research purposes.

In a blinded process, we randomly selected 10 reviewers to examine 2 of the selected Wikipedia articles. Each reviewer was an internal medicine resident or rotating intern at the time of the assignment. This arrangement created redundancy, giving the study 2 independent reviewers for each article. Also, by using physicians as reviewers, we ensured a baseline competency in medical literature interpretation and research. We used a Web-based randomizer (http://www.random.org) to assign the selected Wikipedia articles to each reviewer. Reviewers were asked to identify every assertion (ie, implication or statement of fact) in the Wikipedia article and to fact-check each assertion against a peer-reviewed source that was published or updated within the past 5 years. Reviewers were sent an e-mail containing examples of assertions (eg, “diuretics are the initial drug of choice for essential hypertension without co-morbidities”). The authors instructed the reviewers to use UpToDate (http://www.uptodate.com/) as the initial means by which to search for peer-reviewed sources. If UpToDate did not produce adequate results, then each reviewer was instructed to use PubMed (http://www.ncbi.nlm.nih.gov/pubmed), Google Scholar (http://scholar.google.com/), or a search engine of their choice. Each reviewer then reported concordance or discordance between Wikipedia and the peer-reviewed sources. Two researchers who did not participate in the original review process then compared both reviews of each article for similar assertions as well as dissimilar assertions and tallied the concordance and discordance for each.

The null hypothesis of the study was that there would be concordance between the Wikipedia article and the peer-reviewed sources (P>.05). The alternative hypothesis was that there would be discordance (ie, no concordance) between the Wikipedia article and the peer-reviewed sources (P<.05). A McNemar test for correlated proportions was conducted for the assertions that were similar, dissimilar, or both, as assessed by the blinded reviewers.14(pp171-178)

Results

The Agency for Healthcare Research and Quality13 listed the following 10 conditions as the costliest: heart disease, cancer, mental disorders, trauma-related disorders, osteoarthritis, chronic obstructive pulmonary disease/asthma, hypertension, diabetes, back problems, and hyperlipidemia. The corresponding Wikipedia articles15-24 are listed in Table 1. Examples of the descriptive terms we used to categorize the findings of each reviewer are listed on Table 2.

Table 1.

Top 10 Most Costly Conditions in the United Statesa and Corresponding Wikipedia Articlesb

Reviewers found a statically significant discordance between Wikipedia and peer-reviewed sources for assertions that were similar (P<.05) in all but 1 of the conditions: trauma-related disorders (ie, concussions). The same was true for all assertions found by the blinded reviewers of the articles (P<.05 for all conditions except concussions). In 4 articles—major depressive disorder, osteoarthritis, chronic obstructive pulmonary disease, and diabetes mellitus—there was a statistically significant discordance between Wikipedia articles and peer-reviewed sources for dissimilar assertions. The interpretation of the P value is true for similar assertions between the 2 reviewers as well as for dissimilar assertions (Table 3).

Table 3.

No. of Similar and Dissimilar Assertions and Corresponding P Values of 10 Wikipedia Articlesa

Assertions

Similar

Dissimilar

Both

Wikipedia Article

Concordance

Discordance

Concordance

Discordance

Concordance

Discordance

Total

Lung Cancer

Reviewer 1

73

27

31

17

104

44

148

Reviewer 2

83

18

17

2

100

20

120

P value

<.001

.99

.001

Diabetes Mellitus

Reviewer 1

37

1

15

3

52

4

56

Reviewer 2

34

2

40

7

74

9

83

P value

<.001

<.001

<.001

Osteoarthritis

Reviewer 1

33

8

9

4

42

12

54

Reviewer 2

33

8

19

13

52

21

73

P value

.001

.003

<.001

Coronary Artery Disease

Reviewer 1

17

7

24

4

41

11

52

Reviewer 2

19

9

8

5

27

14

41

P value

.029

.388

.012

Chronic Obstructive Pulmonary Disease

Reviewer 1

36

16

8

3

44

19

63

Reviewer 2

63

10

24

3

87

13

100

P value

<.001

<.001

<.001

Hyperlipidemia

Reviewer 1

17

0

11

0

28

0

28

Reviewer 2

19

4

4

2

23

6

29

P value

<.001

.375

.001

Concussion

Reviewer 1

40

24

22

26

62

50

112

Reviewer 2

26

8

21

3

47

11

58

P value

.888

.56

.839

Hypertension

Reviewer 1

27

13

29

11

56

24

80

Reviewer 2

62

12

7

0

69

11

80

P value

<.001

.481

<.001

Major Depressive Disorder

Reviewer 1

36

9

20

7

56

16

72

Reviewer 2

48

31

45

48

93

79

172

P value

<.001

<.001

<.001

Back Pain

Reviewer 1

34

2

36

8

70

12

82

Reviewer 2

29

2

13

2

42

4

46

P value

<.001

.383

<.001

a Concordance or discordance found between each blinded reviewer for assertions that he or she found to be similar, dissimilar, or both. P values were calculated using the McNemar test for concordance and represent the ratings of 2 researchers who did not participate in the original review process and who tallied the assertions that were found by all blinded reviewers. The terms assertions, similar, dissimilar, concordance, and discordance are defined in Table 2.

No. of Similar and Dissimilar Assertions and Corresponding P Values of 10 Wikipedia Articlesa

Assertions

Similar

Dissimilar

Both

Wikipedia Article

Concordance

Discordance

Concordance

Discordance

Concordance

Discordance

Total

Lung Cancer

Reviewer 1

73

27

31

17

104

44

148

Reviewer 2

83

18

17

2

100

20

120

P value

<.001

.99

.001

Diabetes Mellitus

Reviewer 1

37

1

15

3

52

4

56

Reviewer 2

34

2

40

7

74

9

83

P value

<.001

<.001

<.001

Osteoarthritis

Reviewer 1

33

8

9

4

42

12

54

Reviewer 2

33

8

19

13

52

21

73

P value

.001

.003

<.001

Coronary Artery Disease

Reviewer 1

17

7

24

4

41

11

52

Reviewer 2

19

9

8

5

27

14

41

P value

.029

.388

.012

Chronic Obstructive Pulmonary Disease

Reviewer 1

36

16

8

3

44

19

63

Reviewer 2

63

10

24

3

87

13

100

P value

<.001

<.001

<.001

Hyperlipidemia

Reviewer 1

17

0

11

0

28

0

28

Reviewer 2

19

4

4

2

23

6

29

P value

<.001

.375

.001

Concussion

Reviewer 1

40

24

22

26

62

50

112

Reviewer 2

26

8

21

3

47

11

58

P value

.888

.56

.839

Hypertension

Reviewer 1

27

13

29

11

56

24

80

Reviewer 2

62

12

7

0

69

11

80

P value

<.001

.481

<.001

Major Depressive Disorder

Reviewer 1

36

9

20

7

56

16

72

Reviewer 2

48

31

45

48

93

79

172

P value

<.001

<.001

<.001

Back Pain

Reviewer 1

34

2

36

8

70

12

82

Reviewer 2

29

2

13

2

42

4

46

P value

<.001

.383

<.001

a Concordance or discordance found between each blinded reviewer for assertions that he or she found to be similar, dissimilar, or both. P values were calculated using the McNemar test for concordance and represent the ratings of 2 researchers who did not participate in the original review process and who tallied the assertions that were found by all blinded reviewers. The terms assertions, similar, dissimilar, concordance, and discordance are defined in Table 2.

A few studies12,25-27 have compared Wikipedia articles with standard peer-reviewed sources and have shown it to be roughly equivalent to these sources. The most notable study, by Giles,25 compared Wikipedia with the Encyclopedia Britannica. Other authors12,26,27 have compared Wikipedia with textbooks and national databases and showed comparable results. In contrast, other researchers28-30 have determined that Wikipedia is unsuitable as a reference for drugs. Except for psychiatric conditions,26 scientific research has never, to our knowledge, focused on Wikipedia's content on prevalent medical conditions. A recent study by Azer31 concluded that Wikipedia is not a reliable information source for medical students in gastroenterology and hepatology.

The present study demonstrated that most Wikipedia articles on the 10 most costly conditions in the United States contained assertions that are inconsistent with peer-reviewed sources. Because our standard was the peer-reviewed published literature, it can be argued that these assertions on Wikipedia represent factual errors.

A perplexing finding in our study was that most of the dissimilar assertions found by the reviewers failed to demonstrate discordance. A reporting bias may have plausibly occurred: each article reviewer was either an internal medicine resident or a rotating intern physician at the time of the review and may not have believed that every assertion was worth reporting. For example, the diabetes mellitus Wikipedia article stated that it is a condition in “which a person has high blood sugar.” One reviewer might have accurately recorded this statement as an assertion, whereas another might have assumed the statement to be common knowledge and erroneously not recorded it as an assertion. These incongruent criteria for assertions may explain the difference found between reviewers.

Although 9 of 10 articles demonstrated discordance between Wikipedia articles and the peer-reviewed sources, the article on concussions did not. This finding may have occurred because Wikipedia has a number of different contributors to each article and the contributors to this particular article were more expert.

The present study had 5 main limitations. First, it did not address errors of omission, but rather was designed to detect assertional errors. It is possible that the Wikipedia article did not contain important information about a topic. However, we opted not to examine errors of omission because of the subjectivity involved with determining what should be included in a review article on a specific medical topic. Second, the present study would have been stronger if more than 2 reviewers were assigned to each article. A future study design could use additional reviewers with more varied specializations to strengthen its findings. Third, we used any peer-reviewed reference as a standard that included an initial search through a subscription-only service (UpToDate). Fourth, we used physicians-in-training rather than content experts as reviewers, which may have created a bias that the present study was not designed to measure. Lastly, we did not check the assertions in the peer-reviewed sources, a limitation that may prove important because peer-reviewed sources are often not in agreement. Future studies might also include how the convenience of Wikipedia may influence perception of the reliability of the information found.

Conclusion

Most Wikipedia articles for the 10 costliest conditions in the United States contain errors compared with standard peer-reviewed sources. Health care professionals, trainees, and patients should use caution when using Wikipedia to answer questions regarding patient care.

Our findings reinforce the idea that physicians and medical students who currently use Wikipedia as a medical reference should be discouraged from doing so because of the potential for errors.

No. of Similar and Dissimilar Assertions and Corresponding P Values of 10 Wikipedia Articlesa

Assertions

Similar

Dissimilar

Both

Wikipedia Article

Concordance

Discordance

Concordance

Discordance

Concordance

Discordance

Total

Lung Cancer

Reviewer 1

73

27

31

17

104

44

148

Reviewer 2

83

18

17

2

100

20

120

P value

<.001

.99

.001

Diabetes Mellitus

Reviewer 1

37

1

15

3

52

4

56

Reviewer 2

34

2

40

7

74

9

83

P value

<.001

<.001

<.001

Osteoarthritis

Reviewer 1

33

8

9

4

42

12

54

Reviewer 2

33

8

19

13

52

21

73

P value

.001

.003

<.001

Coronary Artery Disease

Reviewer 1

17

7

24

4

41

11

52

Reviewer 2

19

9

8

5

27

14

41

P value

.029

.388

.012

Chronic Obstructive Pulmonary Disease

Reviewer 1

36

16

8

3

44

19

63

Reviewer 2

63

10

24

3

87

13

100

P value

<.001

<.001

<.001

Hyperlipidemia

Reviewer 1

17

0

11

0

28

0

28

Reviewer 2

19

4

4

2

23

6

29

P value

<.001

.375

.001

Concussion

Reviewer 1

40

24

22

26

62

50

112

Reviewer 2

26

8

21

3

47

11

58

P value

.888

.56

.839

Hypertension

Reviewer 1

27

13

29

11

56

24

80

Reviewer 2

62

12

7

0

69

11

80

P value

<.001

.481

<.001

Major Depressive Disorder

Reviewer 1

36

9

20

7

56

16

72

Reviewer 2

48

31

45

48

93

79

172

P value

<.001

<.001

<.001

Back Pain

Reviewer 1

34

2

36

8

70

12

82

Reviewer 2

29

2

13

2

42

4

46

P value

<.001

.383

<.001

a Concordance or discordance found between each blinded reviewer for assertions that he or she found to be similar, dissimilar, or both. P values were calculated using the McNemar test for concordance and represent the ratings of 2 researchers who did not participate in the original review process and who tallied the assertions that were found by all blinded reviewers. The terms assertions, similar, dissimilar, concordance, and discordance are defined in Table 2.

No. of Similar and Dissimilar Assertions and Corresponding P Values of 10 Wikipedia Articlesa

Assertions

Similar

Dissimilar

Both

Wikipedia Article

Concordance

Discordance

Concordance

Discordance

Concordance

Discordance

Total

Lung Cancer

Reviewer 1

73

27

31

17

104

44

148

Reviewer 2

83

18

17

2

100

20

120

P value

<.001

.99

.001

Diabetes Mellitus

Reviewer 1

37

1

15

3

52

4

56

Reviewer 2

34

2

40

7

74

9

83

P value

<.001

<.001

<.001

Osteoarthritis

Reviewer 1

33

8

9

4

42

12

54

Reviewer 2

33

8

19

13

52

21

73

P value

.001

.003

<.001

Coronary Artery Disease

Reviewer 1

17

7

24

4

41

11

52

Reviewer 2

19

9

8

5

27

14

41

P value

.029

.388

.012

Chronic Obstructive Pulmonary Disease

Reviewer 1

36

16

8

3

44

19

63

Reviewer 2

63

10

24

3

87

13

100

P value

<.001

<.001

<.001

Hyperlipidemia

Reviewer 1

17

0

11

0

28

0

28

Reviewer 2

19

4

4

2

23

6

29

P value

<.001

.375

.001

Concussion

Reviewer 1

40

24

22

26

62

50

112

Reviewer 2

26

8

21

3

47

11

58

P value

.888

.56

.839

Hypertension

Reviewer 1

27

13

29

11

56

24

80

Reviewer 2

62

12

7

0

69

11

80

P value

<.001

.481

<.001

Major Depressive Disorder

Reviewer 1

36

9

20

7

56

16

72

Reviewer 2

48

31

45

48

93

79

172

P value

<.001

<.001

<.001

Back Pain

Reviewer 1

34

2

36

8

70

12

82

Reviewer 2

29

2

13

2

42

4

46

P value

<.001

.383

<.001

a Concordance or discordance found between each blinded reviewer for assertions that he or she found to be similar, dissimilar, or both. P values were calculated using the McNemar test for concordance and represent the ratings of 2 researchers who did not participate in the original review process and who tallied the assertions that were found by all blinded reviewers. The terms assertions, similar, dissimilar, concordance, and discordance are defined in Table 2.