Essay # 4:

Errors in measurements and comparisons

This essay will review some of the difficulties that epidemiologists encounter when trying to gather good quality data that is free of errors. There are many ways in which errors can cause discrepancies between what the epidemiologist would ideally like to observe, and what is actually observed in practice.[Sackett, 1979] Errors may occur when the quality of the data might differ between two comparison groups of people within a study. Errors can also occur if inadequate or incorrect methods of analysis are chosen to examine the data.

Errors in observing the data

We first consider the question of errors made at the stage of observing the data. These are sometimes referred to as measurement errors. Such errors often occur in epidemiology for the simple reason that many risk factors are inherently difficult to observe. [The circumcision debate: [Wynder, 1956, Lilienfeld, 1958]] For example, if an epidemiologist is doing a study on health effects of the diet, we may require participants to remember what they have eaten for some time past, and how often. For most of us, it can be quite difficult to remember and then accurately report what we had for breakfast last week, so there will almost inevitably be some measurement errors in dietary data that is collected in this way.

Even when written documents are consulted to obtain the data, errors can still occur. For instance, medical records may fail to include any indication that a patient had suffered from headaches, because the physician may have considered them sufficiently unimportant and therefore not worthy of writing down.

Sometimes errors occur purely at random and not systematically in one direction or the other. For instance, in a study about the risk of myocardial infarction, laboratory test measurements of blood cholesterol levels will involve a degree of measurement error because of inadequacies of the methods involved or because of lack of precision of the measuring instruments. However, we might imagine that this type of error simply adds noise to the true cholesterol value so that the true “signal” is somewhat obscured. In this scenario, the errors are not linked to the outcome that one wants to study - a lab measurement error in whatever direction will not lead to a higher risk of myocardial infarction. The random measurement “noise” is rather like interference when listening to the weak signal from a distant radio station, making it difficult to hear the program clearly.

In other situations, the measurements may be systematically biased in certain directions, and not occur completely at random.[Copeland et al., 1977] For example, in a study of breast cancer, a women with cancer might be reluctant to admit to an interviewer that she has a history of abortion because they know that there are (false) rumours that abortion might lead to breast cancer; the rate of abortion observed in the epidemiology study sample among women with breast cancer would then be underestimated. Systematic errors of this kind will often occur for sensitive questions like abortion, and also for similar areas such as income level or other aspects of medical history that people are reluctant to discuss, in particular if they feel guilty that their behaviour has caused their disease. This leads to biased information in a study, and is called ‘information bias’.[Copeland et al., 1977]

However, even if study participants do not regard a question as sensitive or invasive, systematic errors can still occur. For example, many people inaccurately report the number of times they have seen their doctor over the past year or the number of times they have been in hospital. This recall might be different between people who are acutely ill, and suddenly remember all instances when they saw a doctor, vs. people who have no worries and may tend to forget some doctor visit for a trifle.[Lane-Claypon, 1926]-, and to cleft palate study [Gold et al., 1979] This is not a deliberate reluctance to provide the correct answer, but happens because memories are selective and shaped by recent events. Finally, there are physiologic examples of systematic errors, such as when blood pressure is taken. It is well known that even healthy individuals become rather tense when they are undergoing a physical examination, and so the first measurement of blood pressure tends to be an over-estimate of the blood pressure of that individual under normal circumstances.

As well as being a problem in observing risk factors, errors can also occur when observing study outcomes. For instance, if death certificates are used to establish the cause of death of individuals in an epidemiology study, the resulting data may not correctly identify the true underlying cause of death. So for instance, some elderly people may have death certificates indicating their cause of death to be pneumonia, whereas in fact they had been suffering for a considerable time with degenerative chronic diseases that were the true underlying causes of death. In some instances therefore researchers may be tempted to collect more detailed information about the cause of death. However, if they pursue the causes of death more in detail in one group that they are most interested in, and not as much in detail in the comparison group, this again becomes a cause of error. Such error might be called ‘assessment bias.’

The noise in the data arising from random errors makes it more difficult for the epidemiologists to identify associations between risk factors and health outcomes, or even to estimate average values of health related quantities in the population. On the other hand, systematic errors lead to bias and tend to distort the results of epidemiology studies; they lead to erroneous conclusions.

Errors in data analysis

Errors can also occur at the data analysis stage of a study, and epidemiologists must be careful to avoid them. A good example is when one is evaluating a cancer screening program. Here cancer cases are detected by screening earlier than they would become noticed by the patient when clinical symptoms first appear. At first glance, it might seem that survival is better for the screen-detected cancers than for the other cases that are not screened. However, we must recognize that screening can provide an earlier date of diagnosis of the cancer even if it makes no difference to the ultimate prognosis.[Zelen, 1982] So the challenge here is to correctly analyse the data, and to separate the bias from this so called lead-time effect from the real survival benefit that has been achieved by the screen.

Errors arising from statistical variation in the data

Even if all of the problems of measurement error and bias in making comparisons are avoided, there is yet another potential problem, that of statistical uncertainty in the results. For almost every issue that is studied by epidemiologists, there will only be a certain probability that individuals exposed to a risk factor will experience the health event of interest, and another probability that individuals not exposed will experience the health event. If, as is often the case, the occurrence of a particular disease is rare, then even quite large study samples may result in only a limited number of disease cases being observed. For example, even if we observe a population sample of 1000 people for five years, we would expect to observe only about three cases of lung cancer, even though it is one of the more common types of cancer to occur.[Peto, 1976]

A recent example of this problem was the review by the World Health Organization about the association between cell phone use and cancers. The WHO reviewed a large volume of data coming from sizable international studies of cell phone users and non-cell phone users, with a variety of types of cancer as the study outcomes. Despite the large sample sizes involved in some of these studies, the cancers involved are relatively rare, and the exposure patterns of study participants are difficult to measure. The net result was that after considerable deliberation, the WHO was only able to conclude that cell phone use was “possibly” related to the risk of cancer, and that even further study will be required in order to provide a definitive answer to the question. This created quite a media upheaval, but the only thing that the researchers wanted to communicate was that although the data were far from firm, a link cannot be ruled out.

When there is only limited information on the health outcomes, there is statistical uncertainty about the strength of its association with possible risk factors. The actual association that is observed may, by chance, be a considerable over- or under-estimate, and the range of plausible values for a summary measure such as the relative risk may be quite wide. Thus, chance fluctuations in the data may mean that we fail to detect any significant association of the risk factor with disease, even though such an association really exists. For this reason, epidemiologists are well aware of the principle that absence of proof is not proof of absence of an effect.

When data are limited, the opposite error can also occur, if investigators conclude that an association exists, but in fact the observed association has occurred simply by chance variation in the data. Of course, the remedy to these difficulties is to gather more data, but then one encounters the practical and financial difficulties of actually doing so.

Conclusion

In designing and executing research studies, epidemiologists must be aware of the many errors in measurements, comparisons and analysis that might affect their results. As we have seen, random errors of measurement tend to obscure the picture in the data, and consequently study results will tend to be imprecise. Systematic errors of measurement may cause bias in the distribution of observed data values, and may also lead to biases when making comparisons between study groups. Other general categories of bias exist, such as selection bias,[Hernan et al, 2004] which can occur if the groups of people being compared differ in some systematic way that affects the study outcome. For instance, in evaluating the risks of diseases or death in an occupational group, it would be inappropriate to compare people working in a certain industry with the population at large; the reason for this is the so-called “healthy worker effect”, which is the fact that individuals who are working in any occupation are generally more healthy than those not working.

Epidemiologists respond to these challenges by trying to design measurement instruments that have good measurement properties, so that they are easily understood and interpretable, and give reliable and accurate answers. They also pay considerable attention to making sure that data analyses involving group comparisons do not suffer from the bias problems we have identified. Finally, they try to ensure that study sample sizes are sufficiently large that the problem of statistical uncertainty in the results is reduced. However, despite these best efforts, it remains true that many factors studied by epidemiologists are inherently hard to measure, and furthermore their associations with health outcomes may be only moderate or weak. This ultimately can lead to continuing uncertainty about the study findings. In such cases the epidemiologist must try to determine what additional evidence would be required to eliminate the uncertainty and arrive at more definitive conclusions.