Measurement of exposures and outcomes

Transcription

1 Chapter 2 Measurement of exposures and outcomes 2.1 Introduction Most epidemiological research involves the study of the relationship of one type of event or characteristic to another. Consider the following questions as examples: * Does alcohol intake increase the risk of lung cancer? Alcohol lung cancer (exposure) (outcome) * Does hepatitis B vaccination protect against liver cancer? Hepatitis B vaccine liver cancer (exposure) (outcome) In these relationships, we assume that one event exposure affects the other outcome. The exposure of interest may be associated with either an increased or a decreased occurrence of disease or other specified health outcome, and may relate to the environment (e.g., air pollution, indoor radon), lifestyle (e.g., smoking habits, diet), or inborn or inherited characteristics (e.g., blood group A, fair skin). The term risk factor is often used to describe an exposure variable. The outcome of a study is a broad term for any defined disease, state of health, health-related event or death. In some studies, there may be multiple outcomes. The exposures and outcomes of interest are specific to study hypotheses and should always be clearly defined before the study starts. The exposure of interest in one study may be the outcome in another. For example, smoking is clearly the exposure of interest in a study that examines whether smokers are more likely to develop lung cancer than non-smokers, but would be the outcome in a study examining the effectiveness of an anti-smoking intervention programme in reducing the frequency of smoking in a certain population. In most instances, it is not sufficient to collect information only on the exposure and outcome of interest. This is because their relationship may be mixed up with the effect of another exposure on the same outcome, the two exposures being correlated. This phenomenon is known as confounding. Consider again the relationship between alcohol intake and lung cancer. 11

2 Chapter 2 * Does alcohol intake increase the risk of lung cancer? Alcohol (exposure) smoking (confounder) lung cancer (outcome) Suppose that a researcher observes that lung cancer occurs more often in people who drink alcohol than in those who do not. It would not be possible to conclude from this observation that exposure to alcohol increases the probability of developing lung cancer, unless the researcher can show that the observed relationship cannot be due to the fact that those who drink alcohol smoke more heavily than non-drinkers. In this example, smoking is acting as a confounder. Confounding can be dealt with when designing studies or when analysing the results provided that the relevant data have been collected. These issues are discussed in detail in Chapters 13 and 14. Thus, most epidemiological studies must collect information on three types of variable: (1) the primary exposure(s) of interest, (2) other exposure(s) that may influence the outcome (potential confounders), and (3) the outcome(s). It is impossible to select appropriate measurements for a particular investigation unless a specific and detailed statement of research objectives has been made. Without such a statement, information on key variables may be inadequate or missing. This chapter discusses different ways of collecting data on exposures and outcomes. 2.2 Types of exposure A wide range of exposures may be of interest in cancer epidemiology. These include genetic traits (e.g., blood group), demographic variables (e.g., sex, age, ethnicity, socioeconomic status), reproductive and sex-related variables, diet and body build, physical activity, smoking and alcohol habits, past medications (e.g., oral contraceptive use), environmental and occupational exposures, and infectious agents. The characteristic of interest, the true exposure, may not be directly measurable, or it may be difficult or impossible to define. Socioeconomic status is an example of such an abstract concept. Epidemiologists commonly measure socioeconomic status using proxy variables such as occupation, income, education, and place of residence. Moreover, socioeconomic status is not per se a cause of disease, but rather an indicator of the level or probability of exposure to some underlying cause, which is often unknown. 12

3 Measurement of exposures and outcomes 2.3. Measurement of exposure Data on the exposures of interest may be obtained through personal interviews (either face-to-face or by telephone), self-administered questionnaires, diaries of behaviour, reference to records, biological measurements and measurements in the environment. If a subject is too young, too ill, or dead, it is also common to obtain data from a proxy respondent, usually a member of their family. The method chosen to collect data depends on many factors: the type of study; the type and detail of data required; availability of existing records collected for other purposes; lack of knowledge or poor recall of the exposure by subjects; sensitivity of the subjects to questioning about the exposure; frequency and level of the exposure, and their variability over time; availability of physical or chemical methods for measuring the exposure in the human body or in the environment; and the costs of the various possible methods. Often, more than one approach is used. Different components of the data often require different collection methods, and using several methods of data collection can help to validate data and to reduce error in measurement (see Section 2.6). The information obtained should include details of the exact nature of the exposure, its amount or dose, and its distribution over time Nature of the exposure The information collected should be as detailed as possible. For instance, it is better to enquire about different forms of tobacco smoking separately (cigarettes, pipes, cigars), rather than to enquire simply about smoking. Questions on types of cigarette may also be asked to obtain information on their tar content. Enquiries should also be made about the route of exposure to the agent (for example, in a study of contraceptives and breast cancer, it is important to distinguish oral contraceptives from other types of contraceptive), as well as about any behaviour that may protect against exposure (for example, in an occupational study, it is important to ask about any behaviour that may have protected the workers from being exposed to hazards, such as use of protective clothing) Dose Exposure is seldom simply present or absent. Most exposures of interest are quantitative variables. Smokers can be classified according to the number of cigarettes smoked daily; industrial exposures by the extent of exposure (often achieved by classifying workers according to the duration of employment and type of job); infections by dose of agent or age at exposure; breast-feeding by duration; and psychological exposures by some arbitrary scale of severity. Thus the simple situation of two groups, one exposed and one unexposed, is rare, and the conclusions of a study are greatly strengthened where there is a trend of increasing disease incidence with increasing exposure an exposure response relationship. 13

4 Chapter 2 Dose may be measured either as the total accumulated dose (cumulative exposure), for example, the total number of packets of cigarettes ever smoked, or as the dose or exposure rate, for example, the number of cigarettes smoked daily. Exposure rate is a measurement of dose per unit time. It is important to realize that although measurements of dose are usually made in the subject s external environment (e.g., levels of environmental pollution), this is not the dose that matters in biological terms. The biologically effective dose is the amount of the external agent or its active metabolite that affects cellular mechanisms in the target organs. The biologically effective dose cannot usually be measured directly, but it may be possible to obtain an estimate, an example being the measurement in humans of DNA adducts with nitrosamines or aflatoxins. Nevertheless, such measurements have their limitations: for instance, they may be useful markers of current or recent, but not of past, exposure (see Section 2.4.4) Time As far as possible, each exposure should be characterized as to when it began, when it ended (if at all), and how it was distributed during the intervening period (was it periodic or continuous? did the dose vary over time?). Similar details should also be obtained for any behaviour that may protect against the exposure. There is thought to be a restricted period, the critical time window, during which the exposure could have caused cancer. Unfortunately, the beginning and end of this critical time window are not known, and its length is likely to vary between individuals. Collecting data on the timing of exposure allows the possible extent of this window to be estimated. Analyses may include examination of the effects of time since first exposure and time since last exposure. Pattern of exposure may also be important. Exposure that occurs periodically in intense bursts may have a different effect from a similar total amount of exposure that occurs continuously at low intensity (e.g., constant versus intermittent exposure to sunlight; chronic exposure to low levels of ionizing radiation versus acute exposure to high levels). 2.4 Sources of exposure data Questionnaires Questionnaires are used to collect exposure data in epidemiological studies by putting the same set of questions to each study participant in a standardized form. Questionnaires can be self-administered or may be administered by an interviewer. The aim of a research questionnaire is to obtain, with minimal error, measurements of the exposure variables of interest for the study. Thus, the questions to be included in a questionnaire should relate directly to the objectives of the study. Some basic principles that should be taken into account when designing a questionnaire are discussed in Appendix 2.1. To 14

5 Measurement of exposures and outcomes ensure that the questions are properly understood and will elicit appropriate answers, questionnaires should be pre-tested on a sample of subjects from the population to be studied. Self-administered questionnaires Self-administered questionnaires are distributed to study subjects who are asked to complete them. They can be delivered and returned either personally or by mail if this is feasible and more convenient. Such questionnaires are particularly appropriate when small amounts of reasonably simple data are required, or for documenting sensitive or socially undesirable behaviour. They are one of the cheapest ways of collecting information, but have the limitation that they can be used only in literate populations. The investigator also has relatively little control on the quality of the data collected. Personal interviews (interviewer-administered questionnaires) Using an interviewer to administer a questionnaire may reduce error by increasing the subjects participation and motivating them to respond adequately. Moreover, an interviewer may probe to obtain more complete data. However, interviewers may also increase error if they influence the subject s responses, either directly or indirectly. As an interview is a conversation between interviewer and respondent, it is essential that a rapport is established right from the start. Interviewers should be selected taking into account the cultural norms of the study population, so that they will be trusted by the study subjects. As a simple example, in some societies, male interviewers will not be allowed to interview women. Cultural characteristics of interviewers may also influence the degree of participation of respondents, and/or the accuracy of the information they give. The respondent must feel that the interviewer understands him or her and that there are no barriers to communication. For collecting large amounts of complex data, face-to-face interviews are clearly best. However, when subjects are widely dispersed and the questionnaire is relatively brief, interviewing by telephone may be a better approach. Of course, this is feasible only where the telephone is widely used, which is not always the case. Even in societies where there is widespread use of telephones, certain groups of people will be excluded from the study either because they do not have a telephone or because they do not like to provide personal information over the telephone. Proxy or surrogate respondents are people who provide information on exposure in place of the study subjects themselves (index subjects). They are used in epidemiology when the index subjects are for any reason unable to provide the data required. Studies involving children normally also rely on proxy respondents. Proxy respondents usually provide less valid information than the index subjects; for instance, they often tend to under-enumerate occupational exposures and to report the index subject as having a job of higher status than is actually the case. Closeness to the 15

6 Chapter 2 study subject is an important determinant of the quality of information obtained; in general, the most accurate information tends to come from spouses and, in the case of children, mothers Diaries Diaries are detailed records of exposure kept by the subject. They are generally open-ended and take the form of a booklet in which the subject records each occurrence of a particular behaviour such as physical exercise, alcohol consumption, dietary intake, sexual activity, use of medication, etc. Diaries are assumed to be highly accurate in measuring current behaviour, because they do not rely on memory. They also allow more detailed information about the exposure to be collected than with a questionnaire. For example, foods can be weighed by the subject before being eaten. The main limitation of diaries is that only current exposures can be measured. In addition, diaries generally demand more of subjects in terms of time and skill than other methods, so compliance may be a problem. Training of subjects in the skills needed to keep an accurate diary can be time-consuming for both subjects and investigators. Thus, diaries are rarely used in countries in which many people are illiterate Records Data on the exposure of interest may be available from census, employment, medical (in- and out-patient), cancer registry, birth certification and death certification records. Typically, as the data have already been collected for purposes other than epidemiological research, the researcher has no control over what items were recorded, how questions were phrased, and so on. Records are also often produced by a large number of people with little uniform training. Moreover, the availability and quality of records in many countries tends to be poor. Despite these limitations, the use of records has several advantages over other methods of data collection. Study costs are usually low, and the duration of the study is shorter because some or all of the data have already been collected. Records can also provide near-complete data on a well defined population, and information can be obtained without contacting the subjects or their relatives. Certain data items (for example, intake of medications or occupational exposures) may be recorded more accurately than information obtained in a personal interview; for instance, errors caused by poor recall or lack of knowledge of the exposure are eliminated. Characteristics and limitations of some such routine data-collection systems are discussed in more detail in Section 2.9 and Chapter Biological measurements In principle, the ideal approach to determining exposure involves measurements made directly on the human body or its products. Biological measurements will be more objective, in that they are independent both 16

7 Measurement of exposures and outcomes of the subjects perceptions and, where instrumental or laboratory methods are used, of the researcher. Biological measurements may also reflect more closely the biologically effective dose, i.e., the level of exposure that affects cells in the target organ(s). Interest in the epidemiological application of measurements of exposure in the human body has recently been growing, with the development of increasingly refined laboratory techniques for measuring active metabolites of carcinogens and the products of their interaction with DNA or proteins (adducts). The term molecular epidemiology has been coined to describe epidemiological approaches that incorporate a laboratory component. An example of the successful application of molecular epidemiology is the measurement of aflatoxin in the human diet. Aflatoxin is produced by the mould Aspergillus flavus, which grows on stored foods such as groundnuts in tropical climates, in particular in eastern Asia and sub-saharan Africa. Although experiments have shown that aflatoxin is a potent inducer of liver cancer in laboratory animals, most epidemiological research has been hampered by the difficulty of measuring the amount of aflatoxin consumed by humans. Recently, biological markers for estimating current or recent aflatoxin consumption have been established, involving measurement of metabolites of aflatoxin and DNA adducts in the urine. Such measurements were made in a study undertaken in Shanghai (Qian et al., 1994), in which the incidence of liver cancer in approximately Chinese men was related to urinary measurements of their exposure to aflatoxin. Results from this study have provided the most direct evidence that aflatoxin has an etiological role in human hepatocellular carcinogenesis. These biological markers are, however, not ideal, as they cannot measure past exposure, which may be crucial in studying the role of aflatoxin in liver cancer. Laboratory assays have also been developed to ascertain exposure to infectious agents such as human papillomavirus (HPV) (Muñoz et al., 1992b) and Helicobacter pylori (IARC, 1994a). These assays have helped to clarify the role of HPV infection in the etiology of cervical cancer, and that of H. pylori in stomach cancer. The possibility of using laboratory measurements in an epidemiological study is determined mainly by the availability of a suitable test, its feasibility (e.g., availability of laboratory equipment) and the cost. Moreover, most laboratory measurements are limited in that they can assess only current exposures, while past exposure is generally more relevant in cancer epidemiology. Thus, laboratory measurements are particularly useful when they assess attributes that remain stable, for example, genetic traits. One way in which this limitation can be overcome is to use banks of biological specimens. Biological samples collected some time before the study subjects develop the outcome of interest can be analysed with the latest laboratory techniques. For instance, blood and urine samples may be collected from all individuals in a particular cohort at the time they enter the 17

8 Chapter 2 study and an aliquot stored frozen. These samples can be re-analysed later when more sophisticated techniques become available Measurements in the environment Measurements in the environment include those of agents in the air (e.g., air pollutants, dust), water (e.g., fluoride), soil (e.g., elements), foods (e.g., nutrient composition), etc. The samples may come from homes, workplaces, recreational sites, or the ambient environment in general. Such measurements are particularly useful when the subjects are unaware of the exposure (e.g., indoor radiation levels) or cannot recall it accurately. The value of environmental measurements depends on the procedures used both for sampling and for analysis. Ideally, environmental agents should be assessed for each study subject throughout the etiologically relevant period, so as to reflect as accurately as possible personal attributes. For example, individual measurements of exposure to ionizing radiation can be made by each study subject wearing a film-badge throughout the study period and individual nutrient intake can be measured by analysing identical portions of all foods and beverages consumed by a subject during the study period. However, this approach is generally not feasible because of time and cost constraints, technical concerns and lack of subject compliance. Usually it is only possible to make measurements in a sample of study subjects at certain defined time points. The choice of the sample and the timing of the measurements is obviously of crucial importance to the validity of the measurements. One limitation of environmental measurements is that they usually reflect only current exposure levels. In certain situations, it may be reasonable to assume that measurements made in the present environment are highly correlated with the exposure levels at etiologically relevant periods in the past. Records of previous exposure measurements may be available, but should be used with caution: such measurements were usually made for other purposes using methods that may now be considered inadequate. When no such measurements are available, proxy measures of past exposures may be used. For example, in a study of occupational exposures, information on type of job, year of employment and duration of employment may be used to classify workers according to exposure status. This information may be extracted from employment records or obtained through questionnaires. 2.5 Measurement of outcome As for measurements of exposure, data on the outcome(s) of interest may be obtained from various sources. Regular questionnaires or telephone calls may be used to ascertain subjects health status. Periodic personal interviews with clinical check-ups may be arranged, which may include biological measurements and any other appropriate diagnostic procedures (e.g., radiography, endoscopy, ultrasound, etc.). Alternatively, 18

9 Measurement of exposures and outcomes information on the outcomes, and in particular on the occurrence of cancer, may be obtained from records, such as hospital records, cancer registrations, death certificates or some other specialized surveillance method (see Section 2.9). When records are used, the data available are limited to outcomes that are recorded routinely, their completeness, and the way in which they are coded. Because malignancies develop slowly and are relatively rare, studies of the relationship between suspected carcinogenic exposures and cancer may require the observation of many participants over a long period. One way to avoid this is to use intermediate end-points as cancer surrogates: that is, to use as an outcome a biological event that is believed to lie on the causal pathway between exposure and cancer. Studies that use intermediate end-points are, in principle, quicker, smaller and less expensive than those using malignancy as the outcome. For instance, a study of the relationship between diet and estrogen metabolism could be carried out on several dozen patients, whereas a dietary intervention study with breast cancer as the end-point would require tens of thousands of women with many years of follow-up (Schatzkin et al., 1990). The underlying assumption in these studies is that the observed relationship between exposure (e.g., diet) and the intermediate end-point (e.g., estrogen metabolism) reflects a similar relationship between exposure and the cancer of interest. Clearly, this assumption must be validated before the intermediate end-point can be used as a cancer surrogate (Toniolo et al., 1997). 2.6 Validity and reliability of measurements of exposure and outcome Validity Validity is defined as the extent to which an instrument (for example, a questionnaire or a laboratory test) measures what it is intended to measure. Validity can be determined only if there is a reference procedure or gold standard : a definitive procedure to determine the characteristic being measured. For example, information on birth weight obtained from an interview can be validated against hospital records, and food-frequency questionnaires against food diaries and biological measurements. However, in some circumstances there is no obvious reference procedure and the best available method must be taken as the standard. Consider the simple example of a test that can give only a positive or negative (i.e., binary) result. When the same subjects have been examined by both the study test and the gold standard, the findings can be expressed in a 2 2 table, as in Table 2.1. The sensitivity of the study test is the proportion of individuals classified as positives by the gold standard who are correctly identified by the study test: Sensitivity = a/(a+c) Gold standard Positive Negative Study Positive a b test Negative c d a, true positives; b, false positives; c, false negatives; d, true negatives. Table 2.1. General layout of a 2 x 2 table to assess the validity of a test that can give only a binary result. 19

10 Chapter 2 The specificity of the study test is the proportion of individuals classified as negatives by the gold standard who are correctly identified by the study test: Specificity = d/(b+d) The predictive value of a positive study test result represents the probability that someone with a positive study test result really has the characteristic of interest as determined by the gold standard: Predictive value of a positive study test result = a/(a+b) The predictive value of a negative study test result represents the probability that someone with a negative study test result does not have the characteristic of interest as determined by the gold standard: Predictive value of a negative study test result = d/(c+d) Example 2.1. A variety of laboratory methods have been developed for detecting human papillomavirus (HPV) infection of the cervix uteri. In a study conducted some years ago, the performance of a new commercially available dot-filter hybridization test (ViraPap ) was assessed by comparing its results with those obtained using a gold standard test in a sample of 450 women who attended a clinic for sexually transmitted diseases in Washington state, USA during (Kiviat et al., 1990). The Southern hybridization test, which is expensive and time-consuming, was taken as the gold standard in this study. The results are shown in Table 2.2. Table 2.2. Comparison of ViraPap and Southern hybridization methods in the diagnosis of cervical HPV infection in a sample of women who attended a sexually transmitted disease clinic. a Southern hybridization (gold standard test) Positive Negative Total ViraPap Positive (new test) Negative Total a Modified from Kiviat et al., 1990 These data yield the following for the ViraPap test: Sensitivity = 62/69 = 90% Specificity = 359/381 = 94% Predictive value of a positive ViraPap test = 62/84 = 74% Predictive value of a negative ViraPap test = 359/366 = 98% 20

11 Measurement of exposures and outcomes An ideal test has high sensitivity (correctly identifies a high proportion of truly exposed or diseased individuals) and high specificity (gives few positive results in unexposed or non-diseased individuals). In Example 2.1, the ViraPap test had both high sensitivity and high specificity, indicating that the test was highly valid in the detection of cervical HPV infection (as compared to the Southern hybridization test) and therefore that its results would be little affected by measurement error. While the predictive value of a study test result strongly depends upon the frequency of the disease (or other characteristic of interest) in the population, sensitivity and specificity are essentially unaffected. When the disease frequency changes, the numbers of diseased people as determined by the gold standard (left-hand column) change in proportion to the numbers of non-diseased people (right-hand column). Unlike sensitivity and specificity, the predictive value of a study test result depends on the numbers in both columns, and will change if the frequency of the disease changes. Example 2.2. Suppose that the same ViraPap test was used in a sample of 450 apparently healthy women who visited their general practitioners for a regular check-up. The results are given in Table 2.3. Southern hybridization (gold standard test) Positive Negative Total ViraPap Positive (new test) Negative Total Table 2.3. Comparison of ViraPap and Southern hybridization methods in the detection of cervical HPV infection among apparently healthy women: hypothetical data. These data yield the following for the ViraPap test: Sensitivity = 21/23 = 91% Specificity = 401/427 = 94% Predictive value of a positive ViraPap test = 21/47 = 45% Predictive value of a negative ViraPap test = 401/403 = 100% In Example 2.2, the predictive value of a positive ViraPap test is markedly decreased (from 74% to 45%). This is because the proportion of HPV-infected women (as determined by the gold standard) was much higher (69/450 = 15%) in the sample of women who attended the clinic for sexually transmitted disease ( Table 2.2) than among the group of apparently healthy women (23/450 = 5%). Thus, diagnostic tests which are useful in clinical medicine may perform poorly in epidemiological surveys or in population screening programmes. In clinical medicine, diagnostic tests are applied to patients in populations already selected as having a high occurrence of the condition. In this situation, the test may have 21

12 Chapter 2 high predictive value. In an epidemiological survey of an unselected population, the same test may have poor predictive value because the frequency of the condition is much lower. For example, mammography has high predictive value as a test for breast cancer in women who consult doctors because of a lump in the breast, but low predictive value when used to screen apparently healthy women in the population. These issues are discussed further in Chapter 16. The selection of a gold standard is a crucial aspect of evaluating the validity of any measurement. Unfortunately, in many cases there is no appropriate gold standard, and the investigator has to rely on the best available method. For instance, for many years, Southern hybridization was regarded as the gold standard method for detecting cervical HPV infection. However, with the development in recent years of polymerase chain reaction (PCR) to amplify HPV-specific DNA sequences, these newer methods have become the accepted gold standard. Example 2.3. The performance of the ViraPap test was compared with that of the polymerase chain reaction (PCR) in newly diagnosed cervical cancer patients. Results are shown in Table 2.4. Table 2.4. Comparison of ViraPap and polymerase chain reaction (PCR) in the detection of cervical HPV infection. a PCR (gold standard test) Positive Negative Total ViraPap Positive test Negative Total a From Muñoz et al. (unpublished) These data yield the following for the ViraPap test: Sensitivity = 163/283 = 58% Specificity = 79 / 90 = 88% In Example 2.3 the validity of the ViraPap test (as measured by its sensitivity and specificity) was much lower than when Southern hybridization was used as the gold standard method ( Example 2.1). This is because the PCR method is more sensitive and more specific than the Southern hybridization technique. Not all tests give a simple yes/no result. Some yield results that are numerical values along a continuous scale of measurement. In these situations, high sensitivity is obtained at the cost of low specificity and vice versa. For example, the higher the blood pressure, the more probable is hypertensive disease. If a diagnostic or screening test for hypertension is 22

13 Measurement of exposures and outcomes set at a diastolic pressure of 90 mmhg, most hypertensive patients would be detected (high sensitivity) but many non-diseased subjects (with diastolic blood pressure higher than 90 mmhg) will be wrongly classified as hypertensive (low specificity). If the screening level for hypertensive disease is set at 110 mmhg for diastolic blood pressure, most non-diseased individuals would be excluded (high specificity), but many hypertensive patients (with diastolic blood pressures lower than 110 mmhg) would be missed (low sensitivity). Example 2.4. A new laboratory assay measuring the concentration of a particular enzyme in the blood is developed. To assess its value in the diagnosis of a specific cancer, the new test is applied to 360 hospital patients and the results are compared with those from anatomo-pathological examination. Blood concentrations of the enzyme 40 IU are taken as positive results. The results are shown in Table 2.5. Blood assay Anatomo-pathological examination (gold standard test) Positive Negative Total Positive ( 40 IU) Negative (<40 IU) Total Table 2.5. Comparison of a new laboratory assay with anatomo-pathological examination in the diagnosis of a specific cancer: hypothetical data. The following can be calculated for the new laboratory assay: Sensitivity = 190/190 = 100% Specificity = 90 / 170 = 53% In Example 2.4, other blood concentration values could be taken as cut-off values to define the assay results as positive or negative. Table 2.6 gives the sensitivity and specificity of the blood assay for different cut-off values. The sensitivity of the laboratory assay decreases as the cut-off value increases, whereas the reverse is true for specificity. This is clearly illustrated in Figure 2.1. One way to summarize the validity of a continuous measurement is to plot sensitivity against (1 specificity) for different cut-off values. This curve is called the receiver operating characteristic (ROC) curve. The ROC curve corresponding to the data in Table 2.6 is shown in Figure 2.2. The closer the ROC curve of a particular test is to the top left-hand corner of the box, where both the sensitivity and specificity are maximized, the better the test. A test with a curve that lies on the diagonal is for practical purposes useless, and no better than a complete guess. 23

15 Measurement of exposures and outcomes Reliability Reliability, sometimes also called repeatability or reproducibility, is a measure of the consistency of the performance of a test when used under similar circumstances. To be valid, a measurement must be reliable. However, reliability is not in itself sufficient for validity: in other words, a test may yield the same result consistently, but the result may not be the true (valid) one. Poor reliability of a measurement may be due to variation when a subject is tested on different occasions (biological variation), or to errors in the measurement technique (observer and instrument variation). Checks of the repeatability of measurements of the main exposures and outcomes should usually be included in an epidemiological study. These checks can take various forms. (1) Intra-observer or intra-measurement reliability Intra-observer or intra-measurement reliability can be determined by having the same observer perform the same measurements on the same subjects on two or more separate occasions. For example, data from medical records may be extracted by the same abstractor on two occasions; the same interviewer may re-interview subjects after a time interval; duplicate biological samples may be re-processed by the same laboratory technician. These separate measurements are then compared. The appropriate time interval between measurements varies according to the type of outcome or exposure measurement. If it is too short, subjects and/or observers may recall the previous result; if it is too long, the subject s exposure or outcome status may have changed (of course, this is not a problem when data are extracted from medical records). (2) Inter-observer reliability Inter-observer reliability can be assessed by having the same subjects measured by two or more independent observers. For example, the performance of two or more data abstractors may be compared using information extracted independently from the same medical records, or the performance of two or more interviewers may be compared using independent interviews of the same subjects on two different occasions. Again, the interval used between measurements needs careful consideration. Consider the simple example of a test that can give only a positive or negative (i.e. binary) result. The agreement between pairs of measurements carried out by two independent observers on the same subjects can be presented as a 2 2 table ( Table 2.7). One measure of repeatability is the observed agreement (O) or mean pair agreement index, which can be calculated as: (No. of agreements/total no. of pairs) = (a + d) / N sensitivity specificity specificity Figure 2.2. Receiver operating characteristic (ROC) curve for the data in Table 2.6 and Figure

16 Chapter 2 Table 2.7. General layout of a table to assess reliability between two observers for a binary test. a, b, c and d refer to the numbers of pairs of observations where the observers gave the indicated result. Observer B Positive Negative Row total Observer A Positive a b a+b Negative c d c+d Column total a+c b+d Grand total N This index has the disadvantage that some agreement would be expected even if both observers simply guessed the result. The kappa statistic (κ) is an alternative measure that takes account of the agreement expected solely on the basis of chance. To calculate the kappa statistic, the number of pairs of observations that would be expected on the basis of chance in cells (++) and ( ) must first be calculated. The expected value in any cell is given by: [(Total of relevant row) (Total of relevant column)]/grand total Thus for cell (++), the expected value will equal: [(a+b) (a+c)]/n and for cell ( ), the expected value will equal: [(c+d) (b+d)]/n The expected agreement on the basis of chance (E) can now be calculated as: [Expected value for cell (++) + Expected value for cell ( )] / N The actual agreement beyond chance is therefore: Observed agreement (O) Expected agreement (E) This value is, however, difficult to interpret, as similar results may be obtained for different values of O and E. For instance, the actual agreement beyond chance is equal to 0.20 for values of O = 0.95 and E = 0.75, and for O = 0.75 and E = What we need to know is how much does it represent in relation to the maximum potential agreement beyond chance that could have been achieved. Complete agreement would imply that all the results would have fallen in cells (++) and ( ) and, therefore, (a+d)/n would have equalled 1. Thus, the potential for agreement beyond chance is 1 E 26

17 Measurement of exposures and outcomes The kappa statistic indicates how much the actual agreement beyond chance (O E) represents relative to this potential (1 E). Kappa (κ) = (O E)/(1 E) The kappa statistic can be used in a similar way to measure intraobserver variability. The values of this coefficient may vary from 1.0 to 1.0. A value of 1.0 indicates perfect agreement and a value of zero means agreement is no better than would be expected on the basis of chance alone; a negative value indicates that the level of disagreement is greater than that expected on the basis of chance. While there is no value of kappa that can be regarded universally as indicating good agreement, in practice, a κ value considerably less than 0.5 indicates poor agreement. Landis and Koch (1977) suggested the following guidelines: kappa values 0.40 represent poor-to-fair agreement; , moderate agreement; , substantial agreement; and , almost perfect agreement. Use of the kappa statistic can be extended to situations where the results of the test are classified in more than two categories, as in Example 2.5. The kappa shows substantial agreement between observers A and B. Intra-observer agreement was calculated in a similar way: the kappa statistic equalled In general, intra-observer agreement tends to be better than inter-observer agreement. kappa values should not be presented alone, as they provide a summary measure of agreement without giving any indication where disagreements occurred. The results of a reliability study should therefore always be presented in a table similar to Table 2.8, so that the main areas of agreement and disagreement are apparent. If different importance is given to different types of agreement or disagreement, the kappa statistics may be weighted to take this into account (Landis & Koch, 1977). Methods are also available for assessing the reliability of measurements that provide results on a continuous scale (e.g. blood pressure measurements, blood glucose levels): however, these are beyond the scope of this chapter. A discussion of these methods can be found in Bland & Altman (1986). 2.7 Consequences of measurement error Errors in measurement can lead to individuals being misclassified and to spurious conclusions about the relationship between the exposure and the outcome. The impact of measurement errors on the results of an epidemiological study depends essentially on the nature of any misclassification. Consider the following example. Suppose that to determine whether cigarette smoking is associated with lung cancer, we rely on a questionnaire that asks Have you ever smoked? and Do you have lung cancer?. The questionnaire is administered to men. Assume that the true 27

19 Measurement of exposures and outcomes smoking status in this study population (as determined by a perfect test, having both a sensitivity and a specificity of 100%) is as indicated in Table 2.9. This table shows that lung cancer is more common among people who have smoked (ever smokers) (150 of 2000 = 7.5%) than among those who have never smoked (never smokers) (50 of 8000 = 0.63%). Thus, if a perfect method could be used to measure smoking habits in this example, ever smokers would be found to be 12 times (7.5% / 0.63% = 12) more likely to develop lung cancer than never smokers. Cigarette smoking Ever Never Total Lung cancer Yes = = No = = Table Distribution of a population by smoking and disease status as determined by a test for measuring smoking habits that has a sensitivity of 80% and a specificity of 100%: hypothetical data. Total Suppose now that when the questionnaire is applied, 20% of smokers, regardless of their disease status, answered that they had never smoked (sensitivity=80%), but that all men who have never smoked reported this accurately (specificity=100%). The results that would be obtained with this imperfect questionnaire are shown in Table Using this imperfect questionnaire, the proportion of lung cancers in smokers is 120/1600=7.5%. This is about eight times the proportion in never smokers (80/8400=0.95%). Despite the poor quality of the data on smoking elicited by the questionnaire, the relationship between cigarette smoking and lung cancer, while appearing weaker than it truly is, is still evident. Non-differential misclassification occurs when an exposure or outcome classification is incorrect for equal proportions of subjects in the groups being compared. In other words, the sensitivity and specificity of the exposure (or outcome) measurement are equal for both the diseased and non-diseased (or exposed and unexposed). In these circumstances, the misclassification is random (i.e., all individuals have the same probability of being misclassified). In non-differential misclassification, individuals are wrongly classified, reducing the confidence that can be placed in each particular test result. Although this random misclassification has important implications in clinical medicine, it is of less concern in epidemiology, where groups rather than individuals are the main interest. Herein lies a great strength of epidemiology. In the above example, the association between smoking and lung cancer was weakened because those classifying themselves as never smokers were in fact a mixture of those who had never smoked and those who had. Although this type of misclassification makes it more difficult to reveal an association between the exposure and the outcome of interest, the problem can usually be overcome by increasing the sample 29

20 Chapter 2 size and/or replicating measurements (except, as discussed in Chapter 13, where there is non-differential misclassification of confounding variables). Thus, the epidemiologist can rely on simple, cheap and non-invasive tests which, despite being in general less valid than those used in clinical settings, are more appropriate for studies in the community. This is an important aspect of epidemiological research that clinicians often find difficult to accept. Clinicians focus on individual patients, trying to obtain the most complete and valid information on which to base the most accurate diagnosis possible and the optimal treatment. Being accustomed to using specialized and high-technology procedures, they may find it hard to believe that one could undertake scientific studies based on relatively low-quality data such as those derived from questionnaires or death certificates. Differential misclassification occurs when the sensitivity and/or specificity of the exposure measurement for the diseased group differs from that for the non-diseased group, or when the sensitivity and/or specificity of the outcome measurement for the exposed group differs from that for the unexposed group. In other words, differential misclassification may occur when errors in classification of outcome status are dependent upon exposure status, or vice versa. For example, clinicians may be more likely to diagnose leukaemia in children who live around nuclear power stations than in those living elsewhere, and women with breast cancer may be more likely to remember having taken oral contraceptives in the past than healthy women. In the example already considered, differential misclassification would have occurred if men with lung cancer were likely to report their smoking habits more or less accurately than men without lung cancer; in such circumstances, the resulting data could exaggerate, attenuate, or even reverse the relationship, and make the results misleading. Differential misclassification is a consequence of defects in the design or execution of an epidemiological study. Unfortunately, it cannot be controlled for in the analysis, and its effect cannot be minimized by increasing the sample size. A more detailed discussion of the consequences of errors in the measurement of exposure and outcome in the interpretation of epidemiological studies is given elsewhere in this book; in particular, in Chapter How can misclassification of exposure and outcome be reduced? All procedures used in the measurements should be described in sufficient detail in the study protocol to allow reproduction of the measurements, within the limits of biological and physical variability, by other investigators. The protocol should include not only a description of the method of measurement, but also instructions for its application. All other procedures involved should also be specified. 30

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

Are 50/60 Hz magnetic fields carcinogenic to humans? Michel Plante M.D. Magnetic fields are present everywhere in our environment. They are commonly measured in Tesla (T), the international unit for magnetic

chapter 5. Quality control at the population-based cancer registry All cancer registries should be able to give some objective indication of the quality of the data that they have collected. The methods

Size of a study 15.1 Introduction It is important to ensure at the design stage that the proposed number of subjects to be recruited into any study will be appropriate to answer the main objective(s) of

Overview of study designs In epidemiology, measuring the occurrence of disease or other healthrelated events in a population is only a beginning. Epidemiologists are also interested in assessing whether

Which Design Is Best? Which Design Is Best? In Investigation 2-8: Which Design Is Best? students will become more familiar with the four basic epidemiologic study designs, learn to identify several strengths

Chapter 14 Dealing with confounding in the analysis In the previous chapter we discussed briefly how confounding could be dealt with at both the design stage of a study and during the analysis of the results.

Chapter 9. Quality and quality control R.G. Skeet Herefordshire Health Authority, Victoria House, Hereford HR4 OAN, UK The cancer registry, above all else, is a source of information. Since it may be argued

Cancer Health Cancer is not one disease, but a class of diseases characterized by uncontrolled cell division and the ability of those cells to invade other cells. Only a few decades ago many people thought

DATA COLLECTION TECHNIQUES Kongmany Chaleunvong GFMER - WHO - UNFPA - LAO PDR Training Course in Reproductive Health Research Vientiane, 25 September 2009 1 OBJECTIVES At the end of this session you should

Cohort studies A cohort study is an observational study in which a study population (a cohort) is selected and information is obtained to determine which subjects either have a particular characteristic

Chapter 4 Measures of occurrence of disease and other healthrelated events 4. Introduction Epidemiological research is based on the ability to quantify the occurrence of disease (or any other health-related

A short introduction to epidemiology Chapter 7: Effect Modification Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand Chapter 8 Effect modification Concepts of interaction

Questions about the Monographs 1. What does the IARC Monographs Programme do? The Monographs Programme identifies and evaluates causes of cancer in humans based on the publically available scientific evidence.

Chapter 15 Multiple myeloma Peter Adamson Summary In the UK and in the 199s, multiple myeloma accounted for around 1 in 8 diagnosed cases of cancer and 1 in 7 deaths from cancer. There was relatively little

The Biotechnology Education Company Blood-Based Cancer Diagnostics EDVO-Kit 141 Store entire experiment at room temperature. EXPERIMENT OBJECTIVE: The objective of this experiment is to learn and understand

UMEÅ INTERNATIONAL SCHOOL OF PUBLIC HEALTH Master Programme in Public Health - Programme and Courses Academic year 2015-2016 Public Health and Clinical Medicine Umeå International School of Public Health

American Society of Addiction Medicine Public Policy Statement On Drug Testing as a Component of Addiction Treatment and Monitoring Programs and in other Clinical Settings [Note: ASAM also has a Public

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

ALCOHOL CONSUMPTION 399 2.4.7 Effect modification The combined effects of smoking and alcoholic beverage consumption on the development of cancer of the oesophagus have been examined in several studies

Preview: Post-class quiz 5 - Clinical Trials Question 1 What is meant by "randomization"? (Select the one best answer.) Question 2 A. Selection of subjects at random. B. Randomization is a method of allocating

Preface The purpose of this CDC Hepatitis C Counseling and Testing manual is to provide guidance for hepatitis C counseling and testing of individuals born during 1945 1965. The guide was used in draft

H E A LT H SURVEY Alcohol Consumption 7 Alcohol Consumption N AT I O N A L Introduction Excessive alcohol consumption is associated with inflammation of the pancreas and damage to the an increased risk

Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

Pocket Guide QuantiFERON -TB Gold CDC TB Testing Guidelines and Recent Literature Update Using IGRAs for TB screening in your patients June 2010 A full copy of the US Centers for Disease Control and Prevention

Master of Public Health (MPH) SC 542 1. Objectives This proposed Master of Public Health (MPH) programme aims to provide an in depth knowledge of public health. It is designed for students who intend to

Liver Cancer And Tumours What causes liver cancer? Many factors may play a role in the development of cancer. Because the liver filters blood from all parts of the body, cancer cells from elsewhere can

Consumption of alcoholic beverages; evaluation of cancer hazards Robert A Baan PhD The IARC MONOGRAPHS International Agency for Research on Cancer Lyon, France The IARC Monographs The IARC Monographs are

Cornell University Program on Breast Cancer and Environmental Risk Factors in New York State (BCERF) July 2003 Family History, Inheritance, and Breast Cancer Risk All breast cancer results from multiple

Cancer Cluster Investigation French Limited Superfund Site, Harris County, Texas Time Period: 1995-2011 Prepared by the Texas Department of State Health Services Summary Some residents living in the vicinity

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

Diagnosis and Treatment of Common Oral Lesions Causing Pain John D. McDowell, DDS, MS University of Colorado School of Dentistry Chair, Oral Diagnosis, Medicine and Radiology Director, Oral Medicine and

Fields of Education Last updated August 2011 Monash University is required to report to the Department of Education, Employment and Workplace Relations (DEEWR) the number of higher degree by research (HDR)

Cancer Cancer is one of the most common diseases in the developed world: 1 in 4 deaths are due to cancer 1 in 17 deaths are due to lung cancer Lung cancer is the most common cancer in men Breast cancer

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics Response by the Genetic Interest Group Question 1: Health

APPENDIX 1 NHS Barking and Dagenham Briefing on disease linked to Asbestos in Barking & Dagenham 1. Background 1.1. Asbestos Asbestos is a general name given to several naturally occurring fibrous minerals

Mutations Mutations: 2 general ways to alter DNA Change a single DNA base Or entire sections of DNA can move from one place to another What is a mutation? Any change in the nucleotide sequence of DNA Here

Cancer is a group of more than 100 related diseases. Normally, cells grow and divide to produce more cells to keep the body healthy. Sometimes, this process goes wrong. New cells form when the body doesn

Q&A on methodology on HIV estimates 09 Understanding the latest estimates of the 2008 Report on the global AIDS epidemic Part one: The data 1. What data do UNAIDS and WHO base their HIV prevalence estimates

Proposed PAHO Plan of Action for Cancer Prevention and Control 2008 2015 Prevent what is preventable, cure what is curable, provide palliative care for patients in need, and monitor and manage for results.

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

WHO/MNC/CRA/02.1 Original: English WHO strategy for prevention and control of chronic respiratory diseases World Health Organization Management of Noncommunicable Diseases Department Chronic Respiratory

Study Design for Chemoprevention Cancer Epidemiology, Prevention and Control Workshop Shanghai, March 12, 2008 I. INTRODUCTION Experimental studies are conducted to assess the effect of a treatment using

HIV & AIDS What is HIV? A virus that reduces the effectiveness of your immune system, meaning you are less protected against disease. What does HIV stand for? Human Immunodeficiency Virus Where did HIV

Am I an Alpha-1 Carrier? 1 ALPHA-1 FOUNDATION The Alpha-1 Foundation is committed to finding a cure for Alpha-1 Antitrypsin Deficiency and to improving the lives of people affected by Alpha-1 worldwide.

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

GUIDE TO ASBESTOS LUNG CANCER What Is Asbestos Lung Cancer? Like tobacco smoking, exposure to asbestos can result in the development of lung cancer. Similarly, the risk of developing asbestos induced lung

Snap shot Cross-sectional surveys FETP India Competency to be gained from this lecture Design the concept of a cross-sectional survey Key areas The concept of a survey Planning a survey Analytical cross-sectional

LESSON 3.5 WORKBOOK How do cancer cells evolve? In this unit we have learned how normal cells can be transformed so that they stop behaving as part of a tissue community and become unresponsive to regulation.