Wolters Kluwer Health may email you for journal alerts and information, but is committed
to maintaining your privacy and will not share your personal information without
your express consent. For more information, please refer to our Privacy Policy.

Abstract

Purpose: Few physical activity (PA) questionnaires were designed to measure the lifestyles and activities of women. We sought to examine the test-retest reliability of a PA questionnaire used in the Women's Health Initiative (WHI) study. Differences in reliability were also explored by important covariates.

Understanding physical activity (PA) and its on impact on health is an important public health challenge (35). Nearly half of the American population does not engage in enough PA to prevent disease or benefit health (9). Compared with men, women participate in less vigorous leisure activity (7,8) and may engage in more sedentary behaviors (8,24). Furthermore, previous studies have indicated that minority women report even less leisure time PA than white women (7,8,25,35). Additional research on PA behaviors in women and minority populations would help guide public health policy and interventions.

Previous research demonstrates that women engage in different types and patterns of PA than men (2,3,34). Women may have a different interpretation or understanding of what PA means to them (2,14,15,21,30,34). Because many PA questionnaires used in epidemiologic research were originally designed for white male populations, they may not accurately measure PA in women (2,21,33). This makes accurate and reliable measurement of PA in women and in minority populations especially challenging. Additionally, few PA questionnaires have detailed measurement properties reported among minority or other special populations of women. Furthermore, the validity and the reliability of PA questionnaires may be impacted by other attributes such as age, length of time between test and retest, or level of PA. These attributes may affect the ability of individuals to remember, comprehend, and answer questions.

One study that has attempted to address issues of PA measurement in women is the Women's Health Initiative (WHI) Observational Study. The WHI is a long-term, multicenter, racially and ethnically diverse national cohort study of 161,808 women. The WHI enables scientists to study relationships between lifestyle, health risk factors, and specific disease outcomes. To date, over 230 articles have been published using WHI data. Several of these articles have explored the associations of PA with major diseases (10,16,20,22,23). To adequately study risk factors, like PA, it is important for researchers to understand the questionnaires' measurement properties. The objective of this article is to examine the test-retest reliability of the WHI PA questionnaire in a random sample of the WHI participants overall and by race/ethnicity, age, time between test and retest, and level of PA.

METHODS

Between 1994 and 1998, over 93,676 women between 50 and 79 yr were enrolled at one of 40 clinic centers across the United States into the WHI Observational Study (19). Eligibility for enrollment included the intention to reside in the area for at least 3 yr, free from any major medical condition that would impact survival within 3 yr of study entry, and no reported mental illness, dementia, alcoholism, or drug dependency. Full details on the study cohort and design are available elsewhere (19).

Between October 1996 and June 1997, a subsample of the women enrolled in the WHI Observational Study was selected to participate in the Measurement and Precision Study. Participants (n = 1092) were randomly recruited within the 40 clinic centers and were stratified by age and race/ethnicity (American Indian/Alaskan Native, Asian or Pacific Islander, black or African American, Hispanic/Latino, and white).

The purpose of the Measurement and Precision Study was to assess test-retest reliability of several self-administered questionnaires. Each clinic center was randomly assigned to repeat a set of baseline questionnaires (19). At approximately 12-wk intervals (range = 8-15 wk), half of the women (n = 567) repeated questions on exercise/recreational activities (Form 34) and the other half (n = 512) repeated questions related to household, yard, and sedentary activities (Form 42). The two questionnaires were distributed between the samples to reduce the time burden on the participants. The 12-wk time interval (8-15 wk) was chosen to minimize a "learned" response to the instrument so that participants would not recall their previous answers. Institutional review board approval was obtained by each participating WHI center before data collection, and participants provided their written informed consent.

PA questionnaire

The PA questionnaire was self-administered at enrollment. The questionnaire was intentionally worded without reference to a specific time frame (e.g., last week, last month, last year) to collect "usual" activity or patterns of activity. It was designed to collect different types of activities by grouping them together by intensity. This was done to reduce the burden and the time needed to complete the questionnaire. The questionnaire was divided into two forms to collect information on usual PA. On the first form, participants reported their usual exercise or recreational activity (mild, moderate, strenuous, and walking activities). On the second form, participants were asked about heavy indoor household activities and yard activities. Both forms were completed at the same time, either at the clinics or mailed to the participant, and then returned to the clinic for review.

The questionnaire grouped exercise or recreational activities into three separate intensities (mild, moderate, and strenuous) based a range of MET values associated with the type of activities described. The participants then reported the usual frequency (six categories, from 0 to 5+ d·wk−1) and duration (four categories, from <20 min to ≥60 min) of activities performed at each intensity level. Episodes of walking outside of the home (10 min or more) were reported separately through frequency (six levels, 0-7 d·wk−1), duration (four levels, <20 min to ≥60 min), and usual speed (four levels, 2-5 mph). Questions on household activities were assessed as hours per week (five categories, from <1 to ≥10 h). Yard activities included the number of months per year (five categories, <1 month to ≥10 months) and hours per week (five categories, <1 to ≥10 h) the activities were performed. Participants were also asked to report number of hours spent sitting and lying down, including sleep, each day (eight categories, <4 to ≥16 h). In addition, the women were also asked to recall whether or not they engaged in strenuous activity (yes or no) at 18, 35, and 50 yr. The questionnaire and the scoring protocol can be found in Appendix 1.

The WHI PA measures were designed to be summarized into continuous variables estimating weekly energy expenditure (MET·h·wk−1) from each type of activity (mild, moderate, strenuous, walking, household, and yard). An estimated MET level for each type of activity was assigned from a compendium of activities (1) (Appendix 2), where the MET level is kilocalories per kilogram of body weight expended each hour during a specific activity. Summary variables were created by combining frequency, duration, and MET-estimated intensity in the following equation: [(frequency of activity per week × minutes per session × MET for that activity) / (60 min·h−1)]. These summary variables in "MET-hours" quantify the total kilocalories expended per kilogram per week. MET units are independent of body weight.

Sociodemographic measures

Participants answered questions on several important health behaviors and demographic attributes. Race/ethnicity (American Indian/Alaskan Native, Asian or Pacific Islander, black or African American, Hispanic/Latino, and white), education (10 levels), main occupation (professional/managerial, technical/sales/administrative, service/labor, and homemaker), retirement status, martial status, smoking status, and general health were all self-reported at the first clinic visit. Additionally, height and weight for each individual were measured at this visit and were used to calculate body mass index (BMI; weight in kilograms divided by height in meters squared) and were categorized as underweight (<18.5 kg·m−2), normal weight (18.5 to <25 kg·m−2), overweight (25 to <30 kg·m−2), and obese (≥30 kg·m−2) (37).

Statistical analysis

Two-level kappa and weighted kappa (three to eight levels) statistics were used to assess the test-retest reliability of each individual question or corresponding component (e.g., frequency, duration). Weighting for the kappa statistics was applied using the default in SAS, the Cicchetti-Allison form, taking into account the degree of nonagreement between the test and the retest. Agreement between the test and the retest was categorized into five categories: poor (0 to <0.2), fair (0.2 to <0.4), moderate (0.4 to <0.6), substantial (0.6 to <0.8), and almost perfect (0.8-1.0) (18). Test-retest reliability of the continuous variables was assessed with the Shrout and Fleiss (29) intraclass correlations coefficient (ICC1,1). This ICC1,1 uses test and retest measures to estimate single trial reliability instead of the average of repeated measures. More specifically, we calculated the ICC1,1 and the 95% confidence intervals (CI) using a one-way ANOVA model (29,31,32) and then assessed the proportion of the total variance (true variability and measurement error) that was attributable to participant variability.

Stratified analyses were performed by race/ethnicity, time between test and retest (≤3 months vs >3 months), age (50 to ≤65 yr, >65 to 79 yr), and level of recreational activity (one or more episodes vs none). Lastly, because the participants were not randomized to the type of activity form (exercise/recreation form vs household/yard form), differences between the two samples were also examined.

RESULTS

Study sample

The majority of the sample (n = 1092) reported good, very good, or excellent health (90%), and the average age was 64 yr old (Table 1). The population was predominantly white (66%) followed by Hispanic (14%), African American (13%), and Asian/Pacific Islander (7%). Only 1% of the women identified themselves as American Indian/Alaskan Natives. These women were excluded from the racially stratified analysis only due to inadequate sample size. Most women had completed high school (93%) and reported an occupation (current or former) other than being a homemaker (90%). More than half of them (55%) were retired. Approximately half of the sample (51%) reported never smoking and more than half (57%) were overweight or obese. The majority of the women were married, whereas one third were either widowed, separated, or divorced.

Although participants were randomly chosen from within each center, each center was assigned to only one of the two PA forms (exercise/recreational activity vs yard/household). Several differences in the populations were found between the two forms. Differences of 5% or more were observed between the two samples for the following variables: race/ethnicity, education, and BMI. A greater proportion of the participants who answered the questionnaire on exercise/recreation activities were normal weight (43% vs 36%), white (69% vs 63%), and college graduates (40% vs 34%) compared with the sample that answered the questions on household/yard activities. Differences were not observed between general health, occupational status, marital status, and smoking.

At baseline, 73% of the women were not strenuously active, and more than half had not participated in regular strenuous activity in their earlier adulthood (aged 18, 35, and 50 yr; data not shown). At least 80% of the women reported some walking. However, when all exercise was combined, fewer than half of the women reported fewer than 10 MET·h·wk−1 (median = 9.0 MET·h·wk−1; SD = 14.3). Whites and Asian/Pacific Islanders had higher median levels of total recreational activity than Hispanic and African Americans (9.8, 8.7, 7.5, and 7.5 MET·h·wk−1, respectively). A similar pattern was observed for strenuous recreational activity and moderate to strenuous recreational activity by race/ethnicity (data not shown). More women reported at least one episode of moderate recreational activity (e.g., easy swimming, biking, or dancing) than mild recreational activity (e.g., bowling or golf; Table 2).

Test-retest reliability

Within the entire sample, substantial test-retest reliability was demonstrated in most summary measures, with the exception of mild recreational activity, which had lower reliability (Table 3). The continuous estimate of total PA (ICC1,1) was 0.76 (95% CI = 0.71-0.79), and the categorical estimate of total PA (weighted kappa) was 0.61 (95% CI = 0.56-0.66; Tables 3 and 4).

Reliability was similar when the sample was reduced to only those women who reported at least one episode of exercise or recreational activity (Table 3). Stratifying by race/ethnicity resulted in a loss of precision, but the associations were similar (Table 5). The exception was mild recreational activity that consistently demonstrated the lowest reliability, especially in nonwhite participants. When stratified by age, women who were ≤65 yr demonstrated higher reliability than women >65 yr (Table 6). However, the magnitude of these differences was small, as the measures in both strata remained similar to the reliability of the entire sample. Additionally, the population of women who repeated the tests within 3 months also tended to have slightly higher reliability compared with women for whom more than 3 months had passed at retest (Table 6).

In general, the reliability of the individual questions on the components of frequency and duration of exercise (strenuous, moderate, mild, and walking) was between 0.36 and 0.62 for the entire sample (Table 4). Better reliability was observed for the strenuous and walking components than moderate or mild components. The reliability estimates of hours spent sitting and lying down as well as yard and indoor household activities ranged from 0.60 to 0.71 for the entire sample (Table 3).

History of strenuous activity at the ages of 18, 35, and 50 yr, measured by kappa statistics, ranged between 0.53 and 0.55 overall (Table 4). Similar to the summary measures, reliability was not meaningfully influenced by restricting the analysis to only women who reported at least one episode of exercise or recreational activity. When stratified by the other relevant covariates (age, race/ethnicity, and time between tests), the reliability of moderate, strenuous, and walking PA were all fair to moderate.

DISCUSSION

The WHI PA questionnaire demonstrated moderate to substantial test-retest reliability in a racially diverse sample of postmenopausal women. The reliability estimates observed in this sample are similar to reliability measures from other self-reported questionnaires designed for women (6) and for older adults (36). Additionally, the PA in this population generally paralleled activity patterns observed in the US population of adults (7,8,35).

The most consistent difference in the test-retest reliability estimates appeared to be lower reliability in the mild exercise or activity measures. Although it is possible that the lower reliability observed in the mild intensity questions may be an artifact of reduced precision, it is consistent with other research (27,36). Activities of mild intensity are less memorable and less likely to be recalled and are consequently less well captured by self-report questionnaires. Another potential explanation for the weaker performance of the mild activity measures may be a result of the questionnaire design. Mild walking, a popular recreational activity in this population, was assessed separately from other mild-intensity activities and showed higher reliability than mild activity. Therefore, if walking had been included in the mild activity measure, instead of assessed separately, mild activity might have shown higher reliability.

Differences in test-retest reliability were not observed when reducing the sample to only women who reported at least one episode of any exercise or recreational activity. Interestingly, there were also no meaningful differences in reliability observed across race/ethnic groups. Previous studies have been mixed in their reporting of differences in reliability by race/ethnicity (5,12,28). However, it is also important to consider the wide CI in the race/ethnicity estimates because stratifying the data resulted in a loss of precision.

Although we did not observe differences in reliability between the different race/ethnic groups or by level of activity, some patterns were observed by age and length of time between test and retest. Women who were 65 yr or younger demonstrated better test-retest reliability than women who were older. Variability of PA in older women may be influenced by many factors, such as changing health status (e.g., fatigue, injury, disease progression), retirement, or loss of a spouse (4,11,13). Any of these changes within the study period could impact questionnaire reliability as women's activity patterns are affected. Additionally, aging is associated with cognitive decline that can impact memory and could in turn affect reliability (26).

Not surprisingly, a slightly higher pattern was observed in some measures among the sample of women who repeated tests within a three-month period compared with women who experienced more than 3 months between the tests. One explanation could be because tests repeated within a shorter time frame are more likely to be given in the same season or comparable time of year with regards to weather. Furthermore, a change in activity (either increase or decrease) could have occurred after the administration of the first questionnaire, such that the reliability estimates would be lower.

Although reliability could be explored with these data, validation of the WHI PA questionnaire could not be assessed. However, the questionnaire's validity was recently explored among 74 women enrolled in the Women's Healthy Eating and Living Study (17). In this convenience sample of women, the WHI PA questionnaire was correlated with both the accelerometer (Actigraph 7164) and the 7-d PA recall (r = 0.73 and 0.88, respectively). Although the WHI questionnaire had 100% sensitivity for identifying women who met the PA guidelines, the specificity was only 60%. The questionnaire tended to underestimate moderate activities and overestimate vigorous activities.

Study limitations

Despite the diverse and large sample, this study had several limitations. The WHI sample was not population based and may not be representative of a specific source population. White women comprised a larger sample than other racial/ethnic groups. Due to the small sample sizes representing Hispanic, African American, and Asian/Pacific Islander women, the bounds of the lower CI were estimated below zero in several of the stratified analyses. Additionally, the level of education in our sample was very high, and we were unable to examine variation in test-retest reliability by education. Another limitation to this study was that participants were not randomized to the two forms, and some differences were observed between the two groups.

Several other considerations should be made when using the questionnaire. Although the WHI PA assessment included a measure of yard and household activity, it was not a comprehensive measure of women's potential activities. Several domains of activity such as nonmotorized transportation (active travel), child or elder care activity, and work or occupational PA were not included in the WHI PA questionnaire.

CONCLUSIONS

Reliable and valid questionnaires are a cost-effective and useful method for collecting PA information in large cohort studies, such as in the WHI Observational Study. However, measurement of PA is challenging because many questionnaires do not collect detailed information on types of activities and use terminology many women do not identify with (2,21,33,34). The WHI PA questionnaire is one of the first questionnaires to examine different types of PA in a large, multiethnic sample of women. This analysis shows that the different domains of PA behavior, such as recreational, yard, and household activity, can be reliably estimated in an ethnically diverse sample of postmenopausal women.

The WHI study was funded by the National Institutes of Health (NIH)/National Heart, Lung, and Blood Institute (NHLBI). This work was supported in part by the NIH/NHLBI #5-T32-HL007055. The results of the present study do not constitute endorsement by the NIH or the ACSM.

The authors are also indebted to Dr. David Couper, Dr. Gerardo Heiss, Dr. Steve Marshall, and Dr. June Stevens for their valuable feedback on the analysis and manuscript.

Enter and submit the email address you registered with. An email with instructions to reset your password will be sent to that address.

Email:

Password Sent

Link to reset your password has been sent to specified email address.

Remember me

What does "Remember me" mean?
By checking this box, you'll stay logged in until you logout. You'll get easier access to your articles, collections,
media, and all your other content, even if you close your browser or shut down your
computer.

To protect your most sensitive data and activities (like changing your password),
we'll ask you to re-enter your password when you access these services.

What if I'm on a computer that I share with others?
If you're using a public computer or you share this computer with others, we recommend
that you uncheck the "Remember me" box.