چکیده انگلیسی

Abstract
Temporal stability of behaviorally evoked cardiovascular responses is important for theoretical (concept of activation) and practical (risk for cardiovascular diseases) reasons. As in test psychology, reliability of physiological responsivity depends on the degree of data aggregation across several measurements. This paper describes a statistical approach based on intra-class correlations. This approach is suited to define certain stability measures based on variance components representing different levels of data aggregation. An empirical investigation is presented comprised of 58 subjects, three physiological parameters (heart rate, systolic and diastolic blood pressure), two mental tasks, two sequences of the tasks within one session, and 2 days with an interval of 4 weeks between them. In addition to the finding that data aggregation can generally increase stability, the different sources of aggregation (across phases within a task, across tasks, and across task sequences) and their combinations are systematically compared with regard to their contribution to this enhancement. Finally, it will be shown how the approach can be utilized to explain aggregation effects for other psychophysiological research questions such as covariation, consistency, and ambulatory assessment of cardiovascular functioning

مقدمه انگلیسی

Introduction
Behavioral stimuli often evoke responses of the autonomic system, and the magnitude of such responses varies significantly among subjects. It has been hypothesized that exaggerated physiologic responses to psychological challenge may be implicated in the etiology of cardiovascular diseases, including coronary heart disease and essential hypertension (Fredrikson and Matthews, 1990 and Pickering and Gerin, 1990). One requirement that needs to be met in order to use cardiovascular reactivity as a risk factor is a certain degree of temporal stability and trans-situational consistency of the cardiovascular responsiveness. In recent years, the issue of temporal stability has been increasingly investigated. While Manuck et al. (1989) summarized the results of 12 published studies which were performed to test stability of cardiovascular responses, Swain and Suls (1996) presented a meta-analysis of 31 papers with 95 coefficients for heart rate (HR) and 73 coefficients for systolic blood pressure (SBP) and diastolic blood pressure (DBP). Mean stability coefficients in the Swain and Suls study were: HR, r=0.55; SBP, r=0.41; and DBP, r=0.35. The studies incorporated in this meta-analysis differed with regards to several criteria: size and composition of the subject sample, type and duration of the mental tasks, definition of baseline values and change scores, and interval between the replications. Some of these aspects are explicitly addressed by Swain and Suls (1996). One of their results was that the stability of blood pressure reactivity depends on the degree of data aggregation. When the blood pressure assessments comprised only a few readings, stability coefficients were lower than in cases with more readings. A similar result was obtained by Kamarck et al. (1992) and Manuck et al. (1993). They found that the reliability of reactivity assessments can be enhanced by aggregating across individual's responses over several different tasks. From a psychometric point of view, this finding is not surprising. Epstein, 1979, Epstein, 1983 and Epstein, 1986 showed how temporal reliability can become higher when the data are aggregated.
The generalizability theory (Cronbach et al., 1972 and Brennan, 1992) is a very useful tool to describe how single measurements can generalize across several conditions. The statistical basis of this theory is an ANOVA model with random factors. For simple (single-facet) designs the generalizability coefficient corresponds with the intra-class correlation. One possible application of generalizability coefficients is in the assessment of temporal stability, understood as generalizability over time. Llabre et al. (1993) use generalizability coefficients for this purpose as an alternative to the usually adopted Pearson correlations. In a multi-facet design it is possible to assess the generalizability across multiple sources of variation. For example, Llabre et al. (1988) demonstrate the generalizability of cardiovascular responses across replications, across settings (laboratory, home and work), and across instruments. Gerin et al. (1996) also show how the assessment of generalizability or reliability of cardiovascular responses can be qualified when several sources of variance (replications under identical conditions and changes in the setting) are differentiated. One application of this theory concerns the number of measurements that are necessary in order to assess the ‘true’ score accurately. For example, Llabre et al. (1988) conclude from generalizability coefficients that six readings of systolic blood pressure are needed at home or at work to assess systolic blood pressure with a sufficient accuracy.
A problem that occurs sometimes in the application of the generalizability theory is that of negative estimates of variance components. These negative estimates are usually replaced with zero (Llabre et al., 1988, Table 3; Van Doornen et al., 1994, Table 6; Gerin et al., 1996, Table 4; Marwitz and Stemmler, 1998, Table 3a). These estimates indicate that the model assumptions have not been perfectly met and that there are inconsistencies in the estimated variance components.
Based on the technique of covariance partitioning (Stemmler and Fahrenberg, 1989 and Stemmler, 1992), which was developed to specify several aspects of covariation between physiological variables, Hinz et al. (1994) constructed several measures for temporal stability which also take into account the degree of data aggregation. Stability measures which include aggregation across tasks proved to be higher than those without such aggregation. So there is theoretical and empirical evidence that data aggregation may enhance stability coefficients. The aim of this paper is to develop stability measures which consider several different aspects of aggregation (including combinations of these aspects) simultaneously. An empirical study will be presented which was designed to study the impact of aggregation across three dimensions (phases within a task, tasks, and occasions) on temporal stability. Furthermore, a statistical approach will be demonstrated which is suited to clarify the underlying relationships among these measures.
1.1. Measures of stability depending on types of data aggregation
A deeper understanding of the impact of data aggregation can be obtained when the total variability of the data is divided into subcomponents, and when the components which indicate different facets of temporal stability are compared. Measures which are derived from intra-class correlations are suited to clarify these relationships.
Hinz et al. (1994) considered a three-dimensional data set (individuals, situations, and time), where the situations represented various mental tasks. Two inter-individual stability measures were defined: between-individuals stability and within-situations stability. While the former measure was defined for values which are aggregated across situations before calculating stability, the latter one defined stability without such initial averaging.
We now consider an even more complex data set. Reactivity is assumed to be investigated during several mental tasks, during several sequences of these tasks (occasions) within 1 day, and during several days. This results in a four-dimensional data set with the following factors: individuals (IND), tasks (TASK), sequences or occasions (OCC), and days (DAY). The total sum of squares of the data can then be partitioned into 15 components, which represent the main effects and interaction effects of the four factors (cf. Table 3). The fourfold interaction IND×TASK×OCC×DAY is confounded with error variance. The sums of squares are additive, they provide an insight in the relative weight of the components. However, since the degrees of freedom are not taken into account, the sums of squares cannot be directly considered as variance components. Several types of stability can now be defined on the basis of intra-class correlations.
The usual way to express these intra-class correlations (IC-correlations) between variables (VAR) reads as follows (Cronbach et al., 1972 and Shrout and Fleiss, 1979):
Full-size image (<1 K)
In this formula, MS are the mean squares, and the number of variables to be correlated is nv. This formula can be transformed into a term which contains sums of squares (SS) instead of mean squares, which is more comfortable for the following calculations:
Full-size image (<1 K)
The formula is directly applicable to assess stability in the two-dimensional case where the test replications (days) are the variables (one task, one occasion). When only 2 days are considered, Pearson correlations are usually calculated to indicate stability. Though intra-class correlations and Pearson correlations are not exactly identical, they are very similar when there are no great differences in the standard deviations of the two variables (days).
In the four-dimensional design (cf. Table 3), there are several types of temporal stability which consider different degrees of data aggregation. Four such types shall be discussed below.
1.2. Type 1: (I)
The stability of Type 1 is defined by the components IND and IND×DAY according to the formula given above. This type of stability refers to those data which are averaged across all tasks and all occasions. The number of days is denoted by nd. For reasons of simplicity, the four dimensions (IND, TASK, OCC, DAY) are further abbreviated: I, T, O, D.
Full-size image (<1 K)
1.3. Type 2: (I, I×T)
Type 2 stability means the inter-individual stability within the tasks (averaged across occasions). This corresponds with the ‘within-situations stability’ introduced in Hinz et al. (1994), when the situations are assumed to be the tasks. The component IND in the Type 1 formula has to be replaced by the combination of IND and IND×TASK.
Full-size image (<1 K)
1.4. Type 3: (I, I×O)
Type 3 stability refers to the inter-individual stability within the occasions (averaged across tasks). The formula is similar to that of Type 2, only the dimensions tasks and occasions are interchanged.
Full-size image (<1 K)
1.5. Type 4: (I, I×T, I×O, I×T×O)
Type 4 stability means the inter-individual stability within tasks and within occasions (no averaging across tasks or occasions). The formula is more complex:
Full-size image (<1 K)
The formulae given above help to understand the impact of data aggregation on stability. Comparing Type 1 and Type 2 stability measures elucidates the background of averaging across tasks. For this comparison, an alternative expression of Type 2 stability is instructive:
Full-size image (<1 K)
This formula consists of two parts. The left part is identical with Type 1 stability. The right part refers to the stability of the interaction between individuals and tasks:
Full-size image (<1 K)
This was called ‘residual stability’ in Hinz et al. (1994), referring to a three-dimensional design with individuals, situations (tasks), and time (days). In the context of a four-dimensional design including occasions, however, the term ‘residual’ is not appropriate since there are several other components beyond the main components of individuals and situations. Type 2 stability can be considered as a compromise between the stability of Type 1 and the stability of r(I×T). The question of whether averaging across tasks enhances stability depends on the relationship between these two measures:
Full-size image (<1 K)
Full-size image (<1 K)
Full-size image (<1 K)
Analogously, the relationship between Type 1 and Type 3 stability can be considered; the dimensions tasks and occasions have simply to be interchanged. Finally, Type 4 stability can be understood to as a composition of the stability components related to IND, IND×TASK, IND×OCC, and IND×TASK×OCC.

نتیجه گیری انگلیسی

Results
3.1. Task scores and change scores
Table 1 presents task scores and change scores (difference scores) for both tasks, both occasions (test sequences within 1 day), and both days. Mean task scores (averaged across days, occasions, and tasks) were significantly higher than the corresponding mean baseline scores (P<0.1%) for heart rate, systolic and diastolic blood pressure.
Table 1.
Mean values of the physiological parameters.
Var Day Occ STM-task TR-task STM-change TR-change
x̄ S.D. x̄ S.D. x̄ S.D. x̄ S.D.
HR 1 1 86.3 13.7 85.1 12.7 6.5 6.1 5.4 5.6
1 2 81.2 11.5 81.2 10.9 4.6 5.1 4.5 5.2
2 1 85.3 13.2 84.5 12.4 5.3 6.5 4.5 5.1
2 2 82.1 12.2 81.4 10.7 3.7 5.3 3.0 3.8
SBP 1 1 128.1 14.4 130.2 15.5 9.4 7.1 11.5 7.4
1 2 122.4 14.5 124.0 15.3 5.8 6.6 7.3 7.7
2 1 122.1 14.9 123.9 16.0 5.0 7.0 6.8 7.0
2 2 119.0 15.4 120.1 15.4 2.9 6.8 4.0 5.6
DBP 1 1 85.1 10.9 87.1 10.6 4.3 7.4 6.3 7.2
1 2 82.7 10.5 85.2 10.7 2.3 5.7 4.8 6.6
2 1 82.6 10.7 83.4 10.5 3.9 6.0 4.7 6.1
2 2 79.9 10.0 82.5 9.6 1.4 4.8 4.0 5.0
Table options
Task scores and change scores were lower during the second occasion compared with the first one. While heart rate reactivity was approximately constant in both days, blood pressure responses were more pronounced during the first day. The differences between the tasks were small in magnitude; blood pressure reactivity was slightly higher during the tracking task compared with the memory task.
3.2. Calculation of stability with Pearson correlations
Table 2 shows stability coefficients of the physiological responses for different degrees of data aggregation, based on Pearson correlations. The figures in the right columns (phases, tasks, occasions) are the stability coefficients (correlations between Day 1 and Day 2) for scores which were initially averaged across phases, tasks, and occasions (eight scores) for each individual before calculating the correlation. The data shown left from these columns (phases, occasions) were averaged across phases and occasions for each individual and each task. The correlations were performed separately for both tasks, and the figures in Table 2 are the means of these two values. In the third column, the data were averaged across occasions instead of tasks before correlating them. In the second columns (phases) the data were only averaged across phases, and the very left columns contain data which are (mean) stability scores with no data aggregation before the correlations were calculated. There was only one blood pressure reading per phase (2-min section of a task); hence the task score stabilities in this column represent the reproducibility of single blood pressure readings during mental challenge. For systolic blood pressure task scores, the eight values range between 0.75 and 0.83, their mean value (0.79) is given in Table 2.
Table 2.
Temporal stability, depending on different degrees of averaging
Variables Task scores, averaged across Change scores, averaged across
– Phases Phases, Phases, Phases, – Phases Phases, Phases, Phases,
tasks occas. tasks, tasks occas. tasks,
occas. occas.
HR 0.73 0.75 0.76 0.76 0.77 0.46 0.53 0.53 0.61 0.61
SBP 0.79 0.82 0.85 0.85 0.87 0.24 0.28 0.29 0.44 0.47
DBP 0.68 0.74 0.78 0.78 0.81 0.11 0.16 0.16 0.23 0.30
Table options
For task scores and change scores, stability coefficients can be enhanced when the data are averaged before the calculation of their stability. Task scores are characterized by higher stability coefficients than change scores, and the influence of averaging is less pronounced compared with change scores. Especially blood pressure reactivity is unstable unless averaging is performed.
Stability calculations typically reported in the literature refer to single tasks with a duration of approximately 4 min, the corresponding values in Table 2 are given in the second column of the right half (difference scores, averaged across phases).
3.3. Calculation of stability with intra-class correlations
We now demonstrate the application of stability measures (Types 1–4) based on intra-class correlations. The sums of squares (Table 3) give an insight in the relative weight of the components. Since only ratios of the sums of squares are of importance for the intra-class correlations, Table 3 displays the proportions (in %) instead of the absolute values. It can be seen that for task scores the predominant proportion (approx. 80%) is related to the component of the individuals (IND). There is only one further component (IND×DAY) which yields values above 5%. In the change scores analyses, the distribution is somewhat different. The IND component remains the most important one (between 30% and 50%), but four other components are also worth noticing with values which can exceed 10%: IND×TASK, IND×OCC, IND×DAY, and IND×OCC×DAY.
Table 3.
Sums of squares (in %)
Components Heart rate Systolic BP Diastolic BP
Task Change Task Change Task Change
I 80.47 48.68 81.73 32.92 78.41 32.21
T 0.08 0.40 0.28 1.24 0.86 2.45
O 2.43 1.78 2.31 4.72 0.90 1.73
D 0.00 1.09 2.53 6.80 1.87 0.53
I×T 2.25 11.44 1.69 7.49 1.82 5.16
I×O 1.80 9.85 1.99 11.64 2.16 14.56
I×D 10.32 11.70 5.91 11.83 8.19 18.61
T×O 0.02 0.08 0.01 0.05 0.08 0.22
T×D 0.00 0.01 0.00 0.02 0.01 0.04
O×D 0.07 0.00 0.16 0.24 0.01 0.01
I×T×O 0.59 2.99 0.89 3.93 1.40 3.97
I×T×D 0.54 2.77 0.70 3.11 1.95 5.55
I×O×D 0.96 6.80 0.97 12.38 1.63 12.93
T×O×D 0.01 0.06 0.00 0.00 0.02 0.06
I×T×O×D 0.46 2.34 0.82 3.63 0.69 1.95
Table options
Stability calculations are based on relationships between the components in Table 3. For each physiological variable (task scores and change scores), the four measures of stability (Types 1–4) were calculated using the sums of squares. For example, Type 1 stability for heart rate task scores is calculated as follows: r=[80.74−10.32/(2−1)]/(80.74+10.32)=0.77, which corresponds with the value given in Table 2. Types of stability based on intra-class correlation are theoretically related to Pearson correlations with data aggregation ( Table 2) as follows: Type 1: phases, tasks, occasions; Type 2: phases, occasions; Type 3: phases, tasks; Type 4: phases. Pearson correlations within phases (left column in Table 2) have no counterpart in the formulae based on intra-class correlations for the four-dimensional design since the sums of squares did not include the dimension ‘phases’.
The intra-class correlation coefficients proved to be very similar to the corresponding (averaged) Pearson coefficients given in Table 2. For task scores analyses, there is no difference greater than 0.01. For the analyses of change scores, there were four scores with a difference of 0.02 (HR, Type 4: 0.51; SBP, Type 2: 0.46; DBP, Type 2: r=0.21; DBP, Type 4: 0.18) and two scores with a 0.03 difference (DBP, Type 1: 0.27; DBP, Type 3: 0.19). Generally speaking, the two approaches (Pearson correlation and intra-class correlation) yield similar results.