Comments 0

Document transcript

Worksheet on using SPSS to analyse and compare cross-sectional developmentaltrajectories

The following

worksheet steps through the use of SPSS to characterise cross-sectionaldevelopmental trajectories. In particular, we focus on comparing typically developingtrajectories with those derived from a group of individuals with a developmentaldisorder. This worksheet accompanies the submitted paper: Thomas, M. S. C., Annaz,D., Ansari, D., Scerif, G., Jarrold, C., & Karmiloff-Smith, A. (2007).

The use ofdevelopmental trajectories in studying genetic developmental disorders.

In the first section, we begin by

characterising a single developmental trajectory,including generating and plotting confidence intervals around the trajectory, checkingfor outliers, assessing the linearity of the trajectory, and comparing goodness-of-fit ofdifferent linear and non-linear functions. Readers familiar with linear regressionmethods may wish to skip this section.

Focusing on the use of linear methods, we then introduce abetween-groupscomparison of trajectories

that allows one to evaluate whether developmentaltrajectories generated from different groups differ significantly in terms of theirgradients or intercepts. We contrast comparisons between the groups for trajectoriesplotted according to chronological age versus those plotted according to mental age.

In cases where the typically developing group produces a reliable trajectory but thedisorder group does not, we then offer a new method to distinguish between twodifferent types of null trajectory in the disorder group: azero trajectory, where thereis no improvement with age; andno-systematic-relationship, where the taskperformance is essentially random with respect to the participant’s age.

In the third section, we show how SPSS can be used to carry out

repeated measureslinear regressions

to compare two trajectories generated by a single group based onperformance in two tasks. With these trajectories, we also show how confidenceintervals can be used to demonstrate the age at which the trajectories for the two tasksreliably converge or diverge.

In the fourth

section, we demonstrate the use of SPSS to analyse

mixed designregressions, for example with one between-groups factor and one repeated measure.For example, where the typically developing group is characterised by a divergence indevelopment on two tasks, one might want to assess whether a disorder groupdemonstrates the same pattern of divergence.

Where appropriate, analyses are illustrated with worked examples using sample data.Charts are mostly generated within Excel and statistical results in SPSS12.0 forWindows.

The linear developmental trajectory, predicting task performance based on

thechronological age of the children, is therefore

13.006.CATask

Here is an Excel chart of these data (employing the XY (Scatter) chart function; afterthe chart was created, the Add Trendline function under the Chart menu was used toadd a best-fit linear trend; the Options tab in this dialogue permits display of the R2

Several further pieces of information are important to explore a single trajectory.First, we wish to find out whether any of the data points exerts undue influence on thetrajectory (i.e., constitutes an outlier). For this, we use Cook’s distance (Cook’s D).Second, we wish to assess the reliability of the parameters (the intercept andgradient). Third, we may wish to generate confidence intervals for the trajectory itself(i.e., the region within which the best-fit line falls with 95% confidence). To generatethe additional bits of information, in the Linear Regression dialogue, click on theStatistics button and make sure both Estimates and Confidence intervals are selected.Click on the Save button and select Cook’s under Distances and Mean underPrediction Intervals (95% confidence is offered as the default value; this may bechanged as desired). Then run the regression again.

(useful for identifying potential outliersin the independent variable, here CA) to identify unusually influential observations onthe regression line (see Howell, 2007, p.516-520 for discussion). Cook’s D assesseshow much the residuals of all cases would change if a particular case were to beexcluded from the calculation of the regression coefficients. This is repeated for eachcase so that each data point is assigned a value measuring influence. There is nogeneral rule for what value of Cook’s D definitively indicates that a given point is anoutlier. However, as a rule of thumb, a Cook’s D of over 1.00 suggests that adatapoint exerts undue influence on the regression. In this case, the analysis should be re-run without the data point in question.

results must be reported for the analyses both with and without the identified datapoint(s). Treatment of variability in disorders is an important issue, sincebehaviourally defined developmental disorders frequently exhibit marked variability,while variability is also found in disorders where an independent genetic diagnosis isavailable (see Thomas, 2003, for discussion

and related analytical techniques). For thesample trajectory, no value of Cook’s D exceeds 0.2 and so no value is identified as apotential outlier.

The two additional variables (LMCI_1, UMCI_1) created by the Save function are thelower mean confidence interval and the upper mean confidence interval for thedependent variable Task for each value of the predictor (i.e., for each age). The‘mean’ confidence interval represents the region within which there is 95% chancethat the actual mean (trendline)

sits. Note that SPSS also gives you the option to savethe ‘individual’ confidence intervals. These demarcate the region within which thereis 95% probability that the individual data points sit (these confidence intervals aretypically wider).

Confidence intervals can be added to the Excel chart by adding (CA, LMCI_1) and(CA, LMCI_2) as two additional X-Y scatter series. Click on your original chart toselect it; under the Chart menu, select Chart-Source, select the Data-Series tab; clickon the Add button and select the column of CA values and the column of LMCI_1values as the X values and Y values respectively; repeat for CA and LMCI_2. Topresent these as thin lines as depicted below, select the data series (click on anypoint), right-click to Format

nicely. No idea why. This can be done by selecting the full chart data range, using theData-Sort function, and sorting in ascending mode by the column that contains age).

Excel dialogues:

Lastly, in SPSS, selecting ‘confidence intervals’ under Statistics in the LinearRegression dialogue provides information on the upper and lower bounds of theintercept and gradient of the developmental trajectory. The results are shown in thebelow table. For our sample trajectory, since the upper and lower bounds of theconfidence interval on the intercept span zero (-.076 to .102), this indicates that theintercept is not significantly different from zero (reflected in the non-significant t-testresult on this coefficient). By contrast, the gradient is reliably greater than zero,indicating improvement with age.

The sample data represent a trajectory that is reasonably linear. What if we are notconfident that a simple line gives a best fit to the data? We can, of course, check thatthe residuals (the difference between the actual performance of each individual andthe performance that is predicted by the trajectory given each individual’s age)

arenormally distributed and do not vary systematically across the age range. Theseindicators (along with the R2) would warn if a linear function does not capture thecross-sectional trajectory very well. SPSS, however, also allows us to assess whetheranother non-linear function fits the data better.

SPSS permits multiple functions to be simultaneously fitted to the same set of datapoints using the Analyze-Regression-Curve Estimation Function. Parameterinformation and proportion of variance explainedcan be derived for several functionsat once, linking performance (Y) to age (t) using parameters b0, b1, b2 . . . in thefollowing ways:

(Taken from SPSS 12.0 for Windows help function under ‘Curve Estimation’)



Linear. Model whose equation is Y = b0 +(b1 * t). The series values are modeled as alinear function of time



Logarithmic. Model whose equation is Y = b0 + (b1 * ln(t))



Inverse. Model whose equation is Y = b0 + (b1 / t)



Quadratic. Model whose equation is Y = b0 + (b1 * t) + (b2 * t**2). The quadratic modelcan be used to model a series which "takes off" or a series which dampens

How do we decide which is the best function to fit the data? Here, the heuristic ofparsimony comes in. We want to explain

the most amount of variance using the leastnumber of parameters. For example, although the Quadratic function fits the databetter than the Linear (i.e., has a larger R2), it does so with one more parameter; theCubic fits marginally better than the Quadratic but uses one more parameter again.

There are two statistical tests that can be used to determine which model/function tochoose in these cases. One method, called the ‘extra sum-of-squares’ test, is onlyapplicable fornested models. A nested model is one where one model is a subset ofthe other, that is, the first model is a version of the second model but with one of theparameters set to zero. Thus the Linear model is a version of the Quadratic modelwith thex2

coefficient set to zero, and a version of the Cubic model with thex2

andx3

coefficients set to zero. Nested models will have different degrees of freedom. Theextra sum-of-squares approach derives anF-ratio from the relative increase in thesum-of-squares and the relative increase in the degrees of freedom reflecting thenumber of parameters used (this information is available in the ANOVA table foreach regression fit). These two values are played off against each other, where anincrease in model fit is good and an increase in parameters is bad. For regression fits 1and 2, the equation is

2/212/21DFDFDFSSSSSSF

whereSS

stands for sum-of-squares andDF

for degrees of freedom. ThisF-ratio hasDF1-DF2

degrees of freedom for the numerator andDF2

degrees of freedom for thedenominator (see Motulsky & Christopoulos, 2004, for more details).

let us compare the Linear and Cubic fits to the sample data. The SPSSprintouts provide the sum-of-squares and degrees-of-freedom information (relevantinformation highlighted inblue):

Dependent variable.. task Method.. LINEAR

Listwise Deletion of Missing Data

Multiple R .93793

R Square .87972

Adjusted R Square .87449

Standard Error .07629

Analysis of Variance:

DF Sum of Squares Mean Square

Regression1 .97897124

.97897124

Residuals23 .13385276

.00581969

F = 168.21722 Signif F = .0000

--------------------

Variables in the Equation--------------------

Variable B SE B Beta T Sig T

CA .006060 .000467 .937933 12.970 .0000

(Constant) .013136 .043018 .305 .7628

Dependent variable.. task Method.. CUBIC

Listwise Deletion of Missing Data

Multiple R .93797

R Square .87979

Adjusted R Square .86261

Standard Error .07981

Analysis of Variance:

DF Sum of Squares Mean Square

Regression3 .97904851

.32634950

Residuals21 .13377549

.00637026

F = 51.23016 Signif F = .0000

--------------------

Variables in the Equation--------------------

Variable B SE B Beta T Sig T

CA .007201 .011874 1.114523

.606 .5507

CA**2-1.254574265500E-05 .000141-.351645-.089 .9298

CA**3 4.212775775628E-08 5.1479E-07 .179103 .082 .9356

(Constant)-.017479 .303698-.058 .9546

_

For this model comparison, an F-test produces the following outcome: F(2,3)=.0009,p=.999. This indicates that the greater number of parameters in the Cubic model ismuch more expensive than the slightly greater fit to the data that the functionprovides. Therefore the simpler Linear model is the better model.

degrees of freedom but the difference (DF1-DF2) is now zero. Division by zerogenerally causes things to go badly wrong.

In the case of non-nested models, ahypothesis testing

approach to comparing themodels may be replaced by one drawn frominformation theory. This techniqueemploys Akaike’s Information Criterion (see Motulsky & Christopoulos, 2004, p.143,for further details). In this case, the result of the test does not indicatethe likelihood ofthe more complicated model fitting the data better

by chance. Instead, it computes therelative likelihood of each model being correct.

The Akaike’s Information Criterion (AIC) for each model is

KNSSNAIC2ln

whereN

is the number of data points,K

is the number of parameters fit by theregression plus one, andSS

is the sum of the squares value taken from the regressionequation. Let us compare theAIC

scores for the Linear and Logistic models for thesample data, both of which fit two parameters (‘CA’ and ‘constant’ below).

compute the relative probability that you are correct if you choose one or othermodel).

What do you do if a non-linear function provides a better fit to the data? In thefollowing, we focus on linear methods to compare trajectories. The primarymotivation for this is that linear methods render interaction terms more interpretableand thus allow us to distinguish different types of descriptive delays. It is a practicalrather than a theoretical decision, since there is no requirement that developmentshould occur at a constant rate, and in many cases does not do so (e.g., the rate ofvocabulary acquisition is children is famously non-linear). However, the methodspresented with linear functions are in principle extendible to non-linear regressionmethods, where for example, differences in the intercept parameter can index delaysin onset and other parameters can index differences in rates of non-linear growth (seeMotulsky & Christopoulos, 2004, for a review of non-linear regression methods withbiological data).

Linear methods may be used in cases where the relationship between age andperformance is non-linear by transforming either or both of these dimensions toimprove the linearity of the relationship between them (so long as

the transformationis applied to both typical and disorder group). Alternatively, subsections of the fullnon-linear trajectory may be explored where development appears to be more linear.For example, in cases where there are early floor or late ceilingeffects, the portion ofthe trajectory between floor and ceiling may be more linear. Thus, for an S-shapedcurve, only the central part of the trajectory might be considered with linear methods.The more restricted analysis would allow you to identify differences in the averageage that experimental groups reach ceiling performance on the task. Analyses runover subsets of the experimental group will, of course, compromise statistical power.

We now turn to consider methods to compare the developmental trajectory generatedby a disorder group to the typically developing profile.

contains data for twogroups, the typically developing (TD) group of 25 children with ages spanning from2;9 to 12;5, and a group of 16 children with a developmental disorder withchronological ages ranging from 5;4 to 11;2. Note that all children have also beengiven a standardised test to produce a mental age (i.e., a test age equivalent score).For the TD group, their mental ages range from 3;3 to 12;10, with the average MA 4.7months in advance of CA (that is, in advance of the sample of TD kids on whom thestandardised test was normed. Information about sampling

would be required todecide whether the TD or norming sample is in some sense any more or less‘normal’).

For the disorder group, MAs range from 4;7 to 10;4, with the average MA 18.1months behind CA. In the SPSS file, group is encoded with the Group variable (coded1 for TD, 2 for disorder in our example). CA and MA are coded in months and taskperformance is coded in proportion correct on the experimental task.

Note that, by design, the TD groups age range spans from the youngest mental age ofthe disorder group on any of the standardised tests used to assess this group, to theoldest CA of the disorder group. This is because it is only sensible to comparedevelopmental trajectories for overlapping chronological or mental age ranges.Comparing non-overlapping trajectories necessitates extrapolating a prediction of taskperformance for one or other group outside of the age or ability range over whichperformance has been measured.

Are the individuals with the disorder performing on the task as you would expectgiven their chronological age? One way to answer this question is on a case-by-casebasis. This can be done using the confidence intervals around the TD trajectoryderived in the previous section. For each child with the disorder, we can see whether,when their performance is plotted on the chart according to their chronological age,the data point falls within the 95% confidence intervals around

the TD trajectory.

However, our intention here is to characterise a developmental trajectory for thedisorder group as a whole, given that we have reasonable participant numbers (N=16)across an age range. We therefore need to generate a trajectory for the disorder groupand compare it to the TD trajectory. Are the two trajectories significantly different,and if so, in what way?

SPSS does not include a direct method to compare linear regressions. Instead, weneed to adapt the Analysis of Covariance function within the General Linear Model.

Assuming we have already verified the approximate linearity of the disordertrajectory on its own, we begin by comparing the developmental trajectories for thetwo groups as they are predicted by chronological age. Select Analyze-General LinearModel-Univariate. Add Task to the Dependent Variable box, Group to the FixedFactor(s) box, and CA to the Covariate(s) box. (The Save dialogue may be used togenerate Cook’s D values for the two trajectories if this has not been

Importantly, as it stands, the SPSS ANCOVA function has a default configuration to‘partial out’ differences in the dependent variable due to differences in the covariate.For example, the ANCOVA is frequently used in behavioural studies where onewants to ‘partial out’ differences in IQ and so focus on differences in performancethat solely arise from manipulation of the independent variables. In ‘partialling out’the influence of the Group factor is effectively evaluated after each participant’sdependent variable score (performance) has been divided by their covariate score(age) and it implicitly assumes a linear relationship between performance and thecovariate.

However, the default setting of SPSSprevents us from examining whether for ourdata the relationship between performance and age differs between the two groups;that is, there is no Group x Age interaction term included in the statistical modeldesign. We must therefore add this interactionterm by hand in a Custom Model.(Note, the Univariate Model dialogue box seems to imply that its default model isalready Full factorial. This is not the case since the interaction term is missing. Wehave to use the Custom mode to construct the fully factorial model).

We add the Group x Age interaction term as follows. Click on the Model button. Inthe Model dialogue, select Custom. Set the Build Terms drop down to Main effects.Click on Group(F) and CA(C) and use the right arrow button to move these across tothe Model box. (Nb., the F and C stand for Fixed and Covariate, respectively). Thenset the Build Terms drop down to Interaction. Select both Group(F) and CA(C) byclicking on them in turn, then click on the right arrow button to add this interaction

term to the model. The dialogue box should now look like this:

Click on Continue, and then run the analysis by clicking on OK in the mainUnivariate dialogue.

Two results tables are now of interest. The Tests of Between-Subjects-Effects allowsus to assess how much of the variance in the data we have explained. The overall R2is.774 (calculated by dividing the sum-of-squares for the Error, .409, by the CorrectedTotal sum-of-squares, 1.809, and subtracting the result from 1). The model

Inspection of the results for each factor indicate that there is no overall effect ofGroup [F(1, 37)=.52, p=.474, η2=.014]. This tells us that the intercepts of the twogroups are notreliably different. There is no Delayed Onset in development for thedisorder group here.

With the groups combined, chronological age significantly predicts level ofperformance [F(1, 37)=32.88, p<.001, η2=.470]. However, crucially, there is asignificant

Group X CA interaction. The disorder group is developing more slowly onthis task [F(1, 37)=7.40, p=.010, η2=.167]. They are exhibiting a Slower Rate ofdevelopment.

Interpretation:There is a subtly in interpreting these results. When there is nodifference in rate, differences in onset are unambiguous. But when there is adifference in rate, an absence of a difference in onset can be more ambiguous.

Clearly, the disorder trajectory falls below that of the TD group. The lack ofsignificance in the Group factor–

which is notionally the statistic to tell us that thedisorder group is performing at a different level–

is because the difference inintercepts is evaluated where the trajectories meet the y-axis, i.e., when age is zero.This is not especially meaningful because our analyses only pertain to the age rangefor which we have measured performance (let alone the idea that individuals could begood at the task the moment they are born!) Across the range we are measuring, thedisordergroup trajectory falls well below the TD group. However, in terms ofstatistically characterising this difference, it arises from a difference in rate rather thanonset. The two trajectories appear to have begun at the same level at some point in thepast,

but to have developed at different rates.

The Parameter Estimates table allows us to reconstruct the regression equations forthe two trajectories (and include confidence intervals on these estimates). Theseparameters allow us to quantify the difference

between the trajectories.

In interpreting this table, note that SPSS selects one group to have the derived valuesof the intercept and gradient (CA), and then provides a modifier to these values if thegroup membership is different. Thus the intercept for the disorder group (Group 2) is.118 and the gradient is .002, while the intercept for the TD group (Group 1) is (.118-.104)=.014 and the gradient is (.002+.004)=.006. These values should correspond tothe parameters generated either by carrying out individual linear regressions for eachtrajectory in SPSS or by Excel’s trendline fitting algorithm in the X-Y Scatter-plotChart.

Straightforwardly,we can say that the disorder group isdeveloping at a third of therate of the TD group

(i.e., .002/.006=.333).

In terms of the onset, we can describe this difference in one of two ways.Remembering that the comparison must take place within the range ofages where thetwo trajectories overlap (i.e., the ages for which we have collected data), can either:

(a)

report the performance difference between the groups at the youngest age ofthe disorder group, corresponding toa performance deficit of 15%

report the age difference at the lowest performance of the disorder groupderived in (a), corresponding toan onset delayof 25 months[predictedperformance at youngest age of 64 months: .002x64+.118=.246; age for TDtrajectory at which performance is .246: (.246-.014)/.006=38.67; difference:64-39=25 months].

The statistical comparison of the trajectories tells us that, in

this case, there is a 15%-performance deficit / 25-month disparity owing not to a difference in onset but adifference in the rate at which the two groups are developing.

Plotting performance against mental age

The analysis so far has indicated that the

disorder group is not at a level we wouldexpect given their chronological age. In some cases, this is unsurprising for a disorder,particularly if we are examining an area of apparent weakness (such as reading indevelopmental dyslexia). In other cases,the CA analysis may be central, such as whenwe believe we are examining an area of potential strength or normal development inthe disorder (e.g., non-verbal reasoning in developmental dyslexia).

Given that performance is not at CA level in the disordergroup, our next questionbecomes: is the performance of the disorder group at a level we would expectgiventheir level of cognitive development, as measured by our selected standardised test? Ifwe plotted the disorder trajectory according to mental age (MA) rather than CA,would the disorder trajectory now fall on top of the TD trajectory?

Note that this is a theory-dependent comparison, because it relies on us having madethe right, theoretically motivated choice about which standardised test is appropriateto evaluate the level of cognitive development in the domain that pertains to ourexperimental task (e.g., one might use a test of receptive grammar to generate MAsfor a disorder group in an experimental task investigating sentence repetition; onemight, more tentatively, use Matrices; one might be less likely to use a block designor face recognition test, although one could of course try).

To carry out this second comparison, we run the statistical test again, but nowsubstituting MA as the covariate. Remember, this requires that we re-specify theCustom model to include the following factors: Group, MA, and Group*MA.

When task performance is plotted against MA, the trajectories look like this:

The results of comparing disorder and TD trajectories based on MA yield thefollowing SPSS results tables:

As suggested by the data plot, the two trajectories are now parallel: there is no reliableinteraction of Group x Age [F(1, 37)=.21, p=.885, η2=.001]. While mental age is astrong predictor of performance over all participants [F(1, 37)=127.29, p<.001,η2=.775], there is no group difference [F(1, 37)=1.88, p=.178, η2=.048]. While thedisorder group performs at a marginally lower level than the TD group, this differenceis not statistically reliable.

In short, there is no delay in onset or rate: statistically, the trajectory is normalised byplotting according to MA. Therefore, in the disorder group, performance on this taskis in line with general development in the domain.

Note, however,we could not say that the disorder group is developing normally

onthis task. We have already established that according to CA, the disorder group isdeveloping at a slower rate. The MA analysis merely demonstrates that the slower rateis in keeping with the slower development exhibited by this domain as a whole (to theextent that the standardised test we have chosen is a valid measure of the domain).

Comparing CA-

and MA-based trajectories

Note that the R2

value for the disorder group increases from 0.104 in the CA plot [F(1,14)=1.60, p=.227, η2=.103] to 0.76 for the MA plot [F(1, 14)=42.37, p<.001,η2=.752]. For disorders with learning disability, this is a frequently observed pattern.Where the standardised test is indeed relevant to the experimental task, or where thestandardised test loads heavily on the general factor of intelligent (such as Ravensmatrices),MA will usually be a better predictor of task performance than CA.

This isespecially the case in a cross-sectional design. In a longitudinal design, MA and CAmay correlate more strongly and therefore their predictive power may be more equal.(See main paper for a theoretical discussion of the predictive power of CA vs. MA).

For the disorder group in our sample data, the correlation between CA and MA wasonly 0.58. Stepwise regression indicates that MA predicts reliably more of thevariance in task performance than CA [R2

CA and MA-based analyses may be compared by examining the confidence intervalson the parameters to see whether they overlap or not. They may be compared moredirectly by including age_type as an additional factor in a between-subjectscomparison (ensuring all 2-way and 3-way interactions are specified in the Custommodel), although this is a conservative comparison since the two trajectories aretreated as between-

rather than repeated measures. We won’t go into any furtherdetails on this analysishere.

Interpreting null trajectories in the disorder group

Consider the following data, drawn from one author’s studies (DA). These relate tothe development of perceptual thresholds in two tasks. In both cases, the TD groupgenerates a reliable cross-sectional trajectory showing improvement on perceptualrecognition. In both cases, the disorder group fails to produce a reliable trajectory, sothat chronological age does not predict performance (in this case, nor did any of themental ages derived fromstandardised tests deemed to be relevant to the task).

Closer inspection of the disorder trajectories for the two tasks suggests there doesseem to be a difference between them. In Task 2, performance seems to be randomwith regard to the

case that the children with the disorder have prematurely reached the best level theycan achieve (note the measure is still in the sensitive range; the children are not atfloor performance). This looks like a real trajectory whose gradient is zero. Such apattern would be consistent with the interpretation that the processing constraints ofthese children’s cognitive systems mean that development simply cannot get past acertain level

of performance for the age range we are examining. Let us call this azero trajectory.

Unfortunately, however, despite this apparent visual difference, statistically the twocases are identical with respect to linear regression methods: the disorder groupproduces null trajectories in both tasks.

These two types of (theoretically different) null results appear the same statistically.Why is this? It is for two reasons. First, the linear regression model simply evaluateswhether the gradient of the line is statistically different from zero. If it is not, thenvalues of the predictor (age) are of no use in predicting values of the dependentvariable (performance). They may be of no use because the dependent variable israndom or because it is always the same value.

Second, and more technically, the regression equation is calculated according to thestandardised residuals of the data points from the derived trajectory. In other words,although the data points for the disorder group may appear to be clustered

moretightly around the flat trajectory in Task 1 than Task 2, the regression model rescalesthis difference so that the two flat trajectories look equally noisy.

However, statistical models depend on the assumptions built into them. In this case,because of the experimental design (a comparison between TD and disorder groups),we may add an additional assumption into the statistical model: the TD groupprovides us with an independent verification of the range of variability expected in thetask. Therefore, we can add in the assumption that residuals need not be standardised.If we pay attention to the tighter clustering of the disorder data points in Task 1 thanTask 2, it becomes possible to distinguish between the two types of null result.

The method we have developed to do this relies on rotating the data in X-Ycoordinates. A flat line produces a gradient of zero. A line at 45 degrees produces agradient of 1. If the tightly clustered trajectory is real, the R2

value should changefrom around zero toaround 1 (for the ideal case) if the graph is simply rotated by 45degrees anti-clockwise. However, if one rotates a random data cloud by 45 degreesanti-clockwise, it should make no difference. The cloud should still be a cloud with anflat line through it; the R2

value of the originally-null trajectory should remain close tozero. In this way, repeating the linear regressions on rotated data should distinguishbetween azero trajectory

and one where there isno systematic relationship.

The Excel filerotating trajectories two sample.xls

includes formulae for thistransformation for two idealised cases of a zero trajectory and no-systematic-relationship. The method involves 2 steps:

has become reliable after rotation while ourno systemrelationshiphas not. Statistically, therefore, the rotation method has proved sufficientto distinguish between the two null results.

Remember, the rotation method only works because we have independent grounds fornot standardising the residuals: we know the variability produced by the TD group onthe same task. The rotation method would not be applicable characterising a singletrajectory, nor would it be applicable for comparing two null trajectories produced bya single group (unless we have some strong reason to believe variability in onemeasure should be related to variability in a second).

To compare these two trajectories treating Task as a repeated (within-participants)measure, select Analyze-General Linear Model-Repeated Measures. Define a Within-Subject Factor ‘task’ with two levels. Click on Define. In the Repeated Measuresdialogue, add the two variables Task1 and Task2 as Within-Subjects Variables andTD_CA as the covariate. Select Estimates of effect size and Parameter Estimates inthe Options dialogue. Cook’s distance information may also be generated using theSave dialogue, to check for outliers. Then run the analysis by clicking on OK in theRepeated Measures dialogue.

Overall, performance significantly improves with age [F(1, 23)=197.52, p<.001,η2=.896]. The repeated measure indicates a significant difference between the tasks,with performance on Task 2 producing reliably higher scores with a medium effectsize [F(1, 23)=14.28, p=.001, η2=.383]. Finally, there is a marginally significantinteraction between age and task [F(1, 23)=3.88, p=.061, η2=.144], suggesting adifference in the rate of development on the two tasks.

Since 100% is the maximum score in both tasks, there is a possibility that thisinteraction stems from a ceiling effect for Task 2. The performance advantage of 20%for Task 2 over Task 1 at around 30 months could not be replicated at 150 monthssince Task 1 is already at 90%: Task 2 would have to exceed the ceiling score.

Finally, the Parameter Estimates provide the coefficients for trajectories two separatetask trajectories in the TD group.

Using confidence intervals to assess when trajectories diverge/converge

In some circumstances, theory predicts that trajectories should converge or diverge.For instance, as children get better at recognising faces, they find it increasingly hardto detect differences in faces when they are presented upside-down. Therefore, normaldevelopment should produce an increasing trajectory of accuracyin upright facerecognition and a decreasing trajectory of accuracy in inverted face recognition, andthis is the pattern that has been observed in trajectory analysis for children between 6and 12 years (Annaz, 2006). In cases like this, it is useful to derive the age at whichtwo trajectories reliably diverge. This can be achieved by plotting the trajectories forthe two tasks (or two groups for a between-participants comparison) along with their95% confidence intervals, which can be generated using linear regression function oneach trajectory on its own (see section 1). The point at which the upper confidenceinterval of one trajectory and the lower confidence interval cease to overlap providesan estimate of the age (or mental age) at which the trajectories diverge. The followingplot illustrates this method for the repeated measures TD sample data. It indicates thatthe trajectories for the two tasks reliably converge above the age of 114 months (orreliably diverge below the age of 114 months).

4. Analysing mixed design linear regressions: does the disorder group show thesame relationship between the development of two abilities as the TD group?

The SPSS data filesample TD disorder mixed design.sav

contains performance datafor the typically developing (TD) group of 25 children (ages 2;9 to 12;5) and the 16children in the disorder group (ages 5;4 to 11;2) on two experimental tasks. We willtherefore be constructing and comparing four developmental trajectories, two for eachgroup.

We can assess the performance of the individual group on an individual basis usingthe confidence intervals around the two TD trajectories. Say that a

given individualfrom the disorder group scores at 20% on Task 1 and 33% on Task 2, a disparity of13%. We can evaluate whether this pattern (20, 33) is found anywhere across the TDtrajectories, with each point inside the respective TD trajectory’s confidence intervals;or, indeed, whether the disparity size of 13% is found anywhere across the TDtrajectories. These questions can be answered irrespective at the age at which anysuch patterns are exhibited in the TD group. This is a theory-neutral comparison ofindividuals with the disorder to the normal pattern of development. However, asbefore, our primary interest lies with group comparisons.

To compare these trajectories statistically, we need to construct a mixed-design linearregression model in SPSS, with Group as a between-participants factor and Task as awithin-participants factor. Select Analyze-General Linear Model-Repeated Measures.Define a Within-Subject Factor ‘task’ with two levels. Click on Define.

In the Repeated Measures dialogue, addthe two variables Task1 and Task2 as Within-Subjects Variables. Add Group to the Between-Subjects Factor(s) box and CA to theCovariates box. Select Estimates of effect size and Parameter Estimates in the Optionsdialogue. Cook’s distance information may also be generated using the Save dialogueto check for outliers.

Between-Subjects Model box. Then click on Continue and run the analyses byclicking on OK in the Repeated Measures dialogue.

We start with the key theoretical question: Does the disorder group show the samedevelopmental

relationship between the two tasks as that found in the TD group?

This corresponds to a 3-way interaction between task x Group x CA. The answer isyes, for this interaction is non-significant [F(1, 37)=2.11, p=.155, η2=.054]. However,the two groups do show a different pattern of accuracy on each task, with the TDgroup performing more accurately on Task 2 than Task 1, while the disorder groupperforms more accurately on Task 1 than 2 [task x Group: F(1, 37)=7.07, p=.012,η2=.160; this interaction then renders the main effect of task non-significant].Chronological age is a strong predictor of performance overall [CA: F(1, 37)=60.15,p<.001, η2=.619], but development once again occurs more slowly in the disordergroup [Group x CA: F(1, 37)=6.06, p=.019, η2=.141].

The Excel chart clearly shows the disorder trajectories falling below the TDtrajectories: why isn’t there a significant main effect of Group? As in Section 2, thisoccurs because of the slower rate of the disorder group and the fact that thecomparison of intercepts is carried out at the y-axis (i.e., when x=0). Using themethod outlined in the previous section for deriving a numerical value of onset delayfrom the regression equations, at the youngest age measured for the disorder group,there is already a performance decrement of 15% on Task 1 and 47% on Task 2.However, the analysis suggests that these accuracy differences stem from two systemsfor which accuracy levels did not initially differ (at an agebefore we startedmeasuring) but which have diverged based on their different rates of growth.(Remember, trajectory analyses aim to offer a richer set of statistical descriptor thatallow us to distinguish different ways in which trajectories can differ:in onset, rate,linearity, and so forth.)

The Parameter Estimates table allows the equations for each of the four trajectories tobe constructed. Parameters are listed separately for each task. Within the task, thedefault intercept and gradient (onset and rate) are listed for Group 2, with Group 1values corresponding to modifiers to these default values. These parameter valuesshould correspond to the regression equations on the Excel Scatter-plot charttrendlines.

Our next question is, is the developmental relation between the two tasks what wewould expect in the disorder groupgiven their level of cognitive development in thedomain, as measured by our selected standardised test? In some circumstances,disorder groups can show an apparently atypical relationship between performance ontwo tasks that is in fact a sign of immaturity, i.e., it is commensurate with the overallstage of development in the cognitive domain. If

this is the case, we would expect thedevelopmental relationship to normalise when trajectories are constructed against MAinstead of CA. (In that case, normalisation would be marked by a significant 3-waytask x Group x CA interaction but a non-significant task x Group x MA interaction).Alternatively, a different relationship between the development of the two tasksaccording to MA would be suggestive of an atypically developing cognitive system(see Karmiloff-Smith et al., 2004; Thomas et al., 2001; Thomas et al., 2006, forexamples of the application of the mixed-design method to test for atypicaldevelopment in visuospatial and language domains, respectively).

For our current sample data, the trajectories plotted against MA are as follows:

To carry out the analysis, replace CA with MA as the covariate in the RepeatedMeasures dialogue, remembering to ensure that the custom Model now containsGroup, MA, and Group x MA as the Between-Subjects Factors:

For the sample data, the 3-way interaction of Task x Group x MA remains non-significant [F(1, 37)=1.05, p=.313, η2=.028], indicating a normal developmentalrelationship between the tasks in the disorder group.

What has changed in assessing task development with reference to the level of abilityin the

general domain (as indexed by the standardised test) as opposed tochronological age? The Task x Group effect has become non-significant, a result ofthe disorder group’s two task trajectories becoming less distinguishable at youngerMAs. Group x MA is also non-significant: the disorder group’s task trajectories aremildly diverging and the TD group’s mildly converging: the average trajectory foreach group is now roughly similar. Averaging over tasks, the disorder group isdeveloping at the rate one would

expect given their level of mental ability (again, notethat the CA comparison shows it isnot

developing at a normal rate). However, meantask performance is now at a reliably lower level in the disorder group [F(1, 37)=5.08,p=.030, η2=.121]. That is, given their level of mental ability, development is at thenormal rate butthere is an onset delay.

Performance is below the level one wouldexpect based on the standardised test.

How much below? Plugging the youngest MA measured in the disorder group into

theequations for the four lines, shown on the chart and also derivable from the ParameterEstimates table, the initial task disparity between the groups is 14% for Task 1 and33% for Task 2. This is an average performance disparity of 23%.

Lastly, note that in this mixed-design, we have principally focused on reporting theresults involving the Group factor, either as a main effect or in interactions. We havegenerally found that mixed-designs are mostly useful for evaluating how disordergroup status modifies the normal pattern of development (either when plotted againstCA or against MA). Particularly when the variability within the groups is different, itis often not useful to explore main effects of task or CA orMA in the mixed-designanalysis, since this serves to conflate the groups. Instead, our usual practice for data ofthe kind presented here would be to characterise the pattern of normal developmentusing a repeated measures design just with the TD group, then characterise the patternobserved in the disorder group, again using a repeated measures design just with thedisorder group, and then finally to test whether differences between the TD anddisorder pattern are reliable by using a mixed-design and noting the involvement ofthe Group factor. (Since one is theoretical interested both in normal development anddevelopment within the disorder group, these individual group trajectories are plannedcomparisons. Therefore, the omnibus comparison need not be the first statisticalanalysis).