Cluster-based permutation tests on time-frequency data

Introduction

The objective of this tutorial is to give an introduction to the statistical analysis of EEG and MEG data (denoted as MEEG data in the following) by means of cluster-based permutation tests.

The tutorial starts with a long background section that sketches the background of permutation tests. The next sections are more tutorial-like. They deal with the analysis of an actual MEG dataset. In a step-by-step fashion, it will be shown how to statistically compare the data observed in two experimental conditions in a between-trials, in a within-trials and in a within-subjects design. For this, we will use planar TFR's.

This tutorial contains hands-on material that we use for the MEG/EEG toolkit course and is complemented by this lecture.

Background

The multiple comparisons problem

In the statistical analysis of MEEG-data we have to deal with the multiple comparisons problem (MCP). This problem originates from the fact that MEEG-data are multidimensional. MEEG-data have a spatiotemporal structure: the signal is sampled at multiple channels and multiple time points (as determined by the sampling frequency). The MCP arises from the fact that the effect of interest (i.e., a difference between experimental conditions) is evaluated at an extremely large number of (channel,time)-pairs. This number is usually in the order of several thousands. Now, the MCP involves that, due to the large number of statistical comparisons (one per (channel,time)-pair), it is not possible to control the so called family-wise error rate (FWER) by means of the standard statistical procedures that operate at the level of single (channel,time)-pairs. The FWER is the probability under the hypothesis of no effect of falsely concluding that there is a difference between the experimental conditions at one or more (channel,time)-pairs. A solution of the MCP requires a procedure that controls the FWER at some critical alpha-level (typically, 0.05 or 0.01). The FWER is also called the false alarm rate.

In this tutorial, we describe nonparametric statistical testing of MEEG-data. Contrary to the familiar parametric statistical framework, it is straightforward to solve the MCP in the nonparametric framework. Fieldtrip contains a set of statistical functions that provide a solution for the MCP. These are the functions ft_timelockstatistics and ft_freqstatistics which compute significance probabilities using a non-parametric permutation tests. Besides nonparametric statistical tests, ft_timelockstatistics and ft_freqstatistics can also perform parametric statistical tests (see the Parametric and non-parametric statistics on event related fields tutorial). However, in this tutorial the focus will be on non-parametric testing.

Structure in experiment and data

Different types of data

MEEG-data have a spatiotemporal structure: the signal is sampled at multiple channels and multiple time points. Thus, MEEG-data are two-dimensional: channels (the spatial dimension) X time-points (the temporal dimension). In the following, these (channel,time)-pairs will be called samples. In some studies, these two-dimensional data are reduced to one-dimensional data by averaging over either a selected set of channels or a selected set of time-points.

In many studies, there is an interest in the modulation of oscillatory brain activity. Oscillatory brain activity is measured by power estimates obtained from Fourier or wavelet analyses of single-trial spatiotemporal data. Very often, power estimates are calculated for multiple data segments obtained by sliding a short time window over the complete data segment. In this way, spatio-spectral-temporal data are obtained from the raw spatiotemporal data. The spectral dimension consists of the different frequency bins for which the power is calculated. In the following, these (channel,frequency,time)-triplets will be called samples. In some studies, these three-dimensional data are reduced to one- or two-dimensional data by averaging over a selected set of channels, a selected set of time points, and/or a selected set of frequencies.

Structure in an experiment

The data are typically collected in different experimental conditions and the experimenter wants to know if there is a significant difference between the data observed in these conditions. To answer this question, statistical tests are used. To understand these statistical tests one has to know how the structure in the experiment affects the structure in the data. Two concepts are crucial for describing the structure in an experiment:

The concept of a unit of observation (UO).

The concept of a between- or a within-UO design.

In MEEG experiments, there are two types of UO: subjects and trials (i.e., epochs of time within a subject). Concerning the concept of a between- or a within-UO design, there are two schemes according to which the UOs can be assigned to the experimental conditions: (1) every UO is assigned to only one of a number of experimental conditions (between UO-design; independent samples), or (2) every UO is assigned to multiple experimental conditions in a particular order (within UO-design; dependent samples). Therefore, in a between-UO design, there is only one experimental condition per UO, and in a within-UO design there are multiple experimental conditions per UO. These two experimental designs require different test statistics.

The two UO-types (subjects and trials) and the two experimental designs (between- and within-UO) can be combined and this results in four different types of experiments.

To correct for multiple comparisons, one has to specify the field cfg.correctm. This field can have the values 'no', 'max', 'cluster', 'bonferoni', 'holms', or 'fdr'. In this tutorial, we only use cfg.correctm = 'cluster', which solves the MCP by calculating a so-called cluster-based test statistic and its significance probability.

The calculation of this cluster-based test statistic

For every sample (a (channel,time)-pair or a (channel,frequency,time)-triplet) the experimental conditions are compared by means of a t-value or some other number that quantifies the effect at this sample. It must be noted that this t-value is not the cluster-based test statistic for which we will calculate the significance probability; it is just an ingredient in the calculation of this cluster-based test statistic. Quantifying the effect at the sample level is possible by means of different measures. These measures are specified in cfg.statistic, which can have the values 'ft_statfun_indepsamplesT', 'ft_statfun_depsamplesT', 'ft_statfun_actvsblT', and many others. We will return to this in the following.

All samples are selected whose t-value is larger than some threshold as specified in cfg.clusteralpha. It must be noted that the value of cfg.clusteralpha does not affect the false alarm rate of the statistical test; it only sets a threshold for considering a sample as a candidate member of some cluster of samples. If cfg.clusteralpha is equal to 0.05, the t-values are thresholded at the 95-th quantile for a one-sided t-test, and at the 2.5-th and the 97.5-th quantiles for a two-sided t-test.

Selected samples are clustered in connected sets on the basis of temporal, spatial and spectral adjacency.

Cluster-level statistics are calculated by taking the sum of the t-values within every cluster.

The maximum of the cluster-level statistics is taken. This step and the previous one (step 4) are controlled by cfg.clusterstatistic, which can have the values 'maxsum', 'maxsize', or 'wcm'. In this tutorial, we will only use 'maxsum', which is the default value.

The result from step 5 is the test statistic by means of which we evaluate the effect of the experimental conditions.

The calculation of the significance probability

If the MCP is solved by choosing for a cluster-based test statistic (i.e., with cfg.correctm = 'cluster'), the significance probability can only be calculated by means of the so-called Monte Carlo method. This is because the reference distribution for a permutation test can be approximated (to any degree of accuracy) by means of the Monte Carlo method. In the configuration, the Monte Carlo method is chosen as follows: cfg.method = 'montecarlo'.

The Monte Carlo significance probability is calculated differently for a within- and between-UO design. We now show how the Monte Carlo significance probability is calculated for in a between-trials experiment (a between-UO design with trials as UO):

Collect the trials of the different experimental conditions in a single set.

Randomly draw as many trials from this combined data set as there were trials in condition 1 and place them into subset 1. The remaining trials are placed in subset 2. The result of this procedure is called a random partition.

Calculate the test statistic (i.e., the maximum of the cluster-level summed t-values) on this random partition.

Repeat steps 2 and 3 a large number of times and construct a histogram of the test statistics. The computation of this Monte Carlo approximation involves a user-specified number of random draws (specified in cfg.numrandomization). The accuracy of the Monte Carlo approximation can be increased by choosing a larger value for cfg.numrandomization.

From the test statistic that was actually observed and the histogram in step 4, calculate the proportion of random partitions that resulted in a larger test statistic than the observed one. This proportion is the Monte Carlo significance probability, which is also called a p-value.

If the p-value is smaller than the critical alpha-level (typically 0.05) then we conclude that the data in the two experimental conditions are significantly different. This p-value can also be calculated for the second largest cluster-level statistic, the third largest, etc. It is common practice to say that a cluster is significant if its p-value is less than the critical alpha-level. This practice suggests that it is possible to do spatio-spectral-temporal localization by means of the cluster-based permutation test. However, from a statistical perspective, this suggestion is not justified. Consult Maris and Oostenveld, Journal of Neuroscience Methods, 2007 for an elaboration on this topic.

2009/04/06 21:20

Procedure

In this tutorial we will consider a between-trials experiment, in which we analyze the data of a single subject. For the statistical analysis for this experiment we calculate the planar TFRs. The steps we perform are as follows:

Figure 3. Pipeline of statistical analysis of planar TFR's in a within-subjects design

Between-trial experiments

In a between-trials experiment, we analyze the data of a single subject. By means of a statistical test, we want to answer the question whether there is a systematic difference in the MEG recorded on trials with a fully congruent and trials with a fully incongruent sentence ending.

Calculation of the planar gradient and time-frequency analysis

Before calculating the TFRs we calculate the planar gradient with ft_megplanar.
This requires preprocessed data (see above), which is also available from the FieldTrip FTP server (dataFIC.mat and dataFC.mat).

Without a prior hypothesis (i.e., an hypothesis that is independent of the present data), we must test the difference between the FC and the FIC condition for all frequency bands that may be of interest. However, such an analysis is not suited as a first step in a tutorial. Therefore, we assume that we have a prior hypothesis about the frequency band in which the FC and FIC condition may differ. This frequency band is centered at 20 Hz. We will now investigate if there is a difference between the FC and the FIC condition at this frequency. To calculate the TFRs, we use ft_freqanalysis with the configuration below. See the tutorial Time-frequency analysis using Hanning window, multitapers and wavelets for an explanation of the settings. Note that cfg.keeptrials = ‘yes’, which is necessary for the subsequent statistical analysis.

Plotting the results

By inspecting stat.posclusters and stat.negclusters, you will see that there is one large cluster that shows a negative effect and no large clusters showing a positive effect.

To show the topography of the negative cluster, we make use of ft_clusterplot. This is a function that displays the channels that contribute to large clusters, based on which the null-hypothesis can be rejected. First we use ft_freqdescriptives to average over the trials.

Within trial experiments

We will now show how to statistically test the difference between the TFRs in the pre-stimulus (baseline) and the post-stimulus (activation) period of the fully incongruent sentence endings.
To perform this comparison by means of a permutation test, we have to select equal-length non-overlapping time intervals in the baseline and the activation period. For the baseline period we choose [-1 0], which is the time interval from 1 to 0 seconds before stimulus onset. And for the activation period we choose [0.6 1.6], which is the time interval from 0.6 to 1.6 seconds after stimulus onset.

It must be stressed that the time windows we choose to compare are nonoverlapping and of equal length. This constraint follows from the null hypothesis that is tested with a permutation test. This null hypothesis involves that the data (spatiotemporal matrices) observed in the two experimental conditions are drawn from the same probability distribution. This null hypothesis only makes sense if the dimensions of the data matrices in the two experimental conditions are equal. In other words, the number of channels and the number of time points of these spatiotemporal data matrices must be equal. This also applies to a within trials experiment in which a baseline and an activation condition are compared: the number of channels and time points of the baseline and the activation data matrices must be equal.

Preprocessing and freqanalysis on planar data

We first select equal-length non-overlapping time intervals in the baseline and the activation period. For the baseline period we choose [-1 0], the time interval from 1 to 0 seconds before stimulus onset. And for the activation period we choose [0.6 1.6], the time interval from 0.6 to 1.6 seconds after stimulus onset. We will again focus on the power in the beta band (around 20 Hz).

ft_freqstatistics will always compare data at the same time point, and therefore you get an error if you try to statistically test differences between data structures with non-overlapping time axes. This implies that, in order to statistically test the differences between baseline and activation period data, we have to trick the function by making the time axis the same. We do this by copying the time axis of dataFIC_activation onto dataFIC_baseline.

dataFIC_baseline.time = dataFIC_activation.time;

To calculate the TFRs for the synthetic planar gradient data, we must run the following code:

This configuration for a within-trials experiment is very similar to the configuration for the within-subjects experiment in the "Cluster-based permutation tests on event related fields" tutorial in which we compared the evoked responses to fully incongruent and fully congruent sentence endings. The main difference is the measure that we use to evaluate the effect at the sample level (cfg.statistic = 'ft_statfun_actvsblT' instead of cfg.statistic = 'ft_statfun_depsamplesT'). With cfg.statistic = 'ft_statfun_actvsblT', we choose the so-called activation-versus-baseline T-statistic. This statistic compares the power in every sample (i.e., a (channel,frequency,time)-triplet) in the activation period with the corresponding time-averaged power (i.e., the average over the temporal dimension) in the baseline period. The comparison of the activation and the time-averaged baseline power is performed by means of a dependent samples T-statistic.
You can read more about the ft_prepare_neighbours function in the FAQ's.

Figure 2: Largest cluster that shows a difference between activation and baseline, plotted on top of the T-statistic of the difference.

Within subjects experiments

In this paragraph we describe permutation testing for TFRs obtained in experiments involving multiple subjects that are each observed in multiple experimental conditions. Every subject is observed in a large number of trials, each one belonging to one experimental condition. For every subject, averages are computed over all trials belonging to each of the experimental conditions. Thus, for every subject, the data are summarized in an array of condition-specific averages. The permutation test that will be described in the following informs us about the following null hypothesis: the probability distribution of the condition-specific averages is identical for all experimental conditions.

Exercise

Try calling clusterplot with cfg.alpha = 0.05;

Summary and suggested further readings

In this tutorial, we showed how to do non-parametric statistical test, cluster-based permutation test on planar TFR's with between-trials, with within-trials and with within-subjects experimental designs. It was also shown how to plot the results.