SPM Maps: "Blobs"

How are those activation "blobs" on an fMRI image created, and what exactly do they represent?

Brightly colored "blobs" on fMRI maps represent brain regions or voxels where "statistically significant" levels of activation or correlation are thought to have occurred. The size and number of these blobs are somewhat arbitrarily chosen, however, involving tradeoffs between excluding false positives (saying an area activates when it does not) and accepting false negatives (considering an area to be silent when it really does activate).

The first level coloring decision is typically based on calculation of a so-called test statistic (e.g., a T-, F- or Z-score) for each voxel or brain region from the fMRI data. Under the null hypothesis that no true activation has occurred, a p-value can be determined, representing the probability that the calculated test statistic score or larger has occurred by chance.Whenever the p-value is less than an arbitrary preselected level of significance, we conclude the measurement is unlikely to have occurred by chance and classify the voxel as "activated/correlated".

Most fMRI analysis programs include "slider bars" allowing interactive display of blobs at various p-value thresholds. As seen below, moving this slider bar toward more restrictive p-values reveals successively smaller areas of activation. But which level is the correct one?

BOLD word generation study. The same image has been processed using T-scores ranging from 0 (upper left) to 4.5 (lower right), corresponding to two-sided p-values between 1.000000 and 0.000007. Which one is correct? For simple eloquent cortex mapping an arbitrary selection in the intermediate range is often made by selecting a "visually pleasing" and/or "modest" amount of activity.

There is no simple answer. Although p-values of 0.05 are commonly used in "standard" scientific experiments, this threshold is inappropriate for fMRI studies where simultaneous statistical testing must be performed on 100,000 or more voxels. Setting a p-value of 0.05 for a single voxel means that 5000 (100,000 x 0.05) of these voxels could appear falsely activated. ​This is an example of the so-called multiple comparisons problem, a major issue that affects genetics testing as well. Methods for handling this including the Bonferroni correction are described in the Advanced Discussion.

In addition to selecting a p-value threshold, several other arbitrary decisions about blobs must be made. These include: 1) choosing the actual color palette including the range of p-values for corresponding to different shades or hues; 2) whether to employ additional spatial smoothing (makes the map less noisy but smears data anatomically); 3) whether and how to perform clustering (e.g., deciding to color a voxel only when a certain small number of immediate neighbors also appear to be activated to reduce false positives). Each of these decisions can significantly affect the appearance of the map.

Because of the problems in defining the limits and meanings of "blobs", some neuroscientists have disparagingly called the field of fMRI "blobology". The next two Q&A's will address issues concerning false activation and problems with various statistical methods.

Perhaps the most widely known procedure to account for multiple comparison errors in standard statistics is the Bonferroni correction. In its simplest form, the Bonferroni method merely divides the required Type I error level (α) by the number of independent tests (N) performed. Thus, one wishes to maintain an α = 0.05 error level for 10 tests, the p-value used would need to be set at 0.05/10 = 0.005. You can see that for an fMRI data set with N=~100,000 voxels being tested, the required p-value would be on the order of 5 x 10−7, an extremely stringent requirement. Using such a strict criterion to avoid Type I errors would severely impact the power of the fMRI data analysis leading to an increasing number of false negative results (Type II errors). Accordingly, several Bonferroni variants (Holm, Hochberg, Simes) including step-wise sequential testing have been devised. An alternative and increasingly popular approach is to control the false discovery rate (FDR), the expected proportion of falsely rejected voxels.