Preparing a combined MEG/fMRI dataset for sharing

This example describes how to prepare a combined MEG/fMRI dataset for sharing. The study involved 204 subjects, which participated in either a auditory or a visual version of a language experiment. The data and the experiment are descriobed in more details in the references that you find at the end of this page.

Although we acquired slightly more data, the part that we now want to share for these subjects consists of

anatomical MRI

diffusion weighted MRI

functional MRI

resting state

functional task

MEG

resting state

functional task

MEG was acquired with a 275 channel CTF system. MRI was acquired with a 3T Siemens scanner. For the coregistration of the MEG with the anatomical MRI, the headshape was recorded using a Polhemus electromagnetic tracker. Digital photo's of the anatomical landmark at both ears were taken. Furthermore, head localizer coils were used for MEG coregistration as usual.

Simulus presentation was done using using NBS Presentation. For MEG the events are coded as triggers in the data set. Both for MEG and MRI the presentation log files are shared and used to create/extend the events.tsv sidecar files for the task data.

The shared data is organized according to BIDS, the Brain Imaging Data Structure. This not only gives structure to the organization of the data files, but also helps to ensure that appropriate metadata and documentation are shared. We believe that this will facilitate reuse and increase the value of the shared data.

Procedure

Prior to conversion, the total dataset contains some 2000 files per subject (many of them are DICOMs), which totals to around 400.000 files. After conversion the total dataset consists of approximately 10.000 files. The dataset is about 1 TB in volume.

for each subject. The actual data files (CTF, DICOM, Presentation log files, Polhemus, etc.) are located in these directories.

The procedure for converting the original data consists of a number of steps

Create empty directory structure according to BIDS

Collect and convert MRI data from DICOM to NIFTI

Collect and rename the CTF MEG datasets

Collect the NBS Presentation log files

Collect the MEG coregistered anatomical MRIs

Create the sidecar files for each dataset

Create the general sidecar files

Finalize

Step 1-5 and step 7 are implemented using Bash scripts. The construction of the sidecar files in step 6 is implemented using the data2bids function that is part of FieldTrip. The final step is not automated, but consists of some manual work.

After each of the automated steps the results should be checked. For that I have been using the command line applications like “find DIR -name PATTERN | wc -l” to count the number of files, but also a graphical databrowser to check the directory structure and a text editor to check the content of the JSON and TSV sidecar files.

It is important that you use appropriate tools. Command line utilities are very handy, but also a good graphical (code) editor that allows you to navigate through the full directory structure and check the file content. I have been using the Atom editor with the network directory mounted on my desktop computer. There are good alternatives.

Step 2: collect and convert MRI data from DICOM to NIFTI

In this section we are using dcm2niix not only to convert the DICOMs to nifti, but also to create the initial json sidecar files with the information about the MR scan parameters. In step 6 we will update the sidecar files with information that is not available in the DICOMs, such as the task instructions.

Step 3: collect and rename the CTF MEG datasets

In this step we are copying and renaming the CTF datasets to the target location using a CTF command line utility. During this process, the identifying information about the subject (i.e name) is removed from the dataset. Since the “newDs -anon” option does not remove the time and date of the recording from the dataset, at the end we do another step to remove the date of acquisition from the res4 header file. We keep the time, as it is not unique enough to identify which recording goes with which participant. See also this frequently asked question.

You can see a few exceptions, which reflect datasets that did not convert well automatically. The reason for this is the fact that during data acquisition, the data ended up in two different *.ds datasets. According to BIDS, these are supposed to be represented by different 'runs'.

Step 4: collect the NBS Presentation log files

All Presentation log files are copied from their original location to the sourcedata folder. Although in step 6 the events in the log files will be used to construct the events.tsv files, we want to keep (and share) the Presentation log files, as those contain slightly more information than what can be represented in the events.tsv.

One issue is that the Presentation log files contains the exact date and time of the experiment. To avoid possible identification of participants, we are using sed to replace the time and date in the files.

Step 5: collect the MEG coregistered anatomical MRIs

The coregistration of the MEG recording with the anatomical MRI has been done on basis of the head localizer coils (placed at Nasion and on two ear molds on either side), the anatomical landmarks (Nasion, LPA, RPA) and using the scalp surface that was recorded with the Polhemus. This coregistration was done using ft_volumerealign and the resulting anatomical MRI was saved back to disk in NIFTI format.

Since the orientation of the CTF coregistered MRI has been flipped relative to the NIFTI file that was generated by dcm2niix, we are sharing both. The native one is most convenient for processing the functional MRI and DWI data, whereas the one in CTF space is most convenient for processing the MEG data.

The CTF coregistered MRI gets the same json sidecar file as the one converted by dcm2niix, which will be updated in step 6 regarding the coordinate system.

This script here deals with some dataset specific exceptions. Indeed, given the fact that we are working with real data here, due to various reasons, automatic conversions (one-size-fits-all) are likely to occasionally fail.

In the current context, the tricky part happened to be the creation of the events.tsv files for the MEG task data. In order to create these files, data2bids attempts to align the experimental events, as extracted from the presentation software logfile, with the experimental events, as extracted from the digital trigger channel in the MEG data files. This only works well and unambiguously, if there's a one-to-one-mapping of the events (or a specific type of event) in the two representations.

In the current example, there were occasional issues with the digital trigger channel, which precluded fully automatic processing of all files. The resulting example script above is therefore the result of several iterations to deal with the exceptions.

Step 7: create the general sidecar files

This step is again done on the Linux command line, using some tools that are shared here. Some of the other tools might be useful in creating scripts to gather and/or reorganize your EEG, MEG, Presentation or DICOM data.

Finalize

There are some things which are not implemented as a script, for example filling out the details in the top-level dataset_description.json file, adding a README file, updating the CHANGES file.

I also manually renamed the subdirectories with the presentation log files in the sourcedata directory, and added the presentation source code and stimulus material in the stimuli directory.

Throughout the development of the scripts and and after having completed the conversion I used the bids-validator to check compliance with BIDS. During script development it revealed errors and inconsistencies, which I fixed in the scripts (which I then reran). After the final conversion there were still some warnings printed, but the dataset passed the validator.

Issues

Although the scripts are presented in a linear fashion, the actual conversion of the whole dataset took some effort, especially in dealing with unexpected features or with exceptions in few subjects. This section describes some of the issues that we encountered.

Due to CTF hardware problems, some subjects' task MEG data was not recorded in a single CTF dataset, but in two datasets. We dealt with this by copying them explicitly (not in the for-loop) in step 3.

Due to misconfiguration of the Bitsi box (“level mode”), some subjects' task MEG data have the trigger codes represented incorrectly. The consequence is that the individual bits of the triggers overlap in time, causing the default trigger detection to fail. This is dealt with in step 6 by using the mous_read_event_audio function from the MOUS github repository.

In some of the MEG recordings the default settings for event detection from the digital trigger channel resulted in a limited number of events being undetected, causing occasional failure of the alignment procedure between shared events. This was mostly caused by 2 events being too closely spaced in time, either or not in combination with a too wide trigger pulse, resulting in “staircase-shaped” pulses. In case of such a mismatch between the number of trigger-channel-extracted events versus Presentation-log-file-extracted events, we defined another shared event for alignment. This is dealt with in step 6.

The Presentation log files for the visual stimuli had an \<enter\> after the period (.) at the end of each sentence. This caused the line in the log file to be broken in two, resulting in incorrect parsing of the log file in step 6. We dealt this by removing the \<enter\> in the log files prior to step 6, i.e. in step 4.