I am trying to associate the “MIMIC II Waveform Matched database” to the “MIMIC III Clinical database”, but my results are suspicious.

According to the documentation, the “sNNNN” identifier of the matched waveform database should be the same as the SUBJECT_ID of the Clinical database.

Citation from the documentation:

…Each subdirectory of this directory contains one or more MIMIC II
Waveform Database records that have been matched with a single subject
(whose MIMIC II Clinical Database Subject_ID is the name of the
subdirectory). The name of each mimic2wdb/matchedwaveform record is of
the form sNNNNN/sNNNNN-YYYY-MM-DD-hh-mm where NNNNN is the matching
MIMIC II Clinical Database Subject_ID, and YYYY, MM, DD, hh, and mm
are the surrogate year, month (01-12), and day (01-31), and the real
hour (00-23) and minute (00-59), derived from the starting date and
time of day of themimic2wdb/matched record. The surrogate dates match
those of the corresponding MIMIC II Clinical Database version 2.6 (or
later) records; note that surrogate dates in previous versions of the
MIMIC II Clinical Database differ from those in version 2.6 and later.
…

However, doing so produces suspicious results. For example:

The first two patients (in them of numerical ordering) of the waveform matched dataset are “s00001” and “s00020”. The first two patients of the clinical dataset are “2” and “3”.

The first patient "in common" in the two databases is the patient “20” (s00020 = 20). However, the record dates are not matching: The record of patient “20” starts at “2567-03-30 17:47:00” for the matched waveform dataset, while it starts at “2183-04-28 09:45:00” for the clinical dataset.

1 Answer
1

That missingness is perfectly valid. Some patient records were removed from the MIMIC database after they were added, potentially due to privacy concerns or corrupted data. You have to work with only the remaining patient records.

It seems that the times were regenerated from scratch for MIMIC III. My suggestion is for you to get your hands on the MIMIC II v2.6 dataset and use the times from there.

Here are the details for Patient 20 (I double-checked that the admit age and sex matched between MIMIC II and MIMIC III.)