The MIMIC-III Waveform Database contains thousands of recordings of multiple
physiologic signals ("waveforms") and time series of vital signs ("numerics")
collected from bedside patient monitors in adult and neonatal intensive care
units (ICUs). It is a companion to
the MIMIC-III Clinical Database,
which contains detailed clinical information for many of the patients
represented in the Waveform Database. The MIMIC-III
Waveform Database Matched Subset contains 22,317 waveform records and
22,247 numerics records, which have been matched and time-aligned with 10,282
MIMIC-III Clinical Database records.

Recorded waveforms and numerics vary depending on choices made by the ICU
staff. Waveforms almost always include one or more ECG signals, and often
include continuous arterial blood pressure (ABP) waveforms, fingertip
photoplethysmogram (PPG) signals, and respiration, with additional waveforms
(up to 8 simultaneously) as available. Numerics typically include heart and
respiration rates, SpO2, and systolic, mean, and diastolic blood pressure,
together with others as available. Recording lengths also vary; most are a few
days in duration, but some are shorter and others are several weeks long.

Use the PhysioBank ATM to view any desired record in
this database, to export it in a variety of formats, or to perform a variety of
other operations on it.

What's New

This database is closely related to the
older MIMIC II Waveform Database,
which (as of version 3.2) shares the same underlying set of records. The only
difference between the two databases lies in the Matched
Subset, which has been matched with the MIMIC-III Clinical Database (version
1.4.)

Version History

Organization of the Database

Each recording comprises two records (a waveform record and a matching
numerics record) in a single record directory ("folder") with the name
of the record.
To reduce access time, the record directories have been distributed among ten
intermediate-level directories (listed below). The names of these intermediate
directories (30, 31, ..., 39) match the first two digits of the record
directories they contain.

In almost all cases, the waveform records comprise multiple segments,
each of which can be read as a separate record. Each segment contains an
uninterrupted recording of a set of simultaneously observed signals, and the
signal gains do not change at any time during the segment. Whenever the
ICU staff changed the signals being monitored or adjusted the amplitude of
a signal being monitored, this event was recorded in the raw data dump,
and a new segment begins at that time.

Each composite waveform record includes a list of the segments that comprise it
in its master header file. The list begins on the second line of the
master header with a layout header file that specifies all of the
signals that are observed in any segment belonging to the record. Each segment
has its own header file and (except for the layout header) a matching
(binary) signal (.dat) file. Occasionally, the monitor may be
disconnected entirely for a short time; these intervals are recorded as gaps in
the master header file, but there are no header or signal files corresponding
to gaps.

The numerics records (designated by the letter n appended to the
record name) are not divided into segments, since the storage savings that
would be achieved by doing so would be relatively little.

Physiologic waveform records in this database contain up to eight simultaneously
recorded signals digitized at 125 Hz with 8-, 10-, or (occasionally) 12-bit
resolution. Numerics records typically contain 10 or more time series of vital
signs sampled once per second or once per minute.

An example will make this arrangement clear:

Intermediate directory 31 contains all records with
names that begin with 31.

Record directory 3141595 is contained within
intermediate directory 31.

All files associated with physiologic waveform record 3141595 and its
companion numerics record 3141595n are contained within record directory
31/3141595.

The first line of the master header file for waveform record 314595
(31/3141595/3141595.hea) indicates that
the record is 242353557 sample intervals (about 22 days at 125 samples per
second) in duration, and that
it contains 427 segments and
gaps. (See header(5) in the WFDB Applications
Guide for details on the format of this text file.) The first segment is named
3141595_0001, and it is 2888500 sample intervals (6 hours, 15 minutes, and 8
seconds, at 125 samples per second) in duration. At the end of the master
header file, a comment (# Location: nicu) specifies the ICU in which
the recording was made (the neonatal ICU in this case).

The layout header file for this record
(31/3141595/3141595_layout.hea)
indicates that five ECG signals (I, II, III, AVR, and "V"), a respiration
signal, and a PPG signal are available during portions of the record.
(The five ECG signals are not all available simultaneously.)

The header file for the first segment of this record
(31/3141595/3141595_0001.hea) shows
that a PPG signal ("PLETH"), a respiration signal, and ECG leads II and AVR are
available throughout this initial segment.

The matching numerics record is named 3141595n, and its header file
(31/3141595/3141595n.hea) shows that it
is 1938730 sample intervals (about 22 days at 1 sample per second) in duration,
and that it contains heart rate (HR, from ECG, as well as PULSE, from one or
more pulsatile signals), noninvasive blood pressure (raw as well as systolic,
diastolic, and mean), respiration rate, and SpO2.

Any WFDB application can read any
waveform record from this database directly from the PhysioNet web server
(i.e., without downloading the record first) using a record name of the
form mimic3wdb/3x/3xyyyyy/. Numerics records can be
read using the longer
form mimic3wdb/3x/3xyyyyy/3xyyyyyn (note
that the final 3xyyyyy must be repeated and followed
by n to specify the numerics record).

For example, if you have installed the WFDB
Software Package, you can read the first 10 seconds of waveform record
3141595 using this rdsamp command:

rdsamp -r mimic3wdb/31/3141595/ -p -v -t 10

To read the first 10 seconds of the matching numerics record 3141595n, use
this command instead:

rdsamp -r mimic3wdb/31/3141595/3141595n -p -v -t 10

Notice that the first command produces 1250 samples of each waveform (125
samples per second, for 10 seconds), but the second command produces only
10 samples of each vital sign (1 sample per second, for 10 seconds). See
How to obtain PhysioBank data in
text form for details about using rdsamp.

Clinical Correlates

The MIMIC-III Clinical Database contains detailed clinical information about
most of the subjects represented in the MIMIC-III Waveform Database. Since
the contents of each database were collected independently, in partially
deidentified form, matching the clinical data with the waveform data is a
non-trivial task, and only a subset of MIMIC-III Waveform Database records,
located in the MIMIC-III Waveform Database Matched
Subset, has been matched with MIMIC-III Clinical Database records.

In these cases, the matches provide additional information about
the subjects, including age, gender, and detailed clinical information
collected during (and in some cases before and after) the periods that have
been recorded in the Waveform Database records. For more information,
apply for access
to the MIMIC-III Clinical Database (a data use agreement is required).

Multiple recordings of a given patient, which may exist (for example) if that
patient was admitted more than once to any of the study ICUs during the study
period, do not have related MIMIC-III Waveform Database record names; it will be
necessary to refer to the Matched Subset to discover any such cases.

Technical Limitations

Waveforms or numerics missing: Occasionally, technical
limitations of the data acquisition system make it possible to create
a physiologic waveform record but not a numerics record, or vice
versa.

A given signal may not be available throughout an entire record.
Records in the MIMIC-III Waveform Database vary in length; some are several
weeks in duration. It is common for the physiologic signals to be interrupted
or changed occasionally during recordings of such long duration. When using a
viewer such as the PhysioBank ATM, all signals
available at any time during a record are listed, although in most cases only
a subset is visible at any given time.

Gaps and patient identification. The waveform and
numerics records have been extracted from raw data dumps collected from the
bedside monitors using a facility provided by the monitor manufacturer. The
raw data dumps contain files of data collected from a single patient monitor
during a single monitoring session (which may last days or weeks). Usually the
monitoring session ends when the patient is discharged, so that the data in a
single file come from a single patient. Occasionally, however, the monitor is
not reset when the patient is discharged, and the session continues after a new
patient has been admitted; in this case the raw data file contains data from
two (or more) patients, with a gap (an interval during which no waveforms or
numerics are recorded) that is typically an hour or more in duration. Such
gaps may also appear if the monitor is temporarily disconnected (for example,
for a laboratory test) and then reconnected to the same patient. Since the raw
data files do not usually contain patient identifiers, it is not trivial to
determine with certainty if the data before and after a gap were collected from
the same patient.

Ideally, each MIMIC-III Waveform Database record should contain data from
only one patient. All raw data files containing gaps of an hour or more have
been split into separate records in order to decrease the likelihood that
any record contains data from multiple patients. An ongoing project is to
examine the sets of records created this way, matching them with MIMIC-III
Clinical Database records when possible, to determine if and how they should
be reassembled.

Inter-waveform alignment problems: The method used for MIMIC
waveform data extraction was not designed for inter-waveform analysis. The
waveform data contain unspecified/unknown filtering delays and/or unknown
inter-channel delays, which may not be constant in a given record.
Therefore, although the ECGs are time-aligned with each other, there may be
a (changing) delay of up to 500ms between any of the other waveforms in the
data. For example, the pulse transit time measured between different
waveforms may be unreliable (either in absolute or relative terms).

ECG limitations: The ECG signals in the waveform records were
originally sampled with 12-bit precision at a high sampling rate, and
were then scaled and decimated to 500 samples per second (per signal).
The scaling reduced the effective amplitude resolution from 12 bits to
9 or 10 bits in typical cases, and as little as 7 bits in some cases.
From each set of 4 consecutive decimated samples of the same ECG
signal, one was recorded (chosen using a turning-point compressor, a
technique sometimes called "peak-picking"). The result is an ECG
signal sampled 125 times per second, but at intervals that vary
between 2 and 14 ms (averaging 8 ms). Since the interval between any
given pair of samples was not available to us, the reconstructions of
the ECG signals assume uniform 8 ms intervals. These signals with
reduced time and amplitude resolution, and sampling jitter introduced
by the "peak-picking", were the only ECG signals that were possible to
capture from the ICU monitors. Although ECGs reconstructed in this
way can be readily interpreted visually, they are unsuitable as input
for certain algorithms for ECG analysis, particularly those that are
sensitive to frequency-domain features of the signal. Note that these
limitations apply only to the ECG signals, not to the other signals,
which were originally sampled at uniform 8 ms intervals (125 samples
per second) and were not scaled prior to capture.