Not a subscriber?

You may access and print any article from the Journal of Clinical Sleep Medicine for your personal scholarly, research,
and educational use. Please note, access to the article is from the computer on which the article was purchased only.
Purchase of the article does not permit distribution, electronic or otherwise, of the article without the written
permission of AASM. Further, purchase does not permit the posting of the article text on an online forum or website.

ABSTRACT

Guidance is needed to help clinicians decide which out-of-center (OOC) testing devices are appropriate for diagnosing obstructive sleep apnea (OSA). A new classification system that details the type of signals measured by these devices is presented. This proposed system categorizes OOC devices based on measurements of Sleep, Cardiovascular, Oximetry, Position, Effort, and Respiratory (SCOPER) parameters.

Criteria for evaluating the devices are also presented, which were generated from chosen pre-test and post-test probabilities. These criteria state that in patients with a high pretest probability of having OSA, the OOC testing device has a positive likelihood ratio (LR+) of 5 or greater coinciding with an in-lab-polysomnography (PSG)-generated apnea hypopnea index (AHI) ≥ 5, and an adequate sensitivity (at least 0.825).

Since oximetry is a mandatory signal for scoring AHI using PSG, devices that do not incorporate oximetry were excluded. English peer-reviewed literature on FDA-approved devices utilizing more than 1 signal was reviewed according to the above criteria for 6 questions. These questions specifically addressed the adequacy of different respiratory and effort sensors and combinations thereof to diagnose OSA. In summary, the literature is currently inadequate to state with confidence that a thermistor alone without any effort sensor is adequate to diagnose OSA; if a thermal sensing device is used as the only measure of respiration, 2 effort belts are required as part of the montage and piezoelectric belts are acceptable in this context; nasal pressure can be an adequate measurement of respiration with no effort measure with the caveat that this may be device specific; nasal pressure may be used in combination with either 2 piezoelectric or respiratory inductance plethysmographic (RIP) belts (but not 1 piezoelectric belt); and there is insufficient evidence to state that both nasal pressure and thermistor are required to adequately diagnose OSA. With respect to alternative devices for diagnosing OSA, the data indicate that peripheral arterial tonometry (PAT) devices are adequate for the proposed use; the device based on cardiac signals shows promise, but more study is required as it has not been tested in the home setting; for the device based on end-tidal CO2 (ETCO2), it appears to be adequate for a hospital population; and for devices utilizing acoustic signals, the data are insufficient to determine whether the use of acoustic signals with other signals as a substitute for airflow is adequate to diagnose OSA.

Standardized research is needed on OOC devices that report LR+ at the appropriate AHI (≥ 5) and scored according to the recommended definitions, while using appropriate research reporting and methodology to minimize bias.

Citation:

1.0 Introduction

The first widely used classification system for describing sleep testing devices was published by the American Academy of Sleep Medicine (then the American Sleep Disorders Association) in 1994, placing available devices into 4 categories based upon the number and type of “leads” used and the circumstances in which the device was used. This schema closely mirrored available Current Procedural Terminology (CPT) codes, worked for the majority of the then-available devices, and served to foster development of practice guidelines and reimbursement decisions. However, since that time, a plethora of innovative testing devices have been developed, many of which do not fit well within that classification scheme. In 2010, the Board of the American Academy of Sleep Medicine (AASM) commissioned a task force to determine a more specific and inclusive method of classifying and evaluating sleep testing devices other than polysomnography (PSG) used as aids in the diagnosis of obstructive sleep apnea (OSA) in the out-of-center (OOC) setting. The scope of this work was specifically limited to classification and evaluation of the performance characteristics of the technology itself and will not address their use in practice guidelines, accreditation standards, or management principles.

There are many issues involved in classifying and evaluating the performance characteristics of the wide array of devices purporting to diagnose OSA outside of the realm of attended polysomnography: (1) many different sensors might be used to measure the same physiologic parameter; (2) sensors may be combined in varied ways in an effort to enhance accuracy; (3) different physiologic parameters might be measured in one device vs. another; (4) signals may be modified by analog or digital processing to arrive at derived measures; and (5) studies evaluating and comparing devices may have employed varied “gold standards” or outcome measures. These and other factors make comparisons and generalizations between studies of even similar devices difficult.

The overarching purpose of our technology assessment is to provide a means of answering some pertinent clinical questions:

Is a thermal sensing device without an effort measure adequate to diagnose OSA?

Is a thermal sensing device with a measure of effort adequate to diagnose OSA?

Is nasal pressure without an effort measure adequate to diagnose OSA?

Is nasal pressure with an effort measure adequate to diagnose OSA?

With an effort measure, is nasal pressure in combination with a thermal sensing device significantly better than either a thermal sensing device or nasal pressure alone to warrant the requirement of both sensors?

What is the evidence for alternative devices to diagnose OSA?

We will first discuss an approach to addressing the complexities and challenges involved in assessing the performance and characteristics of OOC devices, and then use the methods devised to address the clinical questions above.

Variations in the Standards and Outcomes Used to Evaluate and Compare Devices

In addition to variability in sensors used in PSG, the definitions for apneas, hypopneas, apnea-hypopnea index (AHI), respiratory disturbance index (RDI), and OSA are also variable. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications1 was written in an attempt to standardize the scoring definitions, but even the manual has 2 acceptable definitions of hypopnea. To emphasize the impact that different hypopnea scoring has on resultant AHI, Ruehland et al.2 reported that the AHI of studies scored utilizing the “Chicago” criteria (hypopnea defined by a 50% reduction in airflow or < 50% reduction in airflow associated with a 3% oxygen desaturation and/or an arousal) was 3 times the AHI of the same studies scored using the recommended hypopnea definition from the AASM scoring manual. To further complicate matters, the scoring used for an OOC device may or may not be the same as that used for the in-lab PSG because different sensors are used (including lack of a standard sleep measurement). Additionally, the definition of what level of AHI is considered adequate to confirm a diagnosis of OSA that qualifies for treatment is variable across studies. In this paper, we will use the definition of OSA-positive as an AHI ≥ 5.

For this document, the following definitions will apply, except as used by the study's author in which case their terminology was used:

The standard definition of AHI (AHIs) as determined during attended laboratory PSG is:

The nonstandard definition of AHI (AHIns) is defined as:

The respiratory event index (REI) is defined in the context of OOC testing devices as:

In addition to obtaining physiologic measures, many studies examining OOC devices report a variety of additional outcomes measures, such as compliance with positive airway pressure or change in subjective sleepiness. This makes summarizing the evidence across studies exceedingly challenging.

Keeping these challenges in mind, we have attempted to account for these challenges in 2 ways.

We have translated the varied outcome measures of studies evaluating portable testing devices into a dimensionless, useful parameter—the positive likelihood ratio (LR+) delivered by applying a given test and obtaining a “positive” result. This allows comparisons across a wide variety of devices that is less sensitive to variations in case definitions.

We have developed a device categorization scheme that is adaptable, descriptive, and we believe workably specific (see SCOPER system below).

2.0 Determining Criteria for Evaluating Devices

Criteria need to be established to evaluate the appropriateness of OOC devices to diagnose OSA. The 2007 Clinical Guidelines3 prescribe that OOC devices should be used in patients with a “high pretest probability” for OSA. The following section outlines the logic used in establishing the definition of “high pretest probability.” The clinical determination of the pre-test probability is beyond the scope of this document, but will be addressed in a companion paper.

Essentially, the OOC device should be used to increase the pretest probability to a sufficiently high post-test probability that one is very certain that the patient has OSA. For the purposes of this paper, we will recommend that to be considered as having OSA, the post-test probability should be ≥ 95%. The relationship between the pre- and post-test probability may be described by the likelihood ratio, or LR. We are most concerned with using OOC devices to “rule in” OSA, and therefore we are interested in the positive likelihood ratio, LR+.4 The criteria for using OOC devices to “rule out” OSA may be different and are not addressed here.

The combination of the following 3 equations describes the relationship between the LR+, pretest probability, and post-test probability and is shown graphically in Figure 1:

oddspre = probabilitypre/(1-probabilitypre)

oddspost = oddspre × LR+

probabilitypost = oddspost/(1+oddspost)

The relationship between LR+, pretest probability, and post-test probability

Figure 1

The relationship between LR+, pretest probability, and post-test probability

This foundation enables a judgment to be made as to a device's adequacy to be used to help diagnose OSA given a patient's specific pretest probability. Figure 1 shows the relationship between post-test probability and LR+ at a variety of pretest probabilities between 0.5 and 0.95. The required device LR+ to achieve a post-test probability of 95% increases as the pretest probability decreases. For example, if the pretest probability is only 50%, the required LR+ would be greater than 20 (which is off the scale on this figure). For the purposes of this paper, a minimum pretest probability of 80% is suggested such that a reasonable requirement for LR+ can be employed for assessing devices. From Figure 1, at the given pretest (80%) and post-test (95%) probabilities, the device must have an LR+ of at least 5 to be clinically useful. This LR+ cutoff value will change if different pretest or post-test probabilities are employed. Similarly, if the device has an LR+ greater than 5, it could be acceptable at lower pretest probabilities. Also, if the device has an LR+ of 5 and the population has a pretest probability of only 50%, the post-test probability of disease drops to only 83%.

We also wanted to be sure that we are not increasing the burden of testing to the patients. That is, will the burden of some patients getting 2 tests (complex and simple) outweigh the benefit of some patients needing only 1 simpler test? This depends on how much burden the simple test causes relative to the gold standard as well as other factors (e.g., patient doesn't follow up after initial negative test). We suggest that we want to have at least two-thirds of the population (66%) be diagnosed accurately as positive with the simple test; therefore, we set a minimum value for the sensitivity at 0.825. We have also included in the results (Tables 3–11) the negative likelihood ratios, but have concentrated on the positive LR in our schema.

Additionally, the definition of what level of AHI is considered adequate to confirm a diagnosis of OSA is variable across studies. According to the ICSD,5 an AHI ≥ 5 with symptoms is indicative of OSA. But should that definition remain the same for OOC testing considering the variety of event definitions, event detection technologies, as well as a difference in clinical management protocol? The authors judged that it should, for reasons including the desire to minimize the number of follow-up in-lab PSGs. Figure 2 illustrates the concept of setting the AHI cutoff at 5 to maximize true positives, true negatives, and minimize false negatives in the context of a high pretest probability for OSA. To summarize, devices are judged on whether or not they can produce an LR+ of at least 5 and a sensitivity of at least 0.825 at an in-lab AHI of at least 5.

Illustration of the combination of the populations of patients with and without OSA with respect to the AHI cutoff, high pretest probability, true positive, true negative, and false positive results

Figure 2

Illustration of the combination of the populations of patients with and without OSA with respect to the AHI cutoff, high pretest probability, true positive, true negative, and false positive results

Only if the authors defined OSA at a different cutoff or if they used the Chicago criteria6 (see footnote following article) were different cutoffs considered in this evaluation of LR+. As described previously, this is the proposed criterion because AHI utilizing the “Chicago criteria” results have been found to be roughly 3 times the 2007 Scoring Manual Recommended rule.1,2 Therefore an AHI of 15 determined using the 1999 rules would be roughly equivalent to an AHI of 5 using the current recommended rule.

3.0 The SCOPER Categorization System

3.1 Development

In 1994, the AASM (formerly the ASDA) published Practice Parameters for the Use of Portable Recording in the Assessment of Obstructive Sleep Apnea7 and an associated review paper8 that categorized out-of-center (portable) devices into 4 levels: (1) standard attended PSG; (2) comprehensive portable PSG (unattended); (3) modified portable sleep apnea testing (unattended, minimum of 4 channels including ventilation [at least 2 channels of respiratory movement or a combination of respiratory movement and airflow], heart rate or electrocardiography (ECG), and oxygen saturation; and (4) continuous single- or dual-bioparameter recording (unattended).8 However, it has become increasingly apparent that with the continual technological changes that occur over time, this categorization is no longer useful. Many devices do not fit into these categories. Therefore, a new categorization scheme is needed.

A new scheme is suggested based upon the sensors used to measure each of the following:

3.2 Assessment of Each SCOPER Category

Each category of SCOPER was assessed individually as described below:

Sleep: The presence of a measurement of sleep was not quantitatively evaluated. The logic employed is that the measurement of sleep relates in large part to the final assessment of the sleep disordered breathing index, i.e., whether the denominator of the index is per hour of sleep or per hour of recording time. This will predominantly affect the cutoffs of positive or negative diagnoses, which can be addressed with some calibration of the device to in-laboratory PSG studies. At least 1 study9 stated that the addition of sleep surrogate measurement (e.g., actigraphy) did not improve the device's performance for patients with a high pretest probability for obstructive sleep apnea.

Cardiovascular: The cardiovascular measurement evaluation was focused on devices that used either the cardiac signal or a vascular signal (e.g., peripheral arterial tonometry) to derive a respiratory event index. This is the only signal in which the usual PSG signal (one lead of ECG) is “demoted” to a lower level as it is used on a PSG typically for rate and rhythm analysis, rather than assessment of disordered breathing. We realize that these signals often measure different physiologic signals (cardiac vs. vascular tone) but felt that the novel PAT signal fit best in this category, although one could also argue to put it in a different category (respiratory or sleep).

Oximetry: Since the definition of AHI as measured by conventional parameters relies on desaturation to identify many events, for the determination of REI, a device must include an oximeter.

Position: Although we felt important to note in the categorization, the presence of a measurement of position was not quantitatively evaluated since it is not routinely used to diagnose OSA. P1 is considered video or visual confirmation of body position, and P2 is considered any other method to determine body position that is non-visual in nature. The effect of positional variations on OSA is a topic for research.

Effort: The addition of a measurement of respiratory effort was included in the key questions that follow. The best effort measure as noted in the scoring manual1 is respiratory inductance plethysmography (RIP) with 2 belts. Beyond this gold standard, research on the use of 1 RIP belt, 2 or 1 piezo belt and other effort measures is scanty with regards to OOC devices.

Respiratory: The evaluation focused on the measurement of airflow, either by conventional or alternative methods.

Levels of each category of SCOPER are outlined in Table 1. These levels are numbered from 1 up to 5 (depending on category) and are based on the type of sensor or measurement that the device uses for that category. Note that when the device does not measure a certain category, that category is not included in its SCOPER identification and a “0” is listed in that category in Table 2. Also, if the type of signal has not been adequately defined in the study to allow a number to be assigned, it is denoted by an “x”. In particular, for oximetry, the sampling rates have typically not been described in the literature, which has led to most devices being designated “O1x”, indicating a finger or ear oximeter where the sampling time and/or rate have not been adequately described.

SCOPER Categorization System

Sleep

Cardiovascular

Oximetry

Position

Effort

Respiratory

S1 – Sleep by 3 EEG channels+ with EOG and chin EMG

C1 – more than 1 ECG lead – can derive events

O1 – Oximetry (finger or ear) with recommended sampling

P1 – Video or visual position measurement

E1 – 2 RIP belts

R1 – Nasal pressure and thermal device

S2 – Sleep by less than 3 EEG+ with or without EOG or chin EMG

C2 – Peripheral arterial tonometry

O1x – Oximetry (finger or ear) without recommended sampling (per Scoring Manual) or not described

Table 2

4.0 Classification of OOC Devices by SCOPER

A systematic search of the literature was performed, and when peer-reviewed literature in English was available for an FDA-approved device, data were extracted according to standardized methodology (see Appendix I). These data were used to categorize the devices according to the SCOPER scheme as shown in Table 2. Devices that were used in more than 1 configuration have more than 1 SCOPER categorization. We limited the categorization to those configurations with appropriate literature and did not list all the configurations possible per manufacturers' specifications. We have not included (1) FDA-approved devices for which there is no literature; (2) non-FDA approved devices that do not have a current related FDA-approved device on the market; (3) single- channel devices; (4) therapeutic devices used in diagnostic mode; or (5) devices without oximeters.

5.0 Methods and Key Questions

We developed a series of questions to evaluate the OOC devices. Because oxygen desaturation is currently required for scoring certain defined sleep related breathing events, acceptable devices must have oximetry as a mandatory signal. Therefore any device without an oximeter is currently not considered an acceptable device. Subsequently, all devices reported on herein have an oximeter. Because the effort and respiratory categories of SCOPER are significant parameters for diagnosing OSA, they will be the focus of this evaluation, which addresses the following key questions:

Is a thermal sensing device without an effort measure adequate to diagnose OSA?

Is a thermal sensing device with a measure of effort adequate to diagnose OSA?

Is nasal pressure without an effort measure adequate to diagnose OSA?

Is nasal pressure with an effort measure adequate to diagnose OSA?

With an effort measure, is nasal pressure in combination with a thermal sensing device significantly better than either a thermal sensing device or nasal pressure alone to warrant the requirement of both sensors?

What is the evidence for alternative devices to diagnose OSA?

Details on the methodology used to find, grade, and extract the literature to answer these questions are described in Appendix I (pages 544-6), as well as the grading results for each study. In particular, the setting of the tests is considered important. The best evidence for an OOC device is when tested concurrently with PSG and also tested OOC (designated as “home/lab” or H/L). The studies designated as H/H (“home/home”) denotes the device was tested in the home against a comprehensive portable PSG device also performed in the home; and L/L (“lab/lab”), where the device was only tested simultaneously against PSG in the lab. The sensitivity of the devices was also reported in order to show the rate of false negatives. In some cases, the LR+ was calculated from the data in the paper based on our definitions and what was reported by the authors. Lastly, although results at AHI/REI ≥ 5 were desired, often they were not presented at this cutoff value; in those cases the data are presented at the cutoff reported. Also ideally, the LR+ and sensitivity cutoffs should be met in the home setting, but because the data are so sparse, for this paper, if the criteria were met in any setting the device configuration described was considered acceptable.

Appendix II (page 547) lists the studies that were excluded with the reasons for their exclusion. Appendix III (page 548) lists other additional outcomes information that was found in the excluded literature that did not specifically provide data for this assessment.

6.0 Results

6.1 Key Question 1

Is a thermal sensing device without an effort measure adequate to diagnose OSA?

There was 1 paper10 that contained data that could be used to address this question. Table 3 summarizes the data. The Apnoescreen I, which did not have any effort measurement, did not produce an LR+ greater than 5 at the only reported REI/AHIns (≥ 10).

Table 3

Summary

The literature is currently inadequate to state with confidence that a thermal sensing device alone is adequate to diagnose OSA.

6.2 Key Question 2

Is a thermal sensing device with a measure of effort adequate to diagnose OSA?

Three papers addressed this question. The data are summarized in Table 4. Two of the papers compared 2 OOC devices to each other: in Iber et al.,11 the same device (with a thermocouple) was used both in the home and in the lab (S2C3O1xE1R3/PS-2 System), and in Takama and Kurabayashi,12 a simpler device (C4O1xExR3/Somté Morpheus) was compared to a more complex one (S2C4O1xE1R3/P-Series Plus) both with thermistors in the laboratory. Interestingly, the direct comparison of the same device in different environments (Iber et al.) provided an LR+ of only 3.1. The authors stated that their observed differences were equivalent to the variability of repeated studies in the same setting. Positional variation was a reason posed for the variability, but unfortunately, their device did not measure position so a detailed analysis was not possible. In Takama and Kurabayashi's study, an LR+ of 5.8 was found for REI ≥ 20 for the simpler device in the home as compared to the complex one in the laboratory. No data were provided on lower REIs. This was only a Level IVb study because of lack of blinded scoring.

Devices with thermistor as the only measure of respiration (R3) with effort measurement

Table 4

One other study9 compared a simpler device to laboratory PSG. The Apnoescreen II which uses a thermistor, produced an LR+ greater than 5 at both REI/AHIns reported (10 and 15). The Apnoescreen II (S3C3O1xP2E4R3) used 2 piezoelectric effort belts, and it had an ECG lead. Two of the 3 studies met the sensitivity cutoff.

Summary

This small data set indicates if a thermal sensing device is used as the only measure of respiration, 2 effort belts are required as part of the montage. It appears that piezoelectric belts are acceptable in this context.

6.3 Key Question 3

Is nasal pressure without an effort measure adequate to diagnose OSA?

Three papers13–15 addressed this question. The data are summarized in Table 5. The first device (C4O1xR2)had an adequate LR+ on 50 patients, but it has not been tested in the home.13 Further validation in the home setting is needed to confirm this result.

Devices using only nasal pressure as a measurement of respiration (R2)

Table 5

The other papers14,15 discuss the ARES device (S3C4O2P2E3R2). According to the manufacturer, the ARES relies on a change in nasal pressure and SpO2, and confirms each event based on an arousal identified by changes in snoring, head movements, or sympathetic arousal (pulse rate increase). In the first study (Ayappa et al., Level Ia)14 of over 90 patients, the lab/lab portion of the study showed adequate LR+ (6.0 at REI/AHIs ≥ 5), but the home/lab portion showed inadequate LR+ (4.4 at REI/AHIs ≥ 5). It is acknowledged that meeting the LR+ criterion in the home setting is more challenging than in the lab setting. This evaluation is based on the AHIR4%ARES vs AHI4% NPSG data (other definitions and results were presented, but this definition most closely matches the Scoring Criteria). In a lab/lab comparison in the second study, To et al.15 (Level IIa) reported adequate results (∞ at AHI ≥ 5) in a large study on 141 patients.

Summary

Although the 2 devices that fit into this Key Question have different configurations, they both show adequate LR+ and sensitivity values that indicate nasal pressure (with oximetry) without an effort measure is adequate to diagnose OSA. However, the uniqueness of the ARES scoring may not make this configuration (nasal pressure alone without effort measure) broadly applicable, and further studies with this configuration are needed.

6.4 Key Question 4

Is nasal pressure with an effort measure adequate to diagnose OSA?

There were 4 papers that contained the required data to address this question. Table 6 summarizes the data. One of them had adequate LR+ results (Ng et al.),16 and one had adequate LR+ results at an REI that had been previously calibrated to the in-lab results (Dingli et al.).17 Two studies reported inadequate LR+ (Yin et al.,18 Santos-Silva et al.19).

Table 6

Dingli et al., who used 2 piezoelectric belts (O1xP2E4R2/Embletta)with a nasal presure sensor, achieved a calculated LR+ of ∞ at AHI ≥ 15 on PSG, which was defined as OSA-positive. The lab/lab portion of the study was used to calibrate the device and cutoffs to in-lab results. For the LR+ calculation, the authors' category of “possible OSA” [10 < AHI < 20] was added to the negative fraction for the results reported in Table 6. When the “possible OSA” category was recategorized as positive, the LR+ decreased to 4.

Two studies present LR+ results that were inadequate, and they were in the category C4O1xP2E4R2. One piezoelectric belt was used to measure respiratory effort. Since the scoring criteria were equivalent or almost equivalent for the OOC and in-lab tests, it is unlikely that the inadequate LR+ results are due to scoring issues. The only time that the LR+ was adequate was at extremely high REIs (≥ 30 for Santos-Silva et al.19 and ≥ 50 for Yin18).

The sensitivity criterion was met in 3 of the 4 studies.

Summary

The data indicate that nasal pressure can be an adequate measurement of respiration when either 2 piezoelectric or RIP belts are used to measure effort (but not 1 piezoelectric belt).

6.5 Key Question 5

With an effort measure, is nasal pressure in combination with a thermal sensing device significantly better than either a thermal sensing device or nasal pressure alone to warrant the requirement of both sensors?

There was only 1 Level Ia study20 where both a nasal pressure transducer and a thermistor were used to report an LR+ outcome, and this device used 2 piezoelectric bands. The data, presented in Table 7, showed inadequate LR+ at REI/AHI ≥ 5. This is a counterintuitive result, and more data are needed. It is possible that more complex devices may not give better results.

Table 7

Summary

There is insufficient evidence to state that both nasal pressure and thermal sensing device are required to adequately diagnose OSA.

6.6 Key Question 6

What is the evidence for alternative devices to diagnose OSA?

The final question involves evaluating alternative devices that derive or calculate REI from signals other than those that directly measure respiration with either a thermistor and/or nasal pressure. Four alternate methods of determining REI were identified, based on the following signals:

Peripheral Arterial Tonometry (PAT)

Cardiac signals plus oximetry

End-tidal carbon dioxide (ETCO2) as an alternative measure of airflow

Acoustic signals as a substitute for airflow

The following is a summary of the data for the nonstandard methods of determining REI.

6.6.1 PAT Signal

There were 7 studies meeting inclusion criteria that compared WatchPAT (either S3C2O1xP2 or C2O1xP2) to in-lab PSG. This device is based on the PAT signal, oximetry +/- actigraphy. There were 3 Level Ia studies, 3 Level IIa studies, and 1 Level IIb study on these devices. The data are summarized in Table 8.

Table 8

The 3 Level Ia studies included Bar et al. (lab/lab portion),21 Zou et al.,22 and Pang et al.23 In Bar et al., 102 patients included both healthy volunteers and those with suspected OSA. LR+ was calculated as 7 at REI/AHIns ≥ 10 (from Figure 7 in the original paper); REI was scored according to Chicago criteria, and the LR+ was adequate. Zou et al. performed a home/home study of 106 patients; the LR+ was calculated as 9 from Figure 4 in the original paper and was an adequate value. Pang et al. performed a lab/lab study on 37 patients with suspected OSA. The LR+ at REI/AHIns > 5 was 4.7, which is marginal.

The 3 Level IIa studies included Pittman et al.,24 Pittman et al.,25 and Ayas et al.26 Pittman et al.24 included both lab/lab and home/lab comparisons in this study on 30 patients. REIs were presented for the in-lab PSG using both the Chicago and standard AHI criteria and compared to REIs generated with a proprietary algorithm for the Watch PAT. The LR+ at REI ≥ 5 using standard AHI scoring criteria was 13.0 in the lab and ∞ at home, which is adequate.

Pitman et al.25 assessed residual SDB during CPAP therapy with the Watch PAT in a lab/lab study. Using the Chicago criteria, the LR+ at REI > 5 was 1.6, which is inadequate. Ayas et al. performed a lab/lab study of 30 persons with and without suspected OSA. For the AHIns threshold of 10 using Chicago criteria, the optimal LR+ (not defined at what REI cutoff) was 2.9, which is inadequate. For a threshold of AHI ≥ 15, it was also inadequate at LR+ = 3.5.

The remaining study was Level IIb (Choi et al.),27 who reported the LR+ as 5.9 at REI/AHIs ≥ 5 on 25 subjects studied in-lab and in the hospital; all lab PSGs were scored according to the 2007 scoring manual.

The sensitivity results were adequate for 6 of the 7 papers.

Summary

Although the scoring criteria were variable and the results at PSG-AHI cutoff of 5 were not always reported, overall the data indicate that this device is adequate for the proposed use. Two of the 3 Level Ia studies reported adequate LR+ and one was marginal. One Level IIa study reported adequate LR+, 1 was marginal (depending on scoring and AHI cutoffs), and the other inadequate. The Level IIb study reported adequate LR+.

6.6.2 Cardiac Signals Plus Oximetry

One device (a Northeast Monitoring Holter-oximeter) based the REI on 1 ECG channel plus oximetry (C3O1x). The REI is calculated by a pattern recognition algorithm based on a combination of cyclic variations in heart rate associated with apnea, ECG-derived respiration, and SpO2. In this Level Ia study by Heneghan et al.,28 a lab/lab comparison was made using this device vs. PSG. The data are summarized in Table 9. The LR+ was adequate (∞ at REI/AHIns ≥ 5, 8.6 at REI/AHIns ≥ 10, 20.8 at REI/AHIns ≥ 15) from the 63 patients studied. Sensitivity was adequate for the REI/AHIns ≥ 5 criteria.

Table 11

There were 2 studies, 1 Level Ia (Reichert et al.)30 and 1 Level IVa (Claman et al., Level IV because blinding not stated),31 on the device Novasom QSG (C4O1xExR5 and C4O1xE4R5, respectively; also known as Bedbugg or Silent Night). This device reports REI using an algorithm based on sound measurements obtained from 2 microphones at the upper lip that record snoring intensity and ambient noise. The device also has a finger oximeter and an effort measure (1 identified as a pressure transducer; the other was not identified). For REI/AHIns ≥ 10, Claman et al. reported an LR+ of 14 (adequate) in a lab/lab study. Reichert et al. reported an LR+ of 10.6 in a lab/lab comparison and 5.4 in a home/lab comparison at REI/AHIns ≥ 15. Data were not reported for lower AHIs.

There were 2 studies on 2 versions of the SNAP (C4O1xExR5 and C4O1xR5). These devices are based on measures of acoustic oronasal airflow and analyzed using a proprietary algorithm to define REI. Su et al. (Level Ia study)32 used a version of the device that included 1 chest belt (type undefined), and reported an inadequate LR+ of 1.6 at REI/AHIns ≥ 5. Michaelson et al. (Level IIa study in a lab/lab comparison),33 who did not report the use of any effort channel, reported an adequate LR+ (7.1) at REI/AHIns ' 5.

Another device used sound plus oxygen desaturation as an alternative method for determining REI. In a Level Ia study on Remmers/Snoresat (O1xP2R5), Jobin et al.34 compared the device-determined REI results to 3 different REIs obtained from the Suzanne recorder, C4O1xP2E4R2 in a home/home study. Unfortunately, none of the compared definitions of REI were similar to that recommended in the scoring manual. Nonetheless, the reported LR+ at REI/AHIns ≥ 5 ranged from 3.68 to 4.42, which are inadequate.

Finally, in a Level Ia study by Westbrook et al.35 on an earlier version of the ARES device (S3C4O2P2R5) that utilized quantitative acoustic scoring measures plus behavioral/autonomic arousal detection, the results were adequate for the lab/lab comparison (although only results at AHI ≥ 10 were reported), but inadequate in the home setting, although it is acknowledged that achieving adequate results in the home setting is more challenging. The sensitivity results were adequate in both settings.

Summary

For this alternative measure, the data are insufficient to determine whether the use of acoustic signals with other signals as a substitute for airflow is adequate to determine REI. This is because of the varied literature base where (1) only 1 study reported adequate LR+ at REI/AHIns ≥ 5, but it was only a lab/lab study (Michaelson et al.)33; (2) the reported results were at AHI cutoffs that were too high (Reichert et al., Claman et al.)30,31 and only performed lab/lab (Claman et al., Westbrook et al.)31,35; and (3) the reported LR+ were too low at REI/AHIns ≥ 5 (Su et al., Jobin et al., Westbrook et al. H/L).32,34,35

Sensitivity was adequate for all studies in this category.

7.0 Conclusions

Due to the variety of OOC devices now available, the previously accepted method of categorization of these devices is unsuitable; therefore, a new classification system, SCOPER, was proposed in this paper. SCOPER will allow the easy classification of OOC devices based on the types of sensors that they use to aid in the diagnosis of OSA, including sleep, cardiac, oximetry, position, effort, and respiratory measures.

The results of this technology evaluation are that the literature is currently inadequate to state with confidence that a thermistor alone without any effort sensor is adequate to diagnose OSA. If a thermistor is used as the only measure of respiration, 2 effort belts are required as part of the montage. It appears that piezoelectric belts are acceptable in this context. The data indicate that nasal pressure can be an adequate measurement of respiration without an effort measure; however, at this time, this may be device specific and further research is required before recommending broad usage. Nasal pressure may also be used when either 2 piezoelectric or RIP belts are used to measure effort (but not 1 piezoelectric belt). There is insufficient evidence to state that both nasal pressure and thermistor are required to adequately diagnose OSA. With respect to alternative devices for diagnosing OSA, the data indicate that peripheral arterial tonometry (PAT) devices are adequate for the proposed use; the device based on cardiac signals shows promise, but more study is required as it has not been tested in the home setting; for the device based on end-tidal CO2 (ETCO2), it appears to be adequate for a hospital population; and for devices utilizing acoustic signals, the data are insufficient to determine whether the use of acoustic signals with other signals as a substitute for airflow is adequate to diagnose OSA.

8.0 Future Directions

This paper is meant to be the first step in a comprehensive process to evaluate and subsequently make recommendations on how to use OOC testing devices in an outpatient population. It is anticipated that the next paper will address the important issues of determining pretest probability, interpreting study results, developing testing algorithms and treatment decisions.

In 2003, Flemons et al.36 published an evidence review on the home diagnosis of sleep apnea. In that paper, the authors nicely outlined the types of parameters that should be followed to properly assess OOC testing devices. Unfortunately, in the 8 years since that publication, many studies lack the important information required to make useful comparisons. A more recent paper provides very specific details about performing research with these devices and should be referred to for more detail on this subject.37 For the evaluation of OOC testing devices, future studies would greatly benefit by the use of consistent outcomes measures to facilitate direct comparisons and meta-analyses of studies. As described herein, at a minimum, LR+ at an REI/AHIs cutoff of 5 along with sensitivity are the desired outcome measures.

Comparison to the gold standard in-lab PSG with respect to sensor selection, montage selection, and scoring methods would unify the data with respect to the device that is being compared. The recommended montage is listed in the scoring manual, and the companion paper defines the recommended method of scoring. Currently, the definition of AHI is based on the signal from an oximeter, hence the requirement for the presence of that sensor. Emerging technologies allowing respiratory events to be defined in novel ways will be welcomed innovations.

Systematically testing various sensors and combinations of sensors with OOC devices would help to answer the question of the minimum number and type of sensors that are required to acceptably diagnose patients with OSA in an unattended home setting. This was particularly noted in examining the effort signals in which there was often minimal or no data on the type of signal employed or whether 1 or 2 belts were utilized. In addition, the effect of measuring actual sleep time with various sensors including EEG, actigraphy, and other technologies versus recording time is an area to revisit in the future. This will allow for further refinement in the definition of REI as well.

In addition, following modern experimental designs to minimize bias (such as blinded scoring; prospective, randomized, controlled designs; assessing validated standard measures such as PSG on all patients; ensuring low data loss and high percentage of patients who complete studies; and fully describing the PSG and device sensors, montages, and scoring criteria) would enhance the level of confidence in the results. Funding from sources not invested in the results or an explanation of the role that the funding source played in directing the study and/or interpreting the results, would help address any conflicts of interest concerns. In addition, the use of experimental designs that simulate clinical use would be ideal, including the degree and detail of instruction given to patients. Obviously the sample size is important, and a description of the sample with information about eligibility, dropouts, missing data, and refusal to participate should be included.

Other important issues relate to the study population; most studies have concentrated on white males without comorbidities, although some studies are beginning to branch out to more diverse populations, which include more females, other ethnic groups, and patients with comorbid diseases such as heart failure. In fact, 4 devices reported in 5 studies were tested only on patients with heart failure.38–42 A related issue is the use of OOC devices to differentiate obstructive from central sleep apnea. This becomes more important when broader population groups are studied. Future iterations of the paper will address this in more detail as more information is available.

Another problematic issue in assessment of these studies includes the scoring of the OOC device. Many devices have proprietary algorithms which cannot necessarily be checked by the interpreting physician for accuracy. Again, clear discussion of the algorithm is mandatory, and the ability to review raw data is also required.

FOOTNOTE

These events must fulfill criterion 1 or 2, plus criterion 3 of the following:

A clear decrease (> 50%) from baseline in the amplitude of a valid measure of breathing during sleep. Baseline is defined as the mean amplitude of stable breathing and oxygenation in the 2 minutes preceding onset of the event (in individuals who have a stable breathing pattern during sleep) or the mean amplitude of the 3 largest breaths in the 2 minutes preceding onset of the event (in individuals without a stable breathing pattern).

A clear amplitude reduction of a validated measure of breathing during sleep that does not reach the above criterion but is associated with either an oxygen desaturation of > 3% or an arousal.

The event lasts 10 seconds or longer.

DISCLOSURE STATEMENT

This was not an industry supported study. Dr. Mehra has participated in paid speaking engagements. Dr. Tracy is an employee of the American Academy of Sleep Medicine. The other authors have indicated no financial conflicts of interest.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the contributions of the following members of the AASM staff: Christine Stepanski, M.S., for literature search contributions and help in coordinating the project; Kathleen McCann for help coordinating the review process; Sherene Thomas, Ph.D., for help with editing; Carolyn Winter-Rosenberg for help in coordinating the project; and Judy Coy, R.N., for help in formulating the project direction.

Sleep-related breathing disorders in adults: recommendations for syndrome definition and measurement techniques in clinical research. The Report of an American Academy of Sleep Medicine Task Force. Sleep. 1999;22:667–89. [PubMed]

7

Practice parameters for the use of portable recording in the assessment of obstructive sleep apnea. Standards of Practice Committee of the American Sleep Disorders Association. Sleep. 1994;17:372–7. [PubMed]

Flemons WW, Littner MR, Rowley JA, et al., authors. Home diagnosis of sleep apnea: a systematic review of the literature. An evidence review cosponsored by the American Academy of Sleep Medicine, the American College of Chest Physicians, and the American Thoracic Society. Chest. 2003;124:1543–79. [PubMed]

Nakayama-Ashida Y, Takegami M, Chin K, et al., authors. Sleep-disordered breathing in the usual lifestyle setting as detected with home monitoring in a population of working men in Japan. Sleep. 2008;31:419–25. [PubMed Central][PubMed]

Appendix I: Methods

Search Strategy

The search was performed in 2 parts. The first was a general search using PubMed of MEDLINE for relevant, original, peer-reviewed literature in English in the last decade (back to January 2000) and is shown in Table A1. The second consisted of device-specific searches in PubMed since its inception using the device names as search terms.

Device names were obtained in 4 ways: (1) by pearling the first literature search results for device names; (2) from FDA 510(k) device listings in the FDA database; (3) by literature obtained by company representatives at the SLEEP conference in 2010; and (4) from the September 2009 Sleep Review Magazine Home Sleep Testing Comparison Guide.

Table A1

Inclusion and Exclusion Criteria

Only peer-reviewed English literature was included in the analyses, therefore data available from manufacturers on their websites, in product brochures, or published in non-peer-reviewed magazines were excluded. Only devices measuring 2 or more bioparameters were included.

Grading

The grading scheme was a modified form of that used in the 2003 “Home Diagnosis of Sleep Apnea: A Systematic Review of the Literature.”36 The results of the evidence grading process are shown in Table A2.

Table A2

The presence or absence of three key indicators of quality dictated the assignment of evidence level based on an approach published by Sackett et al.43 The definitions of these evidence levels are listed below as follows:

blinded comparison, consecutive patients, reference standard performed on all patients;

blinded comparison, nonconsecutive patients, reference standard performed on all patients;

Table A3

The definitions of the three indicators used to assign level of evidence were as follows:

Blinded comparison: the portable monitor and polysomnogram were scored separately and without knowledge of the results of the other investigation; or the portable monitor study was scored automatically, and it was performed after the PSG was scored. If the investigators failed to mention whether or not the scorers were blinded, this criterion was deemed not to have been met.

Consecutive or random patients: the investigators did not participate in deciding what patients were included in the study. This criterion was met if patients were referred to a sleep clinicrather than a sleep laboratory(unless the investigators explicitly stated that they did not participate in selecting the patients referred to the laboratory). Either consecutively or randomly chosen patients were enrolled.

Reference standard was performed on all patients: all patients entered into the study must have undergone both a portable monitor test and a polysomnogram (either in-lab or comparison device (e.g. comprehensive portable PSG, depending on the aims of the study)). If the results of one test influenced the decision to perform the other, then this criterion was deemed not to have been met.

Seven other aspects of a study's methodology were scored, and a quality rating was assigned based on the number of indicators for which the study met the criteria. Although the random assignment of testing was an important indicator, it was not applicable to studies that had studied a portable monitor simultaneously with polysomnography. Thus, in some circumstances studies were rated on 6 indicators rather than 7. The quality indicator (a to d) was based on the number of indicators for which that study did not meet the criteria, as follows:

zero or one quality indicators not met;

two quality indicators not met;

three quality indicators not met;

four or more quality indicators not met.

Quality rating

a

0 or one quality indicators not met

b

Two quality indicators not met

c

Three indicators not met

d

Four or more quality indicators not met

Table A4

The seven indicators and their definitions are listed below as follows:

Prospective recruitment of patients: the portable monitoring test and the polysomnogram were performed as patients were recruited into the study rather than reviewing a series of patients who had previously been studied.

Random order of testing: patients were assigned to undergo portable monitoring testing or polysomnography first at random rather than at the discretion of the investigators. If the portable monitoring study was performed simultaneously with the polysomnogram, this indicator was not rated.

Low data loss (< 10%): there were < 10% of patients whose results could not be compared because of the loss of polysomnography or portable monitoring data. This indicator allows for the repetition of studies to obtain acceptable results.

High percentage completed (> 90%): of the patients who were initially enrolled into the study (not counting a priori exclusions), > 90% completed the study protocol.

Polysomnography methodology/definitions fully described: the polysomnography methods must include the following:

characterization of the equipment used;

definitions and criteria of all types of breathing events scored and used in comparisons.

Portable monitor methodology/definitions fully described: the polysomnography methods must include the following:

characterization of the equipment used;

definitions and criteria of all types of breathing events scored and used in comparisons

Portable monitor scoring fully described: includes a clear statement of whether manual or automated scoring was used, and, if automated, whether there was manual review/revision done.

Data Extraction

The strategy was taken to extract an amount of data that was succinct, complete, yet not overwhelming. The following data were extracted from the studies:

Table A5

Thermal Sensing Device

In a Level IVa study, BaHamman47 [S1C3O1xP2ExR3(Alice 4)] reported on the failure rate of signals and sensors when using the same equipment attended in-lab versus unattended in a hospital. Hook-up was by a trained technologist and there was a pre-investigation into causes of failures of signals and modifications thereof. The failure rate for the different signals ranged from 0.128 min in electrocardiography (EKG) to 67.36 min in the thoracic belt signal. However, that did not affect the success rate of the studies. Acceptable scorable data was available in 97% of the performed unattended PSGs.

Quintana-Gallego et al.68 [S3C3O1xP2E4R3 (Apnoescreen II, effort not defined, but assumed to be the same as that reported for Apnoescreen II from Garcia-Diaz et al.9)] reported on a cohort of patients with stable congestive heart failure (CHF). Since this is a specific population with comorbidity, these data were not included in this evaluation. The paper was Level Ib, was conducted in the home and lab setting, and good LR+ results were obtained (7.2 at AHI ≥ 5 and 39.5 at AHI ≥ 10) on 90 enrolled patients (75 completed).

Nasal Pressure Devices

Bridevaux et al.49 reported in a Level IVb study on the interobserver agreement for a home study of a C4O1xP2E4R2(Embletta) device (that used 2 piezoelectric effort belts). The ICC for AHI was reported to be 0.73.

Skomro et al.69 reported in a Level Ib study on the use of a C4O1xP2E1R2(Embletta) device using home vs. lab management schemes. They reported that there were no significant differences in ESS, PSQI, SF-36, BP, or CPAP adherence when the subjects were diagnosed and prescribed CPAP treatment with the OOL device vs in-lab PSG.

One study each reported on reliability [Levendowski et al.57 Level IVb home/lab reliability] and effort sensor validation [Popovic et al.66 Level IIa Lab/Lab data on effort sensors] of the ARES Unicorder (S3C4O2P2E3R2). Levendowski et al. reported that night-to-night variability in the home was actually 50% less than that in the lab. Popovic et al. reported in a small study (n = 14 completed) on the intrarater and interrater reliability of forehead venous pressure as a measure of effort as an alternative to esophageal manometry, chest, or abdominal piezoelectric belts for the measurement of respiratory effort. With respect to interrater κ scores, the chest belt was superior to the other measures. For intrarater agreement versus the gold standard of esophageal manometry, the other 3 measures showed near perfect agreement. FVP was superior to either effort belt in the detection of obstructive apneas and hypopneas, similar in the detection of persistent flow limitation and physiological changes in ventilation, and inferior in the detection of central events.

Nakayama-Ashida et al.62 reported on the reliability of the Somté plus actigraphy (S3C4O1xP2E1R2) in a Level IIIb home study. The ICC was 0.98 for interscorer reliability and 0.95 for night-to-night variability.

Chung et al.51 reported in a Level Ia lab/lab study on the validity of the Embletta X100 S2C1O1xP2E1R2 versus in-lab attended PSG that manual scoring was superior to automated scoring for reliability. In a second part of the study (Level IIIa), in a perioperative home or hospital unattended setting, 88.7% of the recordings were technically good, 9% technically acceptable, and 2.3% were failures.

Yin et al.73 used C4O1xP2E2R2Stardust II in a Level IIIa home study of automated vs. manual scoring. At AHI < 30, the maximum LR+ was 2.6; for AHI ≥ 40 the LR+ was 5.5, and for AHI ≥ 50, the LR+ was 13.3. The authors concluded that analyses should be done manually.

Smith et al.70 studied a population of patients with chronic heart failure with O1xP2E4R2(Embletta) in a Level Ib study with both H/L and L/L comparisons on 20 patients. Two piezoelectric belts were used. LR+ was calculated as 2 (not clinically useful). Scoring depended on either a reduction in airflow or thoracic-abdominal movements. Since piezoelectric belts were used, this could explain the poor agreement with lab PSG. Alternative conclusions are that this population should not be studied out-of-lab or that nasal pressure without a thermal sensor is not recommended in this situation.

Thermal Sensing Device Plus Nasal Pressure Devices

Jurado-Gámez et al.56 reported in a Level Ib home/lab study on the performance of the Compumedics P-Series Screener (C4O1xP2E1R1). Fifty-two patients completed the study. No ROC curves were presented, but the AUCs were reported as 0.804 at AHI ≥ 10 and 1.00 at AHI ≥ 30. The diagnoses coincided in 88.4% of cases with an ICC of 0.963. Using a visual analog scale, patients preferred home testing over lab PSG testing, with a median score of 7 for lab PSG and 9 for home testing (p < 0.0001).