Introduction

Obstructive sleep apnea (OSA) is a heterogeneous disease with great importance for public health. Untreated, it is associated with numerous significant comorbidities, symptoms, and health consequences [1,2,3,4,5,6]. Studies indicate OSA as an underdiagnosed disease with increasing prevalence and clinical and public health interest [7,8,9,10,11,12]. The morbidity of OSA highly depends on the disease severity, which is conventionally determined by the apnea–hypopnea-index (AHI), the number of respiratory events per hour of sleep [13]. Nocturnal continuous positive airway pressure (CPAP) therapy is accepted as a primary treatment for moderate to severe OSA and has a positive effect on mortality [14]. Still, therapy initiation and follow-up needs to be controlled. To meet the high demand for OSA testing under therapy, it is necessary to exploit cost-effective and reliable systems.

While the current American Association of Sleep Medicine (AASM) recommendations suggest a polysomnography (PSG) as diagnostic test to assess the grade of severity [13], the current recommendations also consider therapy initiation using autotitrating PAP (APAP) at home, and telemonitoring-guided interventions during the initial period of PAP therapy [15]. A current meta-analysis demonstrated equivalent effects on patient outcomes and no difference in residual OSA severity after the initiation of PAP at home compared to an in-laboratory titration. Few studies actually compared the device AHI with a full polysomnography [16, 17].

The data provided by the devices can be used as a reference for therapy adherence (usage time) and effectiveness (residual AHI), but the devices can also be adjusted and optimized remotely, using their built-in systems. Moreover, sleep-disordered breathing (SDB) is a chronic disease that requires regular checkups. Some countries already use the transmitted data to control CPAP therapy adherence. Yet, the health care and data protection regulations require clear responsibility management [18]. Reliability of the data and its use for patient outcome is key, especially when the telemedicine approaches escalate to further ventilation therapies with more advanced devices for home non-invasive mechanical ventilation (NIV) in other respiratory diseases [19].

The current PAP devices detect and record usage time, leakages, and residual respiratory events including flow limitations (FL), hypopneas, and apneas. Integrated sensors continuously monitor pressure changes, airflow changes, and vibrations during therapy to identify any residual SDB. When operating in automatic PAP mode, internal algorithms evaluate the signals to adjust the pressure as necessary. While in PSG, the more accurate sleep/wake detection allows to refer the number of events to the mere sleep time (total sleep time, TST), the PAP devices usually record and store the data throughout the entire time of operation (total recording time, TRT). Some devices might also register and remove artifacts (e.g., excessive air leakages) or wake periods. To address distinctions, a change in terminology for PAP-only recorded residual respiratory disorders into AHIFLOW has already been suggested by the American Thoracic Society (ATS) [20].

Since the devices of the different manufacturers present unique algorithms to detect residual SDB, it is important to determine the reliability of individual therapy devices and detection techniques against the gold standard PSG before important therapeutic decisions with concrete consequences for individual patients can be drawn from the device-based analyses in practice. The aim of this monocentric prospective study was to compare the recordings and analyses of the current generation of prismaLINE devices (Loewenstein Medical Technology, Germany) to a manual evaluation of a PSG in the sleep laboratory.

Methods

Inclusion criteria were age ≥ 18, moderate to severe sleep apnea syndrome (defined as AHI ≥ 15 n/h), recently diagnosed by PSG, indication for CPAP therapy, and use of a nasal mask. Exclusion criteria: CPAP contraindication, participation in another study that influences the setting of CPAP therapy by specifications regarding device setting or deviating titration needs. Patients needing an oronasal mask (OM) were excluded to ensure comparability. An acceptable study was one with a recorded TST > 4 h and < 10% of excessive leakage. During critically high leakage, there is a limited device response and possibly therapy ineffectiveness. From a leakage value of ≥ 50 L per minute, the “High leakage” event is recorded in the therapy device. The Ethics Committee of Witten-Herdecke University, Germany approved the study under number 192/2018. The trial was retrospectively registered under ClinicalTrials.gov Identifier: NCT04407949. Preliminary results were previously presented at the European Respiratory Society Congress in 2020 [21]. All patients provided consent to participate in the study after oral and written information about the study. Anthropometric baseline data were recorded at hospital admission. The cardiorespiratory PSG nights were performed in the certified sleep laboratory under the supervision of a sleep technician. The study-relevant therapy night with CPAP therapy under PSG took place immediately after the diagnosis night. The pressure titration procedure was predetermined for the study (see Fig. 1). Pressure started at 4 cm H2O, a sub therapeutic phase that allowed more residual events, and was increased every hour after sleep onset up to a pressure of 13 cm H2O, when sleep duration allowed this procedure.

Fig. 1
figure 1

Study flow chart

Manual event detection (polysomnography)

The approved AASM guidelines version 2.4 [13] were used to define sleep and respiratory events for both diagnostic and therapeutic PSG recordings. The relevant flow signal was the nasal pressure signal in the diagnostic night, and the PAP device flow signal, which was fed into the PSG recording, in the titration study, as per recommendations of the AASM [13].

Recommended: According to the AASM guidelines, respiratory events during sleep were considered apneas if the peak airflow decreased by ≥ 90% for at least 10 s, and hypopneas in case of a flow signal reduction of at least 10 s of ≥ 30% of the baseline flow associated with a ≥ 3% oxygen reduction and/or an arousal (hypopnea criterion 1A).

Acceptable: In a secondary analysis, we scored respiratory events as hypopneas according to the acceptable AASM criterion 1B, when the flow signal decreased by ≥ 30% for at least 10 s, and this event was associated with a ≥ 4% oxygen desaturation from the pre-event baseline values.

Automatic event detection (therapy device)

The prismaLine type WM100TD CPAP device from Loewenstein Medical Technology is a CE-approved Class IIa medical device for the positive pressure treatment of sleep-related breathing disorders (PAP therapy). All used devices were of the same series and technically equivalent. The algorithm of the devices identifies residual AHI-relevant respiratory events (obstructive and central apneas and hypopneas) and other less severe events such as RERAflow (respiratory effort related arousal detected by the device), flow limitations, and snoring, also to control the auto-CPAP or auto-EPAP function [22]. The devices recognize a respiratory event as apnea if there is a respiratory flow interruption of at least 90% for at least 10 s.

Technique: An apnea is tested for possible upper airway obstruction by a 5 Hz FOT (forced oscillation technique) signal. If the oscillation is found mainly in the pressure signal, the current apnea is assessed as obstructive apnea. If the device detects the oscillation in the flow signal, the current apnea is categorized as central apnea. Hypopneas on the other hand are categorized as such, when there is a reduction in the respiratory flow curve of at least 50% for at least 10 s and two breaths. The current device class also differentiates hypopneas into central and obstructive. Mild obstructions during spontaneous breathing are detected by the degree of “flattening” in the inspiratory flow curve or by snoring measurement deduced from the flow curve. If this is detected during a hypopnea, it is classified as obstructive, otherwise as central.

Data comparison and statistical considerations

The evaluated PSG data as well as the recorded raw data of the devices were anonymized and evaluated separately in order to perform crosschecks. The scorers were blind to the CPAP evaluation. In order to achieve a sufficiently large number of relevant residual respiratory events in total, a number of 50 evaluable recordings were deemed to be necessary. This sample size made an evaluation in accordance with the methods of descriptive statistics reasonable.

The respiratory event indices detected by the devices were compared to the manually scored PSG event indices over the entire recording time of the device (AHIFlow) similar to daily routine in the sleep lab. Yet the data set, consisting of the scored PSG file and the device file, was synchronized by means of a cross correlation function. Periods without data overlap, i.e., when only one of the two devices recorded data, were cut out, and the events occurring in these periods were not considered. This applied to either the beginning or the end of the recordings, in case one of the recorders (device or PSG) ran longer or shorter than the other.

Data are presented as mean ± standard error as either variable or numbers in percent. Bland–Altmann-plot was chosen as the statistical method to visualize the deviation and agreement. The specificity, sensitivity, and positive and negative predictive value for the given device limits were calculated and plotted in receiver operating characteristics (ROC) curves.

Results

Seventy patients were recruited for the study. Twenty (29%) recordings were deemed unacceptable for final analysis (8 technical PSG issues with major artifacts in flow or neurologic leads, 6 human errors (wrong device or titration), 5 patients did not meet the required minimal TST, and 1 presented unsolvable mask problems with excessive leakage). Fifty patients (female: 13; age: 55.1 ± 11.9 years; body mass index: 35.5 ± 7.4 kg/m2) were included in the final analysis. Baseline AHI was 53.3 ± 24.2. Three patients had a history of coronary heart disease, and 10 patients showed a baseline central AI > 5. Mean baseline AI was 3.4 ± 5.3 (central apnea (n) were 19.8 ± 30.9).

We obtained a mean manually evaluated PSG AHI of 10.5 ± 13.8/h according to criterion 1A and 7.4 ± 12.6/h according to criterion 1B compared to a mean device AHIFlow of 8.4 ± 10.0/h (p = 0.004). All results and correlations are presented in Table 1. The positive and negative predictive value for an AHI > 10, scored according to criterion 1A, were 0.909 and 0.795.

Table 1 Comparison of visual AHI scoring versus AHI indicated by CPAP therapy device

The Bland–Altman plots visualize the difference between the total AHI-PSG according to criterion 1A and 1B, respectively (manually assessed events minus events detected by the device) plotted against their respective average values (see Figs. 2 and 3). Overall, there is little variation in the deviation. The ROC curves describe the quality of the test procedure in which it plots sensitivity (true positive rate) against false positive rate (1-specificity). In our case, the PSG was the standard against which the device AHI (AHIFlow) was measured (see Fig. 4). The two cut off values of AHI (AHI5 and AHI10) refer to the state “still sick” and “healthy.” A pair of sensitivity/specificity was calculated for the whole list of values (device AHIFlow), respectively, and plotted as a curve. The area between the curve and the angle bisector (AUC) can assume values of 0.5–1. Sensitivity and specificity results are presented in Table 2.

Fig. 2
figure 2

Bland–Altman analysis PSG 1A vs AHIFlow

Fig. 3
figure 3

Bland–Altman analysis PSG 1B vs AHIFlow

Fig. 4
figure 4

ROC analysis for AHIFlow cut-off values of 10 and 5

Table 2 Specificity and sensitivity values for AHI > 5 and AHI > 10

Discussion

The data collected in this study demonstrated a high correlation in the determination of the residual apnea and hypopnea indices between the CPAP therapy device and a simultaneous PSG in patients with moderate to severe obstructive sleep apnea. To our knowledge, this is the first study that also examines the influence of AASM criteria 1A and 1B. Compared to the evaluation according to AASM criterion 1A, the device results were only slightly lower, and compared to the evaluation according to 1B, slightly higher. The relevant flow signal for the therapy night was the CPAP flow signal, as per current AASM recommendations. (14) A PSG specific flow signal could have led to other results. The AASM recommended to routinely use the device flow in 2012, but some subsequent studies found different results when using a PSG flow signal versus the device flow signal of an APAP machine during titration [23, 24]. The discrepancies were found in the absolute agreement of the respiratory event classification in spite of good diagnostic accuracy concerning the AHI results. That results relate to older devices of another manufacturer and older PSG systems, it is nevertheless reasonable to assume, that close clinical follow-up is necessary for patients with complex SDB (e.g., high central apnea or high arousal), or if response to treatment is poor.

Other studies on PAP devices that compared the respiratory event detection found the strongest correlations in the detection of obstructive events [25, 26]. In our study, we also found only minor deviations concerning central events, despite the fact, that the PAP devices’ evaluation of residual SDB must rely on flow/pressure analysis alone. While only a limited number of central events were recorded to compare in this predominantly obstructive sleep apnea group, it indicates a sufficient comparability for the standard patient with OSA. A PSG recording, on the other hand, provides further bio signals such as respiratory effort (thorax and abdomen belts), pulse oximetry, electrocardiogram (ECG), brain waves by electroencephalogram (EEG), eye movements by electrooculogram (EOG), and muscle tone by electromyogram (EMG). When analyzing a PSG, the recording of respiratory effort helps to distinguish apnea types, while the neurologic leads allow a differentiation of wakefulness and sleep and an evaluation of different sleep stages.

There are only a few studies that have compared respiratory event detection algorithms from PAP devices with manually assessed events obtained from PSG. Ueno et al. and Berry compared a PAP device with PSG and found higher AHI values at lower pressure levels compared with the PSG and lower AHI values at higher pressure levels. In summary, an AHI < 10/h of the studied PAP device was highly predictive, but an AHI > 10/h was moderately predictive. The event-by-event analysis showed that the automatic event detection had a high specificity but only modest sensitivity (high number of false negatives). The correlation of hypopneas, however, was rather low. The software was also unable to distinguish central apneas [25, 27].

In our study, the CPAP device showed the highest sensitivity for a threshold level for the AHI between 5 and 10/h. When comparing the indices of the PSG 1A and 1B evaluations, a lower sensitivity resulted (see Table 2). A lesser sensitive scoring rule might have a negative impact, when events remain undetected and residual symptoms unexplained. On the other hand, the more sensitive rule may lead to the prescription of a higher therapy pressure with an effect on adherence. A device AHI, present in between these two, might actually be favorable. The lowest correlations were found with regard to central hypopneas, whereas there were only a few events of that type to be compared in our patient population. Future study could focus on patients with more complex and central sleep apnea.

Previous studies showed a high percentage of patients with presumed good PAP therapy success, but still considerable residual respiratory events [28, 29]. Reliable data with regard to the residual event indices are of clinical importance, but patient-reported daytime sleepiness and relevant comorbidities must be included when assessing clinical outcomes [30]. Controlled PAP recordings may not only provide an explanation for possible residual symptoms but may also contribute to the decision to readmit the patient to a sleep center for a new titration.

The COVID-19 pandemic imposed a challenge to sleep laboratories and patients with OSA worldwide. On the one hand, the necessary social distancing measures and the fear of contamination kept patients with non-acute medical conditions from going to the sleep lab, and on the other hand, the need to reschedule ward capacities with temporary and possibly longer-term closures of sleep lab units dramatically reduced sleep medicine services. Regional surveys as well as international studies researched the situation during the pandemic [31]. In Europe, in-lab PSG was reduced from 93 to 20%, and 72% of sleep centers stopped in-lab PAP titration. A third of the European sleep centers started to implement telemedicine services [32]. In the USA and Canada, in-lab testing was reduced by 90% [33], as also in China, where the sleep medicine services recovered to 50% from baseline after the first wave [34].

With certainty, there are numerous aspects of the diagnostic process of OSA that require polysomnographic surveillance. The association of respiratory events, body position, sleep stages, or the extent of nocturnal hypoxia is useful in assessing the disease severity and the underlying physiological causes. A possible strategy to improve the detection sensitivity of a CPAP device could be the integration of oximetry, most preferably in a wireless setting. While the prevalence of SDB increases and sleep center capacities are being reduced, new methods for the diagnosis and treatment control are developing. Cost efficient new technologies are demanded, and telemedicine approaches are implemented, which allow diagnosis and surveillance with minimal patient interface [35].

The high correlation of the respiratory event detection between the tested CPAP device and the information from all bio-signals of the PSG in our study indicates that analysis of respiration alone may be a sufficient risk assessment with CPAP therapy. But patient reported side effects, e.g., via questionnaires, will also play a major role when assessing therapy efficacy and effectiveness by telemedicine, especially the opportunity to record and monitor the residual events with adequate precision over extended periods of time promises to have a positive effect on CPAP-treated patients. Given the night-by-night variability of AHI in a PSG [36], a long-term serial measurement of AHIFlow with a reliable PAP device may actually be favorable.

Limitations

Our study population consisted of predominantly male, obese patients with moderate- to severe disease and reflects a typical population with OSA. Therefore, there is at most a small selection bias. The recordings took place under optimal laboratory conditions. The findings must therefore be translated to the outpatient setting with caution. The study was conducted using the device of Loewenstein Medical Technology with its proprietary technology and the results do not inevitably apply to devices from other companies. This could be the subject of further studies. Patients with high leakage (> 10% of excessive leakage of > 40 L/min) were excluded, as well as oronasal masks (OM). The mask choice makes results more comparable, and generally, nasal masks should always be the first choice and are the most used interfaces. Yet, 28% of the patients of a recent large study on mask side effects were fitted with an OM [37]. Patients titrated with oronasal masks require higher pressure levels and present higher leak and higher residual AHI when compared to nasal masks [38]. It remains unclear, to what extent the results would have been influenced by a subgroup with OM and presumably higher leakages. We chose to exclude patients needing OM in this study, because the presumed subgroup would have been too small to present reliable statistics. It would be reasonable to study this patient group separately.

Our study design did not allow for evaluation at higher pressure levels than 13 cm H2O, yet, such pressures are uncommonly applied. Further studies are needed to verify the diagnostic agreement in settings where higher pressure are required. Whether or not and to what extent any comfort features like pressure relief, an oronasal mask, or even the frequently used humidifier may have an effect on event detection was not part of this research.

Conclusion

The results of this study demonstrated a high correlation in the determination of residual sleep apnea derived from a CPAP device compared with a polysomnography in a typical population with OSA. The reliable information about residual events, along with leakage and usage time, may be available for telemedicine applications. However, symptom health assessment should always be included in therapy decisions. The agreement of PSG findings and device findings should be investigated separately for different CPAP devices and the different algorithms used by manufacturers.