Research
Open access
Published: 05 November 2024

ECG signal reconstruction from PPG using a hybrid attention-based deep learning network

Ahmed Ezzat ORCID: orcid.org/0009-0002-2590-851X^1,2,
Osama A. Omer¹,
Usama S. Mohamed^3,4 &
…
Ahmed S. Mubarak¹

EURASIP Journal on Advances in Signal Processing volume 2024, Article number: 95 (2024) Cite this article

581 Accesses
Metrics details

Abstract

Electrocardiography (ECG) and photoplethysmography (PPG) are non-invasive methods used to quantify signals originating from the cardiovascular system. Although there is a strong correlation between the cycles of the two measures, the correlation between the waveforms has received little attention in research. Measuring the PPG is significantly simpler and more convenient compared to the ECG measure. Recent research has demonstrated that PPG signals can be utilized to rebuild the ECG signals, suggesting that healthcare professionals might acquire a comprehensive comprehension of patients' cardiovascular well-being by only measuring PPG. With the advancement of artificial intelligence, the deep learning model has made significant progress in reconstructing ECG signals from PPG signals. However, the quality of the reconstructed ECG signal is still in need of more enhancement. In order to enhance the quality of the reconstructed ECG signal, we propose an innovative hybrid attention-based bidirectional recurrent neural network (Bi-RNN) that incorporates a dilated convolutional neural network technique for the purpose of reconstructing ECG signals from PPG signals. This addresses issues that occur when standard dilated CNN (DCNN) models neglect the link between circumstances and dispersion of gradients. The suggested approach maximizes the utilization of the DCNN and Bi-RNN unit design to provide fusion features. In order to ensure the model's resilience to deformation, it is imperative to initially extract spatial properties utilizing convolutional neural networks (CNNs). Once we have obtained geographic data, we employ BiLSTM to extract temporal features. The proposed model is so-called hybrid attention-based CNN and BiLSTM (HA-CNN-BiLSTM). BiLSTM mitigates the issues of gradient vanishing and exploding while maintaining accuracy. To showcase the advantages of the suggested model, we conduct a comparison between HA-CNN-BiLSTM and each separate state-of-the-art method. The simulation results indicated that the proposed method yielded superior root mean square error (RMSE) when reconstructing the ECG signal across different optimizers. The results also demonstrated that the proposed method results in the lowest RMSE with SGDM optimizer which is 0.031.

1 Introduction

Based on a survey conducted by the World Health Organization in November 2021 [1], cardiovascular disorders accounted for 31% of global deaths in 2019. Cardiovascular illnesses continue to be a significant contributor to global mortality [2]. Currently, the most widespread method for detecting cardiac illness is by analyzing heart activity using an electrocardiography (ECG) [3]. ECG is a technique that captures the electrical signals generated by the heart on the surface of the body using specialized equipment [4]. ECG signals provide essential data about an individual's vital signs and offer insights into the condition of the heart and cardiovascular system. The increasing popularity of wearable medical diagnostic devices has made long-term ECG signal measurement more accessible and feasible. The current standard duration for ambulatory ECG examinations is either 24 h, 48 h, or 72 h [5].

CVDs refer to a collection of conditions that impact the heart and blood vessels. These encompass a range of disorders, including coronary heart disease, cerebrovascular disease, peripheral arterial disease, rheumatic heart disease, congenital heart disease, deep vein thrombosis, pulmonary embolism, and occurrences such as heart attacks and strokes.

The necessity for accurate and non-invasive monitoring:

Early detection of CVDs is crucial to initiate timely interventions.
Non-invasive monitoring methods, such as blood pressure measurement, ECG, and echocardiography, allow for early identification of risk factors and abnormalities.
Regular monitoring enables healthcare professionals to tailor preventive strategies and manage CVDs effectively.

A gold standard for cardiovascular diagnostics is electrocardiography (ECG) [6]. It has five peaks—P, Q, R, S, and T—representing various heart electrical recordings. ECG may assess heart rhythm and impulse strength, helping diagnose cardiac illnesses such as hypertrophy, congenital heart abnormalities, and inflammation. ECG monitoring is important for early diagnosis of cardiovascular diseases (CVDs), which increases survival chances, especially in high-risk or elderly patients. The most common ECG-based wearables for daily tracking may limit users' activities, making continuous heart monitoring impossible. More importantly, short-term cardiac monitoring may overlook important asymptomatic and irregular signals.

The ECG abnormalities can occur in the form of arrhythmias, ischemia, and conduction disorders. Arrhythmias can range from harmless to life-threatening, while ischemia refers to reduced blood flow to the heart muscle, often resulting in chest pain.

On the other hand, photoplethysmography (PPG) offers data that is simple to acquire, but its clinical application is restricted due to its inadequate accuracy. PPG employs an optical signal to identify alterations in blood volume inside the microvascular bed of the tissue [7]. PPG is a more convenient, cost-effective, and minimally intrusive alternative to ECG. Long-term continuous cardiac monitoring is facilitated by the absence of user interaction requirements. The quality of the PPG signal varies significantly due to factors such as skin color and motion artifacts, which are not relevant to the signal. Subsequent investigation has uncovered that PPG is a highly effective non-invasive digital marker for CVD and associated conditions such as diabetes [8]. A convolutional neural network has been employed to assess the utility of PPG [9]. Two significant findings are discovered: firstly, PPG encompasses valuable therapeutic information, and secondly, deep learning techniques can enhance its applicability even in the presence of data noise.

PPG and ECG are physiologically interconnected as they represent the same cardiac function but are sensed through separate signal domains [10]. The sinoatrial node regulates cardiac electrical signals that cause the heart muscles to contract and relax, which in turn affects the variation in peripheral blood volume as measured by PPG. This research is primarily motivated by the inherent association between PPG and ECG. As a result, a new digital biomarker called PPG with reconstructed ECG has been developed. This biomarker integrates the best features of both signals, allowing for easy PPG monitoring and precise ECG analysis. This facilitates a proficient and uninterrupted cardiac monitoring system that would rebuild the ECG from the PPG. Recent studies have utilized diverse deep learning techniques to merge the benefits of both signals, PPG and ECG. However, the model's performance is ultimately limited due to the absence of contextual information and the restricted ability to remove noise from biomedical data.

Existing research has demonstrated a strong correlation between features extracted from PPG data and corresponding metrics obtained from ECG data. However, PPG faces limitations, including inaccuracies in ECG estimation due to factors like skin tone variations, diverse skin types, motion artifacts, and signal interference. ECG waveforms play a crucial role in assessing cardiac function, with P-wave patterns indicating sinus rhythm and prolonged PR intervals suggesting first-degree heart blockage. Consequently, cardiologists rely on ECG for comprehensive cardiac evaluations.

The proposed hybrid architecture combines the strengths of CNNs for feature extraction, BiLSTMs for temporal context, and attention mechanisms for selective focus and importance weighting in ECG Signal reconstruction.

The primary objective of this research is to establish a systematic approach for reconstructing the complete ECG waveform based on the PPG waveform. This methodology seeks to enhance patient monitoring comprehensively, ensuring the acquisition of essential medical data while minimizing potential inaccuracies inherent in ECG measurement devices.

The impact of the novel ECG monitoring approach can be elucidated in the following manner:

Our research diverges from the methodology used in the previous study [15]. Instead of estimating ECG data on a signal-wide basis, we adopt a beat-by-beat approach.
Additionally, unlike the prior study in [14], our novel approach leverages a hybrid attention-based CNN and BiLSTM model for ECG estimation.
This research presents a dilated CNN as an improvement over traditional ConvBiLSTM illustrated in [15]. The dilated CNN offers advantages such as reduced memory usage, increased computational efficiency, and improved robustness. It achieves wider coverage without adding more calculation costs.
We also introduced an attention mechanism that combines a dilated CNN and BiLSTM to identify the most relevant information characteristics from both networks. This approach improves the performance of ECG signal reconstruction.
In contrast with the approach described in reference [16], where a hybrid attention mechanism was employed for ECG classification, our method leverages a hierarchical attention-based dual-structured RNN combined with a dilated CNN for ECG signal reconstruction. Also, we compare the system's complexity.
In addition, we perform a comparative analysis with existing hybrid deep neural networks (DNNs) to showcase the superiority of our proposed system, specifically the hybrid attention-based deep learning network [17]. This network combines the residual network (ResNet) and Bi-LSTM architecture, incorporating an attention mechanism. Also, we compare it with the system complexity.

The rest of the paper is organized as follows: Sect. 2 discusses the related works, Sect. 3 presents the proposed HA-CNN-BiLSTM for ECG signal reconstruction, Sect. 4 introduces the experimental results, and finally, the paper is concluded in Sect. 5.

2 Related work

Multiple investigations have endeavored to estimate electrocardiogram (ECG) signals from photoplethysmogram (PPG) signals employing diverse approaches. The research on reconstructing ECG signals can be categorized into two main approaches: the first involves generating synthetic ECG using signal processing or mathematical models, while the second focuses on using machine learning techniques to synthesize ECG signals from PPG data by translating PPG-to-ECG. Two investigations utilized the DCT approach [18] and collaborative dictionary learning across domains [19] to estimate ECG signals on a sequential basis. These techniques entailed synchronizing the start of the PPG signal with the peak of the R wave in the ECG signal in order to eliminate the pulse arrival time (PAT). Subsequently, separate cycles were generated from the ECG and PPG data, and a correlation was found between the pulse wave and the ECG cycle. The ECG signal was derived from the PPG signal using this mapping technique. However, problems develop when only certain features are included; for example, when dealing with ECG reconstruction, the models' accuracy suffers due to a lack of non-linear information representation, which also restricts their usefulness in actual medical situations. The BiLSTM DNN was utilized in [20] to reconstruct the ECG Signal from the PPG signal with RMSE of 0.083. This method's limitation is the low accuracy. Three research studies [13, 21, 22] use deep neural networks for the reconstruction of ECGs from PPG. In [13], generative adversarial networks (GAN) architecture was utilized with a generator/discriminator to produce synthetic data for data augmentation in the synthesis of ECG from PPG. Nevertheless, the limitation of addressing instability during training led to sporadic yet crucial random fluctuations. In [21], a deep learning method that encompasses the entire process and utilizes a game-theoretic training technique was explained. This system is capable of reliably estimating ECG waveforms from raw PPG data. The primary achievement of this study was the introduction of P2E-WGAN, an ANN model that uses GAN to directly process 1-D time-series data in order to effectively replicate or generate lifelike ECG signals using PPG signals. Furthermore, this model can be easily extended to other similar pairings of signals. For sequence-to-sequence learning, Chiu et al. [22] employed recurrent neural networks (RNN) with an encoder/decoder architecture. However, the model's susceptibility to signal noise and reliance on gradient-descent optimization restricts its efficiency in tasks that require memory information over a long period.

The objective of the study in [23] is to create a specialized model capable of accurately reproducing ECG signals that closely resemble actual ECG signals, without requiring the calculation or adjustment of the PAT. This investigation is limited by the presence of fluctuations in PAT signals. The approach developed in this study is tailored to a particular individual, capturing their PAT during the training phase. Hence, the straightforward application of the model to numerous subjects poses a substantial obstacle due to the discrepancies in PATs among people. This necessitates the creation of an inter-subject model to address the distinct nature of the problem.

Nevertheless, these activities necessitate thorough preparation of unprocessed data, which may potentially result in unforeseen biases. Other researchers endeavored to create authentic ECG to enhance data through augmentation, to address the data scarcity issue faced [24, 25]. Research on generative adversarial networks (GANs) for ECG synthesis has focused on LSTM-CNN architectures, which can create ECG signals from random noise, as described in [24]. A personalized GAN (PGAN) is suggested in the literature [25] to create input noise-free, patient-specific synthetic ECG waveforms. The GANs method's limitation suffers from low accuracy and high complexity.

In reference [26], a machine learning technique was introduced to accurately estimate typical ECG values and the ranges of RR, PR, QRS, and QT intervals using extracted clean PPG characteristics, achieving a 90% accuracy rate. Nevertheless, the calculation of specific ECG characteristics is inadequate for doing direct ECG screening. This was done by analyzing time and frequency domain patterns taken from a fingertip PPG signal. In a recent pilot study [27], a signal model was developed to establish a linear relationship between PPG and ECG in the discrete cosine transform (DCT) domain. This model allows for the reconstruction of the ECG waveform from the inverse (DCT). This approach attains a mean reconstruction correlation of 0.98 in subject-specific instances. Even so, there is ample opportunity for enhancement regarding the adjustment to the circumstance that is not influenced by specific subjects, where a comprehensive mapping is necessary for a broader range of ECG patterns by machine learning algorithms. However, those individuals had minimal influence over the generated ECG, making it unsuitable for conditioning synthetic signals on PPG signals.

Scientific literature exists on the estimate of physiological parameters for electrocardiogram (ECG) or photoplethysmogram (PPG), such as those referenced in [28]. These works employed time-series signals as input and generated a small set of numbers/parameters that describe the physiological condition.

3 The proposed ECG signal reconstruction from PPG signal

In this study, all systems employed PPG beats to estimate ECG beats due to their structural similarity. The process for estimating ECG beats using deep learning based on PPG beats is delineated through several distinct stages: (1) feature domain, (2) proposed deep learning NN (HA-CNN-BiLSTM) training models, (3) ECG estimation.

3.1 Transformation features domain

In our earlier work [15], we provided a comparison among four feature domains for transformation: time domain (TD), discrete cosine transform (DCT), discrete wavelet transform (DWT), and wavelet scattering transform (WST). Our findings indicate that the WST is not affected by the shifting and scaling of the PPG beats, unlike the discrete wavelet transform (DWT). Hence, WST is a suitable choice for feature extraction in order to help the deep learning network in learning the relationship between PPG and ECG. This research utilizes a combination of WST and DWT to achieve our novel method HA-CNN-BiLSTM for reconstructing ECG signals.

3.2 Proposed deep learning method

We propose a technique for reconstructing ECG signals from PPG signals using a dilated CNN and BiLSTM that are concatenated based on the attention mechanism, as depicted in Fig. 1. A deep learning approach is employed to extract the ECG signal from the PPG signal, thereby addressing a prediction challenge. The hybrid model is created by combining three networks: dilated CNN, Bi-LSTM, and an attention mechanism module. This model is based on the fusion of attention mechanism with deep learning (HA-CNN-BiLSTM). The use of dilated CNN and BiLSTM is recommended for automated feature extraction in order to rebuild the ECG signal. The vast depth of the convolution layer is designed to uncover hierarchical and detailed characteristics as architectural content descriptions. Rapidly collected pertinent characteristics from the dilated convolutional neural network (CNN) layer acquired in this particular layer. In contrast with conventional CNNs, which conduct convolution operations directly on pre-existing weight parameters, the dilated CNN layer is utilized for extracting localized features, successfully addressing the issue of gradient dispersion while simultaneously augmenting the number of network layers. Simultaneously, the BiLSTM network was employed to extract the global features concurrently. BiLSTM overcomes the problems of gradient vanishing and gradient exploding that are present in traditional RNN and LSTM models, while yet maintaining high accuracy. Nevertheless, as previously said, noise has the potential to alter a PPG signal.

The hybrid feature is created by combining the global feature extracted from Bi-LSTM and the local feature obtained from dilated CNN. Next, the weighting parameter in the attention mechanism is computed based on hybrid features. The features obtained from the suggested HA-CNN-BiLSTM model are utilized to produce an electrocardiogram (ECG) signal.

3.2.1 Dilated convolutional neural networks

Dilated convolutional neural networks (DCNNs) are a powerful deep learning network [15] due to their capacity for extracting spatial features. The main stages of the DCNN network consist of the dilation operation, the ReLU function, and the batch normalization.

The concept of dilated convolutions was initially introduced for wavelet transforms by Holschneider et al. in 1990. Subsequently, it was suggested by Yu and Koltun in 2016 for the purpose of multiscale context aggregation. Dilated CNNs outperform normal convolution layers [29] due to their larger receptive field. Its features include: increased computational efficiency, robustness (providing wide coverage at the same computational cost) [30], and reduced memory consumption by skipping the pooling step and preserving the output image's resolution through the use of dilation instead of pooling [30].

The output $y$ of a dilated convolution is defined as the result of convolving a 1D signal $x$ with a kernel $w$.

$$\left( {x_{*l} w} \right)\left[ p \right] = \mathop \sum \limits_{k = 0}^{N - 1} w\left[ k \right]x\left[ {p - lk} \right]$$

(1)

$k$ and $l$ represent the kernel size and the dilation factor, respectively. $*l$ is the $l$ dilated convolution. It should be noted that when the $l$ is equal to 1, the dilated convolution operation is equivalent to the standard convolution operation. Figure 2 depicts the 1D dilated convolution process using dilation rates of 1, 2, and 4, with a kernel size of 3. The dilated convolution involves skipping d-1 signal samples, as illustrated in Fig. 2. In the input signal, the unit of interest is represented by the orange block in the first row. The orange blocks in the second, third, and fourth rows of the output signals display the receptive field for each distinct dilation rate, indicated by the orange blocks.

Dilated convolutions are employed to augment the network's receiving area, which refers to the area in the input space that a CNN feature is influenced by. The receptive field in conventional convolutions has a linear relationship with the layer's depth. By employing exponentially rising dilation rates (d = 1, 2, 4, N), dilated convolutions can create a receptive field that is exponentially proportional to the layer depth.

The advantages of dilated convolution include a larger network's receiving area, which means there is no loss of coverage compared to regular convolution; computationally efficient since it offers greater coverage at the same computational cost; implementation with reduced memory use due to the omission of the pooling step. The output signal maintains its resolution without any loss due to dilation instead of pooling. The structure of this convolution helps in preserving the sequential arrangement of the data.

To enhance the rate of convergence and mitigate issues related to gradient vanishing and explosion within the feature extraction module, we employ the Rectified Linear Unit (ReLU) as our chosen activation function. The computation of dilated CNN is demonstrated by Eq. 2. The equations involve the $l_{i}$-dilated convolution ($x_{{*l_{i} }}$) in layer $i$, where the dilation width is $l_{i}$. The architecture consists of multiple layers of dilated convolutions, where the dilation width increases exponentially and the corresponding layer $c_{t}^{i}$ element of the feature map is then calculated as:

$$c_{t}^{i} = ReLU\left( {\left( {x_{{*l_{i} }} w} \right)} \right)$$

(2)

In the context of deep neural networks, an effective regularization technique involves incorporating a batch normalization (BN) algorithm after each convolutional layer.

3.2.2 BiLSTM

The BiLSTM layer processes one-dimensional input data, while the convolution layer produces multidimensional output data. To accommodate this, the flatten layer is used to transform the convolution layer’s output. In contrast, the LSTM network, a type of recurrent neural network (RNN), excels in analyzing time sequence data due to its effective temporal feature extraction. The key components of an LSTM include the LSTM block, input gate, forget gate, and output gate.

In the process described, the hidden state ($h_{t}$) is generated:

The forget gate $(f_{t} )$ plays a crucial role by selectively zeroing out specific locations in the matrix, effectively instructing the cell state to ignore corresponding data points.

$$f_{t} = sigmoid\left( {W_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right)$$

(3)

The input gate determines which information is permitted to enter the cell state.

$$i_{t} = sigmoid\left( {W_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right)$$

(4)

In the context of cellular states, the modulation input gate has the capacity to cause memory to fade or be lost.

$$\tilde{c}_{t} = tanh\left( {W_{c} \left[ {h_{t - 1} ,x_{t} } \right] + b_{c} } \right)$$

(5)

The output gate plays a crucial role in determining the upcoming hidden state.

$$o_{t} = sigmoid\left( {W_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right)$$

(6)

In the context of neural networks, the weight vectors ${ }W_{i} , W_{o} ,W_{f} ,{ }and{ }W_{c}$ correspond to the input, output, forget, and cell gates, respectively. The sigmoid function $\sigma$ is applied, and the biases $b_{i} ,{ }b_{o} ,{ }b_{f} ,{ }and{ }b_{c}$ are associated with these gates. The hidden state $(h_{t} )$ represents the working memory, and predictions are generated using this hidden state, which encodes information from previous inputs.

$$h_{t} = o_{t} *\tanh \left( {c_{t} } \right)$$

(7)

The current condition of the cell is represented by the symbol $c_{t}$:

$$c_{t} = f_{t} \odot c_{t - 1} + i_{t} \odot \tilde{c}_{t}$$

(8)

In the context where the hyperbolic tangent serves as the activation function, dot multiplication is symbolically represented by $\odot$.

In the architecture of a BiLSTM model, $\vec{h}_{t}$ and $\mathop{h}\limits^{\leftarrow} _{t}$ correspond to the hidden sequences in the forward and backward directions, respectively. Additionally, ${\mathcal{Y}}_{t}$ denotes the output sequence.

$$\vec{h}_{t} = {\mathbf{\mathcal{H}}}\left( {{\varvec{W}}_{{\user2{x \vec{h}}}} x_{t} + {\varvec{W}}_{{\user2{\vec{h} \vec{h}}}} \vec{h}_{t - 1} + {\varvec{b}}_{{\vec{\user2{h}}}} } \right)$$

(9)

$$\mathop{h}\limits^{\leftarrow} _{t} = {\mathbf{\mathcal{H}}}\left( {{\varvec{W}}_{{\user2{x \mathop{h}\limits^{\leftarrow} }}} x_{t} + {\varvec{W}}_{{\user2{\mathop{h}\limits^{\leftarrow} \mathop{h}\limits^{\leftarrow} }}} \mathop{h}\limits^{\leftarrow} _{t - 1} + {\varvec{b}}_{{\user2{\mathop{h}\limits^{\leftarrow} }}} } \right)$$

(10)

$${\mathcal{Y}}_{t} = {\varvec{W}}_{{\vec{\user2{h}}{\mathbf{ \mathcal{Y}}}}} \vec{h}_{t} + {\varvec{W}}_{{\user2{\mathop{h}\limits^{\leftarrow} }{\mathbf{ \mathcal{Y}}}}} \mathop{h}\limits^{\leftarrow} _{t} + {\varvec{b}}_{{\mathbf{\mathcal{Y}}}}$$

(11)

3.2.3 Attention mechanism

Attention mechanism (AM) is a suitable approach to enhance the significance of vital information, drawing inspiration from the human visual system. Human vision typically does not watch an entire scene from start to finish, but instead concentrates on a particular piece as necessary. The process of determining AM involves selectively prioritizing influential information, disregarding unneeded information, and enhancing desired information [9]. To extract the key ECG features related to emotional expression, we suggest using a combination of CNN and BiLSTM with AM. This approach allows us to learn weights from feature maps using the attention mechanism. These learned weights are then applied to the original feature maps, altering the distribution of the original features. As a result, important features are given more attention while redundant features are suppressed. We suggest utilizing a deep convolutional neural network that integrates an attention mechanism.

AM utilizes weight allocation to determine the most efficient information by assigning greater weights. Consequently, it has a beneficial effect on optimizing classical models [10]. A query is translated into a series of Key-Value pairs by the attention function. Calculating attention comprises three processes, as depicted in Fig. 1. In the first step, we calculate the degree to which each Key is similar to the Query using the following method:

$$s_{t} = tanh\left( {W_{c} *c_{t} + W_{b} *{\mathcal{Y}}_{t} } \right)$$

(12)

The attention score is denoted by "st." $W_{c}$ and $W_{b}$ are the weights associated with the fully connected layer parameters of AM, respectively. The input vectors of the DCNN and BiLSTM are denoted as $c_{t}$ and ${\mathcal{Y}}_{t}$, respectively. The second step is to apply the softmax function to transform the attention score after normalizing the score obtained in the first step as specified in the formula:

$$a_{t} = \frac{{{\text{exp}}\left( {s_{t} } \right)}}{{\mathop \sum \nolimits_{t = 1}^{T} {\text{exp}}\left( {s_{t} } \right)}}$$

(13)

As shown in the formula, the final attention value is obtained by a weighted accumulation of values:

$$s = \mathop \sum \limits_{t = 1}^{T} a_{t} {*}c_{t}$$

(14)

AM, commonly employed following CNN and RNN networks, serves to prioritize the characteristics that have a substantial impact on output variables, hence enhancing the model's performance.

3.2.4 Regression layer

The dropout layer, situated between fully connected layers, serves as a mechanism to mitigate overfitting. By randomly deactivating a subset of neurons during training iterations, dropout encourages the network to focus on essential features, enhancing the model’s adaptability.

In the prediction block, which includes an output layer, there are two fully connected layers. Once the attention block has gathered feature values, the fully connected layer applies a series of nonlinear transformations to these values, resulting in the final forecasting outcomes.

4 Experimental result

Our models were trained using 90% of the entire dataset, while the remaining 10% was reserved for testing purposes. The training and testing datasets were completely distinct and did not share any common data points. The network undergoes training by utilizing the training dataset, and its parameters are modified in response to the training mistake. The Adaptive Momentum optimizer (Adam), Stochastic Gradient Descent with Moment (SGDM), and Root Mean Square Propagation (RmsProp) optimizers were deployed to train the model, as they are commonly used for parameter estimation. The decision was taken to employ the Root Mean Square Error (RMSE) loss function to evaluate the precision of the reconstructed ECG signal. The values for the initial learning rate, maximum number of epochs, and minimum batch size were assigned as 0.001, 50, and 20, respectively. Both the learning rate and the batch size of the network were optimized by testing. The network's specifications are displayed in Table 1.

Table 1 Network specifications

Full size table

5 Data setup

The combined PPG/ECG data utilized for training the deep learning network can be accessed from the Physionet MIMIC II dataset (Multi-parameter Intelligent Monitoring in Intensive Care) [31]. The number of subjects is equal to 12,000 subjects. Demographics: Boston’s Beth Israel Deaconess Medical Center during the period 2001–2007. The authors of [32] provided a more systematic synthesis of the identical dataset. This collection contains about 12,000 records. Each record comprises ECG (channel II), PPG (fingertip), and ABP (invasive arterial blood pressure, measured in mmHg) data, with a sampling rate of 125 samples per second. Nevertheless, we are particularly interested in the PPG signals and the labeled ECG data. The records are divided into 1024 sample segments to ensure appropriate manipulation and filtration. In order to attain optimal performance, it is necessary to prepare a dataset that is devoid of any artifacts. This dataset will be used to train and evaluate a deep learning estimator for a combined PPG/ECG cleaning technique. We utilize a dataset including 175,000 meticulously processed beats instead of 309,000 beats of uncleaned data due to the potential for the uncleaned data to create misleading outcomes for deep neural networks.

6 Data preprocessing

If there are no observable alterations in the morphology of the photoplethysmography (PPG) signal, it is feasible to apply preprocessing techniques to enhance the PPG signals. Specifically, bandpass filtering within the frequency range of 0.5 to 8 Hz can be employed. However, any ECG data or beats that display substantial distortion should be excluded from further analysis. The obtained preprocessed signals were utilized to extract characteristics and train the learning models.

7 ECG waveform simulation analysis results

In our study, we evaluated the performance of the proposed neural network (NN) for estimating ECG signals from PPG data by comparing it with other advanced networks. Specifically, we employed the HA-CNN-BiLSTM network, which includes a 120 × 1 sequence regressor output layer, to predict ECG beats based on the corresponding PPG beats. For ECG estimation, an important advantage of using the WST is its inherent immunity to changes in shifting and scaling. Therefore, the capacity to recognize the ECG signal remains unaltered by any possible shifting or scaling of the PPG signal. The use of the discrete wavelet transform (DWT) is favored over the wavelet scattering transform (WST) for analyzing ECG signals (labels in the training phase) since the WST lacks an inverse function. Therefore, the DWT is used as a replacement for the WST in the processing of ECG signals.

8 ECG signal reconstruction results

Table 2 displays the improvement in the proposed HA-CNN-BiLSTM compared to hybrid CNN-BiLSTM in reconstructing ECG signals from PPG signals using various optimization algorithms. Based on the simulation results, it can be shown that the SGDM optimizer performs the best RMSE of 0.031 for the reconstruction HA-CNN-BiLSTM model for ECG.

Table 2 PPG/ECG RMSE Comparison for different optimizer for the proposed HADConvBiLSTM

Full size table

Figure 3 displays the ECG signal reconstruction achieved by the proposed method, employing three optimization techniques. The results are compared with ConvBiLSTM and the ground truth ECG signal for patient. The figure demonstrates that the SGDM optimizer achieves the closest match with the ground truth, which is consistent with the RMSE results presented in Table 2. Furthermore, it is evident that the HA-CNN-BiLSTM model consistently produces superior simulation results for ECG reconstruction across all optimizers.

Figure 4 demonstrates the ability of the proposed system to accurately reconstruct the ECG signal for various signals from different patients using different optimization algorithms, resulting in a closer alignment with the ground truth.

Figure 5 depicts the reconstructed ECG signals using ADAM optimizer with different dilation factors in the suggested model. This is done to demonstrate that effect of the dilation factor on the reconstructed ECG Signals. It is evident that the proposed system with a dilation factor closely matches the ground truth, in contrast with the proposed system without a dilation factor.

The prior study demonstrated the reconstruction of PPG/ECG signals using a hybrid DNN system. A comparison among the suggested system is depicted in Fig. 6. Figure 6 demonstrates that the proposed PPG/ECG model exhibits superior performance when compared to existing hybrid models, such as the attention-based Resnet-BiLSTM [17], dilated CNN–attention-based BiLSTM/BiGRU model [16], and ConvBiLSTM [15].

As shown in Table 3, the results of different hybrid DNN models reveal that the proposed model outperforms previous works in terms of root mean square error to reconstruct the ECG signal from the PPG signal.

Table 3 PPG/ECG RMSE and complexity comparison for different hybrid DNNs for ADAM Optimizer

Full size table

9 Complexity calculation analysis

In this section, we examine the total number of floating-point operations (FLOPs) as an indicator for the system's complexity. The overall number of FLOPs is determined by computing the aggregate number of real-valued operations employed in the neural network's inference process, which is the sum of two primary components. The first term pertains to the weight matrix and bias term. The second term is associated with the activation functions. The first term is the result of performing multiplications and adds with real numbers. This involves multiplying the values from the input or previous layer by the weight matrix and then adding the bias terms. Similarly, the second term denotes the total number of real-valued multiplications and additions needed to compute the activation functions in the neurons of the hidden layer.

The results of the complexity evaluation are presented in Table 3. These results indicate that the FLOPs (floating point operations per second) of the proposed system amount to 0.97 M, which is lower than that of other hybrid attention-based approaches in prior studies [16, 17]. However, it is more complex than [15] with a slight increase.

10 Conclusions

This research introduces a HA-CNN-BiLSTM system for reconstructing single-lead ECG signal from PPG signals. The suggested system utilizes deep learning with scattering wavelet as the feature domain. One primary benefit of utilizing WST is its inherent independence from shifting and scaling operations. Hence, it is capable of detecting the electrocardiogram (ECG) signal irrespective of any alterations in the shifting or scaling of the photoplethysmogram (PPG) signal. The suggested hybrid system method maximizes the benefits of dilated CNN and Bi-LSTM architecture to acquire fusion features that encompass both local and global information. Additionally, it enhances the interpretability of the model by incorporating the attention mechanism. It possesses significant advantages when compared to the most sophisticated categorization techniques.

This approach offers a promising means to enhance the precision and comprehensibility of therapeutic applications. The model construction is meticulously considered to improve PPG/ECG inference. We employed a novel approach that involved using an attention-based dilated CNN-BiLSTM network for ECG reconstruction. The model combines feature extraction techniques with enhanced interpretability by incorporating attention-based dilated CNN and BiLSTM. Bidirectional long short-term memory (BiLSTM) block is used to capture important and distant independent features. The dilated CNN layer is a convolutional layer specifically designed to extract features from PPG signals. Our proposed technique successfully obtained root mean square error (RMSE) values of 0.058, 0.031, and 0.0593 for the ADAM, SGDM, and RmsProp optimizers, respectively. The chosen metric for complexity is platform-independent FLOPs. The computational complexity of proposed model is 0.97 million FLOPs, which is lower than that of existing hybrid deep neural networks (DNNs) models.

The experimental findings confirm the efficacy of the suggested model in reconstructing ECG signals that closely resemble the reference ECG signals, exhibiting minimal phase discrepancy.

Availability of data and materials

Data will be available under request.

Abbreviations

ECG:: Electrocardiography
PPG:: Photoplethysmography
DCNN:: Dilated convolutional neural network
Bi-RNN:: Bidirectional recurrent neural network
BiLSTM:: Bidirectional long short-term memory
RMSE:: Root mean square error
CVDs:: Cardiovascular diseases
ResNet:: Residual network
GAN:: Generative adversarial networks
PGAN:: Personalized GAN
TD:: Time domain
DCT:: Discrete cosine transform
DWT:: Discrete wavelet transform
WST:: Wavelet scattering transform
AM:: Attention mechanism
ADAM:: Adaptive momentum optimizer
SGDM:: Stochastic gradient descent with moment
RmsProp:: Root mean square propagation
FLOPs:: Floating-point operations

References

Cardiovascular diseases. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). Accessed 8 Oct 2022
S. Mendis, P. Puska, B. Norrving, World Health Organization, Global Atlas on Cardiovascular Disease Prevention and Control (World Health Organization, Geneva, 2011)
Google Scholar
P. Kumar, V.K. Sharma, Cardiac signals based methods for recognizing heart disease: a review, in 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pp. 1375–1377. IEEE, Tirunelveli, India (2021)
P. Hao, X. Gao, Z. Li, J. Zhang, F. Wu, C. Bai, Multi-branch fusion network for Myocardial infarction screening from 12-lead ECG images. Comput. Methods Programs Biomed. 184, 105286 (2020)
Article Google Scholar
K. van der Bijl, M. Elgendi, C. Menon, Automatic ECG quality assessment techniques: a systematic review. Diagnostics 12, 2578 (2022)
Article Google Scholar
S. Somani, A.J. Russak, F. Richter, S. Zhao, A. Vaid, F. Chaudhry et al., Deep learning and the electrocardiogram: review of the current state-of-the-art. EP Europace 23(8), 1179–1191 (2021)
Article Google Scholar
M. Elgendi, R. Fletcher, Y. Liang, N. Howard, N.H. Lovell, D. Abbott et al., The use of photoplethysmography for assessing hypertension. NPJ Digit Med 2(1), 60 (2019)
Article Google Scholar
K.B. Kim, H.J. Baek, Photoplethysmography in wearable devices: a comprehensive review of technological advances, current challenges, and future directions. Electronics 12(13), 2923 (2023)
Article Google Scholar
R. Avram, J.E. Olgin, P. Kuhar, J.W. Hughes, G.M. Marcus, M.J. Pletcher et al., A digital biomarker of diabetes from smartphone-based vascular signals. Nat. Med. 26(10), 1576–1582 (2020)
Article Google Scholar
E. Gil, M. Orini, R. Bailon, J.M. Vergara, L. Mainardi, P. Laguna, Photoplethysmography pulse rate variability as a surrogate measurement of heart rate variability during non-stationary conditions. Physiol. Meas. 31(9), 1271 (2010)
Article Google Scholar
P.E. McSharry, G.D. Clifford, L. Tarassenko, L.A. Smith, A dynamical model for generating synthetic electrocardiogram signals. IEEE Trans. Biomed. Eng. 50(3), 289–294 (2003)
Article Google Scholar
T. Golany, G. Lavee, S.T. Yarden, K. Radinsky, Improving ECG classification using generative adversarial networks, in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13280–13285 (2020)
P. Sarkar, A. Etemad, CardioGAN: attentive generative adversarial network with dual discriminators for synthesis of ECG from PPG, in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 488–496. Delhi, India (2021)
O.A. Omer, M. Salah, A.M. Hassan, A.S. Mubarak, Beat-by-beat ECG monitoring from photoplythmography based on scattering wavelet transform. Traitement du Signal 39(5), 1483–1488 (2022)
Article Google Scholar
A. Ezzat, O.A. Omer, U.S. Mohamed, A.S. Mubarak, ECG Signal Reconstruction from PPG using Hybrid Deep Neural Networks. Revue d’Intelligence Artificielle (RIA) 38(1), 251–260 (2024)
Article Google Scholar
M. Jiang, J. Gu, Y. Li, B. Wei, J. Zhang, Z. Wang, L. Xia, HADLN: hybrid attention-based deep learning network for automated arrhythmia classification. Front. Physiol. 12, 683025 (2021)
Article Google Scholar
M.S. Islam, K.F. Hasan, S. Sultana, S. Uddin, J.M. Quinn, M.A. Moni, HARDC: a novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netw. 162, 271–287 (2023)
Article Google Scholar
Q. Zhu, X. Tian, C.W. Wong, M. Wu, Learning your heart actions from pulse: ECG waveform reconstruction from PPG. IEEE Internet Things J. 8, 16734–16748 (2021)
Article Google Scholar
X. Tian, Q. Zhu, Y. Li, M. Wu, Cross-domain joint dictionary learning for ECG reconstruction from PPG, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 936–940. Barcelona, Spain (2020)
Q. Tang, Z. Chen, Y. Guo, Y. Liang, R. Ward, C. Menon, M. Elgendi, Robust reconstruction of electrocardiogram using photoplethysmography: a subject-based Model. Front. Physiol. 13, 859763 (2022)
Article Google Scholar
K. Vo, E. K. Naeini, A. Naderi, D. Jilani, A. M. Rahmani, N. Dutt, H. Cao, P2E-WGAN: ECG waveform synthesis from PPG with conditional Wasserstein generative adversarial networks, in Proceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1030–1036. New York, United States (2021)
H.Y. Chiu, H.H. Shuai, P.C.P. Chao, Reconstructing QRS complex from PPG by transformed attentional neural networks. IEEE Sens. J. 20, 12374–12383 (2020)
Article Google Scholar
Q. Tang, Z. Chen, R. Ward, C. Menon, M. Elgendi, PPG2ECGps: an end-to-end subject-specific deep neural network model for electrocardiogram reconstruction from photoplethysmography signals without pulse arrival time adjustments. Bioengineering 10(6), 630 (2023)
Article Google Scholar
F. Zhu, F. Ye, Y. Fu, Q. Liu, B. Shen, Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Sci. Rep. 9(1), 6734 (2019)
Article Google Scholar
T. Golany, K. Radinsky, PGANs: personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification, in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 557–564. (2019)
R. Banerjee, A. Sinha, A. D. Choudhury, A. Visvanathan, PhotoECG: Photoplethysmographyto estimate ECG parameters, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4404–4408. IEEE. Florence, Italy (2014)
Q. Zhu, X. Tian, C. W. Wong, M. Wu, ECG reconstruction via PPG: a pilot study, in IEEE EMBS international conference on biomedical & health informatics (BHI), pp. 1–4. IEEE. Chicago, IL, USA (2019)
G.M. Lin, H.H.S. Lu, A 12-lead ECG-based system with physiological parameters and machine learning to identify right ventricular hypertrophy in young adults. IEEE J. Transl. Eng. Health Med. 8, 1–10 (2020)
Google Scholar
H. Dang, M. Sun, G. Zhang, X. Qi, X. Zhou, Q. Chang, A novel deep arrhythmia-diagnosis network for atrial fibrillation classification using electrocardiogram signals. IEEE Access 7, 75577–75590 (2019)
Article Google Scholar
R. Shaddeli, N. Yazdanjue, S. Ebadollahi, M.M. Saberi, B. Gill, Noise removal from ECG signals by adaptive filter based on variable step size LMS using evolutionary algorithms, in IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–7. IEEE. ON, Canada (2021)
M. Salah, O.A. Omer, L. Hassan, M. Ragab, A.M. Hassan, A. Abdelreheem, Beat-based PPG-ABP cleaning technique for blood pressure estimation. IEEE Access 10, 55616–55626 (2022)
Article Google Scholar
O. Yildirim, R. San Tan, U.R. Acharya, An efficient compression of ECG signals using deep convolutional autoencoders. Cogn. Syst. Res. 52, 198–211 (2018)
Article Google Scholar

Download references

Funding

There is no funding.

Author information

Authors and Affiliations

Faculty of Engineering, Aswan University, Aswan, 81542, Egypt
Ahmed Ezzat, Osama A. Omer & Ahmed S. Mubarak
Department of Electronics and Communications, Luxor Higher Institute of Engineering and Technology, Luxor, 85834, Egypt
Ahmed Ezzat
Department of Electrical Engineering, Faculty of Engineering, Assiut University, Assiut, 71518, Egypt
Usama S. Mohamed
Faculty of Engineering, Sphinx University, Assiut, 71515, Egypt
Usama S. Mohamed

Authors

Ahmed Ezzat
View author publications
You can also search for this author in PubMed Google Scholar
Osama A. Omer
View author publications
You can also search for this author in PubMed Google Scholar
Usama S. Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed S. Mubarak
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All of the authors of this research paper took part in planning, carrying out, and analyzing the study. They have all read and approved the final version that was sent in.

Corresponding authors

Correspondence to Ahmed Ezzat or Osama A. Omer.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The contents of this manuscript have not been copyrighted or published previously; the contents of this manuscript are not now under consideration for publication elsewhere; the contents of this manuscript will not be copyrighted, submitted, or published elsewhere while acceptance by the journal is under consideration; and there are no directly related manuscripts or abstracts, published or unpublished, by any authors of this paper.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ezzat, A., Omer, O.A., Mohamed, U.S. et al. ECG signal reconstruction from PPG using a hybrid attention-based deep learning network. EURASIP J. Adv. Signal Process. 2024, 95 (2024). https://doi.org/10.1186/s13634-024-01158-8

Download citation

Received: 20 December 2023
Accepted: 23 April 2024
Published: 05 November 2024
DOI: https://doi.org/10.1186/s13634-024-01158-8

ECG signal reconstruction from PPG using a hybrid attention-based deep learning network

Abstract

1 Introduction

2 Related work

3 The proposed ECG signal reconstruction from PPG signal

3.1 Transformation features domain

3.2 Proposed deep learning method

3.2.1 Dilated convolutional neural networks

3.2.2 BiLSTM

3.2.3 Attention mechanism

3.2.4 Regression layer

4 Experimental result

5 Data setup

6 Data preprocessing

7 ECG waveform simulation analysis results

8 ECG signal reconstruction results

9 Complexity calculation analysis

10 Conclusions

Availability of data and materials

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords