Automatic recognition of murmurs of ventricular septal defect using convolutional recurrent neural networks with temporal attentive pooling

Wang, Jou-Kou; Chang, Yun-Fan; Tsai, Kun-Hsi; Wang, Wei-Chien; Tsai, Chang-Yen; Cheng, Chui-Hsuan; Tsao, Yu

doi:10.1038/s41598-020-77994-z

Download PDF

Article
Open access
Published: 11 December 2020

Automatic recognition of murmurs of ventricular septal defect using convolutional recurrent neural networks with temporal attentive pooling

Jou-Kou Wang¹,
Yun-Fan Chang²,
Kun-Hsi Tsai²,
Wei-Chien Wang²,
Chang-Yen Tsai²,
Chui-Hsuan Cheng² &
…
Yu Tsao³

Scientific Reports volume 10, Article number: 21797 (2020) Cite this article

2508 Accesses
16 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Recognizing specific heart sound patterns is important for the diagnosis of structural heart diseases. However, the correct recognition of heart murmur depends largely on clinical experience. Accurately identifying abnormal heart sound patterns is challenging for young and inexperienced clinicians. This study is aimed at the development of a novel algorithm that can automatically recognize systolic murmurs in patients with ventricular septal defects (VSDs). Heart sounds from 51 subjects with VSDs and 25 subjects without a significant heart malformation were obtained in this study. Subsequently, the soundtracks were divided into different training and testing sets to establish the recognition system and evaluate the performance. The automatic murmur recognition system was based on a novel temporal attentive pooling-convolutional recurrent neural network (TAP-CRNN) model. On analyzing the performance using the test data that comprised 178 VSD heart sounds and 60 normal heart sounds, a sensitivity rate of 96.0% was obtained along with a specificity of 96.7%. When analyzing the heart sounds recorded in the second aortic and tricuspid areas, both the sensitivity and specificity were 100%. We demonstrated that the proposed TAP-CRNN system can accurately recognize the systolic murmurs of VSD patients, showing promising potential for the development of software for classifying the heart murmurs of several other structural heart diseases.

Murmur identification and outcome prediction in phonocardiograms using deep features based on Stockwell transform

Article Open access 31 March 2024

A lightweight hybrid deep learning system for cardiac valvular disease classification

Article Open access 22 August 2022

A deep learning framework assisted echocardiography with diagnosis, lesion localization, phenogrouping heterogeneous disease, and anomaly detection

Article Open access 02 January 2023

Introduction

Ventricular septal defect (VSD), a type of congenital heart disease (CHD) caused by developmental defects of the interventricular septum, is the most common type of heart malformation present at birth. It occurs in approximately 2–6 of every 1000 live births and accounts for approximately 30% of all CHDs in children/adolescents^1,2,3,4. The clinical presentation of a VSD is correlated with the size of the defect⁵. Mild VSDs are usually asymptomatic and commonly occur spontaneously within close proximity⁶. Patients with medium defects often suffer from dyspnea. Patients with severe VSDs exhibit cyanosis, dyspnea, syncope, or heart failure and require adequate surgeries unless the defects spontaneously decrease^7,8,9. VSDs can also be classified according to the morphology and anatomical location of the defect. They can also be classified into four anatomical types: type I (outlet supracristal, subarterial, or infundibular), type II (perimembranous, paramembranous, or conoventricular), type III (inlet, atrioventricular canal, or atrioventricular septal defect), and type IV (muscular or trabecular)^10,11,12. The perimembranous type is the most common (~ 80%), followed by the muscular (15–20%), inlet (~ 5%), and outlet (~ 5%) types.

Similar to many other heart malformations, heart murmurs can be heard in patients with VSD¹³. Patients with a VSD are known to commonly experience holosystolic murmurs, owing to the turbulence of the blood flow between the left and right ventricles^14,15. Murmur recognition with auscultation is conventionally used for the screening and diagnosis of VSD¹⁶. However, the accuracy of this method largely depends on clinical experience and is a challenge for most young and inexperienced clinicians¹⁷. Therefore, the development of tools to automatically recognize heart-sound patterns can help physicians diagnose heart disease.

Artificial intelligence has recently been widely used in computer-aided diagnosis^18,19. For example, many algorithms that claim to automatically recognize and classify medical images have been developed using deep learning^20,21,22. Recent efforts have shown significant advances using artificial neural networks (ANNs) or deep neural networks (DNNs) to detect and classify heart sounds^23,24,25. Convolutional neural networks (CNNs) have also been used to identify heart murmurs²⁶. The aim of this study was to develop a novel algorithm that can automatically recognize the systolic murmurs of VSD patients using a novel temporal attentive pooling–convolutional recurrent neural network (TAP-CRNN) model²⁷.

Results

Heart sounds from 76 subjects, including 51 VSD patients and 25 patients without significant heart malformations, were included in this study. Table 1 shows the mean age, height, weight, and sex distribution of these subjects. There were no statistically significant differences between the group suffering from VSD and the normal group, with regard to these clinical variables. Regarding the types of VSDs, most patients were diagnosed with the type 2 VSD (perimembranous type) and a minor VSD. The details of the VSD types are listed in Table 2.

Table 1 Basic information of the subjects in this study.

Full size table

Table 2 Details of the VSD types included in this study.

Full size table

Two repeated heart sound recordings were obtained at each of the five standard auscultation spaces. For some subjects whose recordings did not qualify, owing to the presence of noise, more than two recordings were obtained within the same auscultation space to confirm the quality of the soundtracks. A total of 776 heart soundtracks were recorded from 76 subjects, including 525 soundtracks from VSD patients and 251 soundtracks from normal subjects. The number of soundtracks in the training and test sets is shown in Table 3.

Table 3 Number of subjects and heart sound recordings in this study.

Full size table

The TAP-CRNN model was used to recognize systolic murmurs in the current study. The structure of TAP-CRNN is described in Fig. 1, in which the phonocardiogram (PCG) signals were first converted into the spectral domain using a short-time Fourier transform, with a frame length of 512 and a frameshift of 256. Then, each frame of PCG signals is represented by a 257-dimentional log-power spectral feature vector. An input signal was classified as a systolic murmur or a normal signal for training the TAP-CRNN model, which consists of four parts: convolutional, recurrent, temporal attentive pooling (TAP), and dense layers. The convolutional layers extracted invariant spatial–temporal representations from the spectral features. The recurrent layers were used in the following step to extract the long temporal context information from the representations. The TAP layers were then used to assign importance weights to each frame in the systolic regions. Finally, the classified results were generated by the dense layers according to the temporal attentive pooling feature outputted from the TAP layers.

The performance of the algorithm for systolic murmur recognition was analyzed for two different tasks, namely, a train–test split and K-fold cross-validation. Table 4 shows the performance of TAP-CRNN for systolic murmur recognition in the train–test split task. The CNN and CRNN models for systolic murmur recognition were also analyzed for comparison. The CNN model comprised three convolutional layers, where the first layer consisted of 32 filters with a kernel size of 1 × 4, the second layer included 32 filters with a kernel size of 1 × 4, and the third layer contained 32 filters with a kernel size of 4 × 4; the model also comprised two dense layers, with each layer composed of 512 neurons. The CRNN model comprised two convolutional layers, with each layer consisting of 16 filters with a kernel size of 1 × 4; two recurrent layers (long short-term memory unit), with each layer including 256 neurons; and two dense layers, with each layer containing 256 neurons. Compared with the CRNN architecture, a TAP-CRNN comprises an additional TAP layer. The hyperbolic tangent units were used in all the models, and the softmax unit was used in the last output layer. Adaptive moment estimation (Adam)²⁸ was used as the optimizer. For the train–test split task, the entire set of data were divided into 70% (191 normal sounds, and 351 systolic murmur sounds) and 30% (60 normal sounds and 178 systolic murmur sounds) for training the murmur recognition models and testing their performances, respectively. For this task, the sensitivity and specificity scores were 88% and 85% for CNN, 92% and 93% for CRNN, and 97% and 98% for TAP-CRNN, respectively. In supplementary Tables 1–3, 2 × 2 tables of positive and negative events are shown. The receiver operating characteristic (ROC) curves of CNN, CRNN, and TAP-CRNN are shown in Fig. 2. The results show that the use of the TAP-CRNN model achieves a better accuracy for systolic murmur recognition when compared to the use of the CNN and CRNN models. The K-fold cross-validation task was used to further verify the reliability of the system performance. We conducted experiments using a fourfold (K = 4) setup. We first divided the entire set of PCG data into four groups, and roughly equal numbers of VSD patients and normal people were assigned to each group. We used data belonging to three out of these four groups for training the TAP-CRNN model, and the remaining group was used for testing. There were no overlapping subjects in the training and test sets. We carried out this procedure four times, the results of which have been listed in Table 5. From Table 5, we can see that the fourfold results are quite consistent and share the same trends as the results reported in Table 4 (the train– test split task). The average sensitivity and specificity scores over 4-folds were 97.18% and 91.98% of TAP-CRNN, confirming that the proposed TAP-CRNN can reliably produce satisfactory results for all evaluation metrics.

Table 4 Results of testing the algorithm’s ability to distinguish systolic murmurs from normal heart sounds.

Full size table

Table 5 Results of fourfold cross validation of TAP-CRNN.

Full size table

The capability of the TAP-CRNN model for recognizing the systolic murmurs at the five standard auscultation sites was also analyzed (Table 6, supplemental Tables 4–8). Both the second aortic and the tricuspid areas showed 100% sensitivity and 100% specificity. The sensitivity was decreased in the other spaces, including the aortic (95.5%), pulmonic (94.1%), and mitral (94.1%) areas.

Table 6 Test results of the TAP-CRNN model’s ability to distinguish systolic murmur from normal heart sounds at the 5 standard auscultation locations.

Full size table

Discussion

A murmur is a sound generated by the turbulent blood flow in the heart. Under normal conditions, the blood flow in a vascular bed is smooth and silent. However, blood flow can be turbulent and produce extra noise when the heart has a structural defect²⁹. Murmurs can be classified based on their timing, duration, intensity, pitch, and shape. Specific murmur patterns may occur as a result of many types of structural heart diseases¹⁴. For example, holosystolic murmurs, which are characterized by uniform intensity during the systolic period, usually appear in patients with mitral regurgitation (MR), tricuspid regurgitation (TR), or VSD^30,31,32. Murmurs that occur during the systolic period with a crescendo-decrescendo shape are called systolic ejection murmurs and are often heard in patients with aortic stenosis (AS), pulmonic stenosis (PS), and atrial septal defect (ASD)³⁰. Experienced cardiologists may successfully distinguish these specific heart sound patterns during routine auscultation, and this capability is important in disease diagnosis. However, it is always a challenge for young and inexperienced physicians to make a correct diagnosis based on auscultation^17,33. Therefore, the development of tools that can automatically classify specific murmur types is necessary and clinically significant^34,35.

In recent years, CNNs have been widely used in computer-aided diagnosis^36,37. Previous studies have used a CNN to classify pathological heart sounds^38,39. A recurrent neural network is another model frequently used in computer-aided diagnosis^40,41. In this study, we combined CNN and RNN models (forming a CRNN model) to recognize the systolic murmurs from VSD patients. We used a convolutional unit to extract invariant spatial–temporal representations and the recurrent unit to capture long temporal-context information for systolic murmur recognition. In addition, the TAP mechanism was also applied in the CRNN model to assign an importance weight for each frame within the murmur regions. Finally, the overall model is called TAP-CRNN. From our experimental results, the TAP-CRNN model demonstrated an accuracy of 96% for distinguishing systolic murmurs from normal heart sounds, outperforming both CNN and CRNN without TAP.

For heart sounds recorded in the tricuspid and second aortic areas (Erb’s point), both the sensitivity and specificity reached 100% when the TAP-CRNN model was used. A high accuracy in these two areas is reasonable because the murmurs caused by the blood flow between the right and left ventricles can be most clearly heard in the tricuspid area or the lower left sternal border, which overlies the defect⁴².

The intensity of the murmur is inversely proportional to the size of the VSD. The ability of the algorithm to recognize the murmurs caused by a moderate or large VSD was also tested in the current study. In the test set, 63 soundtracks from 6 patients with moderate/large VSDs were included. When using the TAP-CRNN model, the murmurs of these soundtracks from moderate/large VSDs can be accurately recognized, except for two soundtracks recorded in the mitral area. Although the results obtained by TAP-CRNN are encouraging, we will further test the performance using a larger dataset of heart sound in the future.

This study has several limitations. As a major limitation, this study focused on the specific heart sound patterns of VSD, while not considering other types of structural heart diseases. Although heart murmurs can be heard in many other congenital and valvular heart diseases, such as atrial septal defects, patent ductus arteriosus, mitral regurgitation, and aortic regurgitation, patients with these diseases were not included in this study. Harmless heart murmurs, which occasionally occur in normal subjects, were also not included^43,44,45. A larger heart sound database is currently being established to comprehensively collect heart sounds from patients with all types of structural heart diseases. An advanced version of the proposed TAP-CRNN algorithm that can recognize the specific murmur types in such diseases is also under development.

Conclusions

We demonstrated that a TAP-CRNN model can accurately recognize the systolic murmur of VSD patients. As compared to CNN and CRNN without TAP, the proposed TAP-CRNN achieves higher sensitivity and specificity scores for systolic murmurs detections in patients with VSDs. The results suggest that by incorporating the attention mechanism, the CRNN-based model can more accurately detect murmur signals. We also noted that sounds recorded from the second aortic and the tricuspid areas can facilitate more accurate murmur detection results as compared to other auscultation sites. The experimental results from the present study confirmed that the proposed TAP-CRNN serves as a promising model for the development of software to classify the heart murmurs of many other types of structural heart diseases.

Methods

In this section, we introduce our data source, algorithm, and analysis method.

Data source

The sound dataset used in this study included heart sounds recorded from subjects at the National Taiwan University Hospital (NTUH) using an iMediPlus electronic stethoscope. This study was approved by the research ethics committee of NTUH, and informed consent was obtained from all subjects or, if subjects are under 18, from a parent and/or legal guardian in accordance with the Declaration of Helsinki. It is also confirmed that all methods were carried out in accordance with relevant guidelines and regulations.

Sounds from patients diagnosed with VSD were categorized as the VSD group and sounds from patients without a significant heart malformation were categorized as a normal group. Auscultation was applied for each subject by a cardiologist with 30 years of experience to confirm whether a pathological systolic murmur occurred in patients with VSD. Normal subjects with innocent murmurs were not included in this study. Echocardiography was conducted on all subjects to confirm the disease diagnosis⁴⁶.

For each subject, two repeated heart sound recordings lasting 10 s each were made at each of the following sites: the aortic area (the second intercostal space on the right sternal border), the pulmonic area (the second intercostal space on the left sternal border), the secondary aortic area/Erb’s point (the third intercostal space on the left sternal border), the tricuspid area (the fourth intercostal space on the left sternal border), and the mitral area/apex (the fifth intercostal space to the left of the midclavicular line)^30,47. The sounds were recorded by trained study nurses under the supervision of an experienced cardiologist. The soundtracks were saved as WAV files.

The soundtracks collected were divided into training and test sets. Notably, the training and test sets are two mutually exclusive sets without an overlap.

Algorithm characteristics

In this study, a short-time fast Fourier transformation was used to transform the phonocardiogram (PCG) signal into a time–frequency representation (spectral features), where ${\bf{X}} = \left[ {{\varvec{x}}(1) , \ldots , {\varvec{x}}(n), \ldots ,{\varvec{x}}(N) } \right]$ denotes the input feature, and N is the number of frames of X. Each frame is represented by a 257-dimensional log-power spectral feature vector. The collection of frames in X forms a spectrogram, which is generally used to visualize the characteristics of temporal signals varying over time (Fig. 3). In this study, the TAP-CRNN structure was used for classification²⁷. Figure 1 shows the network architecture of TAP-CRNN, in which convolutional layers⁴⁸ were used to extract invariant frequency-shift features ${\bf{Y}} = \left[ {{\varvec{y}}(1) , \ldots , {\varvec{y}}(n) , \ldots ,{\varvec{y}}(N) } \right].$ Recurrent layers⁴⁹ were used to explore the global temporal feature ${\varvec{h}}\left( {N} \right)$ of a sequence from the recurrent layer’s outputs, and the TAP layers then extracted the temporal attentive feature and weighed the spectral features when generating the classification results. Figure 4 shows CRNN with a TAP mechanism. The idea here is to focus on important features or regions by introducing attention blocks. Two different attention approaches, local and global, were used to exploit the effectiveness of the TAP mechanism. The back-propagation algorithm is adopted to train the TAP-CRNN parameters to minimize the cross entropy⁵⁰. In terms of global attention, the model decides to focus equally on all regions (global). By contrast, local attention focuses on small regions (local). The idea of global attention is to consider all outputs of the convolutional layer and the temporal summarization of the output of the recurrent layer. For global attention of the TAP, we employ a simple concatenated layer to construct the global attentive vector c(n) by combining the information from the output of the convolutional layer ${\varvec{y}}\left( {n} \right)$ and the output of the recurrent layer ${\varvec{h}}\left( {N} \right)$, such as in the following:

$${\varvec{c}}\left( {n} \right) = \left[ {\begin{array}{*{20}c} {{\varvec{W}}_{c} {\varvec{y}}\left( {n} \right)} \\ {{\varvec{W}}_{r} {\varvec{h}}\left( {N} \right)} \\ \end{array} } \right],$$

(1)

where ${\varvec{h}}\left( {N} \right)$ is the output of the recurrent layer at the last time step, ${\varvec{W}}_{c}$ and ${\varvec{W}}_{r}$ are the parameter matrices used to concatenate ${\varvec{y}}\left( n \right)$ and ${\varvec{h}}\left( {N} \right)$, i.e.,${\varvec{W}}_{c} \in R^{{cnn_{dim} \times cnn_{dim} }}$ and ${\varvec{W}}_{r} \in R^{{rnn_{dim} \times rnn_{dim} }}$, where $cnn_{dim}$ and $rnn_{dim}$ are the output dimensions of the convolutional and recurrent layers, respectively.

The global attentive vector ${\varvec{c}}\left( {n} \right)$ is subsequently fed into the global attention block to produce the global attention weights $\alpha_{global}$ (scalar) and is shown as follows:

$$\alpha_{global} \left( {n} \right) = softmax\left( {{\varvec{u}}^{T} \tanh \left( {{\varvec{c}}\left( {n} \right) + {\varvec{b}}_{global} } \right)} \right),$$

(2)

where ${\varvec{u}} \in R^{{\left( {cnn_{dim} + rnn_{dim} } \right) \times 1}}$ is the vector used to calculate the global attention weight matrix shared by all time steps, and ${\varvec{b}}_{global} \in R^{{\left( {cnn_{dim} + rnn_{dim} } \right) \times 1}}$ is the global bias matrix. The global attention weights are used to weight the local features from the convolutional layer at each time step as follows:

$${\varvec{z}}\left( {n} \right) = \alpha_{global} \left( n \right){\varvec{y}}\left( n \right),$$

(3)

In addition to the global attention, the local attention is used to further refine the feature extraction and is calculated in the following manner:

$$\beta_{local} \left( n \right) = softmax\left( {{\varvec{v}}^{T} {\text{tanh}}\left( {{\varvec{W}}_{l} {\varvec{z}}\left( n \right) + {\varvec{b}}_{l} } \right)} \right),$$

(4)

where ${\varvec{W}}_{l} \in R^{{cnn_{dim} \times cnn_{dim} }}$, ${\varvec{b}}_{l} \in R^{{cnn_{dim} \times 1}}$, and ${\varvec{v}} \in R^{{cnn_{dim} \times 1}}$ are the parametric matrices used for the local attention weight calculation. These local attention weights are used to weight the features such as in the following:

$${\varvec{f}}\left( {n} \right) = { }\alpha_{global} \left(n \right) \beta_{local} \left( n \right)\user2{ y}\left(n \right),$$

(5)

where $\beta_{local} \left( n \right)$ is the output weight vector for local attention. The final attentive context is calculated as the average of the weighted outputs and is shown as follows:

$$\hat{\user2{f}} = { }\frac{1}{N}\mathop \sum \limits_{n = 1}^{N} \alpha_{global} \left( n \right) \beta_{local} \left( n \right)\user2{ y}\left( n \right),$$

(6)

After obtaining the attentive context $\hat{\user2{f}}$, we concatenate it with the last time step output ${\varvec{h}}\left( {N} \right)$ of the CRNN as the input s of the dense layers, such as in the following:

$${\varvec{s}} = { }\left[ {\begin{array}{*{20}c} {\hat{\user2{f}}} \\ {{\varvec{W}_g\varvec{h}}\left( {N} \right)} \\ \end{array} } \right]{ },$$

(7)

The dense layers are constructed using fully connected units. The relationship between feature s and the output of the first hidden layer is described as follows:

$${\varvec{a}}_{1} = F\left( {{\varvec{W}}_{1}^{{}} {\varvec{s}} + {\varvec{b}}_{1} } \right),$$

(8)

where ${\varvec{W}}_{1}$ and ${\varvec{b}}_{1}$ correspond to the weight and bias vector in the first layer, and F(.) is the activation function. After obtaining the output of the first hidden layer, the relationship between the current and next hidden layer can be expressed as follows:

$${\varvec{a}}_{l} = F\left( {{\varvec{W}}_{l}^{{}} {\varvec{a}}_{l - 1} + {\varvec{b}}_{l} } \right), \, l = 2, \ldots ,L,$$

(9)

where L is the total number of layers of neurons in the output layer. Thus, the relationship for the classification layer or the output layer can be described as follows:

$${\varvec{o}} = G\left( {{\varvec{a}}_{L} } \right),$$

(10)

where G(.) is the softmax function, and o is the final output of TAP-CRNN.

The importance coefficients provided by the global and local attention were regarded as a frame-based event presence likelihood (EPL), i.e., $\alpha_{global} \left( n \right) \beta_{local} \left(n \right)$. To determine the classified result, the frames with low EPLs were ignored while being emphasized with high EPLs. Figure 5 illustrates the spectrogram (Fig. 5a) and the EPL score (Fig. 5b) of heart sounds from subjects with VSD, in which the murmur regions showed high EPLs when the global attention coefficients and the local attention coefficients were calculated. The features of the murmur regions with a high EPL will be emphasized during the feature extraction.

Statistical analysis

The sex distribution, mean age, mean height, and mean weight of the subjects were calculated. An independent sample t-test and a chi-square test were conducted to compare the differences between the VSD and normal groups in terms of continuous and categorical variables, respectively.

The soundtracks used in the test set were applied to test the recognition performance. The accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for the distinction of the systolic murmur of VSD patients from the normal heart sounds of healthy volunteers were calculated^51,52,53. A diagnosis using echocardiography was applied as the gold standard for these calculations^9,10,54. The equations are as follows:

$$Accuracy = \frac{Tp + Tn}{{Tp + Tn + Fp + Fn}}$$

(11)

$$Sensitivity = \frac{Tp}{{Tp + Fn}}$$

(12)

$$Specificity = \frac{Tn}{{Fp + Tn}}$$

(13)

$$PPV = \frac{Tp}{{Tp + Fp}}$$

(14)

$$NPV = \frac{Tn}{{Fn + Tn}}$$

(15)

where Tp indicates a true positive, Tn indicates a true negative, Fp indicates a false positive, and Fn indicates a false negative.

References

Yeh, S. J. et al. National database study of survival of pediatric congenital heart disease patients in Taiwan. J. Formos. Med. Assoc. 114, 159–163 (2015).
Article PubMed Google Scholar
Chaudhry, T. A., Younas, M. & Baig, A. Ventricular septal defect and associated complications. J. Pak. Med. Assoc. 61, 1001–1004 (2011).
PubMed Google Scholar
Wu, M. H. et al. Prevalence of congenital heart disease at live birth in Taiwan. J. Pediatr. 156, 782–785 (2010).
Article PubMed Google Scholar
Hoffman, J. I. E. & Kaplan, S. The incidence of congenital heart disease. J. Am. Coll. Cardiol. 39, 1890–1900 (2002).
Article PubMed Google Scholar
Ito, T., Okubo, T., Kimura, M., Ito, S. & Akabane, J. Increase in diameter of ventricular septal defect and membranous septal aneurysm formation during the infantile period. Pediatr. Cardiol. 22, 491–493 (2001).
Article CAS PubMed Google Scholar
Zhang, J., Ko, J. M., Guileyardo, J. M. & Roberts, W. C. A review of spontaneous closure of ventricular septal defect. Baylor Univ. Med. Cent. Proc. 28, 516–520 (2015).
Article Google Scholar
Ammash, N. M. & Wames, C. A. Ventricular septal defects in adults. Ann. Internal Med. 135, 812–824 (2001).
Article CAS Google Scholar
Minette, M. S. & Sahn, D. J. Congenital heart disease for the adult cardiologist-ventricular septal defects. Circulation 114, 2190–2197 (2006).
Article PubMed Google Scholar
Dearani, J. A. et al. 2018 AHA/ACC guideline for the management of adults with congenital heart disease. Circulation 139, e698–e800 (2019).
PubMed Google Scholar
Baumgartner, H. et al. ESC guidelines for the management of grown-up congenital heart disease (new version 2010). Eur. Heart J. 31, 2915–2957 (2010).
Article PubMed Google Scholar
Tchervenkov, C. I. & Roy, N. Congenital heart surgery nomenclature and database project: pulmonary atresia—ventricular septal defect. Ann. Thorac. Surg. 69, S25–S35 (2000).
Google Scholar
McDaniel, N. L. Ventricular and atrial septal defects. Pediatr. Rev. 22, 265–270 (2001).
Article CAS PubMed Google Scholar
Etoom, Y. & Ratnapalan, S. Evaluation of children with heart murmurs. Clin. Pediatr. (Phila) 53, 111–117 (2014).
Article Google Scholar
Lessard, E., Glick, M., Ahmed, S. & Saric, M. The patient with a heart murmur: evaluation, assessment and dental considerations. J. Am. Dent. Assoc. 136, 347–356 (2005).
Article PubMed Google Scholar
Frank, J. E. & Jacobe, K. M. Evaluation and management of heart murmurs in children. Am. Fam. Physician 84, 793–800 (2011).
PubMed Google Scholar
Kang, G. et al. Prevalence and clinical significance of cardiac murmurs in schoolchildren. Arch. Dis. Child. 100, 1028–1031 (2015).
Article PubMed Google Scholar
Kumar, K. & Thompson, W. R. Evaluation of cardiac auscultation skills in pediatric residents. Clin. Pediatr. (Phila) 52, 66–73 (2013).
Article Google Scholar
Erickson, B. J. & Bartholmai, B. Computer-aided detection and diagnosis at the start of the third millennium. J. Digit. Imaging 15, 59–68 (2002).
Article PubMed PubMed Central Google Scholar
Bluemke, D. A. Radiology in 2018: are you working with AI or Being replaced by AI?. Radiology 287, 365–366 (2018).
Article PubMed Google Scholar
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Article CAS PubMed PubMed Central Google Scholar
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Article PubMed Google Scholar
Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 10, 257–273 (2017).
Article PubMed Google Scholar
Chen, T. E. et al. S1 and S2 heart sound recognition using deep neural networks. IEEE Trans. Biomed. Eng. 64, 372–380 (2017).
Article ADS PubMed Google Scholar
DeGroff, C. G. et al. Artificial neural network—based method of screening heart murmurs in children. Circulation 103, 2711–2716 (2001).
Article CAS PubMed Google Scholar
Tsao, Y. et al. Robust S1 and S2 heart sound recognition based on spectral restoration and multi-style training. Biomed. Signal Process. Control 49, 173–180 (2019).
Article Google Scholar
Dominguez-Morales, J. P., Jimenez-Fernandez, A. F., Dominguez-Morales, M. J. & Jimenez-Moreno, G. Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors. IEEE Trans. Biomed. Circ. Syst. https://doi.org/10.1109/TBCAS.2017.2751545 (2018).
Article Google Scholar
Lu, X., Shen, P., Li, S., Tsao, Y. &Kawai, H. Temporal attentive pooling for acoustic event detection. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (2018). doi:https://doi.org/10.21437/Interspeech.2018-1552
Kingma, D. P. & Ba, J. L. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings (2015).
Pelech, A. N. The physiology of cardiac auscultation. Pediatr. Clin. N. Am. 51, 1515–1535 (2004).
Article Google Scholar
Chizner, M. A. Cardiac auscultation: rediscovering the lost art. Curr. Probl. Cardiol. 33, 326–408 (2008).
Article PubMed Google Scholar
Alpert, M. A. Systolic murmurs. In Clinical Methods: The History, Physical, and Laboratory Examinations (eds Walker, H. K. & Hall, W. D.) (Butterworth, Oxford, 1990).
Google Scholar
Naik, R. J. & Shah, N. C. Teenage heart murmurs. Pediatr. Clin. N. Am. 61, 1–16 (2014).
Article Google Scholar
Mattioli, L. F., Belmont, J. M. & Davis, A. M. Effectiveness of teaching cardiac auscultation to residents during an elective pediatric cardiology rotation. Pediatr. Cardiol. 29, 1095–1100 (2008).
Article PubMed PubMed Central Google Scholar
Satou, G. M. et al. Telemedicine in pediatric cardiology: a scientific statement from the American Heart Association. Circulation 135, e648–e678 (2017).
Article PubMed Google Scholar
Leng, S. et al. The electronic stethoscope. Biomed. Eng. Online 14, 66 (2015).
Article PubMed PubMed Central Google Scholar
Yamashita, R., Nishio, M., Do, R. K. G. & Togashi, K. Convolutional neural networks: an overview and application in radiology. Insights into Imaging https://doi.org/10.1007/s13244-018-0639-9 (2018).
Article PubMed PubMed Central Google Scholar
Brinker, T. J. et al. Skin cancer classification using convolutional neural networks: systematic review. J. Med. Internet Res. https://doi.org/10.2196/11936 (2018).
Article PubMed PubMed Central Google Scholar
Demir, F., Şengür, A., Bajaj, V. & Polat, K. Towards the classification of heart sounds based on convolutional deep neural network. Health Inf. Sci. Syst. 7, 16 (2019).
Article PubMed PubMed Central Google Scholar
Bozkurt, B., Germanakis, I. & Stylianou, Y. A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection. Comput. Biol. Med. https://doi.org/10.1016/j.compbiomed.2018.06.026 (2018).
Article PubMed Google Scholar
Faust, O. et al. Automated detection of atrial fibrillation using long short-term memory network with RR interval signals. Comput. Biol. Med. https://doi.org/10.1016/j.compbiomed.2018.07.001 (2018).
Article PubMed Google Scholar
Lee, C., Kim, Y., Kim, Y. S. & Jang, J. Automatic disease annotation from radiology reports using artificial intelligence implemented by a recurrent neural network. Am. J. Roentgenol. https://doi.org/10.2214/AJR.18.19869 (2019).
Article Google Scholar
Begic, E. & Begic, Z. Accidental heart murmurs. Med. Arch. 71, 284–287 (2017).
Article PubMed PubMed Central Google Scholar
Saunders, N. R. Innocent heart murmurs in children. Taking a diagnostic approach. Can. Fam. Physician 41, 1512 (1995).
Google Scholar
Doshi, A. R. Innocent heart murmur. Cureus 10, e3689 (2018).
PubMed PubMed Central Google Scholar
Danford, D. A., Martin, A. B., Fletcher, S. E. & Gumbiner, C. H. Echocardiographic yield in children when innocent murmur seems likely but doubts linger. Pediatr. Cardiol. 23, 410–414 (2002).
Article CAS PubMed Google Scholar
Mcleod, G. et al. Echocardiography in congenital heart disease. Prog. Cardiovasc. Dis. 61, 468–475 (2018).
Article PubMed Google Scholar
Bickley, L. S. Bates’ Guide to Physical Examination and History-Taking - Eleventh Edition. (LWW, 2012).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. https://doi.org/10.1016/j.protcy.2014.09.007 (2012).
Article Google Scholar
Weninger, F. et al. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9237, 91–99 (2015).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article ADS MATH Google Scholar
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Article Google Scholar
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
Article Google Scholar
Diel, R., Loddenkemper, R., Niemann, S., Meywald-Walter, K. & Nienhaus, A. Negative and positive predictive value of a whole-blood interferon-γ release assay for developing active tuberculosis: an update. Am. J. Respir. Crit. Care Med. 183, 88–95 (2011).
Article CAS PubMed Google Scholar
Deri, A. & English, K. EDUCATIONAL SERIES IN CONGENITAL HEART DISEASE: Echocardiographic assessment of left to right shunts: atrial septal defect, ventricular septal defect, atrioventricular septal defect, patent arterial duct. Echo Res. Pract. 5, R1–R16 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was supported by iMediPlus, Inc.

Author information

Authors and Affiliations

National Taiwan University Children’s Hospital, Taipei, Taiwan
Jou-Kou Wang
iMediPlus Inc., Hsinchu, Taiwan
Yun-Fan Chang, Kun-Hsi Tsai, Wei-Chien Wang, Chang-Yen Tsai & Chui-Hsuan Cheng
Research Center for Information Technology Innovation at Academia Sinica, Taipei, Taiwan
Yu Tsao

Authors

Jou-Kou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Fan Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Hsi Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chien Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chang-Yen Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Chui-Hsuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yu Tsao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.-K.W. diagnosed disease, provided the experimental concepts and designed the experiment. W.-C.W., Y.-F.C. and Y.T. performed the experiments, developed the algorithm and wrote the paper. K.-H.T and C.-H.C. provided the experimental concepts and supplied an electronic stethoscope for recording data. C.-Y.T. analyzed and labeled data and wrote the paper.

Corresponding author

Correspondence to Yu Tsao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, JK., Chang, YF., Tsai, KH. et al. Automatic recognition of murmurs of ventricular septal defect using convolutional recurrent neural networks with temporal attentive pooling. Sci Rep 10, 21797 (2020). https://doi.org/10.1038/s41598-020-77994-z

Download citation

Received: 04 February 2020
Accepted: 18 November 2020
Published: 11 December 2020
DOI: https://doi.org/10.1038/s41598-020-77994-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.