Deep Learning-Based Heart Sound Classification: A CNN-Transformer Approach Using Mel-Frequency Cepstral Coefficients (Published)
Heart sound anomaly detection is crucial for the early diagnosis of cardiovascular disorders, particularly in resource-limited settings. We propose a hybrid deep learning architecture integrating Convolutional Neural Networks (CNN) with a Transformer encoder to classify heart sounds as normal or abnormal. Mel-Frequency Cepstral Coefficients (MFCCs) serve as robust time-frequency input representations. The model was evaluated against baseline approaches, including traditional CNNs and LSTM-based architectures. Our CNN-Transformer model achieved 96.35% classification accuracy with an AUC of 0.9922, significantly outperforming baseline models. The hybrid architecture captures local acoustic patterns through convolutional layers while modeling long-range dependencies via self-attention mechanisms. Confusion matrix analysis and spectrogram visualizations validate the model’s interpretability and clinical reliability. These findings demonstrate the potential of attention-augmented architectures for automated cardiac auscultation and suggest promising directions for real-time heart sound monitoring systems.
Keywords: Biomedical signal processing, CNN, Heart sound classification, Phonocardiogram, Transformer, deep learning