INVESTIGATIONS ON APPLICATIONS OF IMPROVED DEEP CONVOLUTIONAL NEURAL NETWORK BASED SPEECH EMOTION DETECTION

Authors

  • Savita Jain, Dr. Tarun Shrimali Author

Abstract

Speech emotion detection (SED) has become a pivotal area of research in the field of human-computer interaction, aiming to enhance user experience by enabling machines to understand and respond to human emotions. This paper explores the application of an improved deep convolutional neural network (CNN) for the task of speech emotion detection, emphasizing advancements in network architecture and training methodologies.

Traditional methods of SED often rely on handcrafted features and conventional machine learning techniques, which can be limited in their ability to capture the complex and nuanced characteristics of human emotions in speech. To address these limitations, this study proposes an enhanced CNN model designed to automatically learn and extract relevant features from raw audio signals. The improved CNN architecture incorporates multiple convolutional layers, batch normalization, and residual connections to facilitate deeper learning and mitigate the vanishing gradient problem. Additionally, the model employs a hybrid approach by integrating long short-term memory (LSTM) networks to capture temporal dependencies within the speech data, further improving the accuracy of emotion recognition.

The experimental results demonstrate that the improved CNN model significantly outperforms traditional methods and existing CNN-based models in terms of accuracy, precision, and recall across multiple standard emotion datasets, including the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The study also highlights the model's robustness in real-world scenarios, where variations in speech intensity, pitch, and background noise are prevalent.

The findings underscore the potential of improved deep CNNs in advancing SED technology, paving the way for more intuitive and emotionally aware human-computer interactions. Future work will focus on optimizing the model for real-time applications and exploring its integration into various domains such as customer service, healthcare, and entertainment.

 

Downloads

Published

2023-12-10

Issue

Section

Articles

How to Cite

INVESTIGATIONS ON APPLICATIONS OF IMPROVED DEEP CONVOLUTIONAL NEURAL NETWORK BASED SPEECH EMOTION DETECTION. (2023). JOURNAL OF BASIC SCIENCE AND ENGINEERING, 20(1), 36-51. https://yigkx.org.cn/index.php/jbse/article/view/185