Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features
https://doi.org/10.35596/1729-7648-2024-22-3-93-100
Abstract
The problem of recognizing emotions in a speech signal using mel-frequency cepstral coefficients using a classifier based on the support vector machine has been studied. The RAVDESS data set was used in the experiments. A model is proposed that uses a 306-component suprasegmental feature vector as input to a support vector machine classifier. Model quality was assessed using unweighted average recall (UAR). The use of linear, polynomial and radial basis functions as a kernel in a classifier based on the support vector machine is considered. The use of different signal analysis frame sizes (from 23 to 341 ms) at the stage of extracting mel-frequency cepstral coefficients was investigated. The research results revealed significant accuracy of the resulting model (UAR = 48 %). The proposed approach shows potential for applications such as voice assistants, virtual agents, and mental health diagnostics.
About the Authors
D. V. KrasnoproshinBelarus
Master’s Student at the Department of Electronic Computing Facilities
220013, Minsk, P. Brovki St., 6
M. I. Vashkevich
Belarus
Vashkevich Maxim Iosifovich, Dr. of Sci. (Tech.), Professor at the Department of Electronic Computing Facilities
220013, Minsk, P. Brovki St., 6
Tel.: +375 17 293-84-78
References
1. Issa D., Demirci F. M., Yazici A. (2020) Speech Emotion Recognition with Deep Convolutional Neural Networks. Biomedical Signal Processing and Control. 59.
2. Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martэínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21 (22), 1–29.
3. Xiao H., Li W., Zeng G., Wu Y., Xue J., Zhang J., [et al.] (2022) On-Road Driver Emotion Recognition Using Facial Expression. Appl. Sci. 12 .
4. Sadok S., Leglaive S., Séguier R. (2023) A Vector Quantized Masked Autoencoder for Speech Emotion Recognition. arXiv preprint arXiv. 2304.
5. Bhavan A., Chauhan P., Shah R. R. (2019) Bagged Support Vector Machines for Emotion Recognition from Speech. Knowledge-Based Systems . 184, 1–7.
6. Baruah M., Banerjee B. (2022) Speech Emotion Recognition via Generation Using an Attention-Based Variational Recurrent Neural Network. Proc. Interspeech. 4710–4714.
7. Yu C., Tian Q., Cheng F., Zhang S. (2011) Speech Emotion Recognition Using Support Vector Machines. Advanced Research on Computer Science and Information Engineering. Communications in Computer and Information Science. 152.
8. Huang X., Acero A., Hon H.-W., Foreword By-Reddy R. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR.
9. Hastie T., Tibshirani R., Friedman J. H., Friedman J. H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
10. On C. K., Pandiyan P. M., Yaacob S., Saudi A. (2006) Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition. In 2006 International Conference on Computing & Informatics. 1–5.
11. Livingstone S. R., Russo F. A. (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English. PloS One. 13 (5).
12. Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21.
Review
For citations:
Krasnoproshin D.V., Vashkevich M.I. Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features. Doklady BGUIR. 2024;22(3):93-100. (In Russ.) https://doi.org/10.35596/1729-7648-2024-22-3-93-100