Speech Emotion Recognition Method Based on  Support Vector Machine and Suprasegmental Acoustic Features

D. V. Krasnoproshin; M. I. Vashkevich

doi:10.35596/1729-7648-2024-22-3-93-100

Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features

D. V. Krasnoproshin, M. I. Vashkevich

https://doi.org/10.35596/1729-7648-2024-22-3-93-100

Full Text:

PDF (Rus)

Generate QR code

Abstract

The problem of recognizing emotions in a speech signal using mel-frequency cepstral coefficients using a classifier based on the support vector machine has been studied. The RAVDESS data set was used in the experiments. A model is proposed that uses a 306-component suprasegmental feature vector as input to a support vector machine classifier. Model quality was assessed using unweighted average recall (UAR). The use of linear, polynomial and radial basis functions as a kernel in a classifier based on the support vector machine is considered. The use of different signal analysis frame sizes (from 23 to 341 ms) at the stage of extracting mel-frequency cepstral coefficients was investigated. The research results revealed significant accuracy of the resulting model (UAR = 48 %). The proposed approach shows potential for applications such as voice assistants, virtual agents, and mental health diagnostics.

Keywords

voice signal, mel-frequency cepstral coefficients, audio feature extraction, recognition, machine learning

About the Authors

D. V. Krasnoproshin

Belarusian State University of Informatics and Radioelectronics
Belarus

Master’s Student at the Department of Electronic Computing Facilities

220013, Minsk, P. Brovki St., 6

M. I. Vashkevich

Belarusian State University of Informatics and Radioelectronics
Belarus

Vashkevich Maxim Iosifovich, Dr. of Sci. (Tech.), Professor at the Department of Electronic Computing Facilities

220013, Minsk, P. Brovki St., 6

Tel.: +375 17 293-84-78

References

1. Issa D., Demirci F. M., Yazici A. (2020) Speech Emotion Recognition with Deep Convolutional Neural Networks. Biomedical Signal Processing and Control. 59.

2. Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martэínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21 (22), 1–29.

3. Xiao H., Li W., Zeng G., Wu Y., Xue J., Zhang J., [et al.] (2022) On-Road Driver Emotion Recognition Using Facial Expression. Appl. Sci. 12 .

4. Sadok S., Leglaive S., Séguier R. (2023) A Vector Quantized Masked Autoencoder for Speech Emotion Recognition. arXiv preprint arXiv. 2304.

5. Bhavan A., Chauhan P., Shah R. R. (2019) Bagged Support Vector Machines for Emotion Recognition from Speech. Knowledge-Based Systems . 184, 1–7.

6. Baruah M., Banerjee B. (2022) Speech Emotion Recognition via Generation Using an Attention-Based Variational Recurrent Neural Network. Proc. Interspeech. 4710–4714.

7. Yu C., Tian Q., Cheng F., Zhang S. (2011) Speech Emotion Recognition Using Support Vector Machines. Advanced Research on Computer Science and Information Engineering. Communications in Computer and Information Science. 152.

8. Huang X., Acero A., Hon H.-W., Foreword By-Reddy R. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR.

9. Hastie T., Tibshirani R., Friedman J. H., Friedman J. H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

10. On C. K., Pandiyan P. M., Yaacob S., Saudi A. (2006) Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition. In 2006 International Conference on Computing & Informatics. 1–5.

11. Livingstone S. R., Russo F. A. (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English. PloS One. 13 (5).

12. Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21.

Review

For citations:

Krasnoproshin D.V., Vashkevich M.I. Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features. Doklady BGUIR. 2024;22(3):93-100. (In Russ.) https://doi.org/10.35596/1729-7648-2024-22-3-93-100

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Doklady BGUIR