<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.35596/1729-7648-2024-22-3-93-100</article-id><article-id custom-type="elpub" pub-id-type="custom">bsuir-3938</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Метод распознавания эмоций в речевом сигнале с использованием машины опорных векторов и надсегментных акустических признаков</article-title><trans-title-group xml:lang="en"><trans-title>Speech Emotion Recognition Method Based on  Support Vector Machine and Suprasegmental Acoustic Features</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Краснопрошин</surname><given-names>Д. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Krasnoproshin</surname><given-names>D. V.</given-names></name></name-alternatives><bio xml:lang="ru"><p>магистрант  каф.  электронных  вычислительных средств</p><p>220013, г. Минск, ул. П. Бровки, 6</p></bio><bio xml:lang="en"><p>Master’s Student at the Department  of  Electronic Computing  Facilities</p><p>220013, Minsk, P. Brovki St., 6</p></bio><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Вашкевич</surname><given-names>М. И.</given-names></name><name name-style="western" xml:lang="en"><surname>Vashkevich</surname><given-names>M. I.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Вашкевич Максим Иосифович, д-р техн. наук, проф. каф. электронных вычислительных средств</p><p>220013, г. Минск, ул. П. Бровки, 6</p><p>Тел.: +375 17 293-84-78</p></bio><bio xml:lang="en"><p>Vashkevich Maxim Iosifovich, Dr. of Sci. (Tech.), Professor at the Department of Electronic Computing Facilities</p><p>220013, Minsk, P. Brovki St., 6</p><p>Tel.: +375 17 293-84-78</p></bio><email xlink:type="simple">vashkevich@bsuir.by</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Белорусский государственный университет информатики и радиоэлектроники</institution></aff><aff xml:lang="en"><institution>Belarusian State University of Informatics and Radioelectronics</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2024</year></pub-date><pub-date pub-type="epub"><day>24</day><month>06</month><year>2024</year></pub-date><volume>22</volume><issue>3</issue><fpage>93</fpage><lpage>100</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Краснопрошин Д.В., Вашкевич М.И., 2024</copyright-statement><copyright-year>2024</copyright-year><copyright-holder xml:lang="ru">Краснопрошин Д.В., Вашкевич М.И.</copyright-holder><copyright-holder xml:lang="en">Krasnoproshin D.V., Vashkevich M.I.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/3938">https://doklady.bsuir.by/jour/article/view/3938</self-uri><abstract><p>Исследована задача распознавания эмоций в речевом сигнале с использованием мел-частотных  кепстральных  коэффициентов  при  помощи  классификатора  на  основе  метода  опорных  векторов. При   проведении экспериментов применялся набор данных RAVDESS. Предложена модель, которая использует 306-компонентный вектор надсегментных признаков в качестве входных данных для классификатора на основе метода опорных векторов. Оценка качества модели проводилась с помощью невзвешенного среднего значения полноты (UAR). Рассмотрено применение в классификаторе на основе метода опорных векторов в качестве ядра линейной, полиномиальной и радиальной базисной функций. Исследовано использование разных размеров фрейма анализа сигнала (от 23 до 341 мс) на этапе извлечения мел-частотных кепстральных коэффициентов. Результаты исследований выявили значительную точность полученной модели (UAR  = 48 %). Предлагаемый подход демонстрирует потенциал для таких приложений, как голосовые помощники, виртуальные агенты и диагностика психического здоровья.</p></abstract><trans-abstract xml:lang="en"><p>The problem of recognizing emotions in a speech signal using mel-frequency cepstral coefficients using a classifier based on the support vector machine has been studied. The RAVDESS data set was used in the experiments. A model is proposed that uses a 306-component suprasegmental feature vector as input to a support vector machine classifier. Model quality was assessed using unweighted average recall (UAR). The use of linear, polynomial and radial basis functions as a kernel in a classifier based on the support vector machine is considered. The  use of different signal analysis frame sizes (from 23 to 341 ms) at the stage of extracting mel-frequency cepstral coefficients was investigated. The research results revealed significant accuracy of the resulting model (UAR = 48 %). The proposed approach shows potential for applications such as voice assistants, virtual agents, and mental health diagnostics.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>голосовой сигнал</kwd><kwd>мел-частотные кепстральные коэффициенты</kwd><kwd>извлечение аудиопризнаков</kwd><kwd>распознавание</kwd><kwd>машинное обучение</kwd></kwd-group><kwd-group xml:lang="en"><kwd>voice signal</kwd><kwd>mel-frequency cepstral coefficients</kwd><kwd>audio feature extraction</kwd><kwd>recognition</kwd><kwd>machine learning</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Issa D., Demirci F. M., Yazici A. (2020) Speech Emotion Recognition with Deep Convolutional Neural Networks. Biomedical Signal Processing and Control. 59.</mixed-citation><mixed-citation xml:lang="en">Issa D., Demirci F. M., Yazici A. (2020) Speech Emotion Recognition with Deep Convolutional Neural Networks. Biomedical Signal Processing and Control. 59.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martэínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21 (22), 1–29.</mixed-citation><mixed-citation xml:lang="en">Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martэínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21 (22), 1–29.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Xiao H., Li W., Zeng G., Wu Y., Xue J., Zhang J., [et al.] (2022) On-Road Driver Emotion Recognition Using Facial Expression. Appl. Sci. 12 .</mixed-citation><mixed-citation xml:lang="en">Xiao H., Li W., Zeng G., Wu Y., Xue J., Zhang J., [et al.] (2022) On-Road Driver Emotion Recognition Using Facial Expression. Appl. Sci. 12 .</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Sadok S., Leglaive S., Séguier R. (2023) A Vector Quantized Masked Autoencoder for Speech Emotion Recognition. arXiv preprint arXiv. 2304.</mixed-citation><mixed-citation xml:lang="en">Sadok S., Leglaive S., Séguier R. (2023) A Vector Quantized Masked Autoencoder for Speech Emotion Recognition. arXiv preprint arXiv. 2304.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Bhavan A., Chauhan P., Shah R. R. (2019) Bagged Support Vector Machines for Emotion Recognition from Speech. Knowledge-Based Systems . 184, 1–7.</mixed-citation><mixed-citation xml:lang="en">Bhavan A., Chauhan P., Shah R. R. (2019) Bagged Support Vector Machines for Emotion Recognition from Speech. Knowledge-Based Systems . 184, 1–7.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Baruah M., Banerjee B. (2022) Speech Emotion Recognition via Generation Using an Attention-Based Variational Recurrent Neural Network. Proc. Interspeech. 4710–4714.</mixed-citation><mixed-citation xml:lang="en">Baruah M., Banerjee B. (2022) Speech Emotion Recognition via Generation Using an Attention-Based Variational Recurrent Neural Network. Proc. Interspeech. 4710–4714.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Yu C., Tian Q., Cheng F., Zhang S. (2011) Speech Emotion Recognition Using Support Vector Machines. Advanced Research on Computer Science and Information Engineering. Communications in Computer and Information Science. 152.</mixed-citation><mixed-citation xml:lang="en">Yu C., Tian Q., Cheng F., Zhang S. (2011) Speech Emotion Recognition Using Support Vector Machines. Advanced Research on Computer Science and Information Engineering. Communications in Computer and Information Science. 152.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Huang X., Acero A., Hon H.-W., Foreword By-Reddy R. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR.</mixed-citation><mixed-citation xml:lang="en">Huang X., Acero A., Hon H.-W., Foreword By-Reddy R. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Hastie T., Tibshirani R., Friedman J. H., Friedman J. H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.</mixed-citation><mixed-citation xml:lang="en">Hastie T., Tibshirani R., Friedman J. H., Friedman J. H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">On C. K., Pandiyan P. M., Yaacob S., Saudi A. (2006) Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition. In 2006 International Conference on Computing &amp; Informatics. 1–5.</mixed-citation><mixed-citation xml:lang="en">On C. K., Pandiyan P. M., Yaacob S., Saudi A. (2006) Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition. In 2006 International Conference on Computing &amp; Informatics. 1–5.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Livingstone S. R., Russo F. A. (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English. PloS One. 13 (5).</mixed-citation><mixed-citation xml:lang="en">Livingstone S. R., Russo F. A. (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English. PloS One. 13 (5).</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21.</mixed-citation><mixed-citation xml:lang="en">Luna-Jiménez C., Griol D., Callejas Z., Kleinlein R., Montero J. M., Fernández-Martínez F. (2021) Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors. 21.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
