<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">bsuir-982</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>СИСТЕМА КЛАССИФИКАЦИИ ЗВУКОВ ОКРУЖАЮЩЕЙ СРЕДЫ</article-title><trans-title-group xml:lang="en"><trans-title>Environmental sound classification system</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Жук</surname><given-names>И. Н.</given-names></name><name name-style="western" xml:lang="en"><surname>Zhuk</surname><given-names>I. N.</given-names></name></name-alternatives><email xlink:type="simple">ivan.nikolaevich.zhuk@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Белорусский государственный университет информатики и радиоэлектроники</institution></aff><aff xml:lang="en"><institution>Belarusian state university of informatics and radioelectronics</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2018</year></pub-date><pub-date pub-type="epub"><day>03</day><month>06</month><year>2019</year></pub-date><volume>0</volume><issue>3</issue><fpage>54</fpage><lpage>58</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Жук И.Н., 2019</copyright-statement><copyright-year>2019</copyright-year><copyright-holder xml:lang="ru">Жук И.Н.</copyright-holder><copyright-holder xml:lang="en">Zhuk I.N.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/982">https://doklady.bsuir.by/jour/article/view/982</self-uri><abstract><p>В статье представлена система классификации звуков окружающей среды и результаты сравнения производительности с другими системами на звуковой базе ESC 10. В представленной системе формирование признаков звукового сигнала осуществляется с помощью модели внутреннего уха и импульсов слухового нерва. Классификация звуков осуществляется с помощью различных конфигураций сверточных нейронных сетей. Доля правильных ответов классификации значительно выше результатов оригинальной статьи звуковой базы ESC 10.</p></abstract><trans-abstract xml:lang="en"><p>This paper presents environmental sound classification system and performance comparison on ESC 10 dataset. The feature extraction method includes cochlea and auditory nerve models. Classification model includes classic convolutional neuron network architectures. Experiments based on different architectures of convolutional neural networks and proposed feature extraction method. The model outperforms baseline implementations and achieves results comparable to other state-of-the-art approaches.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>внутреннее ухо</kwd><kwd>формирование признаков</kwd><kwd>классификация звуков</kwd><kwd>сверточные нейронные сети</kwd></kwd-group><kwd-group xml:lang="en"><kwd>ESC 10</kwd><kwd>cochlea</kwd><kwd>auditory nerve spikes</kwd><kwd>feature extraction</kwd><kwd>sound classification</kwd><kwd>convolutional neuron networks</kwd><kwd>ESC 10</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Brian hears: online auditory processing using vectorization over channels / B. Fontaine [et al.] // Front. Neuroinform. 5:9. 2011. doi: 10.3389/fninf.2011.00009.</mixed-citation><mixed-citation xml:lang="en">Brian hears: online auditory processing using vectorization over channels / B. Fontaine [et al.] // Front. Neuroinform. 5:9. 2011. doi: 10.3389/fninf.2011.00009.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Palaz D., Magimai M. Convolutional Neural Networks-based Continuous Speech Recognition using Raw Speech Signal. Doss, Ronan Collobert. Idiap-RR-18-2014.</mixed-citation><mixed-citation xml:lang="en">Palaz D., Magimai M. Convolutional Neural Networks-based Continuous Speech Recognition using Raw Speech Signal. Doss, Ronan Collobert. Idiap-RR-18-2014.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI/ D. Povey [et al.] // Proc. Interspeech. 2016. P. 2751-2755.</mixed-citation><mixed-citation xml:lang="en">Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI/ D. Povey [et al.] // Proc. Interspeech. 2016. P. 2751-2755.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Speaker adaptation of neural network acoustic models using i-vectors / G. Saon [et al.] // in ASRU. 2013. P. 55-59.</mixed-citation><mixed-citation xml:lang="en">Speaker adaptation of neural network acoustic models using i-vectors / G. Saon [et al.] // in ASRU. 2013. P. 55-59.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Deep Speech 2: End-to-End Speech Recognition in English and Mandarin / Dario A. [et al.] // arXiv:1512.02595 [cs.CL]. December 2015.</mixed-citation><mixed-citation xml:lang="en">Deep Speech 2: End-to-End Speech Recognition in English and Mandarin / Dario A. [et al.] // arXiv:1512.02595 [cs.CL]. December 2015.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Goodman D.F., Brette R. The Brian simulator // Front. Neurosci. 3,2:192-197. doi: 10.3389/neuro.01.026.2009.</mixed-citation><mixed-citation xml:lang="en">Goodman D.F., Brette R. The Brian simulator // Front. Neurosci. 3,2:192-197. doi: 10.3389/neuro.01.026.2009.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Equation-oriented specification of neural models for simulations / Stimberg M. [et al.] // Frontiers Neuroinf. 2014. doi:10.3389/fninf.2014.00006.</mixed-citation><mixed-citation xml:lang="en">Equation-oriented specification of neural models for simulations / Stimberg M. [et al.] // Frontiers Neuroinf. 2014. doi:10.3389/fninf.2014.00006.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">An auditory-based feature for robust speech recognition / Y. Shao [et al.] // Acoustics, Speech and Signal Processing. April 2009. P. 4625-4628.</mixed-citation><mixed-citation xml:lang="en">An auditory-based feature for robust speech recognition / Y. Shao [et al.] // Acoustics, Speech and Signal Processing. April 2009. P. 4625-4628.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Automatic Speech Recognition with Neural Spike Trains / M.H. Holmberg [et al.] // Interspeech. Lisbon, Portugal, September 4-8, 2006.</mixed-citation><mixed-citation xml:lang="en">Automatic Speech Recognition with Neural Spike Trains / M.H. Holmberg [et al.] // Interspeech. Lisbon, Portugal, September 4-8, 2006.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Ivanov A.V., Likhachov D.S., Petrovsky A.A. Spiking neuron auditory model for speech processing systems // 9th International Workshop on Systems, Signals and Image Processing IWSSIP. Manchester, United Kingdom, 2002.</mixed-citation><mixed-citation xml:lang="en">Ivanov A.V., Likhachov D.S., Petrovsky A.A. Spiking neuron auditory model for speech processing systems // 9th International Workshop on Systems, Signals and Image Processing IWSSIP. Manchester, United Kingdom, 2002.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Gerstner W., Kistler W. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.</mixed-citation><mixed-citation xml:lang="en">Gerstner W., Kistler W. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Piczak K.J. ESC: Dataset for Environmental Sound Classification // Proceedings of the 23rd ACM international conference on Multimedia. 2015. P. 1015-1018.</mixed-citation><mixed-citation xml:lang="en">Piczak K.J. ESC: Dataset for Environmental Sound Classification // Proceedings of the 23rd ACM international conference on Multimedia. 2015. P. 1015-1018.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">A real-time environmental sound recognition system for the Android OS / Pillos A. [et al.] // Detection and Classification of Acoustic Scenes and Events. 2016.</mixed-citation><mixed-citation xml:lang="en">A real-time environmental sound recognition system for the Android OS / Pillos A. [et al.] // Detection and Classification of Acoustic Scenes and Events. 2016.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Matthew D.Z. ADADELTA: An Adaptive Learning Rate Method. arXiv:1212.5701v1 [cs.LG]. December 2012.</mixed-citation><mixed-citation xml:lang="en">Matthew D.Z. ADADELTA: An Adaptive Learning Rate Method. arXiv:1212.5701v1 [cs.LG]. December 2012.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Breiman L. Machine Learning // Kluwer Academic Publishers, 45: 5. 2001. doi.org/10.1023/A:1010933404324.</mixed-citation><mixed-citation xml:lang="en">Breiman L. Machine Learning // Kluwer Academic Publishers, 45: 5. 2001. doi.org/10.1023/A:1010933404324.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
