References

bsuir

Доклады БГУИР

Doklady BGUIR

1729-76482708-0382

БГУИР

bsuir-982

Research Article

Статьи

СИСТЕМА КЛАССИФИКАЦИИ ЗВУКОВ ОКРУЖАЮЩЕЙ СРЕДЫ

Environmental sound classification system

Жук

И. Н.

Zhuk

I. N.

ivan.nikolaevich.zhuk@gmail.com

Белорусский государственный университет информатики и радиоэлектроникиBelarusian state university of informatics and radioelectronics

2018

03062019

035458

2019

Жук И.Н.

Zhuk I.N.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://doklady.bsuir.by/jour/article/view/982

В статье представлена система классификации звуков окружающей среды и результаты сравнения производительности с другими системами на звуковой базе ESC 10. В представленной системе формирование признаков звукового сигнала осуществляется с помощью модели внутреннего уха и импульсов слухового нерва. Классификация звуков осуществляется с помощью различных конфигураций сверточных нейронных сетей. Доля правильных ответов классификации значительно выше результатов оригинальной статьи звуковой базы ESC 10.

This paper presents environmental sound classification system and performance comparison on ESC 10 dataset. The feature extraction method includes cochlea and auditory nerve models. Classification model includes classic convolutional neuron network architectures. Experiments based on different architectures of convolutional neural networks and proposed feature extraction method. The model outperforms baseline implementations and achieves results comparable to other state-of-the-art approaches.

внутреннее ухоформирование признаковклассификация звуковсверточные нейронные сети

ESC 10cochleaauditory nerve spikesfeature extractionsound classificationconvolutional neuron networksESC 10

References1

Brian hears: online auditory processing using vectorization over channels / B. Fontaine [et al.] // Front. Neuroinform. 5:9. 2011. doi: 10.3389/fninf.2011.00009.

Palaz D., Magimai M. Convolutional Neural Networks-based Continuous Speech Recognition using Raw Speech Signal. Doss, Ronan Collobert. Idiap-RR-18-2014.

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI/ D. Povey [et al.] // Proc. Interspeech. 2016. P. 2751-2755.

Speaker adaptation of neural network acoustic models using i-vectors / G. Saon [et al.] // in ASRU. 2013. P. 55-59.

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin / Dario A. [et al.] // arXiv:1512.02595 [cs.CL]. December 2015.

Goodman D.F., Brette R. The Brian simulator // Front. Neurosci. 3,2:192-197. doi: 10.3389/neuro.01.026.2009.

Equation-oriented specification of neural models for simulations / Stimberg M. [et al.] // Frontiers Neuroinf. 2014. doi:10.3389/fninf.2014.00006.

An auditory-based feature for robust speech recognition / Y. Shao [et al.] // Acoustics, Speech and Signal Processing. April 2009. P. 4625-4628.

Automatic Speech Recognition with Neural Spike Trains / M.H. Holmberg [et al.] // Interspeech. Lisbon, Portugal, September 4-8, 2006.

Ivanov A.V., Likhachov D.S., Petrovsky A.A. Spiking neuron auditory model for speech processing systems // 9th International Workshop on Systems, Signals and Image Processing IWSSIP. Manchester, United Kingdom, 2002.

Gerstner W., Kistler W. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.

Piczak K.J. ESC: Dataset for Environmental Sound Classification // Proceedings of the 23rd ACM international conference on Multimedia. 2015. P. 1015-1018.

A real-time environmental sound recognition system for the Android OS / Pillos A. [et al.] // Detection and Classification of Acoustic Scenes and Events. 2016.

Matthew D.Z. ADADELTA: An Adaptive Learning Rate Method. arXiv:1212.5701v1 [cs.LG]. December 2012.

Breiman L. Machine Learning // Kluwer Academic Publishers, 45: 5. 2001. doi.org/10.1023/A:1010933404324.

The authors declare that there are no conflicts of interest present.