Automatic recognition and representation of text in the form of audio stream

L. V. Serebryanaya; I. E. Lasy

doi:10.35596/1729-7648-2021-19-6-51-58

Automatic recognition and representation of text in the form of audio stream

L. V. Serebryanaya, I. E. Lasy

https://doi.org/10.35596/1729-7648-2021-19-6-51-58

Full Text:

PDF (Rus)

Generate QR code

Abstract

The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.

Keywords

artificial neural network model, audio stream, encoder and decoder, speech generation, spectrogram

About the Authors

L. V. Serebryanaya

Belarusian State University of Informatics and Radioelectronics
Belarus

Serebryanaya Liya V., PhD, Associate Professor, Associate Professor at the Information Technologies Software Department

220013, Minsk, P. Brovka str., 6

I. E. Lasy

Belarusian State University of Informatics and Radioelectronics
Belarus

Lasy Ilya E., Graduate of the Information Technologies Software Department

Minsk

References

1. Goldberg J. [Neural network methods in natural language processing]. Moscow: DMK-Press; 2019. (In Russ)

2. Gudfellou Ya., Bendzhio I., Kurvill' A. [Glubokoye obucheniye = Deep Learning]. Мoscow: DМK-Press; 2017. (In Russ)

3. Nikolenko S.I., Kadurin A.A., Arkhangel'skaya Ye.O. [Deep Learning ]. St. Petersburg: Piter; 2018. (In Russ)

4. Trask E. [Grokay deep learning]. St. Petersburg: Piter; 2019. (In Russ)

5. Scholle F. [Deep Learning in Python]. St. Petersburg: Piter; 2018. (In Russ)

6. Elbon K. [Machine learning in Python. Collection of recipes]. St. Petersburg: BHV; 2019. (In Russ)

7. Mele A. [Django 2 in examples]. Moscow: DMK-Press; 2019. (In Russ)

8. Reza BZ, Ramsundar B. [TensorFlow for deep learning]. St. Petersburg: BHV; 2019. (In Russ)

9. Ganegedara T. [Natural Language Processing with TensorFlow]. Moscow: DMK-Press; 2019. (In Russ)

Review

For citations:

Serebryanaya L.V., Lasy I.E. Automatic recognition and representation of text in the form of audio stream. Doklady BGUIR. 2021;19(6):51-58. (In Russ.) https://doi.org/10.35596/1729-7648-2021-19-6-51-58

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Doklady BGUIR

Automatic recognition and representation of text in the form of audio stream

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy