Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales

M. I. Vashkevich; D. S. Likhachov; E. S. Azarov

doi:10.35596/1729-7648-2022-20-1-73-82

Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales

M. I. Vashkevich, D. S. Likhachov, E. S. Azarov

https://doi.org/10.35596/1729-7648-2022-20-1-73-82

Full Text:

PDF (Rus)

Generate QR code

Abstract

The paper describes an approach to design a system for analyzing and classification of a voice signal based on perturbation parameters and cepstral representation. Two variants of the cepstral representation of the voice signal are considered: based on mel-frequency cepstral coefficients (MFCC) and based on bark-frequency cepstral coefficients (BFCC). The work used a generally accepted approach to calculating the MFCC based on the time-frequency analysis by the method of discrete Fourier transform (DFT) with summation of energy in subbands. This method approximates the frequency resolution of human hearing, but has a fixed temporal resolution. As an alternative, a variant of the cepstral representation based on the BFCC has been proposed. When calculating the BFCC, a warped DFT-modulated filter bank was used, which approximates the frequency and temporal resolution of hearing. The aim of the work was to compare the effectiveness of the use of features based on the MFCC and BFCC for the designing systems for the analysis and classification of the voice signal. The results of the experiment showed that in the case when using acoustic features based on the MFCC, it is possible to obtain a voice classification system with an average recall of 80.6 %, and in the case when using features based on the BFCC, this metric is 83.7 %. With the addition of the set of MFCC features with perturbation parameters of the voice, the average recall of the classification increased to 94.1 %, with a similar addition to the set of BFCC features, the average recall of the classification increased up to 96.7 %.

Keywords

voice signal, MFCC, BFCC, vocal pathology

About the Authors

M. I. Vashkevich

Belarusian State University of Informatics and Radioelectronics
Belarus

Vashkevich Maksim Iosifovich - Cand. of Sci., Associate Professor at the Computer Engineering Department.

220013, Minsk, P. Brovki st., 6, tel. +375-17-293-84-78

D. S. Likhachov

Belarusian State University of Informatics and Radioelectronics
Belarus

Cand. of Sci., Associate Professor at the Computer Engineering Department.

Minsk

E. S. Azarov

Belarusian State University of Informatics and Radioelectronics
Belarus

Dr. of Sci., Head of the Computer Engineering Department.

Minsk

References

1. Harar P., Galaz Z., Alonso-Hernandez J.B., Mekyska J., Burget R., Smekal Z. Towards robust voice pathology detection. Neural Computing and Applications. 2020;32(20): 15747-15757.

2. Likhachov D., Vashkevich M., Azarov E., Malhina K., Rushkevich Y. A mobile application for detection of amyotrophic lateral sclerosis via voice analysis. International Conference on Speech and Computer, 2021. Springer, Cham; 2021:372-383.

3. Benba A., Jilbab A., Hammouch A. Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2016;24(10):1100–1108.

4. Tsanas A., Little M.A., McSharry P.E., Spielman J., Ramig L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Transactions on Biomedical Engineering. 2012;59(5):1264-1271.

5. Vashkevich M., Rushkevich Y. Classification of ALS patients based on acoustic analysis of sustained vowel phonations. Biomedical Signal Processing and Control. 2021;65:1-14.

6. Huang X., Acero A., Hon H.-W. Spoken language processing: A guide to theory, algorithm, and system development. Prentice hall PTR; 2001: 980.

7. Bielawski K., Petrovsky A. Proposition of minimum bands multirate noise reduction system which exploits properties of the human auditory system and all-pass transformed filter bank. IEEE Workshop Signal Processing. 2001:65-70.

8. Gareth J., Daniela W., Trevor H., Robert T. An introduction to statistical learning with applications in R. NewYork: Springer; 2013.

9. Vashkevich M., Petrovsky A. Rushkevich Y. Bulbar ALS detection based on analysis of voice perturbation and vibrato. IEEE International Conference on Signal Processing: Algorithms, Architectures, Arrangements, and Applications. 2019: 267-272.

Review

For citations:

Vashkevich M.I., Likhachov D.S., Azarov E.S. Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales. Doklady BGUIR. 2022;20(1):73-82. (In Russ.) https://doi.org/10.35596/1729-7648-2022-20-1-73-82

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Doklady BGUIR

Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy