Preview

Doklady BGUIR

Advanced search

Experimental Studies on the Application of Data Balancing Methods in Classification Problems

https://doi.org/10.35596/1729-7648-2025-23-5-66-74

Abstract

This article examines methods for working with imbalanced data when building machine learning models for classification problems. Balancing methods are studied to determine their impact on the performance of classical and ensemble models. Five datasets of varying sizes and degrees of imbalance are selected and preprocessed. The impact of the imbalanced-learn library’s methods of increasing the smaller class and decreasing the larger class is studied, both when used separately and in combination. The optimal class ratio after balancing is determined (from 1:1 to 2:1, where the first number corresponds to the number of objects in the initially smaller class), and the impact of hyperparameter selection using Optuna is assessed. It is established that hyperparameter optimization does not compensate for the lack of data balancing, and the best model performance is achieved by using an integrated approach combining two different types of balancing methods, using an ensemble, and hyperparameter selection. The greatest impact on model quality was achieved by using a single balancing method in conjunction with ensemble modeling, so this combination is recommended for limited time and computational resources. Adding a larger class reduction method and hyperparameter tuning is advisable when resources are sufficient and model quality requirements are high.

About the Authors

M. M. Lukashevich
Belarusian State University
Belarus

Lukashevich Marina Mikhailovna, Cand. Sci. (Tech.), Associate Professor, Associate Professor at the Department of Information Management Systems, 

4, Nezavisimosti Ave., Minsk, 220030.

Tel.: +375 29 709-06-08.



K. Klitsunova
Belarusian State University
Belarus

Kateryna Klitsunova, Bachelor of Computer Science,

Minsk.



References

1. Kumar P., Bhatnagar R., Gaur K., Bhatnagar A. (2021) Classification of Imbalanced Data: Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering. IOP Publishing. 1099 (1).

2. Krawczyk B. (2016) Learning from Imbalanced Data: Open Challenges and Future Directions. Progress in Artificial Intelligence. 5 (4), 221–232.

3. Branco P., Torgo L., Ribeiro R. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys (CSUR). 49 (2), 1–50.

4. Sun Y., Wong A. K. C., Kamel M. S. (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence. 23 (4), 687–719.

5. Kim M., Hwang K. B. (2022) An Empirical Evaluation of Sampling Methods for the Classification of Imbalanced Data. PLoS One. 17 (7).

6. Dube L., Verster T. (2023) Enhancing Classification Performance in Imbalanced Datasets: A Comparative Analysis of Machine Learning Models. Data Science in Finance and Economics. 3 (4), 354–379.

7. Khan A., Chaudhari O., Chandra R. (2024) A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation. Expert Systems with Applications. 244.

8. Klitsunova K., Lukashevich M. M. (2025) Comparative Analysis of Data Balancing Methods. BIG DATA and Advanced Analytics, Collection of Scientific Articles of XI International Scientific and Practical Conference. Minsk, Belarusian State University of Informatics and Radioelectronics. 74–83 (in Russian).


Review

For citations:


Lukashevich M.M., Klitsunova K. Experimental Studies on the Application of Data Balancing Methods in Classification Problems. Doklady BGUIR. 2025;23(5):66-74. (In Russ.) https://doi.org/10.35596/1729-7648-2025-23-5-66-74

Views: 36


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)