Experimental Studies on the Application of Data Balancing Methods in Classification Problems

M. M. Lukashevich; K. Klitsunova

doi:10.35596/1729-7648-2025-23-5-66-74

Experimental Studies on the Application of Data Balancing Methods in Classification Problems

M. M. Lukashevich, K. Klitsunova

https://doi.org/10.35596/1729-7648-2025-23-5-66-74

Full Text:

PDF (Rus)

Generate QR code

Abstract

This article examines methods for working with imbalanced data when building machine learning models for classification problems. Balancing methods are studied to determine their impact on the performance of classical and ensemble models. Five datasets of varying sizes and degrees of imbalance are selected and preprocessed. The impact of the imbalanced-learn library’s methods of increasing the smaller class and decreasing the larger class is studied, both when used separately and in combination. The optimal class ratio after balancing is determined (from 1:1 to 2:1, where the first number corresponds to the number of objects in the initially smaller class), and the impact of hyperparameter selection using Optuna is assessed. It is established that hyperparameter optimization does not compensate for the lack of data balancing, and the best model performance is achieved by using an integrated approach combining two different types of balancing methods, using an ensemble, and hyperparameter selection. The greatest impact on model quality was achieved by using a single balancing method in conjunction with ensemble modeling, so this combination is recommended for limited time and computational resources. Adding a larger class reduction method and hyperparameter tuning is advisable when resources are sufficient and model quality requirements are high.

Keywords

classification, machine learning, imbalanced data, data balancing, comparative analysis, classical models, ensembles, hyperparameter tuning

About the Authors

M. M. Lukashevich

Belarusian State University
Belarus

Lukashevich Marina Mikhailovna, Cand. Sci. (Tech.), Associate Professor, Associate Professor at the Department of Information Management Systems,

4, Nezavisimosti Ave., Minsk, 220030.

Tel.: +375 29 709-06-08.

K. Klitsunova

Belarusian State University
Belarus

Kateryna Klitsunova, Bachelor of Computer Science,

Minsk.

References

1. Kumar P., Bhatnagar R., Gaur K., Bhatnagar A. (2021) Classification of Imbalanced Data: Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering. IOP Publishing. 1099 (1).

2. Krawczyk B. (2016) Learning from Imbalanced Data: Open Challenges and Future Directions. Progress in Artificial Intelligence. 5 (4), 221–232.

3. Branco P., Torgo L., Ribeiro R. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys (CSUR). 49 (2), 1–50.

4. Sun Y., Wong A. K. C., Kamel M. S. (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence. 23 (4), 687–719.

5. Kim M., Hwang K. B. (2022) An Empirical Evaluation of Sampling Methods for the Classification of Imbalanced Data. PLoS One. 17 (7).

6. Dube L., Verster T. (2023) Enhancing Classification Performance in Imbalanced Datasets: A Comparative Analysis of Machine Learning Models. Data Science in Finance and Economics. 3 (4), 354–379.

7. Khan A., Chaudhari O., Chandra R. (2024) A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation. Expert Systems with Applications. 244.

8. Klitsunova K., Lukashevich M. M. (2025) Comparative Analysis of Data Balancing Methods. BIG DATA and Advanced Analytics, Collection of Scientific Articles of XI International Scientific and Practical Conference. Minsk, Belarusian State University of Informatics and Radioelectronics. 74–83 (in Russian).

Review

For citations:

Lukashevich M.M., Klitsunova K. Experimental Studies on the Application of Data Balancing Methods in Classification Problems. Doklady BGUIR. 2025;23(5):66-74. (In Russ.) https://doi.org/10.35596/1729-7648-2025-23-5-66-74

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Doklady BGUIR

Experimental Studies on the Application of Data Balancing Methods in Classification Problems

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy