<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.35596/1729-7648-2025-23-5-66-74</article-id><article-id custom-type="elpub" pub-id-type="custom">bsuir-4209</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Экспериментальные исследования по применению методов балансировки данных в задачах классификации</article-title><trans-title-group xml:lang="en"><trans-title>Experimental Studies on the Application of Data Balancing Methods in Classification Problems</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Лукашевич</surname><given-names>М. М.</given-names></name><name name-style="western" xml:lang="en"><surname>Lukashevich</surname><given-names>M. M.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Лукашевич Марина Михайловна, канд. техн. наук, доц., доц. каф. информационных систем управления,</p><p>220030, Минск, просп. Независимости, 4.</p><p>Тел.: +375 29 709-06-08.</p></bio><bio xml:lang="en"><p>Lukashevich Marina Mikhailovna, Cand. Sci. (Tech.), Associate Professor, Associate Professor at the Department of Information Management Systems, </p><p>4, Nezavisimosti Ave., Minsk, 220030.</p><p>Tel.: +375 29 709-06-08.</p></bio><email xlink:type="simple">lukashevichmm@bsu.by</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Клицунова</surname><given-names>Е.</given-names></name><name name-style="western" xml:lang="en"><surname>Klitsunova</surname><given-names>K.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Клицунова Е., бакалавр информатики,</p><p>Минск.</p></bio><bio xml:lang="en"><p>Kateryna Klitsunova, Bachelor of Computer Science,</p><p>Minsk.</p></bio><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Белорусский государственный университет</institution></aff><aff xml:lang="en"><institution>Belarusian State University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>29</day><month>10</month><year>2025</year></pub-date><volume>23</volume><issue>5</issue><fpage>66</fpage><lpage>74</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Лукашевич М.М., Клицунова Е., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Лукашевич М.М., Клицунова Е.</copyright-holder><copyright-holder xml:lang="en">Lukashevich M.M., Klitsunova K.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/4209">https://doklady.bsuir.by/jour/article/view/4209</self-uri><abstract><p>Рассмотрены методы работы с несбалансированными данными при построении моделей машинного обучения для решения задачи классификации. Проведено исследование методов балансировки с определением их влияния на эффективность классических и ансамблевых моделей. Выбраны пять наборов данных различного объема и степени дисбаланса, выполнена их предобработка. Изучено влияние реализованных в библиотеке imbalanced-learn методов увеличения меньшего класса, уменьшения большего класса как при изолированном применении, так и при их комбинации. Определен диапазон оптимального соотношения классов после балансировки (от 1:1 до 2:1, где первое число соотносится с количеством объектов изначально меньшего класса) и оценено влияние подбора гиперпараметров при помощи Optuna. Установлено, что оптимизация гиперпараметров не компенсирует отсутствие балансировки данных, а наилучшие показатели качества моделей достигаются применением комплексного подхода с комбинацией двух методов балансировок различных типов, использованием ансамбля и подбором гиперпараметров. Наибольший вклад в качество моделей дало применение одного метода балансировки вместе с использованием ансамбля, поэтому такую комбинацию можно рекомендовать в условиях ограниченных временных и вычислительных ресурсов. Добавление метода уменьшения большего класса и подбор гиперпараметров целесообразно проводить при достаточном количестве ресурсов и высоких требованиях к качеству модели. </p></abstract><trans-abstract xml:lang="en"><p>This article examines methods for working with imbalanced data when building machine learning models for classification problems. Balancing methods are studied to determine their impact on the performance of classical and ensemble models. Five datasets of varying sizes and degrees of imbalance are selected and preprocessed. The impact of the imbalanced-learn library’s methods of increasing the smaller class and decreasing the larger class is studied, both when used separately and in combination. The optimal class ratio after balancing is determined (from 1:1 to 2:1, where the first number corresponds to the number of objects in the initially smaller class), and the impact of hyperparameter selection using Optuna is assessed. It is established that hyperparameter optimization does not compensate for the lack of data balancing, and the best model performance is achieved by using an integrated approach combining two different types of balancing methods, using an ensemble, and hyperparameter selection. The greatest impact on model quality was achieved by using a single balancing method in conjunction with ensemble modeling, so this combination is recommended for limited time and computational resources. Adding a larger class reduction method and hyperparameter tuning is advisable when resources are sufficient and model quality requirements are high.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>классификация</kwd><kwd>машинное обучение</kwd><kwd>несбалансированные данные</kwd><kwd>балансировка данных</kwd><kwd>сравнительный анализ</kwd><kwd>классические модели</kwd><kwd>ансамбли</kwd><kwd>оптимизация гиперпараметров</kwd></kwd-group><kwd-group xml:lang="en"><kwd>classification</kwd><kwd>machine learning</kwd><kwd>imbalanced data</kwd><kwd>data balancing</kwd><kwd>comparative analysis</kwd><kwd>classical models</kwd><kwd>ensembles</kwd><kwd>hyperparameter tuning</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Classification of Imbalanced Data: Review of Methods and Applications / P. Kumar [et al.] // IOP Conference Series: Materials Science and Engineering. IOP Publishing. 2021. Vol. 1099, No 1.</mixed-citation><mixed-citation xml:lang="en">Kumar P., Bhatnagar R., Gaur K., Bhatnagar A. (2021) Classification of Imbalanced Data: Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering. IOP Publishing. 1099 (1).</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Krawczyk, B. Learning from Imbalanced Data: Open Challenges and Future Directions / B. Krawczyk // Progress in Artificial Intelligence. 2016. Vol. 5, No 4. P. 221–232.</mixed-citation><mixed-citation xml:lang="en">Krawczyk B. (2016) Learning from Imbalanced Data: Open Challenges and Future Directions. Progress in Artificial Intelligence. 5 (4), 221–232.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Branco, P. A Survey of Predictive Modeling on Imbalanced Domains / P. Branco, L. Torgo, R. Ribeiro // ACM Computing Surveys (CSUR). 2016. Vol. 49, No 2. P. 1–50.</mixed-citation><mixed-citation xml:lang="en">Branco P., Torgo L., Ribeiro R. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys (CSUR). 49 (2), 1–50.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Sun, Y. Classification of Imbalanced Data: A Review / Y. Sun, A. K. C. Wong, M. S. Kamel // International Journal of Pattern Recognition and Artificial Intelligence. 2009. Vol. 23, No 4. P. 687–719.</mixed-citation><mixed-citation xml:lang="en">Sun Y., Wong A. K. C., Kamel M. S. (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence. 23 (4), 687–719.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Kim, M. An Empirical Evaluation of Sampling Methods for the Classification of Imbalanced Data / M. Kim, K. B. Hwang // PLoS One. 2022. Vol. 17, No 7.</mixed-citation><mixed-citation xml:lang="en">Kim M., Hwang K. B. (2022) An Empirical Evaluation of Sampling Methods for the Classification of Imbalanced Data. PLoS One. 17 (7).</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Dube, L. Enhancing Classification Performance in Imbalanced Datasets: A Comparative Analysis of Machine Learning Models / L. Dube, T. Verster // Data Science in Finance and Economics. 2023. Vol. 3, No 4. P. 354–379.</mixed-citation><mixed-citation xml:lang="en">Dube L., Verster T. (2023) Enhancing Classification Performance in Imbalanced Datasets: A Comparative Analysis of Machine Learning Models. Data Science in Finance and Economics. 3 (4), 354–379.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Khan, A. A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation / A. Khan, O. Chaudhari, R. Chandra // Expert Systems with Applications. 2024. Vol. 244.</mixed-citation><mixed-citation xml:lang="en">Khan A., Chaudhari O., Chandra R. (2024) A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation. Expert Systems with Applications. 244.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Клицунова, Е. Сравнительный анализ методов балансировки данных для задач машинного обучения / Е. Клицунова, М. М. Лукашевич // BIG DATA и анализ высокого уровня: сб. науч. ст. XI Междунар. науч.-практ. конф. Минск: Белор. гос. ун-т информ. и радиоэлек., 2025. С. 74–83.</mixed-citation><mixed-citation xml:lang="en">Klitsunova K., Lukashevich M. M. (2025) Comparative Analysis of Data Balancing Methods. BIG DATA and Advanced Analytics, Collection of Scientific Articles of XI International Scientific and Practical Conference. Minsk, Belarusian State University of Informatics and Radioelectronics. 74–83 (in Russian).</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
