<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.35596/1729-7648-2022-20-2-46-52</article-id><article-id custom-type="elpub" pub-id-type="custom">bsuir-3310</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ЭЛЕКТРОНИКА, РАДИОФИЗИКА, РАДИОТЕХНИКА, ИНФОРМАТИКА</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>ELECTRONICS, RADIOPHYSICS, RADIOENGINEERING, INFORMATICS</subject></subj-group></article-categories><title-group><article-title>Вложенное преобразование с сохранением семантики исходных данных</article-title><trans-title-group xml:lang="en"><trans-title>Embedding With Preservation of Semantics of the Original Data</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Ваткин</surname><given-names>М. Е.</given-names></name><name name-style="western" xml:lang="en"><surname>Vatkin</surname><given-names>M. E.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Ваткин Максим Евгеньевич - к.т.н., главный специалист по данным</p><p>220005, г. Минск, Бульвар Мулявина 6</p><p>тел. +375-29-278-13-78</p></bio><bio xml:lang="en"><p>Vatkin Maksim Evgenyevich - Cand. of Sci., Chief Data Scientist</p><p>220005, Minsk, Mulyavina blv., 6</p><p>tel. +375-29-278-13-78</p></bio><email xlink:type="simple">mevatkin@bps-sberbank.by</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Воробей</surname><given-names>Д. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Vorobey</surname><given-names>D. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Специалист по данным</p><p>220005, г. Минск, Бульвар Мулявина 6</p><p>тел. +375-29-278-13-78</p></bio><bio xml:lang="en"><p>Data Scientist</p><p>220005, Minsk, Mulyavina blv., 6</p><p>tel. +375-29-278-13-78</p></bio><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Яковлев</surname><given-names>М. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Yakovlev</surname><given-names>M. V.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Специалист по данным</p><p>220005, г. Минск, Бульвар Мулявина 6</p><p>тел. +375-29-278-13-78</p></bio><bio xml:lang="en"><p>Data Scientist</p><p>220005, Minsk, Mulyavina blv., 6</p><p>tel. +375-29-278-13-78</p></bio><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Кривова</surname><given-names>М. Г.</given-names></name><name name-style="western" xml:lang="en"><surname>Krivova</surname><given-names>M. G.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Специалист по данным</p><p>220005, г. Минск, Бульвар Мулявина 6</p><p>тел. +375-29-278-13-78</p></bio><bio xml:lang="en"><p>Data Scientist</p><p>220005, Minsk, Mulyavina blv., 6</p><p>tel. +375-29-278-13-78</p></bio><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>ОАО «Сбер Банк»</institution></aff><aff xml:lang="en"><institution>“Sber Bank”</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2022</year></pub-date><pub-date pub-type="epub"><day>05</day><month>04</month><year>2022</year></pub-date><volume>20</volume><issue>2</issue><fpage>46</fpage><lpage>52</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Ваткин М.Е., Воробей Д.А., Яковлев М.В., Кривова М.Г., 2022</copyright-statement><copyright-year>2022</copyright-year><copyright-holder xml:lang="ru">Ваткин М.Е., Воробей Д.А., Яковлев М.В., Кривова М.Г.</copyright-holder><copyright-holder xml:lang="en">Vatkin M.E., Vorobey D.A., Yakovlev M.V., Krivova M.G.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/3310">https://doklady.bsuir.by/jour/article/view/3310</self-uri><abstract><p>В современном мире данные, используемые для описания объектов, часто представлены в виде разряженных векторов с большим количеством признаков. Работа с такими данными является вычислительно неэффективной, что зачастую приводит к переобучению при моделировании. Поэтому используются алгоритмы понижения размерности данных, одними из которых являются автокодировщики. В статье предложен новый подход для оценки свойств полученных векторов меньшей размерности, а также основанная на этом подходе функция потерь. Идея предложенной функции потерь состоит в вычислении качества сохранения семантической структуры в пространстве вложений и добавлении этой метрики в функцию потерь, что позволяет сохранить отношения объектов в пространстве вложений и таким образом сохранить больше полезной информации об объектах. Полученные результаты показывают, что использование комбинации среднеквадратичной функции потерь вместе с предложенной позволяет улучшить качество полученных вложений.</p></abstract><trans-abstract xml:lang="en"><p>In the modern world, the data used to describe objects is often presented as sparse vectors with a large number of features. Working with them can be computationally inefficient, and often leads to overfitting; therefore, the data dimension reduction algorithms are used, one of which is auto encoders. In this article, we propose a new approach for evaluating the properties of the obtained vectors of lower dimension, as well as a loss function based on this approach. The idea of the suggested loss function is to evaluate the quality of preserving the semantic structure in the embedding space, and to add that metric to loss function to save object relations in the embedding space and thus save more useful information about objects. The results obtained show that using a combination of the mean squared loss function together with the suggested one allows to improve the quality of the embeddings.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>данные</kwd><kwd>вложение</kwd><kwd>вектор</kwd><kwd>функция потерь</kwd><kwd>линейное пространство</kwd><kwd>автокодировщик</kwd><kwd>машинное обучение</kwd></kwd-group><kwd-group xml:lang="en"><kwd>data</kwd><kwd>embedding</kwd><kwd>vector</kwd><kwd>loss function</kwd><kwd>linear space</kwd><kwd>autoencoder</kwd><kwd>machine learning</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Gupta P., Banchs R.E., and Rosso P. Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities. Neurocomputing. 2016;175:1001–1008.</mixed-citation><mixed-citation xml:lang="en">Gupta P., Banchs R.E., and Rosso P. Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities. Neurocomputing. 2016;175:1001–1008.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Mikolov Т., Sutskever I., Chen К., Corrado G.S., Dean J. Distributed representations of words and phrases and their compositionality. NIPS. 2013:3111–3119.</mixed-citation><mixed-citation xml:lang="en">Mikolov Т., Sutskever I., Chen К., Corrado G.S., Dean J. Distributed representations of words and phrases and their compositionality. NIPS. 2013:3111–3119.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Bourlard H., Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 1988;59(September (4)):291-294. DOI: 10.1007/bf00332918.</mixed-citation><mixed-citation xml:lang="en">Bourlard H., Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 1988;59(September (4)):291-294. DOI: 10.1007/bf00332918.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Al-Shabi M.A. Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets. JAMCS. 2019;33(5):1-16.</mixed-citation><mixed-citation xml:lang="en">Al-Shabi M.A. Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets. JAMCS. 2019;33(5):1-16.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Saito T., Rehmsmeier M. The Precision-Recall Plot is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS One. 2015;10(3).</mixed-citation><mixed-citation xml:lang="en">Saito T., Rehmsmeier M. The Precision-Recall Plot is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS One. 2015;10(3).</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Husejinović А. Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences. 2020;8(1):1-5.</mixed-citation><mixed-citation xml:lang="en">Husejinović А. Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences. 2020;8(1):1-5.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
