Preview

Doklady BGUIR

Advanced search

Comparison of Methods for Assessing the Semantic Similarity of Text Fragments

https://doi.org/10.35596/1729-7648-2026-24-2-85-91

Abstract

With the rapid growth of text data, there is a need for methods capable of effectively comparing text fragments by meaning, including cases of paraphrasing, synonymization, and sentence restructuring. One of the pressing challenges is comparing the results of semantic comparison methods based on various models with the human perception of semantic similarity. This article discusses an expert method for assessing the semantic similarity of text fragments based on the assessments of survey participants. The method consists of creating an interpretable semantic similarity scale derived from human perception of text content and used to analyze the consistency of various methods. To develop a “human” assessment, a survey of 138 participants was conducted. A comparative analysis revealed that various semantic similarity assessment methods demonstrate varying degrees of consistency with the human perception of text semantic similarity.

About the Authors

K. Krez
Belarusian State University of Informatics and Radioelectronics
Belarus

Krez Karina, Postgraduate, Assistant at the Department of Information and Computer Systems Design

220013, Minsk, P. Brovki St., 6

Тel.: +375 29 952-75-56



E. Shneiderov
Belarusian State University of Informatics and Radioelectronics
Belarus

Cand. Sci. (Tech.), Associate Professor at the Department of Information and Computer Systems Design

Minsk



P. Shish
Belarusian State University of Informatics and Radioelectronics
Belarus

Student

Minsk



E. Kondratenko
Belarusian State University of Informatics and Radioelectronics
Belarus

Student

Minsk



References

1. Devlin J., Chang M.-W., Lee K., Toutanova K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). 4171–4186. DOI: 10.18653/v1/N19-1423.

2. Reimers N., Gurevych I. (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERTNetworks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019). 3982–3992. DOI: 10.18653/v1/D19-1410.

3. Salton G., Buckley C. (1988) Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (5), 513–523. DOI: 10.1016/0306-4573(88)90021-0.

4. Gao T., Yao X., Chen D. (2021) SimCSE: Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 6894–6910. DOI: 10.18653/v1/2021.emnlp-main.552.

5. Feng F., Yang Y., Cer D., Arivazhagan N., Wang W. (2022) Language-Agnostic BERT Sentence Embedding. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 878–891. DOI: 10.18653/v1/2022.acl-long.62.


Review

For citations:


Krez K., Shneiderov E., Shish P., Kondratenko E. Comparison of Methods for Assessing the Semantic Similarity of Text Fragments. Doklady BGUIR. 2026;24(2):85-91. (In Russ.) https://doi.org/10.35596/1729-7648-2026-24-2-85-91

Views: 145

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)