Comparison of Methods for Assessing the Semantic Similarity of Text Fragments
https://doi.org/10.35596/1729-7648-2026-24-2-85-91
Abstract
With the rapid growth of text data, there is a need for methods capable of effectively comparing text fragments by meaning, including cases of paraphrasing, synonymization, and sentence restructuring. One of the pressing challenges is comparing the results of semantic comparison methods based on various models with the human perception of semantic similarity. This article discusses an expert method for assessing the semantic similarity of text fragments based on the assessments of survey participants. The method consists of creating an interpretable semantic similarity scale derived from human perception of text content and used to analyze the consistency of various methods. To develop a “human” assessment, a survey of 138 participants was conducted. A comparative analysis revealed that various semantic similarity assessment methods demonstrate varying degrees of consistency with the human perception of text semantic similarity.
About the Authors
K. KrezBelarus
Krez Karina, Postgraduate, Assistant at the Department of Information and Computer Systems Design
220013, Minsk, P. Brovki St., 6
Тel.: +375 29 952-75-56
E. Shneiderov
Belarus
Cand. Sci. (Tech.), Associate Professor at the Department of Information and Computer Systems Design
Minsk
P. Shish
Belarus
Student
Minsk
E. Kondratenko
Belarus
Student
Minsk
References
1. Devlin J., Chang M.-W., Lee K., Toutanova K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). 4171–4186. DOI: 10.18653/v1/N19-1423.
2. Reimers N., Gurevych I. (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERTNetworks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019). 3982–3992. DOI: 10.18653/v1/D19-1410.
3. Salton G., Buckley C. (1988) Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (5), 513–523. DOI: 10.1016/0306-4573(88)90021-0.
4. Gao T., Yao X., Chen D. (2021) SimCSE: Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 6894–6910. DOI: 10.18653/v1/2021.emnlp-main.552.
5. Feng F., Yang Y., Cer D., Arivazhagan N., Wang W. (2022) Language-Agnostic BERT Sentence Embedding. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 878–891. DOI: 10.18653/v1/2022.acl-long.62.
Review
For citations:
Krez K., Shneiderov E., Shish P., Kondratenko E. Comparison of Methods for Assessing the Semantic Similarity of Text Fragments. Doklady BGUIR. 2026;24(2):85-91. (In Russ.) https://doi.org/10.35596/1729-7648-2026-24-2-85-91
JATS XML























