<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">bsuir-186</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>АЛГОРИТМ ЭКСТРАКЦИИ ЗНАЧИМОЙ ИНФОРМАЦИИ ИЗ СТРАНИЦ WEB-САЙТОВ</article-title><trans-title-group xml:lang="en"><trans-title>ALGORITHM FOR MINING OF CORE WEBSITES PARTS FOR INFORMATIONAL SEARCH EFFICIENCY</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Шоркин</surname><given-names>А. П.</given-names></name><name name-style="western" xml:lang="en"><surname>Shorkin</surname><given-names>A. P.</given-names></name></name-alternatives><email xlink:type="simple">noemail@neicon.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff xml:lang="ru" id="aff-1"><institution>Белорусский государственный университет информатики и радиоэлектроники</institution><country>Belarus</country></aff><pub-date pub-type="collection"><year>2013</year></pub-date><pub-date pub-type="epub"><day>03</day><month>06</month><year>2019</year></pub-date><volume>0</volume><issue>4</issue><fpage>33</fpage><lpage>37</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Шоркин А.П., 2019</copyright-statement><copyright-year>2019</copyright-year><copyright-holder xml:lang="ru">Шоркин А.П.</copyright-holder><copyright-holder xml:lang="en">Shorkin A.P.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/186">https://doklady.bsuir.by/jour/article/view/186</self-uri><abstract><p>Описаны разработанные алгоритмические подходы к выделению из web-страниц отдельных семантических частей: содержание и служебно-навигационная часть. Подходы базируются на механизме определения единых элементов на разных страницах одного сайта. Главной задачей исследования является улучшение качества информационного поиска посредством исключения из поискового массива web-страниц служебно-навигационной части. Реализованный эксперимент по анализу качества информационного поиска в web на основе тестовых данных с использованием реализованного алгоритма привел к определенному улучшению средней оценки точности поиска. В статье приведен детализированный анализ результатов информационного поиска с использованием описанного алгоритма.</p></abstract><trans-abstract xml:lang="en"><p>Algorithm for automatic dividing of web page into 2 parts: service-navigational and contend parts is described. The method is based on the mining of repeatable elements in html-pages from same website. Main theory is that the quality of information search can be improved by tagging / deleting navigational elements of html pages. Developed method successfully mine service and content parts from html-pages. On the other hand, deleting of service part does not guarantee perfect improvement of web information search quality.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>алгоритм экстракции</kwd><kwd>информация</kwd></kwd-group><kwd-group xml:lang="en"><kwd>web-сайт</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Ландэ Д.В. Основы интеграции информационных потоков. М., 2006.</mixed-citation><mixed-citation xml:lang="en">Ландэ Д.В. Основы интеграции информационных потоков. М., 2006.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Barfourosh A., Nezhad H., Anderson M. et. al. Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition. Michigan, 2002</mixed-citation><mixed-citation xml:lang="en">Barfourosh A., Nezhad H., Anderson M. et. al. Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition. Michigan, 2002</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Sebastiani F. // ACM Computing Surveys. 2002. Vol. 1. P. 1-47.</mixed-citation><mixed-citation xml:lang="en">Sebastiani F. // ACM Computing Surveys. 2002. Vol. 1. P. 1-47.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Liao C., Alpha S., Dixon P. // Proceedings of Australian Data Mining Conference. Canberra, 2003.</mixed-citation><mixed-citation xml:lang="en">Liao C., Alpha S., Dixon P. // Proceedings of Australian Data Mining Conference. Canberra, 2003.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
