Preview

Doklady BGUIR

Advanced search

ALGORITHM FOR MINING OF CORE WEBSITES PARTS FOR INFORMATIONAL SEARCH EFFICIENCY

Abstract

Algorithm for automatic dividing of web page into 2 parts: service-navigational and contend parts is described. The method is based on the mining of repeatable elements in html-pages from same website. Main theory is that the quality of information search can be improved by tagging / deleting navigational elements of html pages. Developed method successfully mine service and content parts from html-pages. On the other hand, deleting of service part does not guarantee perfect improvement of web information search quality.

Keywords


About the Author

A. P. Shorkin
Белорусский государственный университет информатики и радиоэлектроники
Belarus


References

1. Ландэ Д.В. Основы интеграции информационных потоков. М., 2006.

2. Barfourosh A., Nezhad H., Anderson M. et. al. Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition. Michigan, 2002

3. Sebastiani F. // ACM Computing Surveys. 2002. Vol. 1. P. 1-47.

4. Liao C., Alpha S., Dixon P. // Proceedings of Australian Data Mining Conference. Canberra, 2003.


Review

For citations:


Shorkin A.P. ALGORITHM FOR MINING OF CORE WEBSITES PARTS FOR INFORMATIONAL SEARCH EFFICIENCY. Doklady BGUIR. 2013;(4):33-37. (In Russ.)

Views: 3375


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1729-7648 (Print)
ISSN 2708-0382 (Online)