<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">bsuir</journal-id><journal-title-group><journal-title xml:lang="ru">Доклады БГУИР</journal-title><trans-title-group xml:lang="en"><trans-title>Doklady BGUIR</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1729-7648</issn><issn pub-type="epub">2708-0382</issn><publisher><publisher-name>БГУИР</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.35596/1729-7648-2020-18-2-62-70</article-id><article-id custom-type="elpub" pub-id-type="custom">bsuir-2642</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ЭЛЕКТРОНИКА, РАДИОФИЗИКА, РАДИОТЕХНИКА, ИНФОРМАТИКА</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>ELECTRONICS, RADIOPHYSICS, RADIOENGINEERING, INFORMATICS</subject></subj-group></article-categories><title-group><article-title>ПОДХОД К АНАЛИЗУ ИЗОБРАЖЕНИЙ ДЛЯ СИСТЕМ ТЕХНИЧЕСКОГО ЗРЕНИЯ</article-title><trans-title-group xml:lang="en"><trans-title>APPROACH TO IMAGE ANALYSIS FOR COMPUTER VISION SYSTEMS</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Искра</surname><given-names>Н. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Iskra</surname><given-names>N. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Искра Наталья Александровна, магистр технических наук, старший преподаватель кафедры электронных вычислительных машин</p><p>220013, Республика Беларусь, г. Минск, ул. П. Бровки, д. 6; тел. +375-29-586-93-52</p></bio><bio xml:lang="en"><p>Iskra Natalia Alexandrovna, M. Sci, senior lecturer at electronic computing machines Department</p><p>220013, Republic of Belarus, Minsk, P. Brovka str., 6; tel. +375-29-586-93-52</p></bio><email xlink:type="simple">niskra@bsuir.by</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Белорусский государственный университет информатики и радиоэлектроники</institution></aff><aff xml:lang="en"><institution>Belarusian State University of Informatics and Radioelectronics</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2020</year></pub-date><pub-date pub-type="epub"><day>31</day><month>03</month><year>2020</year></pub-date><volume>18</volume><issue>2</issue><fpage>62</fpage><lpage>70</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Искра Н.А., 2020</copyright-statement><copyright-year>2020</copyright-year><copyright-holder xml:lang="ru">Искра Н.А.</copyright-holder><copyright-holder xml:lang="en">Iskra N.A.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://doklady.bsuir.by/jour/article/view/2642">https://doklady.bsuir.by/jour/article/view/2642</self-uri><abstract><p>В данной работе предлагается подход к семантическому анализу изображений, который можно использовать в системах технического зрения. Целью работы является разработка метода автоматического построения семантической модели, формализующей пространственные связи между объектами на изображении, а также ее исследование. Отличительной особенностью данной модели является определение значимых объектов, благодаря чему алгоритм построения анализирует на порядок меньше отношений между объектами, что позволяет существенно сократить время обработки изображения и объем используемых ресурсов. Уделено внимание выбору нейросетевого алгоритма детекции объектов на изображении как предварительного этапа построения модели. Проведены эксперименты на тестовых наборах их базы Visual Genome, разработанной исследователями из Стэнфордского университета для оценки алгоритмов детекции объектов, аннотирования регионов и других актуальных задач анализа изображений. При оценке работы модели оценивалась точность определения пространственных отношений. Также были проведены эксперименты по интерпретации полученной модели, а именно аннотированию, т. е. получению текстового описания содержания изображения. Результаты экспериментов сравнивались с аналогичными результатами нейросетевой генерации аннотаций изображений, полученными на той же базе другими исследователями, а также автором данной работы ранее. Показано улучшение качества аннотирования изображений до 60 % (в соответствии с метрикой METEOR) по сравнению с нейросетевыми методами. Кроме того, использование данной модели позволяет частично очистить и нормализовать данные для обучения, в том числе нейросетевых архитектур, широко применяющихся в анализе изображений. Рассматриваются перспективы использования данной методики в ситуационном мониторинге. В качестве недостатков данного подхода можно отметить некоторые упрощения при построении модели, которые будут учтены в дальнейшем развитии модели.</p></abstract><trans-abstract xml:lang="en"><p>This paper suggests an approach to the semantic image analysis for application in computer vision systems. The aim of the work is to develop a method for automatically construction of a semantic model, that formalizes the spatial relationships between objects in the image and research thereof. A distinctive feature of this model is the detection of salient objects, due to which the construction algorithm analyzes significantly less relations between objects, which can greatly reduce the image processing time and the amount of resources spent for processing. Attention is paid to the selection of a neural network algorithm for object detection in an image, as a preliminary stage of model construction. Experiments were conducted on test datasets provided by Visual Genome database, developed by researchers from Stanford University to evaluate object detection algorithms, image captioning models, and other relevant image analysis tasks. When assessing the performance of the model, the accuracy of spatial relations recognition was evaluated. Further, the experiments on resulting model interpretation were conducted, namely image annotation, i.e. generating a textual description of the image content. The experimental results were compared with similar results obtained by means of the algorithm based on neural networks algorithm on the same dataset by other researchers, as well as by the author of this paper earlier. Up to 60 % improvement in image captioning quality (according to the METEOR metric) compared with neural network methods has been shown. In addition, the use of this model allows partial cleansing and normalization of data for training neural network architectures, which are widely used in image analysis among others. The prospects of using this technique in situational monitoring are considered. The disadvantages of this approach are some simplifications in the construction of the model, which will be taken into account in the further development of the model.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>детекция объектов</kwd><kwd>семантическая модель</kwd><kwd>нейронные сети</kwd><kwd>обработка изображений</kwd><kwd>обработка языка</kwd><kwd>R-CNN</kwd><kwd>WordNet</kwd><kwd>ситуационный мониторинг</kwd><kwd>видеонаблюдение</kwd></kwd-group><kwd-group xml:lang="en"><kwd>object detection</kwd><kwd>semantic model</kwd><kwd>neural networks</kwd><kwd>image processing</kwd><kwd>natural language processing</kwd><kwd>R-CNN</kwd><kwd>WordNet</kwd><kwd>situational monitoring</kwd><kwd>video surveillance</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Liu L., Ouyang W., Wang X., Fieguth P., Chen J., Liu X., Pietikäinen M. Deep learning for generic object detection: A survey. International journal of computer vision. 2019. DOI: 10.1007/s11263-019-01247-4.</mixed-citation><mixed-citation xml:lang="en">Liu L., Ouyang W., Wang X., Fieguth P., Chen J., Liu X., Pietikäinen M. Deep learning for generic object detection: A survey. International journal of computer vision. 2019. DOI: 10.1007/s11263-019-01247-4.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Müller J., Fregin A., Dietmayer K. Disparity sliding window: object proposals from disparity images. IEEE/RSJ International conference on intelligent robots and systems. New York: IEEE, 2018: 5777-5784. ISBN 978-1-5386-8094-0.</mixed-citation><mixed-citation xml:lang="en">Müller J., Fregin A., Dietmayer K. Disparity sliding window: object proposals from disparity images. IEEE/RSJ International conference on intelligent robots and systems. New York: IEEE, 2018: 5777-5784. ISBN 978-1-5386-8094-0.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. DOI: 10.1109/CVPR.2014.81.</mixed-citation><mixed-citation xml:lang="en">Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. DOI: 10.1109/CVPR.2014.81.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.Y., Berg A.C. Ssd: Single shot multibox detector. European conference on computer vision. Springer, Cham, 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2.</mixed-citation><mixed-citation xml:lang="en">Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.Y., Berg A.C. Ssd: Single shot multibox detector. European conference on computer vision. Springer, Cham, 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Girshick R. Fast r-cnn. Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169.</mixed-citation><mixed-citation xml:lang="en">Girshick R. Fast r-cnn. Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Ren S., He K., Girshick R., Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015: 91-99. DOI: 10.5555/2969239.2969250.</mixed-citation><mixed-citation xml:lang="en">Ren S., He K., Girshick R., Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015: 91-99. DOI: 10.5555/2969239.2969250.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">He K., Gkioxari G., Dollár P., Girshick R. Mask r-cnn. Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. DOI: 10.1109/ICCV.2017.322.</mixed-citation><mixed-citation xml:lang="en">He K., Gkioxari G., Dollár P., Girshick R. Mask r-cnn. Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. DOI: 10.1109/ICCV.2017.322.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Xu D., Zhu Y., Choy C.B., Fei-Fei L. Scene graph generation by iterative message passing. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5410-5419. DOI: 10.1109/CVPR.2017.330.</mixed-citation><mixed-citation xml:lang="en">Xu D., Zhu Y., Choy C.B., Fei-Fei L. Scene graph generation by iterative message passing. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5410-5419. DOI: 10.1109/CVPR.2017.330.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Li L.J., Shamma D.A., Bernstein M.S. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision. 2017;123(1):32-73. DOI: 10.1007/s11263-016-0981-7.</mixed-citation><mixed-citation xml:lang="en">Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Li L.J., Shamma D.A., Bernstein M.S. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision. 2017;123(1):32-73. DOI: 10.1007/s11263-016-0981-7.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Miller G.A. WordNet: An electronic lexical database. First edition. Cambridge: MIT Press; 1998. ISBN 9780262061971.</mixed-citation><mixed-citation xml:lang="en">Miller G.A. WordNet: An electronic lexical database. First edition. Cambridge: MIT Press; 1998. ISBN 9780262061971.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Yang J., Lu J., Lee S., Batra D., Parikh D. Graph r-cnn for scene graph generation. Proceedings of the european conference on computer vision. 2018: 690-706. DOI: 10.1007/978-3-030-01246-5_41.</mixed-citation><mixed-citation xml:lang="en">Yang J., Lu J., Lee S., Batra D., Parikh D. Graph r-cnn for scene graph generation. Proceedings of the european conference on computer vision. 2018: 690-706. DOI: 10.1007/978-3-030-01246-5_41.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Borji A., Cheng M.M., Hou Q., Jiang H., Li J. Salient object detection: A survey. Computational visual media. 2019;5(2):117-150. DOI: 10.1007/s41095-019-0149-9.</mixed-citation><mixed-citation xml:lang="en">Borji A., Cheng M.M., Hou Q., Jiang H., Li J. Salient object detection: A survey. Computational visual media. 2019;5(2):117-150. DOI: 10.1007/s41095-019-0149-9.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Michigan: Association for computational linguistics. 2005: 65-72. Anthology ID: W05-0909.</mixed-citation><mixed-citation xml:lang="en">Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Michigan: Association for computational linguistics. 2005: 65-72. Anthology ID: W05-0909.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Johnson J., Karpathy A., Fei-Fei L. Densecap: Fully convolutional localization networks for dense captioning. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4565-4574. DOI: 10.1109/CVPR.2016.494.</mixed-citation><mixed-citation xml:lang="en">Johnson J., Karpathy A., Fei-Fei L. Densecap: Fully convolutional localization networks for dense captioning. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4565-4574. DOI: 10.1109/CVPR.2016.494.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Iskra N., Iskra V. Temporal Convolutional and Recurrent Networks for Image Captioning. Communications in Computer and Information Science. 2019; 1055. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-35430-5_21.</mixed-citation><mixed-citation xml:lang="en">Iskra N., Iskra V. Temporal Convolutional and Recurrent Networks for Image Captioning. Communications in Computer and Information Science. 2019; 1055. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-35430-5_21.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
