KR102104548B1

KR102104548B1 - The visual detecting system and visual detecting method for operating by the same

Info

Publication number: KR102104548B1
Application number: KR1020180091880A
Authority: KR
Inventors: 이준환; 김병준
Original assignee: 전북대학교산학협력단
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2020-04-24
Also published as: KR20200019280A; WO2020032506A1

Abstract

본 발명의 일 실시 예에 따른 시각 감지 시스템은 신경망(neural network)을 이용하여 영상의 특징 맵(feature map)이 생성되는 특징추출부, 생성된 특징 맵을 이용하여 영상 내 객체의 위치 및 이름을 추정하기 위한 공간적 시각감지부, 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체를 감지하기 위한 시간적 시각감지부 및 공간적 시각감지부로부터의 추정된 결과 및 시간적 시각감지부로부터의 감지된 결과를 기초로 하여 영상 내 객체 감지 결과를 판단하는 융합판단부가 포함될 수 있다.The vision detection system according to an embodiment of the present invention uses a neural network to determine the location and name of an object in an image using a feature extraction unit in which a feature map of the image is generated, and the generated feature map. Spatial visual sensor for estimating, estimated results from temporal visual sensor and spatial visual sensor for detecting objects in an image according to time sequence based on the generated feature map, and detected from temporal visual sensor A fusion determination unit for determining an object detection result in an image based on the result may be included.

Description

Visual detection system and visual detection method using the same {THE VISUAL DETECTING SYSTEM AND VISUAL DETECTING METHOD FOR OPERATING BY THE SAME}

본 발명은 시각 감지 시스템 및 이를 이용한 시각 감지 방법에 관한 것으로, 더욱 상세하게는 딥러닝 객체 감지 모델(예컨대, R-CNN, YOLO, SSD 등)을 통한 객체 인식률을 높이기 위하여, 공간적 시각감지 결과와 함께 시간적 정보에 딥러닝 모델(예컨대, MLP, LSTM, GRU 등)을 이용한 시간적 시각감지 결과를 융합하여 영상 내 오인식을 줄이고 정확성을 제고하는 시각 감지 시스템 및 이를 이용한 시각 감지 방법에 관한 것이다.The present invention relates to a visual detection system and a visual detection method using the same, and more specifically, in order to increase the object recognition rate through a deep learning object detection model (eg, R-CNN, YOLO, SSD, etc.), spatial visual detection results and The present invention relates to a visual detection system that fuses temporal information to temporal information using a deep learning model (eg, MLP, LSTM, GRU, etc.) to reduce false positives in an image and improve accuracy, and a visual sensing method using the same.

딥러닝(deep learning)은 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화(다량의 데이터 속에서 핵심 내용 또는 기능 요약)를 시도하는 알고리즘의 집합으로 정의되며, 이미지를 이용한 객체검출(object detection) 및 인식(recognition), 분류(classification)와 음성인식, 자연어 처리 등의 분야에서 적용되고 있다. 특히, 객체 검출(object detection)은 컴퓨터 비전(computer vision) 분야에서 많이 다뤄왔던 기술로, 최근 딥러닝 및 기계학습 방법이 이슈화되면서 다양한 연구들이 진행되어 영상 인식 및 감지 성능은 인간의 시각 수준으로 높아지고 있다. 다만, 컨볼루션 신경망 등의 딥러닝 기술이 개발되었음에도 불구하고, 오인식은 여전히 존재하며 이러한 오인식을 줄이기 위한 기술의 개발이 절실한 상황이다.Deep learning is defined as a set of algorithms that attempt a high level of abstraction (summarizing key contents or functions in a large amount of data) through a combination of several nonlinear transformation methods, and object detection using images. And recognition, classification and speech recognition, and natural language processing. In particular, object detection is a technology that has been widely dealt with in the field of computer vision. Recently, as deep learning and machine learning methods have become issues, various studies have been conducted to increase image recognition and detection performance to the level of human vision. have. However, despite the development of deep learning technologies such as convolutional neural networks, misrecognition still exists and the development of technologies to reduce such misrecognition is urgently needed.

도 1은 종래의 시각 감지 방법을 사용하여 영상 내 객체를 인식하는 상태를 나타낸 예시도이다. 도 1의 (a)와 같이 보행자가 두 명이 존재함에도 불구하고 한 명만 보행자로 인식되는 경우가 존재하며, 도 1의 (b)에서와 같이 자동차 번호판을 검출하려는 경우에도 자동차 번호판이 아닌 다른 부분이 자동차 번호판으로 잘못 인식되는 경우도 발생된다. 또한, 도 1의 (c) 에서처럼 도로 왼쪽의 나무 부근의 화재가 발생하였음에도 화재감지가 전혀 되지 않고, 다른 부분이 인식되는 문제점도 존재한다.1 is an exemplary view showing a state of recognizing an object in an image using a conventional visual sensing method. As shown in (a) of FIG. 1, there are cases where only one person is recognized as a pedestrian even though there are two pedestrians, and when detecting a license plate as shown in FIG. It can also be misrecognized as a license plate. In addition, even in the case of a fire near the tree on the left side of the road, as in FIG. 1C, there is a problem in that no fire is detected and other parts are recognized.

더불어, 기존의 시각 감지 방법은 다양한 환경(Ex. 날씨, 장소 등)을 고려하기 힘들고, 단일 영상 내에서 공간적 특징 정보를 이용할 뿐 시간에 따른 분석에 따른 객체 인식을 위한 기술의 개발이 이루어지지 않은 실정이다.In addition, it is difficult to consider various environments (Ex. Weather, place, etc.) in the existing visual detection method, and the use of spatial feature information in a single image has not been developed for object recognition according to analysis over time. This is true.

1. 대한민국 등록특허공보 제10-1415001호 “객체 검출 및 추적장치와 방법 및 그것을 사용하는 지능형 감시 시스템” (등록일자 :2013.01.31)1. Republic of Korea Registered Patent Publication No. 10-1415001 "Object detection and tracking device and method and intelligent monitoring system using it" (Registration date: 2013.01.31)

본 발명은 전술한 바와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명은 공간적 시각감지와 시간적 시각감지가 동시에 수행되도록 함으로써 객체 인식의 정확성을 제고하고자 함에 목적이 있다.The present invention has been devised to solve the problems as described above, and the present invention has an object to improve the accuracy of object recognition by simultaneously performing spatial and temporal visual sensing.

또한, 공간적 시각감지 및 시간적 시각감지 모두 특징 추출 단계를 공유하도록 구성됨으로써 특징 추출을 위한 계산량을 줄이고 수행 시간을 감소시키고자 함에 목적이 있다.In addition, both spatial and temporal visual sensing are configured to share a feature extraction step, so that the object is to reduce the computational time for feature extraction and reduce the execution time.

본 발명에서 이루고자 하는 기술적 목적들은 이상에서 언급한 사항들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 이하 설명할 본 발명의 실시 예들로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 고려될 수 있다.The technical objectives to be achieved in the present invention are not limited to the above-mentioned matters, and other technical problems that are not mentioned are provided to those skilled in the art from the embodiments of the present invention to be described below. Can be considered by

본 발명의 일 실시 예로써, 입력된 영상 내에서 객체를 감지하기 위한 시각 감지 시스템이 제공될 수 있다.As an embodiment of the present invention, a visual detection system for detecting an object within an input image may be provided.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 특징추출부의 신경망에는 영상의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.In a visual sensing system according to an embodiment of the present invention, a neural network of a feature extraction unit includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map ) May be included.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 공간적 시각감지부에는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)이 포함될 수 있다.The spatial visual sensor of the visual sensor system according to an embodiment of the present invention may include a fully connected layer composed of at least one hidden layer.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부에서 생성된 특징 맵이 저장되는 저장부가 더 포함되고, 시간적 시각감지부는 재귀신경망(recurrent neural network)을 이용하여 시간적 흐름에 따라 영상 내 객체를 감지하며, 재귀신경망은 저장부에 저장된 특징 맵을 기초로 학습될 수 있다.A storage unit in which the feature map generated by the feature extraction unit of the vision detection system according to an embodiment of the present invention is stored is further included, and the temporal vision detection unit uses the recurrent neural network to generate objects in the image according to the temporal flow. And the recursive neural network can be learned based on the feature map stored in the storage unit.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In the visual sensing system according to an embodiment of the present invention, an image in which a screen transition, scaling, and rotation processing is performed may be input and learned in a neural network.

본 발명의 일 실시 예로써, 시각 감지 시스템을 이용한 시각 감지 방법이 제공될 수 있다.As an embodiment of the present invention, a visual sensing method using a visual sensing system may be provided.

본 발명의 일 실시 예에 따른 시각 감지 방법은 영상이 입력되는 단계, 신경망(neural network)을 이용하여 영상의 특징 맵(feature map)이 생성되는 단계, 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계, 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계 및 단계의 추정 결과 및 단계의 감지 결과를 기초로 하여 영상 내 객체 감지 결과가 판단되는 단계가 포함될 수 있다.According to an embodiment of the present invention, a method of detecting a vision includes inputting an image, generating a feature map of an image using a neural network, and location of an object in the image using the feature map. The step of estimating the name, the step of detecting the object in the image based on the feature map based on the feature map, and the step of determining the object detection result in the image based on the estimation result of the step and the detection result of the step may be included. .

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계 및 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계가 동시에 수행될 수 있다.In the visual sensing method according to an embodiment of the present invention, the step of estimating the location and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map are simultaneously performed. You can.

본 발명의 일 실시 예에 따른 시각 감지 방법의 신경망에는 영상의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.The neural network of the visual sensing method according to an embodiment of the present invention includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map You can.

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)을 이용하여 수행될 수 있다.The step of estimating the location and name of the object in the image using the feature map in the visual sensing method according to an embodiment of the present invention uses a fully connected layer composed of at least one hidden layer. Can be performed.

본 발명의 일 실시 예에 따른 시각 감지 방법은 신경망을 이용하여 영상의 특징 맵이 생성되는 단계에서 생성된 특징 맵이 시각 감지 시스템의 저장부에 저장되는 단계가 더 포함되고, 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계는 재귀신경망(recurrent neural network)을 이용하여 수행되며, 재귀신경망은 신경망을 이용하여 영상의 특징 맵이 생성되는 단계에서 저장된 특징 맵을 기초로 학습될 수 있다.According to an embodiment of the present invention, a method of detecting a vision further includes storing a feature map generated in a step of generating a feature map of an image using a neural network, and storing the feature map in a storage unit of the vision detection system, based on the feature map. Therefore, the step of detecting an object in the image according to the time sequence is performed using a recurrent neural network, and the recurrent neural network is learned based on the feature map stored in the step of generating a feature map of the image using the neural network. You can.

본 발명의 일 실시 예에 따른 시각 감지 방법에서, 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In the visual sensing method according to an embodiment of the present invention, the neural network may be learned by inputting an image on which screen flip, scaling, and rotation processing is performed.

한편, 본 발명의 일 실시 예로써, 전술한 방법을 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록매체가 제공될 수 있다.Meanwhile, as an embodiment of the present invention, a computer-readable recording medium in which a program for implementing the above-described method is recorded may be provided.

이와 같은 본 발명에 의해서, 공간적 시각감지 및 시간적 시각감지를 동시에 수행함으로써 영상 내 객체 인식에 있어서 오인식을 줄일 수 있다.According to the present invention, by performing spatial and temporal visual sensing simultaneously, misrecognition in object recognition in an image can be reduced.

또한, 공간적 시각감지 및 시간적 시각감지 모두 특징 추출 단계를 공유하도록 구성됨으로써 특징 추출을 위한 계산량을 줄이고 수행 시간을 줄일 수 있다.In addition, both spatial and temporal visual sensing are configured to share the feature extraction step, thereby reducing the computational amount for feature extraction and reducing the execution time.

본 발명의 실시 예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 이하의 본 발명의 실시 예들에 대한 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 발명을 실시함에 따른 의도하지 않은 효과들 역시 본 발명의 실시 예들로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.The effects obtained in the embodiments of the present invention are not limited to the above-mentioned effects, and other effects not mentioned are common knowledge in the art to which the present invention pertains from the following description of the embodiments of the present invention. It can be clearly drawn and understood by those who have it. That is, unintended effects according to the practice of the present invention can also be derived by those of ordinary skill in the art from the embodiments of the present invention.

도 1은 종래의 시각 감지 방법을 사용하여 영상 내 객체를 인식하는 상태를 나타낸 예시도이다.
도 2는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 블록도이다.
도 3 및 도 4는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 예시도이다.
도 5는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용하여 화재 감지 결과를 나타낸 예시도이다.
도 6a 내지 도 6c는 본 발명의 일 실시 예에 따른 시각 감지 시스템의 학습을 위한 데이터를 나타낸 예시도이다.
도 7a 및 도 7b는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 산불 감지 결과를 나타낸 예시도이다.
도 8 내지 도 10은 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 시각 감지 방법을 나타낸 순서도이다.1 is an exemplary view showing a state of recognizing an object in an image using a conventional visual sensing method.
2 is a block diagram showing a visual sensing system according to an embodiment of the present invention.
3 and 4 are exemplary views showing a vision sensing system according to an embodiment of the present invention.
5 is an exemplary view showing a fire detection result using a visual detection system according to an embodiment of the present invention.
6A to 6C are exemplary views showing data for learning a visual sensing system according to an embodiment of the present invention.
7A and 7B are exemplary views illustrating a result of forest fire detection using a vision detection system according to an embodiment of the present invention.
8 to 10 are flowcharts illustrating a method for detecting a time using a time detection system according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains may easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.Terms used in the present specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다. The terminology used in the present invention was selected from the general terms that are currently widely used while considering the functions in the present invention, but this may vary according to the intention or precedent of a person skilled in the art or the appearance of new technologies. In addition, in certain cases, some terms are arbitrarily selected by the applicant, and in this case, their meanings will be described in detail in the description of the applicable invention. Therefore, the terms used in the present invention should be defined based on the meanings of the terms and the contents of the present invention, not simply the names of the terms.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.또한, 명세서 전체에서 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, "그 중간에 다른 소자를 사이에 두고" 연결되어 있는 경우도 포함한다.When a part of the specification "includes" a certain component, this means that other components may be further included instead of excluding other components unless specifically stated otherwise. In addition, terms such as “... unit” and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented in hardware or software, or a combination of hardware and software. In addition, when a part is "connected" to another part in the specification, it includes not only a case of "directly connecting" but also a case of "connecting another element in between". .

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 종래의 시각 감지 방법을 사용하여 영상(10) 내 객체를 인식하는 상태를 나타낸 예시도이다. 도 1에서와 같이 입력된 이미지 내에서 객체를 인식하고 분류해내는 기술들이 다양하게 개발되고 있다. 특히, 딥러닝 알고리즘(Ex. 콘볼루션 신경망 등)을 이용하여 이미지 내의 객체를 인식하는 기술들로 인해 감지 성능은 인간의 시각 수준으로 높아지고 있다. 다만, 상기와 같은 기술의 개발로 감지 성능이 향상되었음에도 불구하고 오인식은 존재하며 이러한 오인식을 줄이기 위한 기술의 개발이 절실한 상황이다.1 is an exemplary view showing a state of recognizing an object in an image 10 using a conventional visual sensing method. Various techniques for recognizing and classifying objects in the input image as in FIG. 1 have been developed. In particular, due to techniques for recognizing objects in an image using a deep learning algorithm (Ex. Convolutional neural network, etc.), detection performance is increasing to a human visual level. However, despite the improvement in detection performance through the development of the above-described technology, misrecognition exists and there is an urgent need to develop a technology for reducing this misrecognition.

구체적으로, 도 1의 (a)와 같이 보행자가 두 명이 존재함에도 불구하고 한 명만 보행자로 인식되는 경우가 존재하며, 도 1의 (b)에서와 같이 자동차 번호판을 검출하려는 경우에도 자동차 번호판이 아닌 다른 부분이 자동차 번호판으로 잘못 인식되는 경우도 발생된다. 또한, 도 1의 (c) 에서처럼 화재감지가 전혀 되지 않는 문제점도 존재한다.Specifically, there are cases where only one pedestrian is recognized as a pedestrian despite the presence of two pedestrians as shown in FIG. 1 (a), and even when attempting to detect a license plate as in FIG. It is also the case that other parts are incorrectly recognized as license plates. In addition, there is also a problem that the fire is not detected at all as in FIG. 1 (c).

더불어, 기존의 시각 감지 방법은 다양한 환경(Ex. 날씨, 장소 등)을 고려하기 힘들고, 단일 영상(10) 내에서 공간적 특징 정보를 이용할 뿐 시간에 따른 분석에 따른 객체 인식을 위한 기술의 개발이 이루어지지 않은 실정이다.In addition, it is difficult to consider various environments (Ex. Weather, place, etc.) in the existing visual detection method, and it is not possible to develop a technology for object recognition according to analysis over time as it uses spatial feature information within a single image (10). This has not been done.

이하에서는 이미지 내에서 객체 인식 시 공간적 시각감지뿐만 아니라 시간에 따른 객체 인식도 함께 고려한 위한 시각 감지 시스템을 설명한다.Hereinafter, a visual sensing system for considering object recognition over time as well as spatial visual sensing when recognizing objects in an image will be described.

도 2는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 블록도이다.2 is a block diagram showing a visual sensing system according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 시스템은 신경망(neural network)을 이용하여 영상(10)의 특징 맵(feature map)이 생성되는 특징추출부(100), 생성된 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름을 추정하기 위한 공간적 시각감지부(200), 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체를 감지하기 위한 시간적 시각감지부(300) 및 공간적 시각감지부(200)로부터의 추정된 결과 및 시간적 시각감지부(300)로부터의 감지된 결과를 기초로 하여 영상(10) 내 객체 감지 결과를 판단하는 융합판단부(400)가 포함될 수 있다.Referring to FIG. 2, in the visual sensing system according to an embodiment of the present invention, a feature extraction unit 100 in which a feature map of the image 10 is generated using a neural network, and the generated features Spatial visual detection unit 200 for estimating the location and name of the object in the image 10 using the map, temporal time for detecting the object in the image 10 according to the time sequence based on the generated feature map A convergence determination unit for determining an object detection result in the image 10 based on the estimated result from the sensor 300 and the spatial visual sensor 200 and the detected result from the temporal visual sensor 300 400) may be included.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부(100)에서는 다양한 신경망 모델을 이용하여 입력된 영상(10)의 특징 맵이 생성될 수 있다. 특히, 컨볼루션 신경망(Convolution Neural Network, CNN)을 이용하는 경우에는 상기 신경망에는 영상(10)의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다. In the feature extraction unit 100 of the visual sensing system according to an embodiment of the present invention, a feature map of the input image 10 may be generated using various neural network models. In particular, in the case of using a convolutional neural network (CNN), the neural network includes a plurality of convolutional layers for generating a feature map of the image 10 and sampling of the feature map. A pooling layer can be included.

상기 특징추출부(100)에서 생성된 특징 맵은 후술하는 바와 같이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에서 공유되는 특징이 있다. 즉, 본 발명의 시각 감지 시스템에서는 공간적 시각감지부(200)에 의한 영상(10) 내 객체의 추정과 시간적 시각감지부(300)에 의한 시간 순서에 따른 영상(10) 내 객체 감지가 동시에 수행될 수 있다. 이와 같은 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 의해서 동시에 객체 감지를 위해서 특징추출부(100)에서 특징 맵이 생성되는 단계가 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두에 공유될 수 있다. 상기와 같이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에서 특징 맵이 공유되도록 함으로써 영상(10) 내에서 특징 추출을 위한 계산량이 줄어들 수 있고, 수행 시간이 감소될 수 있다.The feature map generated by the feature extraction unit 100 has features shared by the spatial visual detection unit 200 and the temporal visual detection unit 300, as described later. That is, in the visual sensing system of the present invention, the estimation of the object in the image 10 by the spatial visual sensor 200 and the object detection in the image 10 according to the time sequence by the temporal visual sensor 300 are performed simultaneously. Can be. The steps in which the feature map is generated by the feature extraction unit 100 for object detection at the same time by the spatial visual detection unit 200 and the temporal visual detection unit 300 are the spatial visual detection unit 200 and the temporal visual detection unit 300 can be shared by all. As described above, by allowing the feature maps to be shared by the spatial visual sensor 200 and the temporal visual sensor 300, the calculation amount for feature extraction in the image 10 can be reduced, and the execution time can be reduced.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 공간적 시각감지부(200)에서는 특징추출부(100)에서 생성된 특징 맵을 기초로 하여 입력된 영상(10) 내에서 객체의 위치 및 이름이 추정될 수 있다.In the spatial visual detection unit 200 of the visual detection system according to an embodiment of the present invention, the location and name of an object are estimated in the input image 10 based on the feature map generated by the feature extraction unit 100. Can be.

구체적으로, 본 발명의 공간적 시각감지부(200)에는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)이 포함됨으로써, 본 발명의 특징추출부(100)와 함께 컨볼루션 신경망(CNN)이 구성될 수 있다. Specifically, the spatial visual sensing unit 200 of the present invention includes a fully connected layer composed of at least one hidden layer, thereby convolving the feature extraction unit 100 of the present invention. A neural network (CNN) may be constructed.

즉, 전술한 바와 같이 특징추출부(100)의 신경망은 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 공유되는 특징이 있으므로, 특징추출부(100)의 컨볼루션 층들 및 풀링 레이어에 공간적 시각감지부(200)의 완전 연결 층이 결합됨으로써 입력된 단일 영상(10)의 객체가 인식될 수 있다.That is, as described above, since the neural network of the feature extraction unit 100 has features shared by the spatial visual detection unit 200 and the temporal visual detection unit 300, the convolutional layers and the pooling layer of the feature extraction unit 100 The objects of the single image 10 can be recognized by combining the fully connected layers of the spatial visual sensor 200.

다만, 공간적 시각감지부(200)가 특징추출부(100)와 결합되어 컨볼루션 신경망(CNN) 모델이 적용되는 것뿐만 아니라, 공간적 객체검출 모델인 R-CNN(Region based CNN), Faster R-CNN, SSD(Single Shot multibox Detector), YOLO(You Only Look Once) 및 기계학습 모델 등이 사용자의 사용 목적에 따라 다양하게 적용될 수 있다.However, the spatial visual detection unit 200 is combined with the feature extraction unit 100 to apply a convolutional neural network (CNN) model, as well as spatial object detection model R-CNN (Region based CNN), Faster R- CNN, Single Shot Multibox Detector (SSD), You Only Look Once (YOLO), and machine learning models can be applied in various ways depending on the purpose of use.

또한, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 시간적 시각감지부(300)에서는 특징추출부(100)에서 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지될 수 있다. 즉, 시간적 시각감지부(300)에서는 특정 시점(T_n)에 입력된 영상(10)에 대한 특징 맵 및 상기 특정 시점 이전(T_n _- ₁)에 입력된 영상(10)에 대한 특징 맵이 함께 고려되어 영상(10) 내 객체의 감지가 이루어질 수 있다.In addition, in the temporal visual sensing unit 300 of the visual sensing system according to an embodiment of the present invention, objects in the image 10 may be detected in a time sequence based on the feature map generated by the characteristic extracting unit 100. You can. That is, the temporal visual sensor 300 includes a feature map for the image 10 input at a specific time point T _n and a feature map for the image 10 input before the specific time point T _n _- ₁ . Considered together, detection of an object in the image 10 may be achieved.

상기와 같은 시간적 시각감지부(300)에서의 객체 감지는 전술한 공간적 시각감지부(200)의 감지 결과와는 별도로 이루어지는 것으로 공간적 시각감지부(200)에서 객체의 위치나 이름 등을 추정하지 못하는 경우에도 시간적 시각감지부(300)에서 객체가 감지될 수 있다. 결국 공간적 시각감지부(200) 및 시간적 시각감지부(300)는 상호 보완적으로 동작함으로써 입력된 영상(10)의 객체를 인식할 수 있다.The object detection in the temporal visual sensor 300 as described above is performed separately from the detection result of the spatial visual sensor 200 described above, and the spatial visual sensor 200 cannot estimate the location or name of the object. Even in this case, an object may be detected by the temporal visual sensor 300. As a result, the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may recognize an object of the input image 10 by operating complementarily.

구체적으로, 본 발명의 시간적 시각감지부(300)는 시퀀스 입력을 모두 고려하여 시간 순서에 따른 영상(10) 내 객체가 감지될 수 있는데, 이를 위해 다양한 시간적 시각감지 알고리즘(310)이 사용될 수 있다. 예를 들면, 상기 시간적 시각감지 알고리즘(310)에는 다층 퍼셉트론(Multi-Layer Perceptron, MLP), 재귀신경망(Recurrent Neural Network, RNN), LSTM(Long-Short Term Memory) 등이 포함될 수 있다. 상기의 RNN, LSTM 등을 이용한 객체 감지 분류모델에 의해 분류된 결과를 보팅(Voting), 앙상블(Ensemble)등의 알고리즘을 이용하여 객체 감지의 정확성을 제고할 수 있다. Specifically, the temporal visual sensing unit 300 of the present invention can detect objects in the image 10 according to the time sequence in consideration of all sequence inputs, and various temporal visual sensing algorithms 310 may be used for this. . For example, the temporal visual detection algorithm 310 may include a multi-layer perceptron (MLP), a recurrent neural network (RNN), and a long-term memory (LSTM). The accuracy of object detection can be improved by using algorithms such as voting and ensemble of the results classified by the object detection classification model using RNN and LSTM.

본 발명의 일 실시 예에 따른 시각 감지 시스템은 특징추출부(100)에서 생성된 특징 맵이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 공유됨으로써 공간적 시각감지부(200) 및 시간적 시각감지부(300) 각각에서 영상(10) 내 객체 감지가 이루어지는데, 상기 각 감지 결과는 융합판단부(400)에서 종합적으로 판단이 이루어질 수 있다.The vision detection system according to an embodiment of the present invention is characterized in that the feature map generated by the feature extraction unit 100 is shared by the spatial visual detection unit 200 and the temporal visual detection unit 300, thereby providing a spatial visual detection unit 200 and Object detection in the image 10 is performed at each of the temporal visual detection unit 300, and the detection result may be comprehensively determined at the fusion determination unit 400.

구체적으로, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 융합판단부(400)에서는 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과가 미감지인 경우에는 융합 결과를 '미경보'로 판단하며, 공간적 시각감지부(200) 및 시간적 시각감지부(300) 중 어느 하나의 감지 결과만이 감지인 경우에는 융합 결과를 '주의'로 판단할 수 있다. 또한, 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두의 감지된 것으로 결과가 나온 경우에는 융합판단부(400)는 '경보'로 판단할 수 있다. 본 발명의 융합판단부(400)의 판단 결과에서 '미경보'는 입력된 영상(10) 내에서 어떠한 객체(Ex.보행자, 화재 등)도 감지되지 않았다는 것을 의미하며, '주의'는 공간적 시각감지 결과와 시간적 시각감지 결과가 상이한 것으로 입력된 영상(10) 내 객체가 감지될 여지가 있는 것을 의미하거나 판단이 애매한 경우로써 판단보류 혹은 재판단이 요구되는 것을 의미한다. '경보'는 공간적 시각감지 결과 및 시간적 시각감지 결과 모두 감지된 것으로 판단되어 보행자나 화재 등의 객체가 감지된 것으로 판단된 것을 의미한다.Specifically, in the fusion determination unit 400 of the visual sensing system according to an embodiment of the present invention, when the detection results of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 are undetected, the fusion result is' unknown. Alert ', and if only one of the detection results of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 is sensing, the fusion result may be determined as' attention'. In addition, when the result is detected by both the spatial visual sensing unit 200 and the temporal visual sensing unit 300, the fusion determination unit 400 may determine that it is an 'alarm'. In the judgment result of the fusion determination unit 400 of the present invention, 'unalarmed' means that no object (Ex. Pedestrian, fire, etc.) was detected in the input image 10, and 'attention' refers to spatial perspective It means that there is room for an object in the image 10 to be detected as being different from the detection result and the temporal visual detection result, or that the judgment pending or the judge is required as the judgment is ambiguous. 'Alarm' means that objects such as pedestrians or fires are detected because it is determined that both spatial and temporal visual sensing results have been detected.

상기의 예시와 같이 본 발명의 융합판단부(400)에서 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과를 기초로 하여 융합하여 판단하는 내용이 아래의 [표 1]과 같이 정리될 수 있다.Based on the detection results of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 in the fusion determination unit 400 of the present invention as shown in the above example, the contents determined by fusion are as shown in [Table 1] below. Can be organized together.

[표 1][Table 1]

또한, 본 발명의 융합판단부에서는 공간적 시각감지부(200)의 감지 결과가 미감지이고, 시간적 시각감지 결과는 감지로 판단된 경우에는 공간적 시각감지부(200)의 예측 임계값(threshold)을 재조정하여 다시 판단하도록 할 수 있다. 상기 예측 임계값은 본 발명의 특징추출부(100)와 공간적 시각감지부(200)가 결합되어 CNN, R-CNN 등으로 적용되는 경우 신경망의 노드(node) 혹은 은닉 층(hidden layer)의 여러 가중치 파라미터들 중 하나에 해당할 수 있다.In addition, in the convergence determination unit of the present invention, when the detection result of the spatial visual sensing unit 200 is undetected, and the temporal visual sensing result is determined as sensing, the prediction threshold of the spatial visual sensing unit 200 is set. It can be readjusted to make a judgment. The predicted threshold is a combination of the feature extracting unit 100 and the spatial visual sensing unit 200 of the present invention and applied to CNN, R-CNN, and the like, in a number of nodes or hidden layers of a neural network. It may correspond to one of the weight parameters.

도 3 및 도 4는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 예시도이다. 도 3을 참조하면, 도 3에서는 화재 감지를 위해 본 발명의 일 실시 예에 따른 시각 감지 시스템이 적용되어 입력된 영상(10)들로부터 화재가 감지되는 과정이 나타나 있다.3 and 4 are exemplary views showing a vision sensing system according to an embodiment of the present invention. Referring to FIG. 3, in FIG. 3, a process in which a fire is detected from the input images 10 is applied by applying a visual detection system according to an embodiment of the present invention to detect a fire.

구체적으로, 도 3에서와 같이 시간 순서에 따른 영상(10)들이 입력되고, 각각의 영상(10)들은 본 발명의 특징추출부(100)에 의해서 특징 맵이 생성될 수 있다. 특징추출부(100)에서 생성된 특징 맵은 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두에 공유되어 입력되어 공간적 시각감지부(200) 및 시간적 시각감지부(300) 각각에서 객체 감지가 이루어질 수 있다. 도 3에서는 시간적 시각감지부(300)에서 이용되는 시각감지 알고리즘 모델로 다층 퍼셉트론(MLP)알고리즘이 사용된 상태가 나타나 있다. 본 발명의 융합판단부(400)에서는 전술한 바와 같이 공간적 시각감지부(200)의 감지 결과와 시간적 시각감지부(300)의 감지 결과를 기초로 하여 감지 결과가 판단될 수 있다.Specifically, as shown in FIG. 3, images 10 according to a time sequence are input, and a feature map may be generated for each image 10 by the feature extraction unit 100 of the present invention. The feature map generated by the feature extraction unit 100 is shared and input to both the spatial visual detection unit 200 and the temporal visual detection unit 300 and is inputted from the spatial visual detection unit 200 and the temporal visual detection unit 300, respectively. Object detection can be done. In FIG. 3, a state in which a multi-layer perceptron (MLP) algorithm is used as a model of a visual sensing algorithm used in the temporal visual sensing unit 300 is shown. In the fusion determination unit 400 of the present invention, as described above, the sensing result may be determined based on the sensing result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300.

본 발명의 일 실시 예에 따른 시각 감지 시스템은 입력된 영상(10) 내의 객체가 감지(perception) 혹은 인식(recognition)은 전술한 바와 같이 특징추출부(100)의 신경망을 통해 특징 맵이 형성됨으로써 이루어질 수 있다. 상기 객체 감지 혹은 인식은 주어진 영상(10) 내에서 객체로 인식된 영역을 미리 분류된 복수개의 클래스(class)들 중 하나로 인지하는 것을 의미할 수 있다. 예를 들어, 화재 감지의 경우에는 입력된 영상(10) 내에서 불꽃이나 화염, 연기 등이 객체의 대상이 될 수 있다. 이러한 객체 감지 혹은 인식은 기계학습(machine learning) 또는 딥러닝(deep learning)을 통하여 수행될 수 있다. 기계학습 혹은 딥러닝에 의해 객체가 감지/인식되는 경우 데이터세트(train/validation/test dataset)을 이용하여 분류 모델(classification model)이 학습된 후, 입력된 영상(10)에 대하여 상기 복수개의 클래스들 중 어느 클래스에 해당되는지 판단될 수 있다.In the visual sensing system according to an embodiment of the present invention, a feature map is formed through a neural network of the feature extraction unit 100 as described above for perception or recognition of an object in the input image 10. It can be done. The object detection or recognition may mean recognizing an area recognized as an object in a given image 10 as one of a plurality of pre-classified classes. For example, in the case of fire detection, a flame, flame, smoke, etc. may be the object of the object in the input image 10. Such object detection or recognition may be performed through machine learning or deep learning. When an object is detected / recognized by machine learning or deep learning, after the classification model is trained using a dataset (train / validation / test dataset), the plurality of classes for the input image 10 It can be determined which of the classes.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 입력된 영상(10)은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상일 수 있다. 즉, 입력된 영상에 대하여 상기와 같은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행되어 다양한 환경이 고려됨으로써 본 발명의 시각 감지 시스템을 이용한 강건한 검출이 이루어질 수 있다.The image 10 input from the visual sensing system according to an embodiment of the present invention may be an image in which screen switching, scaling, and rotation processing are performed. That is, the screen switching (slip), scaling (scaling), and rotation (rotation) processing on the input image is performed, and various environments are considered, so that robust detection using the visual sensing system of the present invention can be achieved.

구체적으로, 본 발명에서 특징추출부(100)의 신경망 학습을 위한 데이터세트에는 상기 화면 전환, 스케일링 또는 회전 등의 전처리(pre-processing)된 영상(10)들이 더 포함될 수 있다. 화면 전환(flip)은 기존의 데이터세트로 구비된 영상(10)에서 좌우 반전 혹은 상하 반전 등의 화면 전환을 통해서 학습 결과를 다양하게 하기 위한 것에 해당한다. 상기 스케일링(scaling)은 기존의 영상(10)의 크기를 조절하는 것에 해당하며, 스케일 비율(scale rate)는 다양하게 설정될 수 있다. 또한, 회전(rotation)도 마찬가지로 입력된 화면이 여러 각도로 회전됨으로써 데이터세트에 더 포함될 수 있다.Specifically, in the present invention, the data set for learning the neural network of the feature extraction unit 100 may further include pre-processed images 10 such as screen switching, scaling, or rotation. The screen transition (flip) corresponds to diversifying the learning result through screen switching such as left and right inversion or upside down in the image 10 provided in the existing data set. The scaling corresponds to adjusting the size of the existing image 10, and the scale rate can be variously set. In addition, rotation may also be included in the data set by rotating the input screen at various angles.

상기의 데이터세트를 다양하게 하여 특징추출부(100)에서 특징 맵을 생성하기 위한 입력 데이터 생성 방법에 관한 예시적인 내용이 아래의 [표 2]와 같이 정리될 수 있다.Exemplary contents of a method for generating input data for generating a feature map in the feature extraction unit 100 by varying the above data sets may be summarized as in [Table 2] below.

[표 2][Table 2]

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부(100)에서 생성된 특징 맵이 저장되는 저장부가 더 포함되고, 시간적 시각감지부(300)는 재귀신경망(recurrent neural network)을 이용하여 시간적 흐름에 따라 영상(10) 내 객체를 감지하며, 재귀신경망은 저장부에 저장된 특징 맵을 기초로 학습될 수 있다.A storage unit for storing the feature map generated by the feature extraction unit 100 of the vision detection system according to an embodiment of the present invention is further included, and the temporal vision detection unit 300 uses a recurrent neural network. The object in the image 10 is sensed according to the flow of time, and the recursive neural network can be learned based on the feature map stored in the storage unit.

구체적으로, 본 발명의 시각 감지 시스템에서 시간적 시각감지부(300)는 특징추출부(100)의 신경망을 통해 학습된 특징 맵들이 저장부에 저장되고, 상기 학습된 특징 맵들을 이용하여 재귀신경망이 특징추출부(100)의 신경망과는 별개로 학습될 수 있다. 즉, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 실행과는 별도로, 학습(learning)과 관련하여서는 특징추출부(100)의 신경망 학습에 사용된 신경망 구조를 시간적 시각감지부(300)의 재귀신경망에서도 공동으로 사용될 수 있다.Specifically, in the time detection system of the present invention, the temporal visual detection unit 300 stores feature maps learned through the neural network of the feature extraction unit 100 in the storage unit, and uses the learned feature maps to generate a recursive neural network. The feature extraction unit 100 may be learned separately from the neural network. In other words, apart from the execution of the visual sensing system according to an embodiment of the present invention, in connection with learning, the recursiveness of the neural network structure used in the neural network learning of the feature extraction unit 100 to the temporal visual sensing unit 300 It can also be used jointly in neural networks.

본 발명의 시각 감지 시스템은 입력된 영상(10) 내 객체를 검출하기 위한 다양한 분야(Ex. 차량 번호판 검출, 보행자 검출, CCTV를 통한 감시, 불량품 검사 또는 화재 감지 등)에 적용될 수 있다.The visual detection system of the present invention can be applied to various fields (Ex. Vehicle license plate detection, pedestrian detection, CCTV surveillance, defective product inspection or fire detection, etc.) for detecting objects in the input image 10.

이하에서는 본 발명의 일 실시 예에 따른 시각 감지 시스템이 화재 감지에 적용되는 경우를 예를 들어 설명한다.Hereinafter, a case in which the visual detection system according to an embodiment of the present invention is applied to fire detection will be described as an example.

도 5는 시각 감지 시스템을 이용하여 화재 감지 결과를 나타낸 예시도이다. 도 5의 (a)는 시간적 시각감지만을 통해서 화재가 감지되는 상태를 나타낸 것이고, 도 5의 (b)는 본 발명의 일 실시 예에 따른 시각 감지 시스템에 따라 공간적 시각감지 및 시간적 시각감지가 함께 이루어진 상태를 나타낸 예시도이다.5 is an exemplary view showing a fire detection result using a visual detection system. FIG. 5 (a) shows a state in which a fire is detected through temporal visual sensing, and FIG. 5 (b) includes spatial and temporal visual sensing according to the visual sensing system according to an embodiment of the present invention. It is an exemplary view showing the achieved state.

도 5의 (a)를 참조하면, 차량 상단에 불꽃이나 화염은 보이지 않고 검회색빛의 연기만이 관찰되는데, 시간적 시각감지에 의해서 연속적인 영상(10)을 감지한 결과 왼쪽 상단에 화재를 감지한 것으로 판단한 시간적 시각감지결과(301)가 나타나 있다. 반면, 도 5의 (b)는 차량이 불타오르는 모습이 나타나 있고, 불꽃 혹은 화염이 파란색 박스로 공간적 시각감지결과(201)가 표현되어 화재를 감지하고 있다. 또한, 도 5의 (a)와 마찬가지로 시간적 시각감지 결과도 화재를 감지한 것으로 판단되고 있다. 상기 내용을 기초로 하였을 때, 본 발명의 융합판단부(400)는 도 5의 (b)에 대하여 공간적 시각감지 및 시간적 시각감지 결과에 기초하여 '화재경보'라는 판단을 내릴 수 있다.Referring to (a) of FIG. 5, no flame or flame is visible on the top of the vehicle, and only gray-gray smoke is observed. As a result of detecting a continuous image 10 by temporal visual detection, a fire is detected at the top left. A temporal visual detection result 301 judged to have been performed is shown. On the other hand, in FIG. 5 (b), a vehicle is shown to be burning, and a spatial visual detection result 201 is expressed in a blue box with a flame or flame to detect a fire. In addition, as shown in FIG. 5 (a), it is determined that the result of temporal visual detection detects fire. Based on the above, the fusion determination unit 400 of the present invention can make a determination as 'fire alarm' based on the results of spatial and temporal visual sensing with respect to FIG. 5B.

도 6a 내지 도 6c는 본 발명의 일 실시 예에 따른 시각 감지 시스템의 학습을 위한 데이터를 나타낸 예시도이다. 본 발명의 시각 감지 시스템의 학습은 전술한 데이터세트(train/validation/test dataset)을 이용하여 이루어질 수 있는데, 화재 감시에 적용되는 경우에는 도 6a내지 도 6c에 도시된 바와 같이 다양한 연기 혹은 화재 영상(10)이 상기 데이터세트로 활용될 수 있다.6A to 6C are exemplary views showing data for learning a visual sensing system according to an embodiment of the present invention. Learning of the visual sensing system of the present invention may be performed using the above-described dataset (train / validation / test dataset). When applied to fire monitoring, various smoke or fire images as shown in FIGS. 6A to 6C (10) can be utilized as the dataset.

도 6a내지 도 6c를 참조하면, 데이터세트가 다양한 화재 영상(10)들로 구성될 수 있도록, 화재 감시를 위한 데이터세트에는 도 6a와 같이 배경이 존재하는 화염 영상(10), 도 6b와 같이 배경이 존재하는 연기 영상(10)뿐만 아니라 도 6c에서와 같이 배경이 존재하지 않는 연기 영상(10)까지 포함될 수 있다. 또한, 화재가 일어난 시기도 다양하게 화재 초기부터 진압되는 순간까지 나타난 영상(10)들이 데이터세트에 포함될 수 있다.6A to 6C, a flame image 10 having a background as shown in FIG. 6A is provided in a data set for fire monitoring so that the data set may be composed of various fire images 10, as shown in FIG. 6A. In addition to the smoke image 10 having a background, as shown in FIG. 6C, a smoke image 10 having no background may be included. In addition, various times of the fire may be included in the data set, the images 10 appearing from the beginning of the fire to the moment of extinguishing.

도 7a 및 도 7b는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 산불 감지 결과를 나타낸 예시도이다.7A and 7B are exemplary views illustrating a result of forest fire detection using a vision detection system according to an embodiment of the present invention.

도 7a를 참조하면, 산 중턱에 운무(雲霧)가 나타난 영상(10)이 도시되어 있다. 본 발명의 공간적 시각감지부(200)는 연기인 것으로 감지하여 상기 도 7a에는 파란색 박스(201)를 통해 화재가 발생한 영역이 표시되어 있다. 다만, 본 발명의 시간적 시각감지부(300)에서는 시간 순서에 따라 감지 결과가 '미감지'로 판단하여, 도 7a에는 융합 결과(401) 미감지로 판단된 상태가 도시되어 있다. 상기 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과 각각이 고려되어 융합판단부(400)에서는 '주의'로 판단될 수 있다.Referring to FIG. 7A, an image 10 in which cloudiness appears on a hillside is shown. The spatial visual sensor 200 of the present invention detects that it is smoke, and in FIG. 7A, an area where a fire has occurred is indicated through the blue box 201. However, in the temporal visual sensing unit 300 of the present invention, the detection result is determined as 'undetected' according to the order of time, and FIG. 7A shows a state in which the fusion result 401 is determined as undetected. Each of the detection results of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 is considered, so that the fusion determination unit 400 may be determined as 'attention'.

이와는 달리 도 7b를 참조하면, 산 속 건물에 화재가 발생한 영상(10)이 도시되어 있다. 본 발명의 공간적 시각감지부(200)에서는 화재 및 연기를 감지하여 상기 도 7b에는 파란색 박스를 통해 화재가 발생한 영역이 표시되어 있다. 또한, 본 발명의 시간적 시각감지부(300)에서는 시간 순서에 따라 감지 결과를 '감지'로 판단하여, 도 7b에는 융합 결과(401) 감지로 판단된 상태가 도시되어 있다. 이에 따라, 융합판단부(400)에서는 공간적 시각감지 및 시간적 시각감지 모두 감지로 판단되었으므로 상기 결과들을 융합하여 감지 결과를 '화재경보'로 판단될 수 있다.Unlike this, referring to FIG. 7B, an image 10 of a fire in a building in the mountain is shown. The spatial visual sensor 200 of the present invention detects fire and smoke, and in FIG. 7B, an area where a fire has occurred is indicated by a blue box. In addition, the temporal visual sensing unit 300 of the present invention determines the sensing result as 'sensing' according to the time sequence, and FIG. 7B shows a state determined as sensing the fusion result 401. Accordingly, since the fusion determination unit 400 is determined to detect both spatial and temporal visual sensing, the results of the detection can be fused to determine the 'fire alarm'.

도 8 내지 도 10은 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 시각 감지 방법을 나타낸 순서도이다.8 to 10 are flowcharts illustrating a method for detecting a time using a time detection system according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 방법은 영상(10)이 입력되는 단계, 신경망(neural network)을 이용하여 영상(10)의 특징 맵(feature map)이 생성되는 단계, 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계, 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계 및 단계의 추정 결과 및 단계의 감지 결과를 기초로 하여 영상(10) 내 객체 감지 결과가 판단되는 단계가 포함될 수 있다.Referring to FIG. 8, in the visual sensing method according to an embodiment of the present invention, an image 10 is input, and a feature map of the image 10 is generated using a neural network. , The step of estimating the location and name of the object in the image 10 using the feature map, the step of detecting the object in the image 10 according to the time sequence based on the feature map, and detecting the estimation result and step of the step A step of determining an object detection result in the image 10 may be included based on the result.

도 9를 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계 및 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계가 동시에 수행될 수 있다.Referring to FIG. 9, in a time-sensing method according to an embodiment of the present invention, an image in a time order based on a feature map and a step of estimating a location and a name of an object in an image 10 using a feature map 10) The step in which my object is detected may be performed simultaneously.

구체적으로, 본 발명의 일 실시예에 따르면, 공간적 시각감지와 시간적 시각감지가 동시에 병렬적으로 수행되어 얻어진 결과를 종합하여 객체 검출(감지)에 사용할 수 있다. 또한, 본 발명의 일 실시 예에 따르면 공간적 시각감지에 따른 결과(예컨대, 영상(10)으로부터 추출된 특징)를 시간적 시각감지 과정에 적용함으로써 영상(10)의 특징 추출에 대한 계산량을 늘리지 않을 뿐만 아니라, 객체 검출(감지)에 소요될 시간을 대폭 감소시킬 수 있다는 이점이 있다.Specifically, according to an embodiment of the present invention, spatial visual sensing and temporal visual sensing may be simultaneously performed in parallel to synthesize the results obtained and be used for object detection (detection). In addition, according to an embodiment of the present invention, not only does not increase the amount of computation for feature extraction of the image 10 by applying the result of spatial vision detection (eg, features extracted from the image 10) to the temporal visual detection process Rather, it has the advantage of significantly reducing the time required for object detection (detection).

본 발명의 일 실시 예에 따른 시각 감지 방법의 신경망에는 영상(10)의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.The neural network of the visual sensing method according to an embodiment of the present invention includes a plurality of convolution layers for generating a feature map of the image 10 and a pooling layer for sampling the feature map ) May be included.

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)을 이용하여 수행될 수 있다.The step of estimating the location and name of the object in the image 10 using the feature map in the visual sensing method according to an embodiment of the present invention is a fully connected layer composed of at least one hidden layer. ).

본 발명의 일 실시 예에 따른 시각 감지 방법은 신경망을 이용하여 영상(10)의 특징 맵이 생성되는 단계에서 생성된 특징 맵이 시각 감지 시스템의 저장부에 저장되는 단계가 더 포함되고, 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계는 재귀신경망(recurrent neural network)을 이용하여 수행되며, 재귀신경망은 신경망을 이용하여 영상(10)의 특징 맵이 생성되는 단계에서 저장된 특징 맵을 기초로 학습될 수 있다.According to an embodiment of the present invention, a method for detecting a vision further includes storing a feature map generated in a step of generating a feature map of the image 10 using a neural network, and storing the feature map in a storage unit of the vision detection system, The step in which objects in the image 10 are detected according to the time sequence based on is performed using a recurrent neural network, and the feature map of the image 10 is generated using the neural network. It can be learned based on the stored feature map.

본 발명의 일 실시 예에 따른 시가 감지 방법에서, 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In a method for detecting a cigar according to an embodiment of the present invention, an image obtained by performing a screen transition (slip), scaling (scaling), and rotation (rotation) processing may be input to a neural network.

또한, 전술한 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터 판독 가능 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터 판독 가능 매체에 여러 수단을 통하여 기록될 수 있다. 본 발명의 다양한 방법들을 수행하기 위한 실행 가능한 컴퓨터 프로그램이나 코드를 기록하는 기록 매체는, 반송파(carrier waves)나 신호들과 같이 일시적인 대상들은 포함하는 것으로 이해되지는 않아야 한다. 상기 컴퓨터 판독 가능 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, DVD 등)와 같은 저장 매체를 포함할 수 있다.In addition, the above-described method can be written in a program executable on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable medium. Further, the structure of the data used in the above-described method can be recorded on a computer-readable medium through various means. A recording medium that records an executable computer program or code for performing various methods of the present invention should not be understood as including temporary objects such as carrier waves or signals. The computer-readable medium may include a storage medium such as a magnetic storage medium (eg, ROM, floppy disk, hard disk, etc.), optical read media (eg, CD-ROM, DVD, etc.).

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration only, and a person having ordinary knowledge in the technical field to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and it should be interpreted that all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present invention. do.

10 : 영상 100 : 특징추출부
200 : 공간적 시각감지부 201 : 공간적 시각감지결과
300 : 시간적 시각감지부 301 : 시간적 시각감지결과
310 : 시간적 시각감지 알고리즘 320 : 분류모델
400 : 융합판단부 401 : 융합결과10: image 100: feature extraction unit
200: spatial visual sensing unit 201: spatial visual sensing results
300: temporal visual sensing unit 301: temporal visual sensing results
310: temporal visual detection algorithm 320: classification model
400: fusion determination unit 401: fusion results

Claims

As a visual detection system for detecting an object in the input image,
A feature extraction unit for generating a feature map of the image using a neural network;
A spatial visual detection unit for estimating the location and name of the object in the image using the generated feature map;
A temporal visual detection unit for detecting an object in the image according to a time sequence based on the generated feature map; and
It includes; a convergence judgment unit for determining the object detection result in the image based on the estimated result from the spatial visual sensor and the detected result from the temporal visual sensor;
Estimation by the spatial visual sensor and sensing by the temporal visual sensor are performed simultaneously,
When the result estimated from the spatial visual sensor differs from the result detected by the temporal visual sensor, the fusion determination unit re-adjusts the prediction threshold of the spatial visual sensor to judge the object detection result in the image. Vision detection system.

According to claim 1,
The neural network of the feature extraction unit includes a plurality of convolution layers for generating a feature map of the image and a pooling layer for sampling the feature map Detection system.

According to claim 1,
The spatial vision sensor comprises a fully connected layer consisting of at least one hidden layer (hidden layer).

According to claim 1,
A storage unit for storing the feature map generated by the feature extraction unit is further included,
The temporal visual sensor detects an object in the image according to the temporal flow using a recurrent neural network,
The recursive neural network is a visual sensing system characterized in that it is learned based on the feature map stored in the storage unit.

According to claim 1,
The neural network is a visual sensing system characterized in that an image on which a screen transition, scaling, and rotation processing is performed is input and learned.

In the method of visual detection using a visual detection system,
(a) inputting an image;
(b) generating a feature map of the image using a neural network;
(c) estimating the location and name of the object in the image using the feature map;
(d) detecting an object in the image according to a time sequence based on the feature map; And
(e) determining an object detection result in the image based on the estimation result of step (c) and the detection result of step (d); includes,
Step (c) and step (d) are performed simultaneously,
When the estimation result of step (c) is different from the detection result of step (d), the prediction threshold in step (c) is readjusted, so that the object detection result in the image is judged. Detection method.

delete

The method of claim 6,
The neural network includes a plurality of convolution layers for generating a feature map of the image and a pooling layer for sampling the feature map.

The method of claim 6,
The step (c) is performed using a fully connected layer consisting of at least one hidden layer (hidden layer).

The method of claim 6,
The step of storing the feature map generated in the step (b) in the storage unit of the visual detection system; is further included,
The step (d) is performed using a recurrent neural network,
The recursive neural network is a time detection method characterized in that it is learned based on the feature map stored in step (b).

The method of claim 6,
The neural network is a visual sensing method characterized in that an image in which a screen transition, scaling, and rotation processing is performed is input and learned.

A computer-readable recording medium in which a program for implementing the method of any one of claims 6, 8, 9, 10 or 11 is recorded.