KR20190079047A

KR20190079047A - A supporting system and method that assist partial inspections of suspicious objects in cctv video streams by using multi-level object recognition technology to reduce workload of human-eye based inspectors

Info

Publication number: KR20190079047A
Application number: KR1020170180960A
Authority: KR
Inventors: 송동호; 인연진
Original assignee: 소프트온넷(주)
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2019-07-05
Also published as: KR102035592B1

Abstract

According to the present invention, a system for supporting partial inspection of a suspicious object in a CCTV image using an image recognition technology for each level to reduce a load of a naked-eye recognition inspector comprises: an external source to provide notified open data in a preset format; a CCTV system device to input image information including an event in real time; and an image recognition support device to input open data of the external source and real-time image data from the CCTV system device to support image recognition. The image recognition support device includes: a data input mechanism; an artificial intelligent mechanism including a neural network artificial intelligence generation unit to generate essence neural network intelligence having a function of accumulating reference information to distinguish elementary image recognition information, intermediate image recognition information, and advanced image recognition information to recognize a target element; and an essence data providing mechanism to use the essence neural network intelligence to automatically recognize an occurrence of an event and a target element associated with the corresponding event in a real-time CCTV image inputted from a real-time data input unit to provide essence data which is at least one among the elementary image recognition information, the intermediate image recognition information, and the advanced image recognition information from recognition results. Therefore, inspection manpower required for an elementary image recognition function and an intermediate image recognition function can be reduced, and manpower can be concentrated on an advanced image recognition function to improve both efficiency and qualitative aspects.

Description

Field of the Invention < RTI ID = 0.0 > [1] < / RTI > A system and method for supporting suspicious object part inspection in a CCTV image using a class- TO REDUCE WORKLOAD OF HUMAN-EYE BASED INSPECTORS}

본 발명은 움직이는 객체 감지 기술을 이용한 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법에 관한 것으로서, 특히, CCTV 영상정보에서 영상 차분을 통한 인식 혹은 기계학습 기술을 이용하여 검사 대상이 되는 전체 영상 중 특정 객체의 이동 등과 같은 유의미한 특정장면들을 타겟으로 기록하고 육안 검사원이 부분검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공하는 할 수 있는 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법에 관한 것이다.The present invention relates to a system and a method for supporting suspicious object part inspection in a CCTV image by using a gradation image recognition technology for reducing the load of a visual recognition surveiller using moving object detection technology, and more particularly, By using machine learning technology, it is possible to record the meaningful specific scenes such as the movement of specific objects among all the images to be inspected and to allow the visual inspection person to perform only the partial inspection, thereby reducing the manpower required for the complete inspection The present invention relates to a system and method for supporting a suspicious object part inspection in a CCTV image using a class-based image recognition technology for reducing the load of a visual recognition sensor.

본 발명에서 사용하는 용어는 다음과 같다.The terms used in the present invention are as follows.

타겟 요소(또는 타겟 또는 타겟 영상): 검사 대상이 되는 영상객체, 영상객체의 정지, 이동과 같은 변화 상태 정보, 기타 사건, 사고를 판별하는데 유용한 모든 영상 객체의 활동의 결과물에 관한 정보이다.Target element (or target or target image): Information about the result of all the image object activities that are useful for discriminating the image object to be inspected, the change status information such as stop and movement of the image object, and other events and accidents.

환경 요소: 타겟을 제외한 객체와 그 객체의 행위 및 그 결과와, 타겟 주변에서 정지 또는 이동하고 있는 물체이거나, 시간적, 물리적 변화에 따른 주변 환경의 음영, 형태의 변화 등을 포함하는 정보이다.Environment element: Information excluding objects except for the target, actions and results of the objects, objects that are stopped or moving around the target, shadows of the surrounding environment due to temporal and physical changes, and the like.

MPEG4: 영상 및 음성 압축기술 국제표준규격으로서, 저전송량은 64kbps이하, 중전송량은 4~384kbps이며, 고전송량은 384~4Mbps 까지를 가진다. MPEG4는 고정 비트율(CBR, constant bit rate), 가변비트율 (VBR, variable bit rate) 모두 지원한다.MPEG4 is an international standard for video and audio compression technology. It has a low transmission rate of 64 kbps or less, a medium transmission rate of 4 to 384 kbps, and a high transmission rate of 384 to 4 Mbps. MPEG4 supports both constant bit rate (CBR) and variable bit rate (VBR).

실시간 자료: CCTV 화면으로부터 시스템에 들어오는 실시간 영상 자료를 의미한다.Real-time data: Real-time image data coming into the system from the CCTV screen.

오픈 자료: 외부 소스로부터 차량 및 보행자의 객체 인식 시스템의 학습에 필요한 사전 녹화된 영상, 동영상 자료, 분석자료.Open Source: Pre-recorded video, moving picture, analysis data necessary for learning of vehicle and pedestrian object recognition system from external sources.

DB 보관 자료: 시스템에 입력되어 DB에 저장된 오픈 자료 및 실시간 자료를 모두 포함하는 자료로서, 인공지능의 학습이나, CCTV 영상처리 또는 인식용으로 필요한 자료에 해당한다.DB archive data: This data contains both open data and real-time data entered into the system and stored in the DB, and corresponds to data required for artificial intelligence learning or CCTV image processing or recognition.

이벤트: 사건, 사고, 행사, 및 그에 따른 물적, 현상적 변화 등을 망라하는 개념.Events: Concepts covering events, events, events, and the resulting material and phenomenal changes.

케스케이딩 소프트웨어 기구: 전처리 프로세스 또는 에센스 자료 제공 프로세스에서 디코딩 작업, 콘볼루션작업, 풀링작업등을 포함하는 디텍션 작업, 트래킹 작업 등을 처리하는 소프트웨어 서브시스템들이 순차적으로 연결되는 구조로서 하나의 서브시스템의 결과물이 그 다음 서브시스템의 입력으로 작용하는 형태이다. 서로 다른 처리해야 할 대상 비디오 스트림에서 제어기구에 의해 교호로 동시에 진행하도록 하여 병목현상을 회피하도록 하는 방법개념.Cascading Software Organization: It is a structure in which software subsystems that process detection process, detection work including decoding work, convolution work, pooling work, and tracking work in sequential connection process of preprocessing process or essence data providing process, The result is the form that acts as input to the next subsystem. A concept that allows bottlenecks to be avoided by allowing video streams to be processed differently and alternately by alternate control mechanisms.

케스케이팅 처리: 케스케이딩 소프트웨어 구조를 이용하여 진행되는 작업.Casking processing: Tasks performed using a cascading software architecture.

콘볼루션(Convolution) 작업: 국소적인 패턴을 찾기 위한 슬라이딩 윈도우의 한 방식. 딥러닝에서 주로 사용됨.Convolution work: One way of sliding windows to find local patterns. Mainly used in deep running.

풀링(Pooling) 작업: 패턴의 존재 여부를 결정짓기 위한 과정임.Pooling: A process for determining the existence of a pattern.

피쳐맵 (Feature Map): 이미지 안에서 국소적인 패턴과 유사 여부를 나타냄Feature Map: Indicates similarity with local pattern in image.

디텍션 (Detection): 입력되는 영상내에서 유의미한 어떤 사물이 출현하고 이것을 서치하는 영상처리기법중 한 단계. 단순히 전후 화면의 차이를 비교하여 특정 객체의 출현 여부를 찾아내는 단순한 기법에서부터 딥러닝을 통하여 검출하는 방법까지 다양함.Detection: A step in the image processing technique in which a meaningful object appears in the input image and searches for it. It differs from a simple method of detecting the appearance of a specific object by comparing the difference between the front and back screens, and a method of detecting through deep running.

트래킹 (Tracking): 영상인식 시스템에서 하나의 객체가 인식된후 동일 객체가 동일 화면상에서 움직일때 이 객체를 새로운 객체로 인식하지 않고 기 인식된 객체가 움직이는 것이라고 판단하고 이를 추적할 수 있는 기술Tracking: When an object is recognized in the image recognition system and then the same object moves on the same screen, it is judged that the recognized object is moving and it can be traced

지능형 IP 카메라: 통상 아날로그 CCTV카메라 시스템 기술에 추가로 카메라 내장형 임베디드 컴퓨터를 탑재한 지능형 IP 카메라(CCTV계 장치에 포함되는 구성의 일부)에서는 입력된 아날로그 영상을 디지털로 변환하고, 이를 MPEG등의 영상 인코딩 기술로 압축을 수행한 후 출력을 인코딩된 영상을 IP 통신을 통해서 최종 출력한다. 특히, IP 카메라는 IP 기반 통신 네트워크를 활용하고 PoE (Power over Ethernet)을 통해 전원공급까지 받는 특징이 있는 카메라이다. 더 나아가, IP 카메라 중에는 내장 컴퓨터 상에서 움직임 감지, 차량 감지, 보행자 감지와 같은 단순한 영상 인식까지 처리하여 화면상에 움직임이 있는 정보만을 외부 출력으로 내보내는 고급 지능형 CCTV 카메라도 있다. 이 수준에서는 알고리즘과 하드웨어 디자인을 통합하여 실시간 영상 분석을 가능하게 한다. 본 발명에 따르면, 이 수준에서는 영상분석의 기본 기능들을 IP 카메라에 임베디드 컴퓨터 상에 탑재하고, 검출된 정보를 서버로 보냄으로써, 서버에서는 하기 중급 및 고급 수준의 CCTV 카메라 망에서 상호 협업을 통해 해결해야 할 문제들을 처리할 수 있게 하는 방식을 실현할 수 있다.Intelligent IP Camera: In addition to the usual analog CCTV camera system technology, an intelligent IP camera (part of the configuration included in the CCTV system) equipped with a camera embedded embedded computer converts the input analog image into digital, After compression is performed by the encoding technique, the output is finally output through the IP communication. In particular, IP cameras are features that utilize IP-based communication networks and are powered by Power over Ethernet (PoE). Furthermore, some IP cameras include advanced intelligent CCTV cameras that process only the image recognition such as motion detection, vehicle detection, and pedestrian detection on the built-in computer, and output only the motion information on the screen to the external output. At this level, algorithms and hardware designs are integrated to enable real-time image analysis. According to the present invention, at this level, basic functions of image analysis are mounted on an embedded computer in an IP camera, and the detected information is sent to the server, so that the server can solve You can realize a way to handle the problems you need to do.

CCTV계 장치: 지능형 IP CCTV, IP CCTV, 일반 CCTV, CCTV를 포괄하는 개념이며, CCTV와 연관성을 가지고 CCTV에 필수적 또는 보조적으로 사용되면서 CCTV의 기능을 유지하거나 강화하는 모든 부속장치를 포괄하는 개념으로 사용된다. 여기서 지능형 IP CCTV 또는 IP CCTV는 지능형 IP 카메라를 포함하는 개념으로 사용되고, 단순히 CCTV 또는 일반 CCTV로 표현된 경우는 지능형 IP 카메라가 아닌 일반 CCTV 카메라가 포함되는 개념으로 사용된다.CCTV system: The concept that encompasses intelligent IP CCTV, IP CCTV, general CCTV, and CCTV. It is a concept covering all the accessories that are related to CCTV and used or supplemented to CCTV while maintaining or enhancing the functions of CCTV. Is used. Here, intelligent IP CCTV or IP CCTV is used as a concept including an intelligent IP camera, and when it is expressed simply as CCTV or general CCTV, it is used as a concept including a general CCTV camera rather than an intelligent IP camera.

초급 영상인식 수준: 비쥬얼한 이벤트 모델링 및 알고리즘으로서 객체를 안정적으로 검출하고, 트래킹하고, 분류하고, 특히, 보통과 다른 이상한 상태에 대해서 움직임 디텍션 등과 같은 정도의 영상 인식 수준을 의미한다. 초급 영상인식 수준 단계는 신호처리 수준”(예: 밝기, 칼라, 움직임 변화등) 에서 “시멘틱(의미적) 수준”(예: 교통사고감지, 테러감지등)으로 발전하기 위한 다리 역할을 한다. 영상내 포함된 정보에 대하여 의미적 복잡도가 높은 것을 계층적으로 해석함으로써 영상 분석능력은 커지고 모호성은 감소되는 과정에서 초급 영상 인식 수준은 하위 단계에 해당한다.Elementary Image Recognition Level: Visual event modeling and algorithms that reliably detect, track, and classify objects, and in particular image recognition levels such as motion detection for unusual conditions. The beginner level of image recognition level acts as a bridge to develop from "signal processing level" (eg brightness, color, motion change, etc.) to "semantic" level (eg traffic accident detection, terrorism detection, etc.). By analyzing the information with high semantic complexity in the image hierarchically, the image analysis ability becomes bigger and the ambiguity decreases, and the beginner image recognition level corresponds to the lower level.

중급 영상인식 수준: 초급 영상인식 수준에서는 움직임 디텍션 등만 처리했다고 하면, 중급수준에서는 하나의 CCTV카메라 영상에 대해서 시멘틱 수준으로 교통사고 감지, 테러감지, 등을 보다 복합적인 영상인식 수준에서 처리하는 단계를 말한다.Intermediate Image Recognition Level: Assuming that only the motion detection at the beginner image recognition level is processed, at the intermediate level, the process of handling traffic accident detection, terror detection, etc. at a level of semantic level for one CCTV camera image at a more complex image recognition level .

고급 영상인식 수준: 이 고급 영상인식 수준은 단일 CCTV카메라가 아닌 다중 카메라 네트워크로부터 입력되는 정보를 분석하는데까지 발전하는 단계이다. 즉, 여러 사거리에 설치된 CCTV카메라들 간에 영상처리를 연동함으로써, 도주차량 추적을 가능하게 하는데 이는 영상분석 데이타 퓨전 모델과 하나의 동일한 시각적 현상에 대한 다중 관찰자들을 정보를 최대화 하여 협력모델을 만들어 복잡한 CCTV 카메라 망과 연계된 문제풀이를 하는 것이다.Advanced Image Recognition Levels: This advanced level of image recognition is an evolutionary step from analyzing information input from multiple camera networks rather than a single CCTV camera. In other words, it is possible to track escape vehicles by interworking image processing between CCTV cameras installed in various intersections. This is achieved by maximizing information by multiple observers for image analysis data fusion model and one same visual phenomenon, It is to solve the problem connected with the camera network.

종래, 일반 지방자치단체 별로 치안, 실태 점검 등의 목적으로 설치. 운영중인 CCTV는 주로 주요 간선도로, 이면도로, 골목길 등에 설치 운영하고 있으며, 그 규모는 관할지역 단위별로 대략 평균 1500 여대에 이를 정도로 방대하며, 일반적으로 결과물로서 출력되는 영상은 하루 24시간 모든 분량으로서 VMS (Video Management System) 라고 하는 비디오 서버에 영상을 저장하고 있다. 이와 같이 저장된 영상 중에서 특정 사건, 사고와 관련된 영상이나, 미처 알려지지 않은 사건, 사고와 관련한 유의미한 영상을 용어정의에서 설명한 초급영상인식 수준으로 검사하는 데는 현재 대부분 검사원의 육안에 의존하는 실정이다. 따라서, 방대한 CCTV 영상물을 검사원의 판단으로 모두 검사하는 데는 많은 시간과 노동력이 소요되는 실정이다.Conventionally, it is installed for the purpose of policing, actual condition check by each local self-government. The CCTV is mainly installed on major arterial roads, back roads and alleyways. The size of each CCTV is roughly 1,500 units in each jurisdiction area. In general, the output image is 24 hours per day Video is stored in a video server called VMS (Video Management System). In this way, most of the stored images are dependent on the naked eye of the surveillant at the present time for inspecting meaningful images relating to specific events, accident related images, unknown unknown events, and accidents to the level of the beginner image recognition described in the definition of terms. Therefore, it takes a lot of time and labor to inspect the vast amount of CCTV video at the judgment of the inspector.

이와 같은 검사원의 검사작업에 도움을 주는 CCTV 영상 인식기술로는, 예를 들면, 특허출원공개 제1997-0014321호(제1특허문헌)가 있다. 제1특허문헌의 경우는 복수개의 감시용 카메라를 통해 촬영된 카메라 신호를 하나의 기록 테이프에 기록할 때, 경보, 비디오 손실, 외부인 침입과 같은 사건별로 식별코드를 발생하고, 이러한 사건이 발생하는 시점의 날자 및 시각에 대한 식별코드를 카메라 신호의 수직귀선 시간에 기록하고, 영상 재생시 기록된 식별코드를 검출하여 사용자가 설정한 사건 및 시간대와 비교하여 사용자가 설정한 사건 및 시간대에 해당하는 식별코드가 기록된 카메라 신호를 재생함으로써, 검사시간을 단축하고 있다. 특히, 제1특허문헌의 경우 프레임간 차이를 이용하여 물체등의 이동이나 변화를 통해 유의미한 객체나 타겟을 찾는 기술을 제안하고 있다.As a CCTV image recognition technology that assists the inspecting work of the inspector, there is, for example, Patent Application Publication No. 1997-0014321 (first patent document). In the case of the first patent document, when a camera signal photographed through a plurality of surveillance cameras is recorded on one recording tape, an identification code is generated for each event such as an alarm, a video loss, and an intruder, The identification code for the date and time of the viewpoint is recorded in the vertical retrace time of the camera signal, the identification code recorded at the time of image playback is detected, and the detected case is compared with the event and time zone set by the user, By reproducing the camera signal in which the identification code is recorded, the inspection time is shortened. Particularly, in the case of the first patent document, a technology for finding a meaningful object or target through movement or change of an object or the like by using the difference between frames is proposed.

제1특허문헌의 경우는, 유의미한 영상 타겟에 대해 사전에 대응하는 식별코드를 미리 지정하여 둘 때 방대한 양의 CCTV 영상을 모두 육안으로 검사하여 그 유의미한 타겟임을 검사자의 육안으로 스스로 판단하여 하나 하나씩 코드를 입력해야 하기 때문에 대단한 노동력과 시간이 필연적으로 필요하다. 또한, CCTV 영상은 지속적으로 축적되는 것이기 때문에 이러한 육안검사를 사전에 미리 찾아서 대응하는 식별코드를 마련하는 작업을 끈임 없이 지속적으로 반복해야 하고, 실시간으로 모든 유의미한 자료를 실수없이 찾아내는 것은 거의 불가능하다. 또한, 프레임 전환방식의 경우는 단순히 물체의 이동이나 변화 때마다 반응하는 것이기 때문에, 바람에 따른 물체의 운동, 시간적 음양변화, 의미없는 나무의 흔들림, 타겟이 아닌 객체의 움직임 등과 같은 환경변화마다 모두 반응하게 되어 방대한 양의 무의미한 정보를 산출하게 되며, 결과적으로 산출된 정보를 다시 유의미한 것과 무의미한 것으로 분리하는 수단은 사람들의 육안을 통한 분류이기 때문에 이 부분에서도 많은 노동력이 투입될 수 밖에 없다. In the case of the first patent document, when an identification code corresponding to a significant image target is specified in advance, a vast amount of CCTV images are visually inspected by the naked eye, and the visual inspection of the significant target is judged by the inspector himself / herself, So that a great amount of labor and time are inevitably needed. In addition, because CCTV images are accumulated continuously, it is almost impossible to find out all the meaningful data in real time without mistakes by continuously repeating the process of finding such a visual inspection in advance and preparing a corresponding identification code. In addition, since the frame switching method simply reacts to the movement or change of the object, it can be applied to all the environmental changes such as the motion of the object due to the wind, the temporal and spatial variation of the shade, the shaking of the meaningless tree, And the result is that the means of separating the resulting information into meaningful and insignificant information is a classification through human eyes, so that much labor is forced to be added to this part as well.

다른 종래기술로서, 이미 촬영된 CCTV 영상을 시간에 대응하여 기록해 두고 검사할 때 해당 시간에 대응하는 사건, 사고를 검사하는 방식이, 예를 들면, 특허공개 10-2017-0096838호(제2특허문헌)에 기재되어 있다. As another conventional technique, a method of inspecting an incident or an accident corresponding to a time when a CCTV image that has already been photographed is recorded in correspondence with time and is inspected is disclosed, for example, in Japanese Patent Application Laid-Open No. 10-2017-0096838 &Lt; / RTI >

제2특허문헌의 경우는 단순히 시간별로 CCTV 영상을 기록저장한다는 점에서는 사전에 식별코드를 모든 육안검사하는 작업이 필요 없지만, 참조 코드가 단순히 시간밖에 없기 때문에, 과거 특정 사건을 검사할 때 해당 특정 시간에 대응하는 사건을 검사하는데는 유용하지만, 예를 들면, 1500대에 해당하는 모든 CCTV 영상에서 기록된 특정 시간대라 하여도 그 양이 상당히 방대하고, 또한, 특정 시간대를 벗어나는 부분에서 놓칠 수 있는 영상 타겟에 대해서는 아무런 대책을 없기 때문에 결과적으로, 유의미한 타겟을 모든 기록된 CCTV 영상에서 검사하려면 제2특허문헌과 같이 많은 노동력과 시간이 별도로 소모될 수 밖에 없다.In the case of the second patent document, it is not necessary to perform all visual inspection of the identification code in advance in view of recording and storing the CCTV image by time, but since the reference code is simply time only, Although it is useful for examining events corresponding to time, for example, even in a specific time zone recorded in all CCTV images corresponding to 1500 units, the amount thereof is considerably large, and in addition, As a result, there is no countermeasure against the image target. Consequently, in order to inspect the meaningful target from all recorded CCTV images, much labor and time are consumed separately as in the second patent document.

한편, 특허등록 제10-1589823(제3특허문헌)호의 경우는 이미 촬영된 영상에 대해 감시할 수 있는 표시부를 두고, 그 표시부를 통해 표현되는 영상 중에서 특정한 장소의 관심영역을 검사원이 입력하고, 그 특정한 감시영역에서 유의미한 객체의 움직임을 이벤트 발생 영상으로 표시하도록 하여 필요한 영상을 신속하게 검사할 수 있도록 하는 기술을 제안하고 있다.On the other hand, in the case of the patent registration No. 10-1589823 (third patent document), there is provided a display section which can be monitored for a photographed image, an inspector inputs an area of interest at a specific place among the images represented through the display section, A motion of a significant object in the specific surveillance region is displayed as an event occurrence image, and a necessary image can be quickly inspected.

제3특허문헌의 경우는 방대한 영역 중에서 국소적인 특정 영역 또는 지역에 대해 발생하는 이벤트를 검사할 수 있는 기능을 제공할 수 있어, 특정 기관(예를 들면, 은행 등)에 대한 방범 차원에서는 유용할 수 있지만, 사건, 사고는 전혀 예측할 수 없는 광범위한 장소와 시간에서 연관하여 발생하는 경우가 많고(예를 범인이 불특정 도로상에서 강도행위를 한 후 자동차 등으로 도주하거나 현재 도주하는 경우 그 경로는 특정장소와 시간으로 한정할 경우 의미가 없어질 수 있다), 더욱이 방대한 지역의 CCTV 영상에서 모든 유의미한 타겟을 과거, 현재, 미래와 연관하여 검사하는 데는 역시 많은 인력이 필요하다는 점에서 제1 및 제2특허문헌의 문제를 동일하게 가지고 있다고 할 수 있다.In the case of the third patent document, it is possible to provide a function of inspecting an event occurring in a specific local area or region among a vast area, and it is useful in the security dimension for a specific institution (for example, a bank, etc.) However, in many cases, incidents and accidents occur in a wide range of places and times that can not be predicted at all (for example, if a criminal escapes to an automobile after escaping from an unspecified road, or is now escaping, In addition, it is also necessary to use a large number of manpower to inspect all significant targets in relation to the past, present, and future in a vast area of CCTV images. Therefore, the first and second patents It can be said that they have the same problems in the literature.

결과적으로, 제1 내지 제3특허문헌을 포함하는 종래 관련기술들의 경우는 CCTV 영상을 검사하는데, 특히, 이미 촬영된 CCTV 영상에서 유의미한 타겟 영상을 사전 설정하여 차후 검사를 용이하게 하거나, 그 설정된 내용을 사후 검사하는데 많은 시간과 노동력이 투여되며, 실시간 검사인 경우에는 언제나 유의미한 타겟 요소를 놓칠 수 있는 위험과 또한 실시간에 신속하게 대응하기 어려운 문제점이 있다.As a result, in the case of related arts including the first to third patent documents, a CCTV image is inspected. In particular, a target image is preset in a CCTV image that has already been photographed to facilitate subsequent inspection, A lot of time and workforce are administered to the post-test. In case of the real-time test, there is a risk that a meaningful target element is always missed, and also it is difficult to respond quickly in real time.

그 밖에도, 제1 내지 제3특허문헌과 같은 종래 기술에서, 검사를 지연시키는 요인으로는, 예를들면, 검사과정에서의 비디오 디코딩, 비디오 내 객체 디텍션, 비디오 클래시피케이션 (Classification), 트래킹 등의 소프트웨어 서브시스템을 들 수 있는데, 이러한 작업의 처리과정에 많은 CPU, Memory 버퍼, GPU자원을 사용한다. 따라서, 수십개의 비디오스트림이 동시에 디코딩, 디텍션 등과 같은 무거운 소프트웨어 처리 과정을 상기 하나의 서브시스템에 집중되어 동시 수행할 경우 이 부분에서 병목현상이 발생할 수 있다. 그러므로, 동시에 입력되는 많은 CCTV 비디오 스트림을 주어진 하드웨어 장비 위에서 소프트웨어 상에서 동시병렬로 가능한 한 많은 스트림을 처리하기 위해서는 데이터 스트림 처리과정에서 발생할 수 있는 병목현상들을 찾아내고 이를 해결해야 하는 문제점도 있다.In addition, in the prior arts such as the first to third patent documents, factors that delay the inspection include, for example, video decoding in an inspection process, object detection in video, classification in video, tracking Of software subsystems. Many CPUs, memory buffers, and GPU resources are used in the processing of these tasks. Therefore, when dozens of video streams are concurrently performed on the one subsystem at the same time by a heavy software processing process such as decoding, detection, etc., a bottleneck may occur in this part. Therefore, there is a problem in that, in order to process as many streams as possible simultaneously and in parallel on the software on a given hardware device, many bottlenecks that can occur in the data stream processing process must be solved.

특허출원공개 제1997-0014321호(1997. 03. 29. 공개)Patent Application Publication No. 1997-0014321 (published on March 29, 1997) 특허출원공개 제10-2017-0096838호(2017. 08. 25. 공개)Patent Application Publication No. 10-2017-0096838 (published on Aug. 25, 2017) 특허등록 제10-1589823호(2016. 01. 29. 공고)Patent Registration No. 10-1589823 (Bulletin issued on June 29, 2016)

본 발명은 상기 종래기술의 문제점을 해소하기 위한 것으로서, 본 발명의 일실시예에 따른 목적은 다수의 CCTV 영상으로부터 유의미한 타겟 요소를 검사하는데 투입되는 노동력과 시간을 획기적으로 줄일 수 있고, 검사시간도 줄일 수 있는 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법을 제공하는 것이다.The object of the present invention is to solve the problems of the prior art described above, and it is an object of the present invention to significantly reduce labor and time to inspect a target component from a plurality of CCTV images, And to provide a system and method for supporting a suspicious object part inspection in a CCTV image using a class-specific image recognition technology for reducing the load of a visual recognition sensor.

본 발명의 다른 일실시예에 따른 목적은 육안 영상 감시원의 용어정의에서 설명한 초급 영상인식 업무를 최소화하기 위해서 촬영된 CCTV의 전시간 영상중 무의미한 배경 영상 등 보지 않아도 될 부분을 걸러내고, 특정 차량 혹은 행인등 유의미한 객체들이 이동하는 장면등 유의미한 봐야 할 영상 부분만 남겨서 육안판독을 하도록 도와 주는 것이다. 영상내 객체들의 움직임을 인식하되 바람에 의한 나무의 흔들림, 쓰레기 종이가 날아 가는것, 동물의 움직임 등은 걸러 내 버리는 영상부분일 것이며, 남겨야 할 영상 부분은 운행차량, 보행자 등 영상객체 인식을 가진 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법을 제공하는 것이다.Another object of the present invention is to narrow down a portion of the entire time-of-day image of the photographed CCTV, which is unnecessary to be viewed, in order to minimize the beginner-level image recognition task described in the definition of the visual- Such as scenes in which meaningful objects such as passers-by are moving. Recognize the motion of the objects in the image, but it will be the part of the image that filters out the shaking of the wind by the wind, the paper of the waste paper, the movements of the animal, etc. The image part to be left is the image object recognition And to provide a system and method for supporting a suspicious object part inspection in a CCTV image using a class-specific image recognition technology for reducing the load of a visual recognition examiner.

본 발명의 다른 일실시예에 따른 목적은 검사과정에서의 비디오 디코딩, 비디오 내 객체 디텍션, 비디오 클래시피케이션 (Classification), 트래킹 등의 작업과정에서 다수의 비디오스트림이 동시에 디코딩, 디텍션, 트래킹 등과 같은 무거운 소프트웨어 처리 과정을 동시 수행할 경우에도 병목현상을 방지하여 가능한한 최대수의 비디오 데이터 스트림을 동시에 처리할 수 있는 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법을 제공하는 것이다.Another object of the present invention is to provide a method and apparatus for simultaneously decoding a plurality of video streams in a process of video decoding in an inspection process, object detection in video, classification and tracking of video, Visual recognition that can simultaneously process the maximum number of video data streams while avoiding bottlenecks even when heavy software processing is performed simultaneously CCTV detection of suspicious objects in CCTV images using image recognition technology for load reduction And to provide a support system and method.

본 발명의 또 다른 일실시예에 따른 목적은 용어정의에서 설명한 지능형 IP CCTV 카메라를 본 발명에 적용할 경우, 전술한 움직임 감지와 같은 초급 영상인식 수준의 기술이 IP 카메라 상에서 수행이 직접되고, 유의미한 영상이 서버로 전송되어 왔을 경우는, 육안검사를 수행하는 검사원이 보다 높은 차원의 중급 및 고급의 사건인식 등을 복합적으로 판단하는 업무를 수행할 수 있다. 본 발명은 이러한 업무를 검사원 인력들이 할 수 있도록 도와줄 수 있는 복합적인 사건의 이해로서 얼굴인식, 보행자의 인상착의 인식, 보행자 휴대 물품 인식, 차량의 차종인식, 색깔 인식 등을 종합한 교통사고 등이나, 여러 교차로를 경유하여 도주하는 차량을 추적하는 용어정의에서 설명한 고급의 영상인식 수준의 업무를 서버에서 수행할 수 있는 딥러닝 객체인식 및 행위 인식 시스템 및 방법을 제공함으로써, 영상검사원의 인력을 더욱 감소시키면서 효과적으로 사용할 수 있도록 도와주는 시스템의 역할을 수행할 수 있다.According to another embodiment of the present invention, when the intelligent IP CCTV camera described in the definition of the term is applied to the present invention, a technique of a basic image recognition level such as the above-described motion detection is performed directly on an IP camera, When the image has been transmitted to the server, the inspector performing the visual inspection can perform a task of complex judgment of a higher-level intermediate-level and higher-level event recognition. The present invention relates to a complex event that can help the inspection personnel to perform such tasks, including face recognition, pedestrian recognition, pedestrian recognition, vehicle recognition, color recognition, and the like And a deep learning object recognition and behavior recognition system and method capable of performing the task of advanced image recognition level described in the definition of the term tracking vehicle escaping via various intersections in a server, It can serve as a system that helps to use it more effectively.

상기 목적을 실현하기 위한 본 발명의 제1양태는, 기계학습 기술을 이용하여 검사 대상이 되는 전체 영상중 특정 객체의 이동을 적어도 포함하는 유의미한 특정장면들을 기록하여 부분 육안 검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공하기 위해 보통 상태와는 다른 이상한 상태의 움직임을 적어도 디텍션하는 초급 영상인식 기능과, 상기 초급 영상인식 기능에 따라 검출된 초급 영상인식 정보를 취합하여 단일 CCTV 카메라에서 특정 사건을 규정하는 시멘틱 수준의 의미를 생성하는 중급 영상인식 기능과, 중급 영상인식 기능에 따라 검출된 중급 영상인식 정보를 취합하여 복수의 CCTV 카메라에서 복합 연동 정보를 생성하는 고급 영상인식 기능을 제공하는 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템으로서; 일반 행정구역 관할 도로에서 발생하는 사건, 사고, 행사, 예측가능하거나 예측불가능한 일들을 망라하는 이벤트를 촬영한 CCTV 영상정보를 적어도 포함하는 공지된 오픈 자료를 사전에 미리 마련된 형태로 제공하는 외부 소스와; 상기 이벤트를 포함하는 영상정보를 실시간으로 입력하는 CCTV계 장치와; 상기 외부 소스의 오픈 자료와 CCTV계 장치로부터 실시간 영상자료를 입력하여 영상 인식을 지원하는 영상 인식 지원 장치를, 구비하고, 상기 영상 인식 지원 장치는, 상기 외부 소스로부터 상기 오픈 자료를 입력하는 오픈 자료 입력부, 및 상기 이벤트를 실시간으로 촬영하는 CCTV계 장치로부터 실시간 영상자료를 입력하는 실시간 자료 입력부를 포함하는 자료 입력 기구와; 상기 오픈 자료에서, 상기 입력 기구의 오픈 자료 입력부로부터 입력되는 오픈 자료를 이용하여, 상기 이벤트 및 해당 이벤트와 관련된 동작이나 행위 및 그 동작이나 행위에 따른 결과를 타겟 요소로 추출하여 준비하고, 상기 타겟 요소외의 객체나 해당 객체와 관련된 동작이나 행위 및 그 동작이나 행위에 따른 결과와 도로에서 발생할 것으로 예상되는 시간적, 자연현상적 변화를 모의 환경요소로 추출하여 준비하며, 상기 모의 환경요소 및 상기 모의 타겟 요소를 상기 오픈 자료 중에서 이벤트가 포함된 CCTV 영상에 적용하여 상기 이벤트의 타겟 요소를 환경요소로부터 분별하는 기계학습을 수행하는 모의 타겟 마킹 학습부, 상기 모의 타겟 마킹 학습부에서의 학습결과를 상기 초급 영상인식 정보, 상기 중급 영상인식 정보 및 상기 고급 영상인식 정보를 판별하기위한 기준정보로 축적하여 자체적으로 타겟 요소를 인식하는 기능을 가진 에센스 신경망 인공지능을 생성하는 신경망 인공지능 생성부를 포함하는 인공지능 기구와; 상기 에센스 신경망 인공지능을 이용하여 상기 실시간 자료 입력부로부터 입력되는 실시간 CCTV 영상정보에서 자동으로 이벤트의 발생과 해당 이벤트에 관련된 타겟 요소를 인식 하여 그 인식 결과물을 상기 초급 영상인식 정보, 상기 중급 영상인식 정보 및 상기 고급 영상인식 정보 중 적어도 하나인 에센스 자료를 제공하는 에센스 자료 제공 기구를; 구비한다.The first aspect of the present invention for achieving the above object is to provide a method and apparatus for recording a meaningful specific scene including at least movement of a specific object among all the images to be inspected by using a machine learning technique, A basic image recognition function for detecting at least a motion in a strange state different from a normal state in order to provide a technique for reducing the manpower required for the inspection; Intermediate image recognition function that generates a semantic level meaning that a specific event is defined in a CCTV camera and advanced image recognition that generates intermediate interlock information from a plurality of CCTV cameras by collecting intermediate image recognition information detected according to an intermediate image recognition function Level image recognition technology to reduce the load of the visual cognition surveyor As a support system for suspicious object part in CCTV images; An external source that provides a pre-arranged open source of known open data that includes at least CCTV image information of events covering events, accidents, events, and predictable or unpredictable events occurring in the jurisdiction of the general administrative district; ; A CCTV system for inputting video information including the event in real time; And an image recognition support device for supporting image recognition by inputting open data of the external source and real-time image data from a CCTV system, wherein the image recognition support device comprises an open data inputting unit for inputting the open data from the external source, A data input mechanism including an input unit and a real-time data input unit for inputting real-time image data from a CCTV system for photographing the event in real time; Extracting and preparing an action or an action related to the event and a corresponding event and a result according to an action or an action using the open data input from the open data input unit of the input mechanism in the open data, And the result of the action or action associated with the object other than the element or the object and the temporal and natural phenomenon changes expected to occur in the road are extracted and prepared as simulation environment elements and the simulation environment element and the simulation target A simulation target learning unit for applying a learning result to a CCTV image including an event among the open data to perform a machine learning for distinguishing a target element of the event from an environmental element, The image recognition information, the intermediate image recognition information, An artificial intelligence device including an artificial intelligence generating unit for generating an essence neural network artificial intelligence having a function of recognizing a target element by itself as a reference information for discriminating a beam; The real-time data input unit receives the real-time CCTV image information using the essence-based neural network artificial intelligence and automatically recognizes the event generation and the target element related to the corresponding event, And an essence data providing mechanism for providing essence data that is at least one of the advanced image recognition information; Respectively.

바람직하게는, 상기 영상 인식 지원 장치는, 상기 에센스 자료 제공기구가 실시간 CCTV 영상정보로부터 타겟을 인식하여 에센스 자료를 제공하기 전에 해당 실시간 CCTV 영상정보에서 타겟 요소와 직접적으로 관련이 없는 부분을 제거하거나 축소하여 인식할 실시간 CCTV 영상정보의 데이터량을 축소하는 전처리 정보를 생성하는 영상 전처리 기구를 더 포함하고, 상기 인공지능 기구는 상기 전처리와 관련된 식별 요소인 전처리 요소를 준비하는 모의 전처리 요소 준비부와 상기 모의 전처리 요소를 기초로 모의 전처리를 영상차분 또는 딥러닝 방식으로 기계학습하는 모의 전처리 학습부를 더 포함하고, 상기 신경망 인공지능 생성부는 상기 모의 전처리 학습부에서 축적된 모의 전처리 학습 결과를 기초로 자동으로 전처리를 수행할 수 있는 전처리 신경망 인공지능을 더 생성하며, 상기 영상 전처리 기구는 상기 전처리 신경망 인공지능을 이용하여 상기 전처리 정보를 자동으로 생성하고, 상기 에센스 자료는 상기 전처리 정보를 기초로 생성된다.Preferably, the essence data providing apparatus removes a portion of the real-time CCTV image information that is not directly related to the target element from the real-time CCTV image information before the essence data providing apparatus recognizes the target from the real- Further comprising an image preprocessing mechanism for generating preprocessing information for reducing the data amount of the real-time CCTV image information to be recognized and reduced, wherein the artificial intelligence device comprises a preprocessing element prepara- tion section for preparing a preprocessing element that is an identification element related to the preprocessing Further comprising a simulation pre-processing learning unit that mechanically learns a simulation pre-processing based on the simulation pre-processing element by an image difference or a deep learning method, and the neural network artificial intelligence generating unit generates an automatic learning algorithm based on the simulation preprocessing learning results accumulated in the simulation pre- Which can perform pre-processing Further create artificial intelligence, the image pre-processing mechanism is the essence data using the neural network pre-AI and automatically generates a pre-processing the information, is generated based on the pre-information.

바람직하게는, 영상 전처리 기구는, CCTV 영상의 해상도를 낮추는 해상도 축소부와, CCTV 영상의 색상의 종류를 단순화하는 색상 전환부와, CCTV에서 타겟과 관련없는 객체를 배제하고 타겟 또는 타겟과 관련성이 있는 부분을 인식 대상영역으로 설정하는 필요영역 설정부와, CCTV 영상에서 초당 프레임들 중 중복부분의 개수를 낮추는 프레임 축소부와, 상기 해상도 축소부, 색상 전환부, 필요영역 설정부, 프레임 축소부의 동작 결과물 중 적어도 일부를 영상 인식용 전처리 정보로 생성하는 전처리 정보 생성부를 포함한다.Preferably, the image preprocessing mechanism includes a resolution reducing unit for reducing the resolution of the CCTV image, a color switching unit for simplifying the color type of the CCTV image, and a color conversion unit for excluding the object not related to the target in the CCTV, A color reduction unit, a color conversion unit, a required area setting unit, a frame reduction unit, and a frame reduction unit. The frame size reduction unit reduces the number of overlapping portions of frames per second in the CCTV image. And a preprocessing information generating unit for generating at least a part of the operation result as preprocessing information for image recognition.

바람직하게는, 상기 초급 영상인식 정보는 검사인력에 의해 추출된 결과물을 적어도 포함하거나 IP CCTV로부터 직접 형성된 결과물을 적어도 포함하며, 상기 에센스 자료 제공기구는 상기 중급 영상인식 정보 또는 상기 고급 영상인식 정보 중 적어도 하나를 검사인력의 배제율에 대응하는 완성도 80% 내지 99% 이상의 에센스 자료로 제공한다.Preferably, the elementary image recognition information includes at least a result extracted from the inspection workforce or formed directly from the IP CCTV, and the essence data providing mechanism may include at least one of the intermediate image recognition information and the advanced image recognition information At least one of them is provided as essence data of at least 80% to 99% of maturity corresponding to the exclusion rate of the inspection personnel.

바람직하게는, 상기 에센스 자료는 상기 에센스 신경망 인공지능을 이용하여 상기 실시간 자료 입력부로부터 입력되는 실시간 CCTV 영상정보에서 자동으로 유의미한 이벤트의 발생과 해당 이벤트에 관련된 원 영상의 위치정보 , 타겟 요소를 표시하고 이를 메타데이타화 하여 저장한후, 필요시 상기 메타데이터화된 유의미한 장면만 검사 가능하도록 구성된 자료이거나, 검사가 완료된 자료이다.Preferably, the essence data may include location information and target elements of an original image related to generation of a meaningful event and related to the corresponding event in real-time CCTV image information input from the real-time data input unit using the essence neural network artificial intelligence The metadata is stored as metadata, and if necessary, the metadata is configured to be inspectable only for the meaningful scenes, or the data is inspected.

바람직하게는, 상기 에센스 자료 제공기구는 전후 CCTV 영상의 변화를 감지하여 전후 영상간의 차이를 영상차분 정보로 생성하여 이벤트의 발생이나 특정지역에서의 상황변경을 알려주는 영상차분 처리부를 더 포함하며, 상기 에센스 자료제공기구는 에센스 메타데이터 또는 에센스 영상에 대해 상기 영상차분 정보를 반영하거나, 상기 영상 천처리 기구의 전처리 정보 생성부로부터 나오는 전처리 정보를 기초로 직접 영상차분 정보를 생성한다.Preferably, the essence data providing apparatus further includes an image difference processing unit for detecting a change in the front and rear CCTV images, generating a difference between the before and after images as image difference information, and notifying the generation of an event or a change in a specific area, The essence data providing mechanism reflects the image difference information to the essence metadata or the essence image or directly generates image difference information based on the preprocessing information from the preprocessing information generating unit of the image processing apparatus.

바람직하게는, 상기 시스템은 시스템 전반을 제어하는 제어기구를 더 포함하고, 상기 제어기구는 상기 영상 인식 지원 장치와 일체로 구성되거나 직접 전기통신적으로 연결되고, 상기 제어기구는, 상기 영상 전처리 기구와 상기 에센스 자료 제공기구의 CCTV영상 처리동작에서 디코딩 작업, 디텍션 작업, 트래킹작업 등을 분리하여 서로 다른 인식 대상 비디오 스트림에 하나씩 교호로 할당하는 케스케이딩 처리를 수행하는 케이스케이딩 처리부를 포함하고, 상기 인공지능 기구는 상기 외부소스로부터의 오픈 자료와 CCTV계 장치로부터 제공되는 실시간 자료, 및 환경요소와 타겟요소의 추가 또는 변경에 따라서 주기적 또는 간헐적으로 업데이트된다.Preferably, the system further comprises a control mechanism for controlling the system as a whole, and the control mechanism is integrally or directly telecommunicationly connected to the image recognition support device, And a case processing unit for performing a cascading process for separating a decoding operation, a detection operation, a tracking operation, and the like in the CCTV image processing operation of the essence data providing apparatus, and alternately assigning the decoding operation, the detection operation, , The AI is periodically or intermittently updated according to the addition or modification of open data from the external source and real-time data provided from the CCTV system device, and environmental and target elements.

바람직하게는, 상기 제어기구는, 영상 전처리 기구 또는 에센스 자료 제공기구의 동작조건을 설정하고, 영상 인식 지원 장치의 운영동작을 감시하는 한편 사용자 인터페이스 UI 기능을 가진 입출력부와, 모의 전처리 요소, 모의 환경 요소, 모의 타겟 요소 중에서 형상으로 사전에 준비해둘 수 있는 부분을 수동 또는 자동으로 생성할 수 있는 기능을 제공하는 GUI부와, 에센스 자료 제공 기구에 의해 CCTV 영상의 인식 작업이 진행되어 에센스 자료가 제공될 때 타겟 요소의 출현을 실시간으로 알려주는 알람부를, 적어도 포함하며, 상기 기계학습은 딥러닝(Deep Learning) 및 샬로우러닝(Shallow learning) 중 적어도 하나를 포함하고, 상기 영상 인식 지원 장치는 장치 전반의 운영과정에서 발생하는 데이터를 저장하는 종합 데이터베이스(DB)를 더 포함한다.Preferably, the control mechanism includes: an input / output unit that sets an operation condition of the image preprocessing mechanism or the essence data providing mechanism, monitors the operation of the image recognition support apparatus and has a UI function, a simulation preprocessing element, A GUI part that provides a function of manually or automatically generating a part that can be prepared in advance in a shape among environmental elements and simulated target elements, and an essence data providing mechanism that performs the recognition of the CCTV image, Wherein the machine learning includes at least one of Deep Learning and Shallow learning, wherein the machine learning includes at least one of Deep Learning and Shallow learning, And a comprehensive database (DB) for storing data generated during the entire operation of the apparatus.

상기 목적을 실현하기 위한 본 발명의 제2양태는 기계학습 기술을 이용하여 검사 대상이 되는 전체 영상중 특정 객체의 이동을 적어도 포함하는 유의미한 특정장면들을 기록하여 부분 육안 검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공하기 위해 보통 상태와는 다른 이상한 상태의 움직임을 적어도 디텍션하는 초급 영상인식 기능과, 상기 초급 영상인식 기능에 따라 검출된 초급 영상인식 정보를 취합하여 단일 CCTV 카메라에서 특정 사건을 규정하는 시멘틱 수준의 의미를 생성하는 중급 영상인식 기능과, 중급 영상인식 기능에 따라 검출된 중급 영상인식 정보를 취합하여 복수의 CCTV 카메라에서 복합 연동 정보를 생성하는 고급 영상인식 기능을 제공하는 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 방법으로서; 상기 방법은 일반 행정구역 관할 도로에서 발생하는 사건, 사고, 행사, 예측가능하거나 예측불가능한 일들을 망라하는 이벤트를 촬영한 CCTV 영상정보를 적어도 포함하는 공지된 오픈 자료를 사전에 미리 마련된 형태로 제공하는 외부 소스와; 상기 이벤트를 포함하는 영상정보를 실시간으로 입력하는 CCTV계 장치와; 상기 외부 소스의 오픈 자료와 CCTV계 장치로부터 실시간 영상자료를 입력하여 영상 인식을 지원하는 영상 인식 지원 장치를 구비한 움직이는 객체 감지 기술을 이용한 CCTV 영상내 의심물체 육안 인식 부분검사 지원 시스템에서 이용되며, 상기 영상 인식 지원 장치가 상기 외부 소스로부터 상기 오픈 자료를 입력하고, 상기 이벤트를 실시간으로 촬영하는 CCTV계 장치로부터 실시간 영상자료를 입력하는 자료 입력 단계와; 상기 영상 인식 지원 장치가 상기 오픈 자료에서, 상기 이벤트 및 해당 이벤트와 관련된 동작이나 행위 및 그 동작이나 행위에 따른 결과를 타겟 요소로 추출하여 준비하는 타겟 요소 준비단계와; 상기 영상 인식 지원 장치가 상기 타겟 요소외의 객체나 해당 객체와 관련된 동작이나 행위 및 그 동작이나 행위에 따른 결과와 도로에서 발생할 것으로 예상되는 시간적, 자연현상적 변화를 모의 환경요소로 추출하여 준비하는 환경요소 준비단계와; 상기 영상 인식 지원 장치가 상기 모의 환경요소 및 상기 모의 타겟 요소를 상기 오픈 자료 중에서 이벤트가 포함된 CCTV 영상에 적용하여 상기 이벤트의 타겟 요소를 환경요소로부터 분별하는 기계학습을 수행하는 모의 타겟 마킹 학습 단계와; 상기 영상 인식 지원 장치의 신경망 인공지능 생성부가 상기 모의 타겟 마킹 학습부에서의 학습결과를 상기 초급 영상인식 정보, 상기 중급 영상인식 정보 및 상기 고급 영상인식 정보를 판별하기위한 기준정보로 축적하여 자체적으로 타겟 요소를 인식하는 기능을 가진 에센스 신경망 인공지능을 생성하는 신경망 인공지능 생성 단계와; 상기 영상 인식 지원 장치가 상기 에센스 신경망 인공지능을 이용하여 상기 실시간 자료 입력부로부터 입력되는 실시간 CCTV 영상정보에서 자동으로 이벤트의 발생과 해당 이벤트에 관련된 타겟 요소를 인식하여 그 인식결과물을 상기 초급 영상인식 정보, 상기 중급 영상인식 정보 및 상기 고급 영상인식 정보 중 적어도 하나인 에센스 자료를 제공하는 에센스 자료 제공 단계를; 구비한다.The second aspect of the present invention for achieving the above object is to provide a method and apparatus for recording a meaningful specific scene including at least movement of a specific object among all the images to be inspected by using a machine learning technique, A basic image recognition function for detecting at least a motion in a strange state different from a normal state to provide a technique for reducing the manpower required for the basic image recognition function, Intermediate image recognition function that generates meaning of semantics that defines a specific event in camera, Advanced image recognition function that generates intermediate interlock information from multiple CCTV cameras by collecting intermediate image recognition information detected by intermediate image recognition function Level image recognition technology to reduce load of visual cognition surveyor As my doubts with CCTV video inspection body parts support methods; The method may include providing previously open publicly known information including at least CCTV image information of events, accidents, events, and events covering predictable or unpredictable events occurring in a general administrative district jurisdiction An external source; A CCTV system for inputting video information including the event in real time; The system is used in a support system for a suspicious object visual recognition part inspection in a CCTV image using a moving object detection technology including an open source of the external source and an image recognition support device for inputting real time image data from a CCTV system device, A data input step in which the image recognition support apparatus inputs the open data from the external source and inputs real-time image data from a CCTV system which photographs the event in real time; A target element preparation step in which the image recognition support apparatus extracts, from the open data, an action or an action related to the event and a corresponding event and a result according to an action or an action thereof as a target element; An environment in which the image recognition support device extracts and prepares the temporal and natural phenomenon changes expected to occur on the road and the results of the actions or actions related to the objects or the objects other than the target element, Element preparation step; A simulated target marking learning step of performing the machine learning in which the image recognition support device applies the simulated environment element and the simulated target element to the CCTV image including the event among the open data to discriminate the target element of the event from the environmental element Wow; Wherein the neural network artificial intelligence generating unit of the image recognition support apparatus accumulates the learning result in the simulated target marking learning unit as reference information for discriminating the elementary image recognition information, the intermediate image recognition information, and the advanced image recognition information, A neural network artificial intelligence generating step of generating an essence neural network artificial intelligence having a function of recognizing a target element; The image recognition support apparatus automatically recognizes the generation of an event and a target element related to the event in real-time CCTV image information input from the real-time data input unit using the essence neural network artificial intelligence, , Essence data providing at least one of the intermediate image recognition information and the advanced image recognition information; Respectively.

바람직하게는, 상기 에센스 자료 제공 단계 전에, 실시간 CCTV 영상정보에서 타겟 요소와 직접적으로 관련이 없는 부분을 제거하거나 축소하여 인식할 실시간 CCTV 영상정보의 데이터량을 축소하는 전처리 정보를 생성하는 영상 전처리 단계를, 더 포함하고, 상기 영상 전처리 단계 전에, 상기 전처리와 관련된 식별 요소인 전처리 요소를 준비하는 모의 전처리 요소 준비단계와; 상기 모의 전처리 요소를 기초로 모의 전처리를 딥러닝 방식으로 기계학습하는 모의 전처리 학습 단계와; 상기 신경망 인공지능 생성부가 상기 모의 전처리 학습부에서 축적된 모의 전처리 학습 결과를 기초로 자동으로 전처리를 수행할 수 있는 전처리 신경망 인공지능을 생성하는 단계;를 더 포함하고, 상기 영상 전처리 단계에서 상기 전처리 신경망 인공지능을 이용하여 상기 전처리 정보를 자동으로 생성하고, 상기 에센스 자료는 상기 전처리 정보를 기초로 생성된다.Preferably, before the essence data providing step, an image preprocessing step for generating a preprocessing information for reducing the data amount of the real-time CCTV image information to be recognized by removing or reducing a portion not directly related to the target element in the real- Preparing a preprocessing element preprocessing element, which is an identification element related to the preprocessing, before the image preprocessing step; A simulation preprocessing learning step of mechanically learning the simulation preprocessing based on the simulation preprocessing element by a deep learning method; Wherein the neural network artificial intelligence generating unit generates a pre-processing neural network artificial intelligence capable of automatically performing pre-processing based on the simulation pre-processing learning results accumulated in the simulation pre-processing learning unit, The neural network artificial intelligence is used to automatically generate the pre-processing information, and the essence data is generated based on the pre-processing information.

바람직하게는, 상기 영상 전처리 단계는, CCTV 영상의 해상도를 낮추는 해상도 축소 하위단계와, CCTV 영상의 색상의 종류를 단순화하는 색상 전환 단위단계와, CCTV에서 타겟과 관련없는 객체를 배제하고 타겟 또는 타겟과 관련성이 있는 부분을 인식대상영역으로 설정하는 필요영역 설정 하위단계와, CCTV 영상에서 초당 프레임들 중 중복부분의 개수를 낮추는 프레임 축소 하위단계와, 상기 해상도 축소 하위단계, 색상 전환 하위단계, 필요영역 설정 하위단계, 프레임 축소 하위단계의 동작 결과물 중 적어도 일부를 영상 인식용 전처리 정보로 생성하는 전처리 정보 생성 하위단계를 포함한다.Preferably, the image preprocessing step includes a resolution reducing sub-step of lowering the resolution of the CCTV image, a color conversion unit step of simplifying the color type of the CCTV image, a color conversion unit step of removing the object not related to the target, , A frame reduction sub-step of lowering the number of overlapping portions of frames per second in the CCTV image, and a step of reducing the resolution, a color conversion sub-step, And a preprocessing information generating sub-step of generating at least a part of the operation result of the sub-step of setting the area, the sub-step of reducing the frame, as the preprocessing information for image recognition.

바람직하게는, 상기 방법에서 상기 초급 영상인식 정보는 검사인력에 의해 추출된 결과물을 적어도 포함하거나 IP CCTV로부터 직접 형성된 결과물을 적어도 포함하며, 상기 에센스 자료 제공기구는 상기 중급 영상인식 정보 또는 상기 고급 영상인식 정보 중 적어도 하나를 검사인력의 배제율에 대응하는 완성도 80% 내지 99% 이상의 에센스 자료로 제공한다.Preferably, in the method, the elementary image recognition information includes at least the result extracted by the inspection personnel or the result directly formed from the IP CCTV, and the essence data providing mechanism may include the intermediate image recognition information or the advanced image At least one of the recognition information is provided as essence data of 80% to 99% of completion degree corresponding to the exclusion rate of the inspection personnel.

바람직하게는, 상기 방법에서 상기 에센스 자료는 상기 에센스 신경망 인공지능을 이용하여 상기 실시간 자료 입력부로부터 입력되는 실시간 CCTV 영상정보에서 자동으로 유의미한 이벤트의 발생과 해당 이벤트에 관련된 원 영상의 위치정보 , 타겟 요소를 표시하고 이를 메타데이타화 하여 저장한후, 필요시 상기 메타데이터화된 유의미한 장면만 검사 가능하도록 구성된 자료이거나, 검사가 완료된 자료이거나 검사요원의 추가 검사를 통해 최종 검사가 완료되는 반가공 자료이다.Preferably, in the method, the essence data may include at least one of genuine events generated from real-time data input from the real-time data input unit using the essence neural network artificial intelligence, location information of an original image related to the event, Is metadata that can be used to check only the meaningful scenes that are meta-data, if necessary, or semi-processed data that is the finished data or the final inspection is completed through the additional inspection of the inspection personnel.

바람직하게는, 상기 전처리 단계 이후, 상기 에센스 자료 제공기구가 전후 CCTV 영상의 변화를 감지하여 전후 영상간의 차이를 영상차분 정보로 생성하여 이벤트의 발생이나 특정지역에서의 상황변경을 알려주는 영상차분 처리단계를 더 포함하며, 상기 영상차분 처리단계에서, 에센스 자료제공기구는 에센스 메타데이터 또는 에센스 영상에 대해 상기 영상차분 정보를 반영하거나, 상기 영상 천처리 기구의 전처리 정보 생성부로부터 나오는 전처리 정보를 기초로 직접 영상차분 정보를 생성한다.Preferably, after the pre-processing step, the essence data providing apparatus senses a change in the front and rear CCTV images, generates a difference between the front and rear images as image difference information, and performs image difference processing Wherein the essence data providing mechanism reflects the image difference information with respect to the essence metadata or the essence image or uses the preprocessing information generated from the preprocessing information generating unit of the image processing apparatus as a basis To generate direct image difference information.

바람직하게는, 상기 방법에서, 상기 시스템은 시스템 전반을 제어하는 제어기구를 더 포함하고, 상기 제어기구는 상기 영상 인식 지원 장치와 일체로 구성되거나 직접 전기통신적으로 연결되고, 상기 영상 전처리 단계 또는 상기 에센스 자료 제공 단계 중에, 상기 영상 전처리 기구와 상기 에센스 자료 제공기구의 CCTV 영상 처리동작에서 상기 제어기구의 케이케이딩 처리부가 인식 작업의 디코딩 작업, 디텍션 작업, 트래킹 작업등을 분리하여 서로 다른 인식 대상 비디오 스트림에 하나씩 교호로 할당하는 케스케이딩 처리 단계와; 상기 신경망 인공지능이 상기 외부소스로부터의 오픈 자료와 CCTV계 장치로부터 제공되는 실시간 자료, 및 환경요소 와 타겟요소의 추가 또는 변경에 따라서 주기적 또는 간헐적으로 업데이트되는 단계를 더 포함한다.Preferably, in the method, the system further comprises a control mechanism for controlling the system as a whole, and the control mechanism is integrally constructed or directly telecommunicated with the image recognition assistant device, and the image preprocessing step In the essence data providing step, in the CCTV image processing operation of the image preprocessing mechanism and the essence data providing mechanism, the caching processing unit of the control mechanism separates the decoding operation of the recognition operation, the detection operation, the tracking operation, A cascading processing step of assigning video streams one by one alternately; The neural network artificial intelligence is periodically or intermittently updated according to the addition or modification of open data from the external source and real-time data provided from the CCTV system device, and environment elements and target elements.

바람직하게는, 상기 방법에서, 상기 시스템은 시스템 전반을 제어하는 제어기구를 더 포함하고, 상기 제어기구는 상기 영상 인식 지원 장치와 일체로 구성되거나 직접 전기통신적으로 연결되고, 상기 모의 타겟 요소 준비단계, 상기 모의 환경요소 준비단계, 상기 모의 전처리 요소 준비 단계 중 적어도 하나의 단계가, 상기 제어기구가 영상 전처리 단계 또는 에센스 자료 제공단계의 동작조건을 설정하고, 영상 인식 지원 장치의 운영동작을 감시하는 입출력 단계와, 모의 전처리 요소, 모의 환경 요소, 모의 타겟 요소 중에서 형상으로 사전에 준비해둘 수 있는 부분을 수동 또는 자동으로 생성할 수 있는 기능을 제공하는 GUI단계와, 에센스 자료 제공 기구에 의해 CCTV 영상의 인식작업이 진행되어 에센스 자료가 제공될 때 타겟 요소의 출현을 실시간으로 알려주는 알람단계를, 적어도 포함하며, 상기 기계학습은 딥러닝(Deep Learning) 및 샬로우러닝(Shallow learning) 중 적어도 하나를 포함하고, 상기 영상 인식 지원 장치의 종합 데이터 베이스(DB)가 장치 전반의 운영과정에서 발생하는 데이터를 저장하는 단계를 더 포함한다.Preferably, in the method, the system further comprises a control mechanism for controlling the system as a whole, and the control mechanism is integrally formed or directly telecommunicated with the image recognition assistant device, and the simulated target element preparation At least one of the steps of preparing the simulated environmental element, preparing the simulated preprocessing element, and preparing the simulated preprocessing element may include the steps of: setting the operating conditions of the image preprocessing step or the essence data providing step, A GUI step of manually or automatically generating a part of the simulated preprocessing element, a simulation environment element, and a simulated target element that can be prepared in advance in a shape, As the recognition of the image progresses and the essence data is provided, the appearance of the target element is performed in real time Wherein the machine learning includes at least one of Deep Learning and Shallow Learning, and wherein the comprehensive database of the image recognition support apparatus includes at least one of an overall And storing the data generated during the operation of the mobile terminal.

이상과 같은 구성에 따라서, 본 발명은 다음과 같은 효과를 제공한다.According to the above configuration, the present invention provides the following effects.

먼저, 당분야의 영상 인식 기술에 있어서, 시장에서 요구되는 핵심 기능중 하나는 한 대의 호스트서버에 다수 CPU 및 GPU 들을 장착한 상태에서 가능한한 많은 CCTV대수로부터 입력되는 실시간 영상 스트림들을 동시에 처리하는 능력이다. 즉, 주어진 하드웨어 장비 사양에서 처리코자 하는 기능들을 담당하는 소프트웨어가 어떻게 효율적으로 영상 데이터 스트림을 처리하는가에 따라서 이 시스템의 가치는 차이가 날 수 가 있다. 한 대의 호스트서버에서 30대의 CCTV비디오 스트림을 동시에 처리하는 소프트웨어 시스템도 있을 수 있고, 동일한 호스트서버에서 50대 이상의 CCTV비디오 스트림을 동시에 처리하는 소프트웨어 시스템도 있을 수 있다. 동일 혹은 유사 품질의 출력물을 처리할 경우 당연히 보다 많은 CCTV비디오 스트림을 처리하는 소프트웨어 시스템의 가치가 더 높다.First of all, one of the key functions required in the market in the field of image recognition technology in the field is the ability to simultaneously process real-time video streams inputted from as many CCTVs as possible with a plurality of CPUs and GPUs mounted on one host server to be. In other words, the value of this system can vary depending on how efficiently the software that handles the functions to be processed in a given hardware device specification processes the video data stream. There may be a software system that simultaneously processes 30 CCTV video streams from a single host server, and a software system that simultaneously handles more than 50 CCTV video streams from the same host server. Of course, when processing output of the same or similar quality, a software system that processes more CCTV video streams is more valuable.

이러한 점에서, 그 만큼 환경 요소 및 타겟 요소를 정형화하여 식별하는 능력과 타겟 요소의 분리에 방해가 되는 환경 요소를 단순화하는 작업이 필요하다. 본 발명은 이러한 요구에 부응하는 기술로서, 가능한 한 영상스트림에서 특정 물체를 감지 (detection)을 90% 이상 정확도를 유지하면서, 또한, 가능한한 많은 영상 데이터 스트림을 동시병렬로 처리할 수 있는 능력을 최대화 할 수 있는 핵심 효과를 가진다. 이러한 핵심 효과를 뒷받침하는 구체적인 하위 효과를 설명하면 다음과 같다. In this regard, there is a need to simplify the ability to identify and formalize environmental and target elements as well as the environmental elements that interfere with the separation of target elements. The present invention addresses this need by providing a technique capable of detecting as much as 90% or more of a specific object in a video stream as much as possible and simultaneously processing as many video data streams as possible in parallel It has a key effect that can be maximized. The specific sub-effects that support these key effects are as follows.

첫째, 종래 영상 인식 시스템의 경우는 차후 유의미한 타겟 영상을 찾기 위해 사전에 모든 영상을 육안으로 식별하여 코드를 부여하는 작업을 판별사 또는 검사원이 수작업으로 수행하였다. 또한, 실시간으로 들어오는 CCTV 영상마다 이러한 코드 부여 작업 역시 수작업으로 끊임없이 수행해야 한다. 실시간 CCTV 영상을 실시간으로 검사할 경우에도 모든 영상을 판별사 또는 검사원의 육안검사가 필요하므로 놓치거나 부정확한 검사결과를 산출할 뿐만 아니라 노동력과 시간이 많이 소모되었다. First, in the case of the conventional image recognition system, a discriminator or an inspector manually performed an operation of visually identifying all the images beforehand to find a meaningful target image and assigning codes thereto. In addition, this code assignment work must be performed manually by every CCTV image coming in real time. In case of real-time inspection of real-time CCTV images, since all images are required to be visually inspected by a discriminator or an inspector, not only missed or inaccurate inspection results but also labor and time are consumed.

그러나, 본 발명의 움직이는 객체 감지 기술을 이용한 CCTV 영상내 의심물체 육안 인식 부분검사 지원 시스템 및 방법에 있어서는 이러한 사전 검사 코드 마킹 작업은 물론 실시간 CCTV 영상 검사 작업에 있어 유사한 모의 환경 학습을 통해 생성되는 신경망 인공지능을 통해 사람의 노동력 개입이 없거나 최소화하여 CCTV 영상에서 유의미한 타겟을 검사하거나 검사할 수 있는 환경을 제공할 수 있다. 따라서, 검사의 정확성, 노동력의 최소화, 검사결과의 신속화에서 종래기술 보다 진보한 능력을 제공한다.However, in the system and method for supporting the inspection of suspicious objects in the CCTV image using the moving object sensing technology of the present invention, it is necessary to perform the pre-inspection code marking work as well as the neural network Artificial intelligence can provide an environment that can inspect or inspect meaningful targets in CCTV images by eliminating or minimizing human labor involvement. Thus, it provides advanced capabilities over the prior art in the accuracy of testing, minimization of labor force, and expediting of test results.

둘째, 종래 기술의 경우, 영상객체 인식을 이용한 전 자동 객체 인식 및 행위 인식 시스템의 경우 이를 처리하는 호스트서버의 용량이 대단히 많이 필요하다. 그 이유는 입력영상이 HD급으로서 즉, 최소 1,080 x 720 (통상 1,280 x 720) 픽셀 이상이며, RGB 컬러값으로 각 픽셀당 3바이트의 메모리를 차지한다.Second, in the case of the conventional automatic object recognition and behavior recognition system using the video object recognition, the host server processing the object requires a great amount of capacity. The reason for this is that the input image is HD-level, that is, at least 1,080 x 720 (usually 1,280 x 720) pixels or more, and occupies three bytes of memory per pixel with RGB color values.

그러나, 본 발명의 인공지능 기반 영상 전처리 기능을 통해 처리 대상 영상의 데이터량을 대폭 감소할 수 있어, 동일한 호스트 용량을 기준으로 할 때 종래 기술에 비하여 영상 처리 과정에, 수배에서 수십배 까지 더 많은 영상 스트림 처리할 수 있다. 즉 본 발명의 인공지능 기반 영상 전처리 기능은 CCTV 영상의 해상도를 낮추는 해상도 축소기법과, CCTV 영상의 색상의 종류를 단순화하는 색상 전환기능과, CCTV에서 타겟과 관련없는 객체를 배제하고 타겟 또는 타겟과 관련성이 있는 부분을 영상처리 대상영역으로 설정하는 필요영역 설정기능과, CCTV 영상에서 초당 프레임들 중 중복부분의 개수를 낮추는 프레임 축소기능을 통해 검색 대상 영상의 데이터량을 대폭 축소할 수 있다.However, since the artificial intelligence-based image preprocessing function of the present invention can greatly reduce the data amount of the processing target image, when the same host capacity is used as a reference, the image processing process can be performed several times to several times Stream processing is possible. That is, the artificial intelligence-based image preprocessing function of the present invention includes a resolution reduction method for reducing the resolution of the CCTV image, a color switching function for simplifying the color type of the CCTV image, It is possible to greatly reduce the amount of data of a search target image through a necessary area setting function for setting a relevant part as an image processing target area and a frame reduction function for reducing the number of overlapping parts among frames per second in a CCTV image.

보다 구체적으로는,More specifically,

a. 해상도 축소기법을 통해 HD급 영상을 최대한 축소하여 CIF (352 x 240, NTSC방식에서) 혹은 QCIF (176 x 120) 크기까지 압축 내지 축소한 형태의 영상으로 변환하여 데이터량을 감소시키고, a. The resolution reduction method reduces the amount of data by converting the HD class image into a compressed or reduced image of CIF (352 x 240 in NTSC format) or QCIF (176 x 120) format,

b. 색상 전환기능을 통해, 컬러 영상을 흑백영상으로 변환하여 처리 데이타 량을 감소시키고, b. The color conversion function converts the color image into a monochrome image to reduce the processing data amount,

c. 필요영역 설정기능을 통해, 하나의 도로 영상에서 가로수등 도로 주변 배경은 영상관측의 주 관심이 아니므로 이를 배제하고 도로 로면위에 다니는 차량과 인도상에 다니는 사람에 대한 유무를 판별하는 판별지역설정을 통해 처리해야 할 한 영상내 비트맵의 갯수를 현저히 축소 시키고,c. Since the background of a road such as a roadside in a road image is not a main concern of an image observation through a necessary area setting function, it is possible to set a discrimination area for discriminating presence or absence of a person traveling on a roadway and a person traveling on a roadway The number of bitmaps in the image to be processed through the image is significantly reduced,

d. 프레임 축소기능에 있어서,하나의 영상이 통상 초당 30프레임으로 구성되는 이 되는바, 이를 MPEG등의 압축방식에서는 I프레임, B프레임, P프레임 등의 기술로 표시하는데, 영상처리 부하를 최소화 하기 위하여 초당 30프레임 전체를 대상으로 처리하는 방법이 아니고, 이중 초당 1~수 프레임만 선택하여 처리 데이터량을 감소시킬 수 있다. 추가로 이전프레임과 이후 프레임간의 차별화를 판별하는 기능을 가지고 더욱 검사 용량을 간소화하거나 축소할 수 있다.d. In the frame reduction function, one image is usually composed of 30 frames per second. In the compression method such as MPEG, it is displayed by the technique of I frame, B frame, P frame, etc. In order to minimize the image processing load It is not a method of processing the entire 30 frames per second, but the processing data amount can be reduced by selecting only one to several frames per second. In addition, it has the function of discriminating the difference between the previous frame and the subsequent frame, and further the inspection capacity can be simplified or reduced.

셋째, 현재 출시된 영상차분 혹은 딥러닝 혹은 샬로우러닝 기계학습을 통한 대부분의 전자동 객체 인식 및 행위인식 시스템의 인식율이 현실적으로 50% ~ 80% 수준에 머물고 있지만, 본 발명의 경우 전처리 기능으로 검사 대상 데이터량을 대폭축소한 상태에서 신경망 인공지능을 이용하여 움직이는 차량, 사람 등의 타겟을 탐지 (Detection) 하기 때문에 정확도를 한층 높일 수 있으며, 구체적으로는 90% 이상의 정확도를 확보할 수 있다. 이렇게 인식된 유의미한 장면에 대해서 표기를 메타데이터 또는 다른 방식으로 마킹을 하고, 추후 검사가 필요할 때에 마킹된 장면에 해당하는 영상을 제공하기 때문에, 전수검사가 아닌 부분검사만으로도 최소한의 시간과 노력으로 검사를 완벽에 가깝게 완료할 수 있어, 전체 영상에 대해서 그 검색 노동력 및 검색시간을 종래기술에 비하여 월등히 감소시킬 수 있다.Third, the recognition rate of most fully automatic object recognition and behavior recognition systems through the currently available image difference, deep learning, or shallow running learning machine is practically about 50% ~ 80%. However, in the case of the present invention, Since the data amount is largely reduced and the target such as a moving vehicle or a person moving by using the neural network artificial intelligence is detected, the accuracy can be further increased, and specifically, accuracy of 90% or more can be secured. Since the notation is marked with meta data or another method and the image corresponding to the marked scene is provided at a later inspection time, the partial inspection is carried out with minimal time and effort Can be completed close to perfection, and the search labor and search time for the entire image can be greatly reduced compared to the prior art.

넷째, 앞에서도 설명한 바와 같이, 종래기술의 경우 동시에 입력되는 CCTV비디오 스트림을 주어진 하드웨어 장비 위에서 소프트웨어 적으로 동시병렬로 가능한한 많은 스트림을 처리하기 위해서는 데이터 스트림 처리과정에서 병목현상이 발생할 수 있다. 예를들면, 비디오 디코딩, 비디오 내 객체 디텍션, 비디오 클래시피케이션 (Classification), 트래킹 등은 처리과정에 많은 CPU, 메모리 버퍼, GPU자원을 사용하기 때문에, 수십개의 비디오스트림이 동시에 디코딩, 디텍션, 트래킹 등과 같은 무거운 소프트웨어 처리 과정을 동시 수행할 경우 이 부분이 병목이 될 수 있으며, 따라서, 본 발명에서는 이러한 병목해소 방법으로서 디코딩, 디텍션, 트래킹을 서로 다른 비디오데이타 스트림에 대해서 처리하도록 분리하여 동시에 수행하는 소위 케스케이딩 혹은 파이프라이닝 기법(이하 “케스케이딩 처리”라 함)을 사용한다. 즉, 본 발명의 경우, 캐스케이딩 처리에 따라서 1번 비디오 스트림이 디코딩 모듈을 수행할 때는 다른 비디오스트림은 큐에서 대기를하고 1번 비디오 디코딩이 종료후에 2번 비디오가 다시 디코딩을 시작한다. 한편, 1번 비디오는 디코딩이 종료된후 비디오 디텍션 과정을 처리한다. 이렇게 하면, 비디오 디텍션이란 무거운 소프트웨어 과정에 대해서 모든 비디오 스트림이 대기를 하는 병목현상을 해소할 수 있다. 이를 다시 설명하면, 비디오 디코더 모듈, 비디오 디텍션 모듈이 있다고 가정하면, 디코더 모듈에서 1번 비디오를 처리후, 그 결과를 디텍션 모듈로 넘기고, 디코더 모듈은 2번 비디오를 받아서 디코딩을 하고, 그 결과를 디텍션 모듈로 넘긴다. 이렇게 할 경우, 디텍션 모듈이 무거운 처리과정이라고 할 때 디텍션 모듈의 앞과정 혹은 뒤 과정에 가벼운 처리모듈을 둠으로써 병목현상을 해결할 수 있다.Fourth, as described above, in the related art, a bottleneck may occur in the process of processing a data stream in order to process as many streams as possible simultaneously in parallel while software CCTV video streams are simultaneously input on a given hardware device. For example, video decoding, object detection in video, video classification, and tracking use a large number of CPUs, memory buffers, and GPU resources for processing, so dozens of video streams can be decoded, And so on. Therefore, in the present invention, decoding, detection, and tracking are performed as separate bottleneck resolution methods to separate and simultaneously perform processing for different video data streams Called cascading or pipelining technique (hereinafter referred to as " cascading process ") is used. That is, in the case of the present invention, when the video stream # 1 performs the decoding module according to the cascading process, the other video stream waits in the queue, and the video # 2 starts decoding again after the video decoding # 1 ends. On the other hand, the video # 1 processes the video detection process after the decoding ends. This way, video detection can eliminate bottlenecks in which all video streams wait for heavy software processes. If the video decoder module and the video detection module are assumed to be present, the decoder module processes the video 1 and then passes the result to the detection module. The decoder module receives the video 2 and decodes the video, And passes it to the detection module. In this case, if the detection module is a heavy processing process, the bottleneck phenomenon can be solved by putting a light processing module in the process before or after the detection module.

이러한 점에서, 그 만큼 환경 요소 및 타겟 요소를 정형화하여 식별하는 능력과 타겟 요소의 분리에 방해가 되는 환경 요소를 단순화하는 작업이 필요하다. 본 발명은 이러한 요구에 부응하는 기술로서, 가능한 한 영상스트림에서 특정 물체의 감지 (detection)를 90% 이상 정확도를 유지하면서, 또한, 가능한한 많은 영상 데이터 스트림을 동시병렬로 처리할 수 있는 능력을 최대화 할 수 있는 핵심 효과를 가진다. 이러한 핵심 효과를 뒷받침하는 구체적인 하위 효과를 설명하면 다음과 같다. In this regard, there is a need to simplify the ability to identify and formalize environmental and target elements as well as the environmental elements that interfere with the separation of target elements. The present invention addresses this need by providing a technique capable of simultaneously processing as many image data streams as possible while maintaining accuracy of 90% or more of detection of a specific object in a video stream as much as possible It has a key effect that can be maximized. The specific sub-effects that support these key effects are as follows.

보다 구체적으로는,More specifically,

넷째, 앞에서도 설명한 바와 같이, 종래기술의 경우 동시에 입력되는 CCTV비디오 스트림을 주어진 하드웨어 장비위에서 소프트웨어 적으로 동시병렬로 가능한한 많은 스트림을 처리하기 위해서는 데이터 스트림 처리과정에서 병목현상이 발생할 수 있다. 예를들면, 비디오 디코딩, 비디오 내 객체 디텍션, 비디오 클래시피케이션 (Classification), 트래킹 등은 처리과정에 많은 CPU, Memory 버퍼, GPU자원을 사용하기 때문에, 수십개의 비디오스트림이 동시에 디코딩, 디텍션, 트래킹 등과 같은 무거운 소프트웨어 처리 과정을 동시 수행할 경우 이 부분이 병목이 될 수 있으며, 따라서, 본 발명에서는 이러한 병목해소 방법으로서 디코딩, 디텍션, 트래킹을 서로 다른 비디오데이타 스트림에 대해서 처리하도록 분리하여 동시에 수행하는 소위 케스케이딩 혹은 파이프라이닝 기법(이하 “케스케이딩 처리”라 함)을 사용한다. 즉, 본 발명의 경우, 캐스케이딩 처리에 따라서 1번 비디오 스트림이 디코딩 모듈을 수행할 때는 다른 비디오스트림은 큐에서 대기를하고 1번 비디오 디코딩이 종료후에 2번 비디오가 다시 디코딩을 시작한다. 한편, 1번 비디오는 디코딩이 종료된후 비디오 디텍션 과정을 처리한다. 이렇게 하면, 비디오 디텍션이란 무거운 소프트웨어 과정에 대해서 모든 비디오 스트림이 대기를 하는 병목현상을 해소할 수 있다. 이를 다시설명하면, 비디오 디코더 모듈, 비디오 디텍션 모듈이 있다고 가정하면, 디코더 모듈에서 1번 비디오를 처리후, 그 결과를 디텍션 모듈로 넘기고, 디코더 모듈은 2번 비디오를 받아서 디코딩을 하고, 그 결과를 디텍션 모듈로 넘긴다. 이렇게 할 경우, 디텍션 모듈이 무거운 처리과정이라고 할 때 디텍션 모듈의 앞과정 혹은 뒤 과정에 가벼운 처리모듈을 둠으로써 병목현상을 해결할 수 있다.Fourth, as described above, in the related art, a bottleneck may occur in the process of processing a data stream in order to process as many streams as possible simultaneously in parallel while software CCTV video streams are simultaneously input on a given hardware device. For example, video decoding, object detection in video, video classification, and tracking use a large number of CPUs, memory buffers, and GPU resources in the process, so dozens of video streams can be decoded, And so on. Therefore, in the present invention, decoding, detection, and tracking are performed as separate bottleneck resolution methods to separate and simultaneously perform processing for different video data streams Called cascading or pipelining technique (hereinafter referred to as " cascading process ") is used. That is, in the case of the present invention, when the video stream # 1 performs the decoding module according to the cascading process, the other video stream waits in the queue, and the video # 2 starts decoding again after the video decoding # 1 ends. On the other hand, the video # 1 processes the video detection process after the decoding ends. This way, video detection can eliminate bottlenecks in which all video streams wait for heavy software processes. If the video decoder module and the video detection module are assumed to be present, the decoder module processes the video 1 and then passes the result to the detection module. The decoder module receives the video 2 and decodes the video, And passes it to the detection module. In this case, if the detection module is a heavy processing process, the bottleneck phenomenon can be solved by putting a light processing module in the process before or after the detection module.

다섯째, 앞에서도 설명한 바와 같이, 종래기술의 경우 최근 출시된 IP카메라중에는 상기 첫째, 둘째, 셋째, 넷째의 기능들 일부를 IP카메라에 탑재된 임베디드 컴퓨터상에서 직접 처리하여 움직임이 없는 부분은 버리고 유의미한 움직임이 있는 장면만을 출력으로 내 보내거나 혹은 기본적인 영상인식 동작인식등을 수행하여 그 결과를 내보내는 고급 카메라들도 있다. 이런 고급 IP CCTV카메라의 경우 전술한 움직임 감지와 같은 초급 영상인식 수준의 기술이 지능형 IP카메라 상에서 수행이 직접되고, 유의미한 영상이 서버로 전송되어 왔을 경우는, 육안검사를 수행하는 검사원이 용의자 및 용의차량 추적 등 중급 및 고급의 사건인식 등을 할 수 있도록 도와줄 수 있는 얼굴인식, 보행자의 인상착의 인식, 보행자 휴대 물품 인식, 차량의 차종인식, 색깔 인식의 기능들을 서버에서 수행하는 딥러닝 객체인식 및 행위 인식 시스템 및 방법을 제공할 수 있다.Fifth, as described above, in the conventional technology, some of the first, second, third, and fourth functions of the recently released IP cameras are directly processed on the embedded computer mounted on the IP camera, There are also some advanced cameras that output only the scene with the output, or perform basic image recognition motion recognition and so forth. In the case of such advanced IP CCTV cameras, when a basic image recognition level such as the above-described motion detection is performed directly on the intelligent IP camera and a meaningful image is transmitted to the server, Which is a deep running object that performs functions of face recognition, pedestrian recognition, pedestrian recognition, vehicle recognition, and color recognition, which can help intermediate and advanced events such as vehicle tracking of a vehicle, A recognition and behavior recognition system and method can be provided.

즉, IP CCTV를 이용할 경우에는 초급 영상인식 수준에서 유의미한 영상변화를 검출할 때 소요되는 인력이 거의 필요없게 되므로 서버 차원에서는 초급 영상 수준에서 소요되던 인력을 중급 또는 고급의 사건인식에 사용하도록 전용할 수 있다. 또한, 본 발명에 따른 인공지능 기능이 중급 및 고급 영상인식 중 적어도 하나를 수행할 경우, 검색요원이 수행하는 중급 영상인식이나 고급 영상인식 작업이 불필요하게 되거나, 검색요원이 수행하는 중급 영상 인식 작업이나 고급 영상인식 작업의 전체 부하량 중에서 많은 부분은 본 발명의 인공지능 기능이 담당할 수 있기 때문에 검색요원의 중급 영상인식 작업량이나 고급 영상인식 작업량을 크게 감소시킬 수 있다.In other words, when IP CCTV is used, the labor required when detecting meaningful image changes at the level of image recognition is almost unnecessary. Therefore, in the server level, the personnel used at the beginner image level is dedicated to use for event recognition of intermediate or advanced . Further, when the artificial intelligence function according to the present invention performs at least one of intermediate and advanced image recognition, it is unnecessary to perform intermediate image recognition or advanced image recognition operations performed by a search agent, or intermediate image recognition operations Or a large portion of the overall load of the advanced image recognition task can be handled by the artificial intelligence function of the present invention, the intermediate image recognition workload and the advanced image recognition workload of the search agent can be greatly reduced.

도 1은 본 발명의 일실시예에 따라서 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템을 나타낸 개략 구성도이다.
도 2는 본 발명의 일실시예에 따라서 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법의 동작을 나타낸 흐름도이다.
도 3은 본 발명의 일실시예에 따른 영상 전처리 프로세스의 흐름도이다.
도 4는 본 발명의 일실시예에 따른 영상 차분 또는 기계학습에 따른 신경망 인공지능의 생성 프로세스를 나타낸 도면이다.FIG. 1 is a schematic diagram showing a system for supporting a suspicious object part inspection in a CCTV image using a class-specific image recognition technology for reducing the load of a visual recognition sensor according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an operation of a system and method for supporting a suspicious object part inspection in a CCTV image using a class-specific image recognition technology for load reduction of a visual recognition surveillant according to an embodiment of the present invention.
3 is a flowchart of an image preprocessing process according to an embodiment of the present invention.
4 is a diagram illustrating a process of generating artificial intelligence based on image difference or machine learning according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예에 대하여 첨부도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따라서 구성된 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 (이하 “부분검사 지원 시스템”과 혼용하여 표현함)의 개략 구성도이다. 도 1을 참조하면, 본 발명의 일실시예에 따른 부분검사 지원 시스템은 CCTV로부터 입력된 기초 영상자료에서 유의미한 참조 자료를 확인하는데 필요한 타겟 자료를 찾아내고, 불필요한 자료를 배제할 수 있는 능력을 스스로 학습하고, 결과적으로 검사에 필요한 자료량을 최소한으로 축소시킨 에센스 메타데이터 또는 그 에센스 메타데이터 기반 에센스 영상을 생성하는 영상 인식 지원 장치(4000)와, 실시간으로 길거리 또는 주변 환경 동향을 감시하는 다수의 CCTV계 장치(1000, 1100)와, 기계 학습용 자료로 사용할 수 있는 기존에 마련된 다수의 정보자료를 공지된 오픈자료로 제공할 수 있는 외부 소스(2000)와, 영상 인식 지원 장치(4000)에 포함되거나 연결되어 영상 인식 지원 장치(4000) 또는 시스템을 전반적으로 제어하는 제어기구(3000)를 포함하여 이루어진다. 다수의 CCTV계 장치(1000, 1100)는 일반 CCTV 카메라(1010)를 포함하는 CCTV계 장치(1000)와 IP CCTV 카메라(1110)를 포함하는 IP CCTV계 장치(1100)로 구분할 수 있으며, 그 밖의 구성은 양자간 동일하다. 기능상 본 발명에서 일반 CCTV계 장치(1000)와 IP CCTV계 장치(1100)의 차이점은 IP CCTV계 장치(1100)는 자체적으로 기초 영상인식 정보를 생성할 수 있는 인공지능을 내장할 수 있지만 일반 CCTV계 장치(1000)는 인공지능을 보유하지 못하여 기초 영상인식 정보를 검사인력과 본 발명의 후술하는 인공지능기구(4300)의 도움을 받아서 출력한다는 점에서 상호간 차이가 있다고 할 수 있다. 그러나, IP CCTV계 장치(1100)와 일반 CCTV계 장치(1000) 모두 부분적으로 인공지능기구(4300)의 도움을 추가로 받아서 기초 영상인식 정보를 생성할 수도 있다.FIG. 1 is a schematic view of a suspicious object part inspection support system (hereinafter, referred to as a partial inspection support system) in a CCTV image using a class-based image recognition technology for reducing the load of a visual recognition surveillant constructed according to an embodiment of the present invention FIG. Referring to FIG. 1, a partial test support system according to an exemplary embodiment of the present invention finds a target data required to identify meaningful reference data in the basic image data input from the CCTV, (4000) for generating an essence image based on the essence metadata or its essence meta data obtained by minimizing the amount of data necessary for the examination, and a plurality of CCTVs An external source 2000 capable of providing a plurality of existing information resources that can be used as machine learning materials as well as known open data, and an external source 2000 that is included in the image recognition support apparatus 4000 And a control mechanism 3000 that controls the image recognition support apparatus 4000 or the system as a whole . A plurality of CCTV system apparatuses 1000 and 1100 can be classified into a CCTV system apparatus 1000 including a general CCTV camera 1010 and an IP CCTV system apparatus 1100 including an IP CCTV camera 1110, The configuration is the same between the two. Functionally, the difference between the general CCTV system 1000 and the IP CCTV system 1100 in the present invention is that the IP CCTV system 1100 can embed an artificial intelligence capable of generating basic video recognition information on its own, The system 1000 does not have the artificial intelligence, and the basic image recognition information is outputted with the help of the inspection personnel and the artificial intelligence device 4300 described later of the present invention. However, both the IP CCTV system device 1100 and the general CCTV system device 1000 may additionally receive the assistance of the artificial intelligence device 4300 to generate basic image recognition information.

본 발명에서 표현하는 외부 소스(2000)는 일반적으로 길거리 상황, 교통 상황, 도로 상황, 풍경과 같은 고정된 대상에서 시간적 변화, 계절적 변화, 기상적 변화 등이 시계열적으로 연속하여 표현될 수 있는 정지화상 및 동영상을 제공할 수 있는 행정기관, 연구기관, 관공서, 사설기관 등에 설치되어 운영중인 CCTV들을 망라하는 개념이며, 외부 소스(2000)로부터 제공되는 오픈 자료를 이용하여 특정 사건에서 유의미한 자료, 예를 들면, 사건, 사고, 행사(이하 사건, 사고, 행사, 및 그에 따른 물적, 현상적 변화 등을 망라하는 개념으로 "이벤트"라 함)에 관여하는, 예를 들면, 차량이나, 범죄자, 형사, 일반 방관자, 행사 참여자 등(이하 사건에 참여하는 모든 물리적 대상을 "객체"로 표현함)과 이들 객체의 이동이나 변화 장면(이하, 객체와, 객체의 이동이나 변화를 포함하는 영상장면을 "타겟"으로 표현함, "타겟"은 많은 영상자료에서 검사하여 찾고자 하는 대상이 된다)을 기계학습을 통해 학습하여 사건, 사고 현장에서 스스로 무의미한 환경자료와 유의미한 타겟 영상을 분류할 수 있는 신경망 인공지능을 생성한다는 점이 본 발명의 특징들 중 하나이다. The external source 2000 represented in the present invention is generally a stationary object such as a stationary object such as a street situation, a traffic situation, a road situation, a scenery, a temporal change, a seasonal change, a meteorological change, CCTVs are installed and operated in administrative agencies, research institutes, government offices, and private institutions that can provide images and videos. Using open data provided from external sources (2000) For example, a person who is involved in an event, an accident, or an event (hereinafter referred to as an "event" with a concept covering events, accidents, events and consequently material and phenomenal changes) (Hereinafter referred to as "object") and a moving or changing scene (hereinafter, referred to as "moving object" or "moving object") of the object , "Target" is an object to be inspected and examined in many image data) through machine learning to classify the target image meaningless and meaningful in the event and accident scene by itself It is one of the features of the present invention that it generates neural network artificial intelligence that can be used.

먼저, 영상 인식 지원 장치(4000)는 외부소스(2000) 또는 CCTV계 장치(1000)로부터 직접 영상정보를 입력하는 자료입력기구(4100)와, 자료입력기구(4100)로부터 입력된 외부소스 자료(일반적으로 과거에 이미 기록된 자료, 실시간 자료와 구별되는 개념), 즉 오픈 자료 또는 실시간 자료(CCTV로 부터 입력되는 실시간 자료)를 학습자료로 이용하여 샬로우러닝, 딥러닝 방식 등의 기계학습을 통해 최종 타겟을 검사하는데 유용한 검사자료를 추출할 수 있는 능력을 획득할 수 있는 전처리 신경망 인공지능 및 에센스 신경망 인공지능을 생성하는 인공지능기구(4300)와, 인공지능기구(4300)로부터 생성된 전처리 신경망 인공지능을 이용하여 방대한 실시간 자료에서 불필요한 데이터를 감축하여 검사에 필요한 데이터량을 간소하게 축소하는 영상 전처리 기구(4400)와, 영상 전처리 기구(4400)로부터 제공되는 전처리 정보를 이용하거나 또는 실시간 자료를 이용하여 에센스 신경망 인공지능을 기초로 필수적 타겟을 포함하는 에센스 메타 데이터를 생성하거나 에센스 메타 데이터 기반의 에센스 영상을 제공하는 에센스 자료 제공기구(4500)를 구비한다. 또한, 영상 인식 지원 장치(4000)는 시스템 내에서 발생하는 모든 정보의 흐름, 발생, 생성의 결과물을 저장하는 종합 DB(4200)를 포함한다. 본 발명에서 표현하는 에센스 자료 제공기구(4500)가 제공하는 에센스 자료는 본 발명에서 정의하는 기초 영상인식 정보, 중급 영상인식 정보 및 고급 영상인식 정보를 모두 포함하거나 이들 정보 중 적어도 하나을 포함하는 개념이다.The image recognition support apparatus 4000 includes a data input mechanism 4100 for inputting image information directly from the external source 2000 or the CCTV system 1000 and an external source data input unit In general, it is possible to learn machine learning such as shallow running and deep learning method by using learning materials that are open data or real time data (real time data input from CCTV) A preprocessing neural network capable of acquiring the ability to extract test data useful for inspecting a final target through an artificial intelligence instrument 4300 for generating artificial intelligence and essence neural network artificial intelligence, An image preprocessing unit 4400 for reducing unnecessary data in massive real-time data by using neural network artificial intelligence to reduce the amount of data required for inspection, Generation of essence metadata including an essential target based on essence neural network artificial intelligence using pre-processing information provided from the preprocessing mechanism 4400 or real-time data, or essence data providing essence images based on essence metadata And has a mechanism 4500. In addition, the image recognition support apparatus 4000 includes a comprehensive DB 4200 that stores the results of the flow, generation, and generation of all information generated in the system. Essence data provided by the essence data providing mechanism 4500 represented in the present invention includes both basic image recognition information, intermediate image recognition information, and advanced image recognition information defined in the present invention, or includes at least one of these information .

또한, 영상 인식 지원 장치(4000)는 부분검사 지원 시스템 또는 영상 인식 지원 장치(4000)를 전반적으로 제어하는 제어기구(3000)를 더 포함한다. 다른 실시예로서, 제어기구(3000)는 영상 인식 지원 장치(4000)로부터 별도로 분리되어 외부에서 영상 인식 지원 장치(4000) 또는 부분검사 지원 시스템을 전체적으로 제어할 수도 있다. In addition, the image recognition support apparatus 4000 further includes a control mechanism 3000 that controls the partial inspection support system or the image recognition support apparatus 4000 as a whole. In another embodiment, the control mechanism 3000 may be separately provided separately from the image recognition support apparatus 4000 to externally control the image recognition support apparatus 4000 or the partial inspection support system as a whole.

먼저, 영상 인식 지원 장치(4000)에 있어서, 자료입력기구(4100)의 오픈자료 입력부(4110)는 앞에 설명한 바와 같이 외부 소스(2000)로부터 이미 마련된 오픈 자료, 예를 들면, 길거리, 도심 등에서의 도로, 사람들의 동태 등에 관한 정지 또는 동영상 자료를 입력하여 종합 DB(4200)에 오픈 자료로서 저장한다. 한편으로, 자료입력기구(4100)는 현재, 특정 자치단체의 관할 행정구역 내에 배치된 CCTV계 장치(1000)로부터 실시간으로 도로의 교통모습, 시민들의 생활모습이 담긴 활동상태를 입력하는 실시간 자료 입력부(4120)를 더 포함한다. 실시간 자료 입력부(4120)로부터의 실시간 자료도 종합 DB(4200)에 저장되는 한편 영상 전처리 기구(4400) 및 에센스 자료 제공기구(4500)에도 실시간으로 직접 입력될 수도 있다. First, in the image recognition support apparatus 4000, the open data input unit 4110 of the data input mechanism 4100 is an open data input unit 4110 from the external source 2000, for example, Roads, movement of people, and the like, and stores them as open data in the comprehensive DB 4200. On the other hand, the data input mechanism 4100 is a real-time data inputting unit for inputting the activity state of the traffic of the road in real time from the CCTV system 1000 disposed in the jurisdiction of a specific municipality, (4120). The real-time data from the real-time data input unit 4120 may be stored in the comprehensive database 4200, and may be directly input to the image preprocessing unit 4400 and the essence data providing unit 4500 in real time.

인공지능기구(4300)는 기본적으로 영상 전처리 기구(4400)의 기능을 지원하는 전처리 신경망 인공지능과 에센스 자료 제공기구(4500)의 기능을 지원하는 에센스 신경망 인공지능을 제공하게 되며, 이를 위해, 사전에 모의 전처리 요소를 준비하는 모의 전처리 요소 준비부(4310) 및 그 준비된 모의 전처리 요소를 기반으로 기계학습으로 모의 전처리 학습을 수행하는 모의 전처리 학습부(4320), 길거리의 기후적 변화, 시간적 변화, 주변 사물 이동 등을 포함하는 모의 환경 요소를 준비하는 모의 환경 요소 준비부(4330)와, 모의 객체 상태 정보와 모의 객체의 이동 정보를 포함하는 모의 타겟 요소 준비부(4340)와, 모의 환경 요소와 모의 타겟 요소를 기초로 사건, 사건, 행사 등의 이벤트에 대해 유의미한 모의 타겟을 서치하여 마킹하는 모의 타겟 마킹 학습부(4350)와, 모의 타겟 마킹 학습부(4350)에 의해 기계학습된 인공지능 능력을 전처리 신경망 인공지능과 에센스 신경망 인공지능으로 생성하는 신경망 인공지능 생성부(4360)를 포함할 수 있다.The artificial intelligence instrument 4300 basically provides the artificial intelligence artificial intelligence supporting the functions of the pre-processing neural network artificial intelligence supporting function of the image preprocessing mechanism 4400 and the essence data providing mechanism 4500. For this purpose, A simulation preprocessing element preparation unit 4310 for preparing a simulation preprocessing element for preparing the simulation preprocessing element and a simulation preprocessing learning unit 4320 for performing simulation preprocessing learning based on the prepared simulation preprocessing element, A mock environment element preparation unit 4330 that prepares a mock environment element including a moving object around the object, a mock target element preparation unit 4340 that includes mock object state information and movement information of the mock object, A mock target marking learning unit 4350 for searching and marking a mock target for an event such as an event, an event, an event based on the mock target element, And it may include an artificial intelligence neural generator (4360) to create a machine learning, artificial intelligence neural network capabilities in pre-treatment of artificial intelligence and artificial intelligence neural essence by the simulated target marking the learning unit (4350).

모의 전처리 요소 준비부(4310)의 모의 전처리 요소와, 모의 환경 요소 준비부(4330)의 모의 환경 요소와, 모의 타겟 요소 준비부(4340)의 모의 타겟 요소는 종합 DB(4200)에 저장된 오픈 자료 또는 실시간 자료 중 적어도 하나를 기초로 추출하게 되며, 이 부분에 대해서는 하기 영상 전처리 기구(4400) 또는 타겟 자료 제공 기구(4600)의 설명과 연계하여 보다 구체적으로 설명한다.The simulated pre-processing element of the simulated preprocessing element preparation unit 4310, the simulated environment element of the simulated environment element preparation unit 4330, and the simulated target elements of the simulated target element preparation unit 4340 are stored in the open database Or real-time data, which will be described in more detail in connection with the following description of the image preprocessing mechanism 4400 or the target data providing mechanism 4600.

영상 전처리 기구(4400)는 인식대상이 되는 데이터량을 축소하는 것에 초점을 두고 있는 기능기구에 해당하며, 이를 위해, 예를 들면, CCTV 영상물의 HD급 해상도를, CIF(352 , NTSC 방식 기준), 또는 QCIF (176 ×120) 급으로 압축 내지 축소하는 등의 기능을 수행하여 해상도 부분의 변환을 통해 데이터량을 축소하는 해상도 축소부(4410)와, 칼라 색상을 흑백으로 변화하여 데이터량을 축소하는 색상 전환부(4420)와, 예를 들면, 특정 지역에서 가로수 등 도로주변 배경은 관심 검사 대상인 타겟 영상에 해당되지 않을 가능성 높으므로 이러한 비관심 대상 영역을 배제하고 타겟 또는 타겟과 연관이 있는 부분을 설정하는 인식 필요영역 설정부(4430)와, 초당 영상 프레임 수를, 예를 들면, 30 프레임에서 1∼수 프레임으로 감축하는 프레임 축소부(4430)를 포함한다. 일반적으로 초당 프레임 수는 30개 내외가 되며, 그 중에서 근사하게 중복되는 부분을 제외하면(특히 정지화상인 경우), 초당 프레임수를 1% 내지 최대 99% 사이에서 줄일 수 있다.The image preprocessing mechanism 4400 corresponds to a function mechanism that focuses on reducing the amount of data to be recognized. For this purpose, for example, the HD resolution of the CCTV video is divided into CIF (352, based on NTSC format) , Or QCIF (176x120) class to reduce the amount of data through conversion of the resolution part, a resolution reduction part 4410 for changing the color color to black and white to reduce the amount of data For example, the background around the road, such as the number of avenues in a specific area, is likely to not correspond to the target image to be inspected. Therefore, And a frame reducing unit 4430 for reducing the number of image frames per second from, for example, 30 frames to one to several frames. Generally, the number of frames per second is about 30, and the number of frames per second can be reduced to between 1% and 99%, except for a portion that overlaps most of them (especially in the case of a still image).

영상 전처리 기구(4400)의 전처리 동작을 위해 제공되는 전처리 신경망 인공지능을 생성하기 위해 모의 전처리 요소 준비부(4310)는, 화면에서 해상도, 색상, 초당 유사 프레임 개수, 필요영역, 불필요영역 등을 모의 전처리 요소로서 사전에 미리 수집하여두며, 특히 필요영역이나 불필요영역은 사전에 외부소스 자료 입력부(4110)로부터 수집된 오픈자료를 통해 많은 사례로서 미리 수집하고 정형화할 수 있는 모델 형상인 경우 제어기구(3000)의 GUI부(3200)를 통해 준비해 둘 수도 있다. 전처리 요소 준비부(4310)의 그 밖의 요소는 제어기구(3000)의 입출력부(3100)에서 사용자의 셋팅을 통해 사전에 종류, 시간, 음영 레벨, 형상 규모, 해상도 범위 등 다양한 파레메터 중에서 입력조건을 선택할 수 있다. 그 밖에도 입출력부(3100)는 사용자 인터페이스(UI)의 기능을 포함하고 있으며, 입출력부(3100)의 사용자 인터페이스를 통해 차후 설명하게될 에센스 메타데이터 또는 에센스 영상에 대한 추가 육안 검사를 수행하여 최종 검사 완료 데이터를 생성하여 출력하거나 데이터베이스(4200)에 저장할 수도 있다. 모의 전처리 학습부(4320)는 준비된 전처리 요소를 기초로 종합 DB(4200)에 저장된 무수히 많은 오픈 자료 영상을, 예를 들면, 샬로우러닝 또는 딥러닝 기법을 이용하여 기계학습하여 결과적으로 전처리 신경망 인공지능을 전처리 작업용 엔진으로서 제공할 수 있다. In order to generate the pre-processing neural network artificial intelligence provided for the preprocessing operation of the image preprocessing mechanism 4400, the simulation preprocessor preparation unit 4310 simulates the resolution, color, number of similar frames per second, necessary area, In particular, in the case of a model shape that can be collected and formatted in advance as a case through the open data collected from the external source data input unit 4110 in advance, the necessary area or the unnecessary area is collected in advance as a pre-processing element. 3000 via the GUI unit 3200. [ The other elements of the preprocessing element preparation unit 4310 may be previously set in the input / output unit 3100 of the control mechanism 3000 through the setting of the user, among various parameters such as the type, time, shade level, Can be selected. In addition, the input / output unit 3100 includes a function of a user interface (UI), performs additional visual inspection of essence metadata or essence image to be described later through a user interface of the input / output unit 3100, The completion data may be generated and output or stored in the database 4200. The simulation pre-processing learning unit 4320 performs machine learning using, for example, a shallow running or a deep learning technique, a myriad of open data images stored in the comprehensive DB 4200 based on the prepared pre-processing elements, Intelligence can be provided as an engine for pre-processing work.

에센스 자료 제공기구(4500)는 신경망 인공지능 생성부(4360)로부터 생성되는 에센스 신경망 인공지능을 이용하여 실시간 자료 입력부(4120)로부터 실시간으로 입력되는 영상에서 직접, 환경요소를 제거하고 타겟 요소에 해당하는 영상 부분이나 유사 부분에 대해 에센스 메타데이터를 생성하는 에센스 메타데이터 생성부(4510)와 에센스 메타데이터 생성부(4510)로부터의 에센스 메타데이터에 대응하는 에센스 영상을 생성하는 에센스 영상 생성부(4520)를 포함할 수 있다. The essence data providing mechanism 4500 removes the environmental element directly from the real-time input image from the real-time data inputting unit 4120 using the essence-based neural network artificial intelligence generated from the neural network artificial intelligence generating unit 4360, And an essence image generation unit 4520 for generating an essence image corresponding to the essence meta data from the essence meta data generation unit 4510. The essence image generation unit 4520 generates an essence image corresponding to the essence meta data from the essence meta data generation unit 4510, ).

에센스 자료 제공기구(4500)는 전후 CCTV 영상의 변화를 감지하고 전후 영상간의 차이를 영상차분 정보로 생성하여 이벤트의 발생이나 특정지역에서의 상황변경을 알려주는 영상차분 처리부(4530)를 더 포함할 수 있다. 에센스 자료제공기구(4500)는 에센스 메타데이터 또는 에센스 영상에 대해 상기 영상차분 정보를 반영하거나, 상기 영상 전처리 기구(4400)의 전처리 정보 생성부(4450)로부터 나오는 전처리 정보를 기초로 직접 영상차분 정보를 생성하여, 예를 들면, 육안 검사자의 검사를 위해 제어기구(3000)의 입출력부(3100)에 제공할 수도 있다. 입출력부(3100)는 영상차분정보를 영상차분 처리부(4530)가 제공할 때에는 영상의 변화에 대해 알람부(3300)의 시각 또는 청각 알람정보가 함께 제공될 수도 있다. The essence data providing mechanism 4500 further includes an image difference processing unit 4530 for detecting a change in the front and rear CCTV images and generating a difference between the before and after images as image difference information to notify occurrence of an event or change of a situation in a specific area . The essence data providing mechanism 4500 reflects the image difference information on the essence metadata or the essence image or generates the direct image difference information based on the preprocessing information from the preprocessing information generating unit 4450 of the image preprocessing unit 4400 Output section 3100 of the control mechanism 3000 for the inspection of the naked eye examiner, for example. When the image difference processing unit 4530 provides the image difference information to the input / output unit 3100, the time of the alarm unit 3300 or the audible alarm information may be provided together with the change of the image.

에센스 메타데이터 또는 에센스 영상은 특정 이벤트에 대응하는 타겟 관련 영상에 해당할 수도 있고, 일정부분은 환경요소가 배제된 영상, 즉, 타겟관련 영상 대비 전체영상의 축소율이, 예를들면 50 ～ 99% 만큼 낮은 상태의 축소된 검사대상 영상이 될 수도 있다. 축소 검사대상 영상의 경우는, 예를 들면, 사람들의 육안을 통해 최종 타겟관련 영상만을 추출할 수도 있다.The essence meta data or essence image may correspond to a target related image corresponding to a specific event and a certain portion may have a reduction ratio of an entire image excluding an environmental factor, that is, a target related image, for example, 50 to 99% It is possible to reduce the size of the inspection target image. In the case of the reduction inspection target image, for example, only the final target related image may be extracted through the eyes of the people.

제어기구(3000)의 경우는 부분검사 지원 시스템 전체를 제어하는 기능을 담당할 수도 있고, 영상 인식 지원 장치(4000)만을 제어하는 기능을 담당할 수도 있으며, 인공지능기구(4300)의 모의 요소(모의 전처리 요소, 모의 환경 요소, 모의 타겟 요소) 등을 그래픽으로 설정하는 GUI부(3200)와 처리 대상 영상(실시간 영상, 전처리 영상, 에센스 메타데이터, 에센스 영상 등)에서 유의미한 타겟 영상이 나타날 때 검사원에게 알람신호(소리, 알림표시 등)를 제공하는 알람부(3300), 그리고, 모의 요소의 정도나 조건 파라메터를 설정하고, 영상 인식 지원 장치(4000)의 특정 동작의 결과물을 출력하는 입출력부(3100)와, 신경망 인공지능을 업데이트하기 위한 업데이트부(3500)를 포함할 수 있다.In the case of the control mechanism 3000, the control unit 3000 may be responsible for controlling the entire partial inspection support system, controlling only the image recognition support apparatus 4000, (A real-time image, a preprocessed image, an essence metadata, an essence image, etc.) and a target image (a real-time image, a preprocessed image, an essence metadata, an essence image, etc.) An alarm unit 3300 for providing an alarm signal (sound, notification display) to the image recognition support apparatus 4000 and an input / output unit 3100), and an update unit 3500 for updating the neural network artificial intelligence.

그 밖에도, 제어기구(3000)는 처리 작업에서 디코딩 작업, 디텍션작업, 트래킹 작업등을 개별 비디오 영상별로 시차를 두고 할당하는 케스케이딩 처리를 수행하는 케스케이딩 처리부(3400)를 더 포함할 수 있으며, 케이케이딩 처리부(3400)는 디코팅, 데텍션 등 일련의 순차적으로 처리과정이 연결되어 있는 소프트웨어 서브시스템들을 포함하고 있어서 비디오 데이터가 이들 서브시스템에 대응하여 순차적으로 처리되도록 조절된다. 앞에서도 설명한 바와 같이, 종래기술의 경우 동시에 입력되는 CCTV비디오 스트림을 주어진 하드웨어 장비 상에서 소프트웨어적으로 동시병렬로 많은 데이터 스트림을 처리하는 과정에서 병목현상이 발생할 수 있다. 예를들면, 비디오 디코딩, 비디오 내 객체 디텍션, 비디오 클래시피케이션 (Classification), 트래킹 등은 처리과정에 많은 CPU, Memory 버퍼, GPU자원을 사용하기 때문에 수십개의 비디오스트림이 동시에 디코딩, 디텍션, 트래킹 등 소프트웨어 서브시스템들과 같은 무거운 소프트웨어 처리 과정을 하나의 소프트웨어 서브시스템이 동시에 여러 비디오 데이터스트림 처리를 수행할 경우 이 부분이 병목이 될 수 있다. 따라서, 본 발명에서는 이러한 병목해소 방법으로서 케스케이딩 소프트웨어 기구인 케스케이딩 처리부는 디코딩, 디텍션, 트래킹등을 서로 다른 비디오데이타 스트림에 대해서 순차 처리하도록 분리하여 수행하는 소위 캐스케이딩 기법을 사용한다. 즉, 본 발명의 케스케이딩 처리부(3400)는, 영상 전처리 기구(4400)의 동작이나 에센스 자료 제공 기구(4500)의 동작 중에 비디오 스트림을 처리하여 메타데이타를 생성 하는 과정에서, 비디오 스트림 별로 디코딩 작업과 디텍션 작업을 별도 분리하여 서로 다른 비디오 스트림에 할당하는 케스케이딩 혹은 파이프라이닝 기법에 따라서 1번 비디오 스트림에 대해 디코딩 작업을 수행할 때는 다른 비디오스트림은 큐에서 대기를 하고 1번 비디오 스트림의 디코딩이 종료된 후에 2번 비디오 스트림에 대해 다시 디코딩을 시작한다. 한편, 1번 비디오 스트림은 디코딩이 종료된 후 비디오 디텍션 과정을 처리한다. 즉, 한 개의 단일 비디오 스트림이 시차를 두고 차례로 디코딩, 디텍션, 트래킹을 진행할 때 다른 비디오 스트림은 한 단계 연후에 디텍션 디코딩, 트래킹을 진행하는 방식이 된다. 이렇게 하면, 비디오 디텍션과 같은 처리용량이 큰 소프트웨어 과정 중에 모든 비디오 스트림이 대기를 하는 병목현상을 해소할 수 있다. 이를 다시 보다 구체적으로 설명하면, 비디오 디코더 모듈(서브 케스케이딩 소프트웨어 중 하나), 비디오 디텍션 모듈(서브 커스케이딩 소프트웨어 하나)이 있다고 가정하면, 디코더 모듈에서 1번 비디오를 처리후, 그 결과를 디텍션 모듈로 넘기고, 디코더 모듈은 2번 비디오를 받아서 디코딩을 하고, 그 결과를 디텍션 모듈로 넘긴다. 이렇게 할 경우, 디텍션 모듈이 처리용량이 큰(무거운) 처리과정이라고 할 때 디텍션 모듈의 앞과정 혹은 뒤 과정에 가벼운 처리모듈을 둠으로써 병목현상을 해결할 수 있다.In addition, the control mechanism 3000 may further include a cascading processing unit 3400 that performs cascading processing for allocating a decoding operation, a detection operation, a tracking operation, , And the categorization processing unit 3400 includes software subsystems in which a series of sequential processes such as decoating and detection are connected, so that video data is adjusted to be sequentially processed corresponding to these subsystems. As described above, in the related art, a bottleneck may occur in the process of processing a large number of data streams simultaneously in parallel on a given hardware device by simultaneously inputting CCTV video streams. For example, video decoding, object detection in video, video classification, and tracking use a large number of CPUs, memory buffers, and GPU resources for processing, so dozens of video streams can be decoded, This can become a bottleneck when a software subsystem simultaneously processes multiple video data streams, such as software subsystems. Accordingly, in the present invention, the cascading processing unit, which is a cascading software mechanism, uses a so-called cascading technique for separately performing decoding, detection, and tracking on different video data streams in order to solve the bottleneck . That is, in the process of generating the metadata by processing the video stream during the operation of the image preprocessing mechanism 4400 or the operation of the essence data providing mechanism 4500, the cascading processing unit 3400 of the present invention decodes According to the cascading or pipelining method of allocating the task and detection task to different video streams separately, when decoding the video stream 1, the other video stream is queued and the video stream 1 After decoding ends, the decoding of the second video stream is started. On the other hand, the video stream # 1 processes the video detection process after the decoding ends. That is, when one single video stream sequentially decodes, detects, and tracks the parallax, the other video stream proceeds to the detection decoding and tracking after one step. This can eliminate bottlenecks in which all video streams wait during a software process with high processing capacity, such as video detection. More specifically, assuming that there is a video decoder module (one of the sub-cascading software) and a video detection module (one sub-cascading software), the decoder module processes the video 1, The decoder module receives the video 2 and decodes it, and passes the result to the detection module. In this case, if the detection module is a process having a large processing capacity (heavy processing), the bottleneck phenomenon can be solved by placing a light processing module in the process before or after the detection module.

다음에, 도 2를 참조하여 본 발명의 일실시예에 따른 육안 인식 검사원의 부하경감을 위한 등급별 영상인식 기술을 이용한 CCTV 영상내 의심물체 부분검사 지원 시스템 및 방법의 동작에 대하여 설명한다.Next, referring to FIG. 2, an operation of a system and method for supporting a suspicious object part inspection in a CCTV image using a class-specific image recognition technology for reducing the load of a visual recognition surveillant according to an embodiment of the present invention will be described.

스텝 S1: 영상 인식 지원 장치(4000)의 자료입력 저장기구(4100)의 오픈 자료 입력부(41110)가 외부 소스(2000)로부터 골목길, 도로 등에 대해 촬영하여 기록해둔 영상정보 등을 입력하여 종합DB(4200)에 저장한다. 한편, 실시간 자료 입력부(4120)은 특정 행정구역의 주택가, 가로구역, 도로마다 설치된 CCTV계 장치(1000)로부터 현재 실시간으로 입력되는 실시간 CCTV 영상을 입력하여 종합DB(4200)에 입력하는 한편, 영상 전처리 기구(4400) 또는 에센스 자료 제공 기구(4500)에 전달할 수도 있다.Step S1: The open data input unit 41110 of the data input storage mechanism 4100 of the image recognition support apparatus 4000 inputs image information and the like taken from the external source 2000 by photographing the alley road, 4200). Meanwhile, the real-time data input unit 4120 inputs real-time CCTV images input in real-time from the CCTV system 1000 installed in the residential area, the street area, and the road in the specific administrative area and inputs them into the comprehensive DB 4200, To the preprocessing mechanism 4400 or the essence data providing mechanism 4500.

스텝S2: 인공지능기구(4300)의 모의 전처리 요소 준비부(4310)는 종합 DB(4200)에 저장된 오픈 자료를 이용하여 모의 전처리 요소를 준비한다. 모의 전처리 요소는, 예를 들면, 특정 영상의 해상도, 영상의 색상, 도로의 가로수, 도로주변 배경 등과 같이 타겟 영상과 관련없는 부분, 영상 프레임 등이 될 수 있다. 모의 전처리 요소를 이와 같이 선택하는 이유는 타겟 영상과 관련이 없거나 관련성이 상대적으로 적을 경우 그 부분을 영상에서 제외하여 전체적으로 영상 데이터의 분량을 감소시키는 것에 목적을 두기 때문이다. 따라서, 타겟 영상과 관련성이 적은 영상부분이라면 모의 전처리 요소 준비부(4310)가 모의 전처리 요소로서 준비하는 대상이 될 수 있다. 모의 전처리 요소 중 형상으로서 표현될 수 있는 가로수 등은 제어기구(3000)의 GUI부(3200)를 통해 미리 예상 가능한 모든 형상으로 준비하여 둘 수도 있다.Step S2: The simulated preprocessing element preparation unit 4310 of the AI unit 4300 prepares a simulation preprocessing element using the open data stored in the comprehensive DB 4200. The simulated preprocessing element may be, for example, a part of the image not related to the target image, such as the resolution of a specific image, the color of the image, the number of lines of the road, the background around the road, The reason for choosing the simulation preprocessing element is to reduce the amount of image data as a whole by excluding the portion of the image data from the image if the image is not related to the target image or the relevance is relatively low. Accordingly, if the image portion has little relevance to the target image, the simulated preprocessing element preparation unit 4310 can be prepared as a simulated preprocessing element. The tree line, which can be expressed as a shape among the simulation preprocessing elements, may be prepared in all predictable shapes through the GUI unit 3200 of the control mechanism 3000.

이와 같이 준비된 모의 전처리 요소에 대해 모의 전처리 학습부(4320)는 오픈자료를 이용하여 모의 영상과 모의 전처리 요소를 기준으로 모의 전처리 학습을, 예를 들면, 샬로우러닝 또는 딥러닝의 학습방법으로 수행한다. 즉, 모의 전처리 학습부(4320)는 오픈자료 영상의 해상도를 인지하고 모의 해상도 요소 보다 높을 경우 또는 제어기구(3000)의 입출력부(3100)를 통해 미리 설정된 해상도 요소를 기준으로 오픈자료 영상의 해상도가 높을 경우 모의 오픈자료 영상의 해상도를 낮출 수 있다. 예를 들면, 모의 오픈자료 영상의 해상도가 HD 급으로 설정되고, 압축 내지 축소의 기준이 되는 모의 해상도 요소가 CIF(352 ×240) 또는 QCIF(176 ×120)로 설정되어 있으면, 그에 상응하여 모의 영상을 CIF 또는 QCIF 영상으로 변환하는 해상축소 학습을 수행한다. 오픈자료의 모의 영상이 칼라인 경우는 칼라를 흑백 등과 같은 보다 낮은 급의 색상으로 전환하는 색상전환을 학습할 수 있다. 또한, 오픈 자료의 모의 영상이, 예를 들면, 초당 30 프레임으로 구성된 경우, 그 중 1~수개의 프레임으로 축소하는 프레임 축소 학습을 수행할 수 있다. 또한, 오픈자료의 모의 영상이 차량이 통행하는 도로인 경우 주변배경이나 가로수를 제외하고 타겟이 포함될 수 있는 도로, 도로위 차량, 인도의 사람 등으로 한정하는 필요영역 설정 학습을 수행할 수 있다. 이와 같이 모의 전처리 학습부(4320)는 수많은 오픈 자료의 모의 영상에 대한 해상도 축소 학습, 색상전환 학습, 프레임 축소학습, 필요영역 설정학습을 수행하여 다양한 상황에서 영상 데이터량을 축소할 수 있는 학습을 통해 학습결과를 축적하여 둔다. 학습에 적용되는 샬로우러닝 또는 딥러닝은 이미 인공지능분야에서는 그 능력이 확인된 공지된 기법이며, 이 기법과 관련하여서는 차후 타겟 요소 마킹 학습부에 딥러닝을 적용하는 설명을 도 4와 함께 간단히 설명한다.The simulated preprocessing learning unit 4320 performs simulated preprocessing learning based on the simulated image and the simulated preprocessing element using the open data, for example, as a learning method of shallow running or deep learning. do. That is, the simulation preprocessing learning unit 4320 recognizes the resolution of the open source image, and when the resolution of the open source image is higher than that of the simulated resolution element or through the input / output unit 3100 of the control mechanism 3000, The resolution of the simulated open data image can be lowered. For example, if the resolution of the simulated open source image is set to HD and the simulated resolution element for compression or reduction is set to CIF (352 × 240) or QCIF (176 × 120) And performs marine reduction learning for converting an image into a CIF or QCIF image. If the simulated image of the open material is color, it can learn the color conversion to convert the color to a lower grade color such as black and white. Further, when the simulated image of the open data is composed of, for example, 30 frames per second, it is possible to perform frame reduction learning in which the simulation image is reduced to one to several frames. In addition, if the simulated image of the open data is a road on which the vehicle is traveling, it is possible to perform necessary area setting learning to be limited to roads, vehicles on the road, people in India, etc., In this way, the simulation pre-processing learning unit 4320 performs learning to reduce the amount of image data in various situations by performing resolution reduction learning, color conversion learning, frame reduction learning, and required area setting learning for a simulation image of a number of open data And accumulate learning results. Learning or deep learning applied to learning is a well-known technique that has already been confirmed in the field of artificial intelligence. With regard to this technique, explanation of applying deep learning to a target element marking learning unit in the future will be briefly described with reference to FIG. Explain.

스텝S3: 모의 전처리 학습부(4320)로부터 무수히 다양한 모의 영상에 대해 학습된 모의 전처리 학습 결과를 기초로 인공지능 생성부(4360)는 실시간 CCTV 영상에 적용할 수 있는 전처리 프로세스를 위한 엔진으로서 전처리 신경망 인공지능을 생성한다. 종래기술과 본 발명의 가장 큰 차이점은 이와 같은 전처리 공정을 수행하는 주체가 전처리 신경망 인공지능이라는 점이다. 본 발명에 따르면, 실시간으로 입력되는 CCTV 영상에 대해 사람들이 영상마다 일일이 확인하여 전처리를 수행하는 것이 아니라 서치 및 처리공정 기능 엔진으로서의 신경망 인공지능이 이미 학습된 방식으로 전처리를 수행하기 때문에 사람의 노동력을 절감할 수 있다는 점에 본 발명의 일실시예에 따른 효과가 종래 기술의 효과와 차이가 있다.Step S3: Based on the simulated preprocessing learning results obtained from the simulated preprocessing learning unit 4320, the artificial intelligence generating unit 4360 generates a preprocessing neural network Generate artificial intelligence. The most significant difference between the prior art and the present invention is that the subject performing the preprocessing process is the pre-processing neural network artificial intelligence. According to the present invention, since not only the human being perceives the CCTV image input in real time, but performs the preprocessing in a manner that the neural network artificial intelligence as the function engine of search and processing is already learned, The effect according to the embodiment of the present invention is different from the effect of the prior art.

스텝S4: 인공지능기구(4300)의 모의환경 요소 준비부(4330)가 모의 환경요소를 종합 DB(4200)에 저장된 오픈자료로부터 수집한다. 모의 환경요소에는, 예를 들면, 하루 일과 상의 음영변화, 계절적 변화, 날씨변화, 바람에 따른 가로수나 간판의 변형, 개, 고양이 등의 동물의 움직임 등이 모두 환경요소에 포함될 수 있다. 이 중에서 정형화된 모델을 미리 설정해둘 필요가 있을 경우, 제어기구(3000)의 GUI부(3200)를 통해 수동 또는 자동으로 도형화하여 다양한 모델의 환경요소를 갖출 수도 있다.Step S4: The simulated environment element preparation unit 4330 of the artificial intelligence apparatus 4300 collects the simulated environment elements from the open data stored in the comprehensive DB 4200. In the simulated environmental element, environmental factors may include, for example, a shade change on a day, a seasonal change, a weather change, a change of a roadside tree or a sign according to the wind, and a movement of an animal such as a dog or a cat. If a formal model needs to be set in advance, it may be manually or automatically rendered into a graphic form through the GUI unit 3200 of the control mechanism 3000 to provide environment elements of various models.

한편으로, 모의 타겟 요소 준비부(4340)는 검사의 대상이 될 수 있는 대상, 예를 들면, 범죄자, 범죄자 차량과 같은 객체와, 그 객체의 이동, 그 객체의 이동에 따른 다른 객체의 영향(예를 들면, 충돌에 따른 사람의 움직임, 다른 차량이나 기물의 파손 등)과 같이 객체와 관련된 모든 행위 또는 그 행위의 결과를 타겟 요소에 포함시킬 수 있다. 모의 타겟 요소의 경우도 모의 환경요소의 경우와 마찬가지로 형상으로서 정형화가 필요한 부분을 GUI부(3200)를 통해 도형으로서 준비해둘 수도 있다.On the other hand, the mock target element preparation unit 4340 is configured to prepare a mock target element preparation unit 4340 for the object such as an object to be inspected, for example, a criminal, a criminal vehicle, and the influence of other objects (E.g., movement of a person due to a collision, breakage of another vehicle or object, etc.), or the result of the action may be included in the target element. In the case of the simulated target element, a portion that needs to be stylized as a shape may be prepared as a graphic form through the GUI unit 3200 as in the case of the simulated environment element.

준비된 모의 환경요소 및 모의 타겟 요소를 기초로 모의 타겟 마킹 학습부(4350)는 오픈자료의 다양한 상황에 대해 많은 모의 학습을 수만번 내지 수천만번을 수행하여 모의 환경요소와 모의 타겟 요소를 스스로 분별하는 능력을 갖춘 인공지능 능력을 예를 들면, 샬로우러닝 또는 딥러닝 기법을 이용하여 달성한다.Based on the prepared mock environment element and the mock target element, the mock target marking learning unit 4350 performs a lot of mock learning on various situations of the open data from tens of thousands of times to tens of thousands of times to discriminate between the mock environment elements and the mock target elements Ability artificial intelligence abilities are achieved, for example, using shallow running or deep learning techniques.

스텝S5: 신경망 인공지능 생성부(4360)는 모의 타겟 마킹 학습부(4350)의 반복학습을 통해 축적되는 학습데이터를 기초로 타겟의 서치 및 처리 엔진이 되는 에센스 신경망 인공지능을 생성하며, 그 과정은 다음과 같다. Step S5: The neural network artificial intelligence generating unit 4360 generates the essence-based neural network artificial intelligence serving as a target search and processing engine based on the learning data accumulated through the iterative learning of the simulated target marking learning unit 4350, Is as follows.

도 4를 참조하면, 모의 환경요소 준비부(4330)는, 예를 들면, 용의 차량을 객체로 하고, 이 객체와 그 관련 모든 동작이나 상황을 포함하는 타겟 요소를 GUI부(3200)를 통하여 수동 혹은 자동으로 작은 크기의 동영상 또는 스냅샷 사진(D100a, D100b)으로 잘라서 모의 타겟 요소 또는 모의 환경요소인 모의 요소(D100)를 생성할 수 있다. 타겟 요소와는 관련이 없는 환경요소를 스냅샷 사진(D100c)으로 하여 환경요소를 생성할 수 있다. 타겟 요소의 스냅샷 사진(D100a, D100b)에서 사각형 박스는 객체 또는 객체와 관련된 요소, 즉 타겟 요소를 나타내고, 원 또는 타원으로 표현되는 것은 타겟 요소와 관련없는 주변 환경요소를 나타낸다. 이와 같이 타겟 요소에서 특정 객체 하나를 표시한 다각형의 모양, 색상, 시공간별 위치를 변경하여 객체와 비객체를 구별하여 학습시키는 타겟 요소 데이터(D100)를 생성한다. 설명의 편의상 실시예에서의 객체 또는 타겟 요소는 검은색 세단차량이고 특정한 차량번호를 포함하는 것으로 한다. 스냅사진(D100a)의 경우 환경요소가 음영(저녁무렵) 및 주변나무에 해당하고, 타겟 요소는 용의차량에 해당되며, 스냅사진(D100b)은 특정 장소(예를 들면, 도 4의 (b)에서 영상의 빨간 사각형 내에 표시된 “천안”과, 객체를 포함하는 타겟 요소, 그리고 환경요소로서 주변 숲을 나타낸다. 스냅사진 D100C는 타겟 요소가 아닌 환경요소만을 나타내며, 차량이 흰색이므로 타겟에 포함되지 않고 해당 현장이 도로가 아닌 주차구역임을 나타낸다. 4, the simulated environment element preparation unit 4330 takes, for example, a dragged vehicle as an object, and transmits a target element including the object and all related operations or situations to the GUI unit 3200 through the GUI unit 3200 It is possible to manually or automatically create a simulation element D100 as a simulation target element or a simulation environment element by cutting it into a small-sized moving image or snapshot image D100a, D100b. It is possible to create an environmental element by using the snapshot image D100c as an environmental element not related to the target element. In the snapshot photographs (D100a, D100b) of the target element, the rectangular box represents an element related to the object or the object, that is, the target element, and what is represented by a circle or an ellipse represents an environmental element unrelated to the target element. In this manner, the target element data D100 is generated to distinguish between the object and the non-object and to learn by changing the position of the polygon displaying the specific object in the target element by shape, color, and space-time. For convenience of explanation, it is assumed that the object or the target element in the embodiment is a black sedan vehicle and includes a specific vehicle number. In the case of the snapshot D100a, the environmental element corresponds to the shade (in the evening) and the surrounding trees, the target element corresponds to the dragon vehicle, and the snapshot D100b corresponds to a specific place (for example, ), A target element containing an object, and a surrounding forest as an environmental element, represented in the red square of the image in the image. Snapshot D100C represents only the environmental element, not the target element, Indicating that the site is a parking area rather than a road.

도 4의 (b)를 참조하면, 모의 타겟 마킹 학습부(4220)는 오픈 자료 (D111a, D111b, D111c)(예를 들면, 도로교통 상황 데이터 세트)를 사용하여 오픈 자료에 포함된 맑음, 흐림, 비 또는 눈오는 날씨 상황의 환경요소로부터 모의 타겟 요소를 구별할 수 있는 모의 타겟 마킹 학습을 수행한다. 4 (b), the simulated target marking learning unit 4220 uses the open data D111a, D111b, and D111c (for example, road traffic situation data set) , Performs simulated target marking learning that can distinguish a simulated target element from environmental factors in a rain or snowy weather situation.

오픈 자료에는 다수의 도로 상황 이미지가 반영된 오픈자료(D111a～D111c)와 하나 이상의 타겟 요소 및 환경요소를 포함하고 있다. 또한, 오픈 자료들 중 어느 하나의 오픈 자료(D111a)에 도로 영역 정보를 xml파일(D111b) 하나가 동일한 이름으로 포함되어 영상 이미지에 포함되어 있는 영역을 구별할 수 있게 해준다.The open data includes open data D111a to D111c reflecting a number of road situation images and one or more target elements and environment elements. In addition, one of the open data D111a can distinguish the road area information including the xml file D111b included in the image image by including one with the same name.

모의 타겟 마킹 학습부(4350)는 딥러닝(Deep Learning) 혹은 샬로우러닝(Shallow Learning) 기술을 사용하여 환경 변화에 따른 모의 요소 학습 데이터(D100), 오픈 자료(D111a～D111c)로서 도로 교통 영상 데이터 세트(예를 들면, 도 4의 (b)에서의 데이터 세트) 중 하나 이상을 사용하여 타겟 상태를 파악하는 능력을 갖추도록 학습을 수행한다. 즉, 도 4의 (a)를 참조하면, 모의 요소가 빨간 사각형 또는 녹색 원형(또는 타원형)으로 표시된 모의 요소(D100)를 입력으로 도로에서의 타겟 마킹 학습을 하고, 모의 요소(D100)에서 도로 상의 특정 장소와 객체가 반영된 스냅 사진(D100b) 또는 객체가 다각형으로 표시된 영상 이미지를 입력으로 타겟 마킹 학습을 수행하고 객체를 도로 영상 데이터에서 구별할 수 있는 능력을 가진 타겟 마킹 학습정보를 학습을 통해 획득하여 학습된 신경망 인공지능으로서 자체적으로 축적하고 있거나 종합 DB(4200)에 백업으로 저장한다. 이 때 타겟은 모의 타겟 마킹 학습부(4350)가 입력된 영상 이미지에서 타겟인지의 여부를 구별하기 위한 정보로 사용되는 한가지 예로 설명한 것이며, 각각의 영상 이미지인 오픈자료(D111a)에 타겟 요소와 환경요소의 영역 정보를 별도의 xml파일(D111b)에 저장하여 모의 타겟 마킹 학습부(4350)가 복합 요소 세트를 생성할 수도 있다.The simulated target marking learning unit 4350 uses simulated element learning data D100 and open data D111a to D111c as a result of environmental change using Deep Learning or Shallow Learning techniques, Learning is performed to have the ability to grasp the target state using one or more of the data sets (e.g., the data set in Figure 4 (b)). That is, referring to FIG. 4A, a target marking learning is performed on a road by inputting a simulation element D100 in which a simulation element is displayed in a red square or a green circular (or oval) (D100b) that reflects a specific place and object on an object, or target marking learning information capable of performing target marking learning by inputting a video image in which an object is displayed as a polygon and distinguishing the object from road image data, And stores it as a learned neural network artificial intelligence or as a backup in a comprehensive DB (4200). In this case, the target is one example in which the simulated target marking learning unit 4350 is used as information for discriminating whether or not the input image is a target, and the target element and the environment Element area information may be stored in a separate xml file D111b so that the simulated target marking learning unit 4350 may generate a complex element set.

모의 타겟 마킹 학습부(4350)는, 예를 들면, ANN(Artificial Neural Network), 부스팅(Boosting), 랜덤 포리스트(Random forest)와 같은 지도학습(Supervised Learning) 방법을 사용하거나 분류(clustering), ICA(Independent Component Analysis), PCA(Principle Component Analysis), SVD(Singlular Value Decomposition)와 같은 자율학습(Unsupervised Learning) 방법을 사용하여 모의 타겟 마킹 학습을 진행할 수 있다.The simulated target marking learning unit 4350 uses or classifies a supervised learning method such as ANN (Artificial Neural Network), Boosting, or random forest, ICA It is possible to conduct simulation target marking learning using an independent learning method such as Independent Component Analysis (PCA), Principle Component Analysis (PCA), and Singular Value Decomposition (SVD).

도 4의 (c)를 참조하여 모의 타겟 마킹 학습부(4400) 및 신경망 인공지능 생성부(4360)의 타겟 마킹 학습 및 서치 또는 처리 엔진인 에센스 신경망 인공지능의 동작을 좀더 자세하게 설명하면 다음과 같다.The operation of the essence neural network artificial intelligence, which is the target marking learning and search or processing engine of the simulated target marking learning unit 4400 and the neural network artificial intelligence generating unit 4360, will be described in more detail with reference to FIG. 4 (c) .

모의 타겟 마킹 학습부(4400)는 K ×K 픽셀로 구성된 필터(F100)를 모의 요소(D100) 또는 오픈 자료(D110)에 포함된 영상 이미지(D100a ~ D100c, D111a ~ D111c)의 좌측상단부터 우측 하단까지 순차적으로 픽셀을 스캐닝하며 모의 요소의 스냅사진 및 오픈자료(D100a ~ D100c, D111a ~ D111c)의 픽셀과 K ×K 픽셀로 구성된 필터로 내적(dot product)을 구하여 피쳐맵(Feature map)들을 생성하는 컨볼루션 레이어(Convolution Layer)(D210a)를 적용한다. 이때, 모의 요소(D100)로 제공된 타겟 이미지에 포함되어 있는 여러 가지 이미지 특징을 구별하기 위해 컨볼루션 레이어(D210a)에서는 하나의 필터(F100)가 아닌 여러 개의 다중 필터(F100)를 적용하여 피쳐맵을 생성한다.The simulated target marking learning unit 4400 may classify the filter F100 constituted by K × K pixels from the left upper end to the right end of the video images D100a to D100c and D111a to D111c included in the simulated element D100 or the open data D110 The dot product is obtained from the pixels of the snapshot and open data (D100a to D100c, D111a to D111c) of the simulation element and the filter composed of the K × K pixels, and the feature maps A convolution layer D210a is generated. At this time, in order to distinguish the various image features included in the target image provided by the simulation element D100, the convolution layer D210a applies a plurality of multiple filters F100, rather than one filter F100, .

필터(F100)를 통해 생성된 피쳐맵을 입력으로 하여 ReLU(Rectified Linear Unit) 레이어에서 피쳐맵에 표시된 정량적인 값을 타겟 요소, 환경요소의 특징이 어느 정도 있는지 판단할 수 있는 비선형 값으로 바꾸어주는 활성화함수(Activation Function)인 ReLu함수를 적용한 활성화맵(Activation map)을 생성한다.A quantitative value indicated in a feature map in a ReLU (Rectified Linear Unit) layer is converted into a nonlinear value capable of determining the degree of characteristics of a target element and an environmental element, using the feature map generated through the filter F100 as an input Creates an activation map that applies the ReLu function, which is an activation function.

상기 활성화맵을 입력으로 맥스풀 레이어(Maxpool Layer)에서, 예를 들면, 2픽셀로 구성된 맥스풀링 필터(F200)를 스트라이드(stride) 2로 하여 2칸씩 피쳐맵의 좌측상단부터 우측하단까지 순차적으로 스캐닝하며 2픽셀에서 최대값을 뽑아내어 적용한 샘플링맵(sampling map)을 생성한다.The activation map is inputted into the MaxPool layer, for example, a Max-Pulling filter F200 composed of 2 pixels is stride 2, and two spaces are sequentially arranged from the upper left to the lower right of the feature map Scanning and extracting the maximum value from 2 pixels to generate the applied sampling map.

컨볼루션 레이어(D210a, D210b), ReLu 레이어, MaxPool 레이어(D220a, D220b)를 수회 반복적으로 조합하여(도 2의 (c)의 D200에 해당), 모의 요소터(D100)에 포함된 타겟 요소 (또는 환경요소)의 특징을 추출한 최종 샘플링맵을 생성할 수 있다.2C) of the simulation element D100 and the convolution layers D210a and D210b, the ReLu layer and the MaxPool layers D220a and D220b are repeatedly combined (corresponding to D200 in FIG. 2C) Or environment elements) can be generated.

마지막으로 출력된 샘플링맵의 각각의 픽셀값을 연결한 소위 "풀리 커넥티드 레이어"(Fully connected layer)(D300a)에 활성화함수인 Softmax 함수(D300c)를 적용하여 타겟 요소(D400a)환경요소(D400b)에 대한 확률을 0 ~ 1.0 범위에서 구할 수 있는 인공 신경망(Neural Network)(D300)을 구성하고, 필요에 따라 소위 "드롭아웃 레이어(Dropoutlayer)(D300b)를 사용하여 신경망의 뉴런을 랜덤하게 추출하여 학습에 대한 방해공정을 수행하여 모의 요소(D100)에서 너무 치우치는 오버피팅(over-fitting) 현상을 차단한다. 이때, Softmax 함수(D300c)를 적용하여 0 ~ 1 사이의 값에서 결과값을 변경시키면 변환된 결과에 대한 모든 합계가 1이 되므로 확률(Probabilittes) 값으로 사용할 수 있게 된다. 상기 컨볼루션 레이어, ReLu 레이어, MaxPool 레이어, 풀리 커넥티드 레이어(Fully connected layer)에는 컴퓨터 자원이 많이 사용되는 행렬(matrix) 연산이 사용되므로 이러한 행렬연산의 성능 향상을 위해 그래픽 처리장치(GPU)가 포함된 GUI부(3200)(도 1 참조)를 사용할 수 있다. 또한, 상기 표현된 각 레이어(Layer)들의 반복횟수, 실행 위치, 옵션값들은 LeNet, AlexNet, ZFnet, GoogleNet, VGGNet, Res Net과 같이 딥러닝(Deep Learning) 혹은 샬로우러닝(Shallow Learning)의 방법에 따라 다른 응용을 사용할 수 있다.A softmax function D300c, which is an activation function, is applied to a so-called "pulley connected layer " D300a in which the pixel values of the last outputted sampling map are concatenated to obtain a target element D400a, (D300) that can obtain the probability of the neural network (D300) in the range of 0 to 1.0 and construct a randomly extracted neuron of the neural network by using a so-called "dropout layer (D300b) (D300c) is applied to change the result value between 0 and 1. In this case, it is necessary to change the result value between 0 and 1 by applying Softmax function (D300c) The convolution layer, the ReLu layer, the MaxPool layer, and the Fully connected layer are each provided with a computer resource Since a commonly used matrix operation is used, a GUI unit 3200 (see FIG. 1) including a graphics processing unit (GPU) can be used to improve the performance of such a matrix operation. Further, Repeat times, execution positions and option values of layers can be used according to Deep Learning or Shallow Learning methods such as LeNet, AlexNet, ZFnet, GoogleNet, VGGNet and Res Net. have.

스텝 S6: 전처리 신경망 인공지능과 에센스 신경망 인공지능은 일단 완성되면 특별히 새롭게 투입할 모의 요소가 없는 한은 인력이 최소한으로 소요되는 전자동식 또는 반자동식 전처리 엔진과 서치 또는 처리 엔진으로서 기능할 수 있지만, 새로운 모의 요소가 추가되거나 변경될 필요가 있을 경우(예를 들면, 기상이변의 추이로 인한 환경요소의 변화 등이나, 기술의 발달로 인한 모의 환경요소나 모의 전처리 요소의 변경), 새로운 모의 요소의 투입 또는 변경 및 그에 따른 새로운 오픈 자료의 적용에 따른 기계학습 및 새로운 신경망 인공지능의 업데이트가 제어기구(3000)의 업데이트부(3500)를 통해 진행될 수 있다. 따라서, 업데이트가 필요할 때는 스텝 S1부터 스텝 S5까지의 과정을 반복할 수 있고, 업데이트가 없으면 스텝 S7로 진행한다.Step S6: Pretreatment Neural Network Artificial Intelligence and Essence Neural Network Artificial intelligence can function as an electronic or semi-automatic preprocessing engine and a search or processing engine, which requires a minimum of manpower once the simulation is complete, When simulation elements need to be added or changed (for example, changes in environmental factors due to abnormal weather trends or changes in simulated environmental elements or simulation pretreatment elements due to technological development), input of new simulation elements Or updating of the machine learning and new neural network artifacts as a result of the change and the application of the new open data may then proceed through the update unit 3500 of the control mechanism 3000. Therefore, when updating is required, the process from step S1 to step S5 can be repeated, and if there is no update, the process proceeds to step S7.

스텝 S7: CCTV계 장치(1000, 1100)가 일반 CCTV계 장치(1000)의 일반 CCTV 카메라(1010)인지 IP CCTV계 장치(1100)의 IP CCTV 카메라(1110)인지의 여부를 판정한다. Step S7: It is determined whether the CCTV system 1000 or 1100 is a general CCTV camera 1010 of the general CCTV system 1000 or an IP CCTV camera 1110 of the IP CCTV system 1100. [

스텝 S8: 스텝 S7에서, CCTV계 장치(1000, 1100)가 일반 CCTV계 장치(1000)의 일반 CCTV 카메라(1010)인 경우 초급 영상인식 정보 중 적어도 일부를 검사인력의 작업을 통해 습득한다. 만약 일부분만 검사인력을 통해 초급 영상인식 정보를 습득할 경우에는 나머지는 에센스 자료 제공기구(4500)를 통해 획득한다.Step S8: If the CCTV systems 1000 and 1100 are general CCTV cameras 1010 of the general CCTV system 1000 in step S7, at least a part of the basic video recognition information is acquired through the work of the inspection personnel. If a part of the user acquires the basic image recognition information through the inspection personnel, the rest is acquired through the essence data providing mechanism 4500.

스텝 S9: 스텝 S7에서, CCTV계 장치(1000, 1100)가 IP CCTV계 장치(1100)의 IP CCTV 카메라(1110)인 경우 초급 영상인식 정보 중 적어도 일부를 IP CCTV 카메라(1110)의 인공지능을 통해 습득한다. 만약 일부분만 IP CCTV 카메라(1110)를 통해 초급 영상인식 정보를 습득할 경우에는 나머지는 에센스 자료 제공기구(4500)를 통해 획득한다.Step S9: When the CCTV systems 1000 and 1100 are the IP CCTV cameras 1110 of the IP CCTV system 1100 in step S7, at least a part of the entry-level image recognition information is transmitted to the IP CCTV camera 1110 . If only a part of the user acquires the basic image recognition information through the IP CCTV camera 1110, the rest is acquired through the essence data providing mechanism 4500.

스텝 S10: 본 발명은 2가지 모드로 진행될 수 있다. 제1모드는 실시간 CCTV 영상에 대해 전처리 신경망 인공지능과 제어기구(3000)의 케이케이딩 처리부(3400)에 의한 케스케이딩 처리기능(개별 비디오 스트림에 대한 디코딩 작업, 디텍션작업, 트래킹작업 의 할당 기능)에 기초하여 전처리 과정을 거치고 나서 다시 전처리 정보를 생성한 이후, 그 전처리 정보를 기초로 에센스 신경망 인공지능과 캐스케이딩 처리기능을 이용하여 에센스 메타데이터 또는 에센스 영상을 생성하는 과정이고, 제2모드는 실시간 CCTV 영상에 대해 직접 에센스 신경망 인공지능을 이용하여 에센스 메타데이터와 에센스 영상을 생성하는 과정이다. 스텝 S7에서 제1모드가 될 경우 스텝 S8로 진행한다. 즉, 제1모드 및 제2모드의 동작은 기계학습 기술을 이용하여 검사 대상이 되는 전체 영상중 특정 객체의 이동 등과 같은 유의미한 특정장면들을 기록하여 부분검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공할 수 있다. 또한, CCTV 영상 처리과정에서 디코딩 작업, 디텍션 작업, 트래킹작업을 동시에 다른 비디오 스트림에 분할하여 할당함에 따라서, 디텍션 작업, 디코딩 작업, 트래킹작업 중에 다른 비디오스트림이 대기만을 하던 종래기술에 비하여 병목현상을 감소시켜 작업지연과 호스트 서버 또는 컴퓨터의 부하를 크게 감소시킬 수 있다.Step S10: The present invention can proceed in two modes. The first mode includes a pre-processing neural network artificial intelligence for real-time CCTV images and a cascading function by the caching processing unit 3400 of the control mechanism 3000 (assignment of decoding operation, detection operation, and tracking operation for an individual video stream) And generating essence metadata or essence images by using the essence neural network artificial intelligence and cascading processing function based on the preprocessing information after the preprocessing information is generated based on the preprocessing information 2 mode is a process of generating essence metadata and essence image using direct essence neural network artificial intelligence for real time CCTV image. When the first mode is set in step S7, the process goes to step S8. That is, in the first mode and the second mode of operation, it is possible to record specific scenes such as movement of a specific object among the whole images to be inspected by using a machine learning technique, Can be provided. In addition, as CCTV video processing divides decoding work, detection work, and tracking work into different video streams at the same time, it becomes a bottleneck compared to the conventional art in which other video streams are used during detection work, decoding work, To greatly reduce the work delay and the load on the host server or the computer.

스텝 S11: 영상 전처리 기구(4400)는 전처리 신경망 인공지능을 이용하여 해상도 축소 공정, 색상전환 공정, 프레임 축소 공정, 필요영역 설정 공정 중 적어도 하나 이상을 수행하여 전처리 정보를 생성한다. 이 경우, 프레임 전환 공정도 함께 수행할 수도 있다.Step S11: The image preprocessing mechanism 4400 generates at least one of a resolution reduction process, a color conversion process, a frame reduction process, and a required area setting process using the pre-processing neural network artificial intelligence to generate preprocessing information. In this case, a frame switching process may also be performed.

도 3은 전처리 과정을 나타낸 흐름도로 그 프로세스는 다음과 같다.FIG. 3 is a flowchart illustrating a preprocessing process.

서브스텝 PS1: 영상 전처리 기구(4400)가 신경망 인공지능 생성부(4360)로부터 전처리 신경망 인공지능을, 예를들면, 다운로드하거나, 단순히 신경망 인공지능 생성부(4360)를 작동시켜 전처리 공정 엔진으로서 활성화한다.Substep PS1: The image preprocessing mechanism 4400 downloads the pre-processing neural network artificial intelligence from the neural network artificial intelligence generating unit 4360, for example, or simply activates the neural network artificial intelligence generating unit 4360 to activate do.

서브스텝 PS2: 영상 전처리 기구(4400)는 자료입력기구(4100)의 실시간 자료 입력부(4120)로부터 실시간 CCTV 영상을 제공받는다.Substep PS2: The image preprocessing mechanism 4400 receives the real-time CCTV image from the real-time data input unit 4120 of the data input mechanism 4100. [

서브스텝 PS3: 해상도 축소부(4410)가 전처리 신경망 인공지능을 이용하여, 실시간 CCTV 영상의 해상도를 낮춘다. 예를 들면, HD의 영상을 축소하여 CIF 또는 QCIF급으로 낮추어 해상도 축소 영상을 생성한다.Sub-step PS3: The resolution reducing unit 4410 lowers the resolution of the real-time CCTV image using the pre-processing neural network artificial intelligence. For example, the image of HD is reduced to a level of CIF or QCIF to generate a resolution reduced image.

서브스텝 PS4: 색상 전환부(4420)가 전처리 신경망 인공지능을 이용하여 컬러 영상을 흑백영상으로 전환하여 색상 전환 영상을 생성한다.Substep PS4: The color switching unit 4420 converts the color image into the monochrome image using the pre-processing neural network artificial intelligence to generate the color conversion image.

서브스텝 PS5: 필요영역 설정부(4430)가 전처리 신경망 인공지능을 이용하여 영상에서 가로수 등의 도로주변 배경이나 타겟이 아닌 부분을 제외하고 타겟 및 타겟 연관부분을 중심으로 검사 또는 영상처리 필요영역을 설정하여 필요영역 설정 영상을 생성한다.Sub-step PS5: The necessary area setting unit 4430 sets the area required for inspection or image processing around the target and the target-related part excluding the background around the road or the non-target area such as a street tree in the image by using the pre-processing neural network artificial intelligence And generates a necessary area setting image.

서브스텝 PS6: 프레임 축소부(4440)가 전처리 신경망 인공지능을 이용하여 프레임 축소를 실행하여 프레임 축소 영상을 생성한다.Sub-step PS6: The frame reduction unit 4440 performs frame reduction using the pre-processing neural network artificial intelligence to generate a frame reduction image.

서브스텝 PS2 내지 PS6의 각각의 과정 중에, 제어기구(3000)의 캐스케이딩 처리부(3400)에 의해 케스케이딩 처리가 수행될 수도 있다.During the respective steps of the sub-steps PS2 to PS6, the cascading processing by the cascading processing unit 3400 of the control mechanism 3000 may be performed.

PS7: 전처리 정보 생성부(4450)가 PS3 내지 PS6의 전처리 단계에서 수행한 결과물인 해상도 축소 영상, 색상 전환 영상, 필요영역 설정 영상, 프레임 축소 영상을 적절히 취합하여 전처리 정보로 생성한다. 전처리 정보는 서브스텝의 순서대로 이루어진 결과물일 수도 있고, 필요에 따라서 PS3 내지 PS6의 서브스텝이 선택적으로 선정된 결과물일 수도 있다.PS7: The preprocessing information generating unit 4450 appropriately combines the resolution reduction image, the color conversion image, the required area setting image, and the frame reduction image, which are the result of the preprocessing steps of PS3 to PS6, to generate preprocessing information. The preprocessing information may be a result obtained in the order of the sub-steps, or may be a result of selectively selecting sub-steps of PS3 to PS6, if necessary.

스텝 S12: 전처리 프로세스 이후, 에센스 자료 제공기구(4520)는 에센스 신경망 인공지능을 이용하여 전처리 정보에 대해 에센스 메타데이터와 에센스 영상을 생성한다. 생성된 에센스 메타데이터 또는 에센스 영상은 인력에 의한 추가 검사 또는 처리를 요하지 않은 최종 검사일 수도 있고, 전체 영상 검사 과정의 50% 이상을 완성하고, 나머지 50% 내지 1% 범위에서 인력에 의한 추가 검사를 통해 최종 검사를 완성하는 반가공 처리 또는 검색물일 수도 있다. 스텝 S12에서, 제어기구(3000)는 케스케이딩 처리부(3400)를 활성화하여 케스케이딩 처리를 실행할 수도 있다.Step S12: After the preprocessing process, the essence data providing mechanism 4520 generates essence metadata and essence image for the preprocessing information using the essence neural network artificial intelligence. The generated essence metadata or essence image may be a final examination that does not require additional inspection or treatment by the human hand, or may complete more than 50% of the entire imaging procedure and additional testing by manpower in the remaining 50% to 1% range It may be a semi-processed or retrieved product that completes the final inspection. In step S12, the control mechanism 3000 may activate the cascading processing section 3400 to execute the cascading processing.

스텝 S13: 한편, 스텝 S10에서 제1모드가 아닌 제2모드인 경우, 스텝 S13으로 진행하여, 에센스 자료 제공 기구(4500)는 자료입력기구(4100)의 실시간 자료 입력부(4120)로부터 직접 실시간 CCTV 영상을 입력하거나, 종합 DB(4200)로부터 이미 저장된 비실시간 자료를 입력하여 에센스 신경망 인공지능을 통해 에센스 메타데이터를 생성하거나 해당 메타데이터에 상응하는 에센스 영상을 생성한다.If the second mode is not the first mode in step S10, the flow advances to step S13 to transmit the essence data providing mechanism 4500 from the real-time data input unit 4120 of the data input mechanism 4100 directly to the real- Or input non-real-time data already stored from the comprehensive DB 4200 to generate essence metadata through the essence neural network artificial intelligence or generate essence images corresponding to the corresponding metadata.

스텝 S14: 에센스 자료 제공기구(4500)는 전후 CCTV 영상의 변화를 감지하고 전후 영상간의 차이를 영상차분 정보로 생성하여 이벤트의 발생이나 특정지역에서의 상황변경을 알려주는 영상차분 처리 프로세스를 실행한다. 에센스 자료제공기구(4500)는 에센스 메타데이터 또는 에센스 영상에 대해 상기 영상차분 정보를 반영하거나(스텝 S12), 영상 전처리 기구(4400)의 전처리 정보 생성부(4450)로부터 나오는 전처리 정보를 기초로 직접 영상차분 정보를 생성하여, 예를 들면, 육안 검사자의 검사를 위해 제어기구(3000)의 입출력부(3100)에 제공할 수 있다(스텝 S13). 입출력부(3100)는 영상차분정보를 영상차분 처리부(4530)가 제공할 때에는 영상의 변화에 대해 알람부(3300)의 시각 또는 청각 알람정보가 함께 제공될 수도 있다. 검사요원에 의한 육안검사의 경우, 에센스 자료 제공기구(4500)가 제공하는 에센스 메타 데이터나 에센스 영상 중 적어도 하나를 포함하는 에센스 자료는 본 발명의 인공지능 기구(4300)를 이용함에 따라서 완성도가 80% 내지 99%에 이르도록 높일 수 있으므로, 최종 완성검사를 위해 검사요원의 인력이 수반되는 육안검사에 소용되는 인력을 최소한으로 줄일 수 있다. 특히 IP CCTV 카메라(1110)를 사용할 경우 기초 영상인식 정보의 육안검사를 그 만큼 줄일 수 있으므로, 중급 영상인식 정보와 고급 영상인식 정보의 생성 이후 육안검사의 노동력 투입에 한층 여유가 있고 그 만큼 결과적으로 효율적, 질적인 측면에서 높은 향상을 얻을 수 있다.Step S14: The essence data providing mechanism 4500 detects a change in the front and rear CCTV images and generates a difference between the before and after images as image difference information, and executes an image difference processing process to notify generation of an event or change in a specific area . The essence data providing mechanism 4500 reflects the image difference information on the essence metadata or the essence image in step S12 or directly on the basis of the preprocessing information from the preprocessing information generating unit 4450 of the image preprocessing mechanism 4400 The image difference information may be generated and provided to the input / output unit 3100 of the control mechanism 3000 for inspection of, for example, the naked eye examiner (step S13). When the image difference processing unit 4530 provides the image difference information to the input / output unit 3100, the time of the alarm unit 3300 or the audible alarm information may be provided together with the change of the image. Essence data including at least one of the essence metadata and the essence image provided by the essence data providing mechanism 4500 in the case of the visual inspection by the inspection staff can be obtained by using the artificial intelligence apparatus 4300 of the present invention, % To 99%, it is possible to minimize the manpower required for the visual inspection accompanied by the workforce of the inspection personnel for the final completion inspection. Especially, when the IP CCTV camera 1110 is used, it is possible to reduce the visual inspection of the basic image recognition information by as much as that. Therefore, after the generation of the intermediate image recognition information and the advanced image recognition information, there is more room for the labor input of the visual inspection, High efficiency can be obtained in terms of efficiency and quality.

에센스 메타데이터 또는 에센스 영상이 반가공 처리영상인 경우 추가 육안검사를 시행하고, 그렇지 않은 경우, 스텝 S7로 진행하여 다시 스텝S14까지의 프로세스를 반복한다.If the essence meta data or the essence image is a semi-processed image, additional visual inspection is performed. Otherwise, the process proceeds to step S7 and the process up to step S14 is repeated.

스텝 S15: 스텝 S12에서 육안검사가 시행될 경우 제어기구(3000)는 입출력부(3100)에 인식조건을 입력하고, 알람을 요청하면 알람부(3300)는 에센스 영상에서 타겟이 나타날 때마다 알람을 제공하여 검사요원에게 알려준다. 에센스 메타데이터 생성부(4510)와 에센스 영상 생성부(4520), 그리고 영상차분 처리부(4530)에서의 동작이 진행될 때에도 알람부(3300)의 알람기능이 설정조건에 따라서 작동되어 자동검사가 이루어지는 과정에 검사요원에게 타겟 영상의 출현을 실시간으로 알려줄 수도 있다.Step S15: When visual inspection is performed in step S12, the control unit 3000 inputs the recognition condition to the input / output unit 3100, and when an alarm is requested, the alarm unit 3300 generates an alarm every time a target appears in the essence image And informs the inspectors. When the operation of the essence metadata generation unit 4510, the essence image generation unit 4520, and the image difference processing unit 4530 is proceeded, the alarm function of the alarm unit 3300 is operated according to the setting condition, It is possible to inform the inspection personnel in real time about the appearance of the target image.

본 발명에 따르면, 기계학습 기술을 이용하여 검사 대상이 되는 전체 CCTV 영상분석을 통해 유의미한 장면을 찾기 위하여 검사인력이 투입되는데 이를 최소한의 인력으로 최대의 효과를 얻기 위한 방법으로 검사인력이 수행할 업무가 전수검사 보다는 부분검사를 할 수 있도록 구성된다. 초급 영상인식 기능에서는 특정 객체의 이동을 적어도 포함하는 유의미한 특정장면들을 기록하여 검사원들이 부분 육안 검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공하는 할 수 있다.According to the present invention, a test manpower is input to search for a meaningful scene through analysis of the whole CCTV image to be inspected by using a machine learning technology. This method is a method for obtaining the maximum effect with a minimum manpower, Is configured to be able to perform a partial inspection rather than a full inspection. In the beginner image recognition function, it is possible to provide a technique of allowing the inspectors to perform only partial visual inspection by recording meaningful specific scenes including at least movement of a specific object, thereby reducing the manpower required for the complete inspection.

중급 영상인식 기능에서는 초급 영상인식 기능이 지능형 CCTV IP카메라에 기탑재되어 있는 경우로서, 움직임 디텍션, 단순한 객체인식과 같은 유의미한 영상장면과 인식된 메타데이타 정보가 카메라로 부터 출력되는 상황에서 검사원 인력들이 필요로 하는 중급 및 고급 영상인식 수준의 업무를 수행할 수 있도록 하는데 목적이 있다. 초급 수준의 영상인식 데이타들이 하나 혹은 다중 CCTV들로 부터 입력되면 이를 종합하여 보다 복잡한 상황인식 및 카메라 망과 연계된 용의자 추적 등과 같은 유의미한 특정장면들을 기록하여 시스템이 검사 인력들에게 제공함으로써 검사원들이 보다 고급의 부분 육안 검사만 할 수 있도록 하여 전수검사에 필요한 인력을 감축할 수 있는 기법을 제공하는 할 수 있다.In the intermediate image recognition function, the intelligent CCTV IP camera is equipped with a beginner image recognition function. When significant image scenes such as motion detection and simple object recognition and recognized metadata information are outputted from the camera, It is aimed to be able to carry out tasks of intermediate and advanced image recognition level as required. When the first level image recognition data is input from one or multiple CCTVs, the system collects specific scenes such as more complicated situation recognition and suspect tracking linked to the camera network, and provides the system to the inspection personnel. It is possible to provide a technique for reducing the manpower required for the whole inspection by allowing only partial visual inspection of the advanced part.

따라서, 결과적으로, 본 발명은 검사인력을 크게 경감할 수 있으면서도 질적으로 유휴 검사인력을 고급의 부분 육안검사에 집중 투입할 수 있는 여력을 갖출 수 있도록 초급 영상인식 기능, 중급 영상인식 기능을 더욱 강화할 수 있다. 물론 본 발명은 고급 영상인식 기능도 강화하여 궁극적으로는 절대적으로 검사인력을 획기적으로 경감할 수 있는 기반기술도 제공할 수 있다.As a result, the present invention can enhance the intermediate-level image recognition function and the intermediate-level image recognition function so that the inspection manpower can be largely reduced but the quality of the idle inspection workforce can be concentratedly applied to the high- . Of course, the present invention can also provide an advanced image recognition function, which can ultimately provide an infrastructure technology capable of drastically alleviating inspection personnel.

이상과 같이 본 발명의 바람직한 실시예에 대하여 첨부도면을 참조하여 설명하였지만, 본 발명은 이에 한정되지 않으며, 다음의 특허청구범위를 일탈하지 않고도 당분야의 통상의 기술자에 의해 여러 가지 변경 및 변형이 가능하다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It is possible.

1000: 일반 CCTV계 장치 1010: 일반 CCTV 카메라
1020: VMS 1030: 영상DB
1100: IP CCTV계 장치 1110: IP CCTV 카메라
1120: VMS 1130: 영상DB
2000: 외부 소스 3000: 제어기구
3100: 입출력부 3200: GUI부
3300: 알람부 3400: 케스케이딩 처리부
3500: 업데이트부 4000: 영상 인식 지원 장치
4100: 자료 입력 기구 4110: 오픈자료 입력부
4120: 실시간 자료 입력부 4200: 종합 DB
4300: 인공지능 기구 4310: 모의 전처리 요소 준비부
4320: 모의 전처리 학습부 4330: 모의 환경 요소 준비부
4340: 모의 타겟 요소 준비부 4350: 모의 타겟 마킹 학습부
4360: 신경망 인공지능 생성부 4400: 영상 전처리 기구
4410: 해상도 축소부 4420: 색상전환부
4430: 필요영역 설정부 4440: 프레임 축소부
4450: 전처리 정보 생성부 4500: 에센스 자료 제공 기구
4510: 에센스 메타데이터 생성부 4520: 에센스 영상 생성부
4530: 영상차분 처리부1000: General CCTV equipment 1010: General CCTV camera
1020: VMS 1030: Video DB
1100: IP CCTV system device 1110: IP CCTV camera
1120: VMS 1130: Video DB
2000: external source 3000: control device
3100: input / output unit 3200: GUI unit
3300: alarm part 3400: cascade processing part
3500: Update part 4000: Image recognition support device
4100: data input device 4110: open data input device
4120: real-time data input unit 4200: comprehensive DB
4300: Artificial intelligence instrument 4310: Simulation preprocessing element preparation part
4320: simulation preprocessing learning unit 4330: simulation environment element preparation unit
4340: simulated target element preparing unit 4350: simulated target marking learning unit
4360: Neural network artificial intelligence generating unit 4400: Image preprocessing mechanism
4410: resolution reduction unit 4420: color conversion unit
4430: Required area setting unit 4440: Frame narrowing unit
4450: preprocessing information generating unit 4500: essence data providing mechanism
4510: Essence metadata generation unit 4520: Essence image generation unit
4530: Image difference processing unit

Claims

To provide a technique which can reduce the manpower required for the total inspection by allowing only partial visual inspection by recording meaningful specific scenes including at least the movement of a specific object among the whole images to be inspected by using the machine learning technology A beginner image recognition function that detects at least a motion in a strange state different from a normal state and a beginner image recognition function that is detected according to the beginner image recognition function and a semantic level meaning that a specific event is defined in a single CCTV camera is generated And the intermediate image recognition function according to the intermediate image recognition function and collects the intermediate image recognition information according to the intermediate image recognition function to generate an advanced image recognition function for generating the interlocking information from a plurality of CCTV cameras. Supporting inspection of suspicious objects in CCTV images using recognition technology A system;
An external source that provides a pre-arranged open source of known open data that includes at least CCTV image information of events covering events, accidents, events, and predictable or unpredictable events occurring in the jurisdiction of the general administrative district; ;
A CCTV system for inputting video information including the event in real time;
And an image recognition support device for supporting image recognition by inputting open data of the external source and real-time image data from the CCTV system,
Wherein the image recognition support apparatus comprises:
A data input mechanism including an open data input unit for inputting the open data from the external source and a real time data input unit for inputting real time image data from the CCTV system for photographing the event in real time;
Extracting and preparing an action or an action related to the event and a corresponding event and a result according to an action or an action using the open data input from the open data input unit of the input mechanism in the open data, And the result of the action or action associated with the object other than the element or the object and the temporal and natural phenomenon changes expected to occur in the road are extracted and prepared as simulation environment elements and the simulation environment element and the simulation target A simulation target learning unit for applying a learning result to a CCTV image including an event among the open data to perform a machine learning for distinguishing a target element of the event from an environmental element, The image recognition information, the intermediate image recognition information, An artificial intelligence device including an artificial intelligence generating unit for generating an essence neural network artificial intelligence having a function of recognizing a target element by itself as a reference information for discriminating a beam;
The real-time data input unit receives the real-time CCTV image information using the essence-based neural network artificial intelligence and automatically recognizes the event generation and the target element related to the corresponding event, And an essence data providing mechanism for providing essence data that is at least one of the advanced image recognition information;
And the system for supporting suspicious object part inspection in a CCTV image using a class image recognition technology for load reduction of a visual recognition surveillant.

The method according to claim 1,
Wherein the image recognition support apparatus comprises:
Before the essence data providing apparatus recognizes the target from the real-time CCTV image information and provides the essence data, the portion of the real-time CCTV image information that is not directly related to the target element is removed or reduced to obtain the data amount of the real- Further comprising an image preprocessing mechanism for generating preprocessing information for reducing the image quality of the image,
The artificial intelligence device further includes a simulation preprocessing element preparation unit for preparing a preprocessing element which is an identification element related to the preprocessing and a simulation preprocessing learning unit for mechanically learning the simulation preprocessing based on the simulation preprocessing element by an image difference or a deep learning method ,
Wherein the neural network artificial intelligence generating unit further generates a pre-processing neural network artificial intelligence capable of automatically performing a pre-processing based on the simulation pre-processing learning results accumulated in the simulation pre-processing learning unit,
Wherein the image preprocessing mechanism automatically generates the preprocessing information using the preprocessing neural network artificial intelligence and the essence data is generated based on the preprocessing information. Support system for suspicious object part inspection in CCTV image using.

3. The method of claim 2,
The image preprocessing mechanism includes a resolution reduction unit that reduces the resolution of the CCTV image, a color conversion unit that simplifies the color type of the CCTV image, a CCTV object that is not related to the target, A frame reduction unit for reducing the number of overlapping portions among the frames per second in the CCTV image; and at least one of an operation result of the resolution reduction unit, the color switching unit, the required area setting unit, And a pre-processing information generating unit for generating a part of the pre-processing information for image recognition as a preprocessing information for image recognition.

4. The method according to any one of claims 1 to 3,
The basic image recognition information includes at least the result extracted by the inspection personnel or at least the output formed directly from the IP CCTV,
Wherein the essence data providing mechanism provides at least one of the intermediate image recognition information or the advanced image recognition information as essence data at a completion rate of 80% to 99% or more corresponding to the exclusion rate of the inspection personnel. A Support System for Inspection of Suspicious Objects in CCTV Images Using Classified Image Recognition Technology for Mitigation.

5. The method of claim 4,
The essence data automatically displays meaningful events in the real time CCTV image information input from the real time data input unit using the essence neural network artificial intelligence, and displays positional information and target elements of the original image related to the corresponding events, And then, if necessary, the metadata is the data configured to be inspectable only for the meaningful scenes, or the data is the inspection completed. In the CCTV image using the class-based image recognition technology for reducing the load of the visual recognition examiner, Support system.

6. The method of claim 5,
The essence data providing apparatus further includes an image difference processing unit for detecting a change of the front and rear CCTV images and generating a difference between the before and after images as image difference information to notify occurrence of an event or change of a situation in a specific area,
Wherein the essence data providing mechanism reflects the image difference information to the essence metadata or the essence image or generates direct image difference information based on the preprocessing information from the preprocessing information generating unit of the image processing apparatus A System for Supporting Suspect Body Partial Inspection in CCTV Images Using Classified Image Recognition Technology for Reducing Load of Visual Recognition Surveyor.

6. The method of claim 5,
Wherein the system further comprises a control mechanism for controlling the entire system, and the control mechanism is integrally or directly telecommunicated with the image recognition support apparatus,
The control mechanism includes:
A casing for performing a cascading process of separating a decoding operation, a detection operation, a tracking operation and the like in the CCTV image processing operation of the image preprocessing mechanism and the essence data providing mechanism alternately one by one to different recognition target video streams, Processing unit,
Wherein the artificial intelligence mechanism is periodically or intermittently updated according to addition or modification of open data from the external source, real-time data provided from the CCTV system device, and environment elements and target elements. Supporting System for Suspect Body Partial Inspection in CCTV Images Using Level - Aware Image Recognition Technology.

8. The method of claim 7,
The control mechanism includes:
An input / output unit which sets an operation condition of an image preprocessing mechanism or an essence data providing mechanism, monitors the operation of the image recognition support apparatus and has a UI function,
A GUI unit that provides a function of manually or automatically generating a portion of a simulation preprocessing element, a simulation environment element, and a simulated target element that can be prepared in advance in a shape,
An alarm unit for notifying the appearance of a target element in real time when essence data is provided by the essence data providing apparatus,
At least,
Wherein the machine learning includes at least one of Deep Learning and Shallow learning,
Wherein the image recognition support apparatus further comprises a comprehensive database (DB) for storing data generated during the entire operation of the apparatus. Partial inspection support system.

To provide a technique which can reduce the manpower required for the total inspection by allowing only partial visual inspection by recording meaningful specific scenes including at least the movement of a specific object among the whole images to be inspected by using the machine learning technology A beginner image recognition function that detects at least a motion in a strange state different from a normal state and a beginner image recognition function that is detected according to the beginner image recognition function and a semantic level meaning that a specific event is defined in a single CCTV camera is generated And the intermediate image recognition function according to the intermediate image recognition function and collects the intermediate image recognition information according to the intermediate image recognition function to generate an advanced image recognition function for generating the interlocking information from a plurality of CCTV cameras. Supporting inspection of suspect objects in CCTV images using recognition technology A;
The method may include providing previously open publicly known information including at least CCTV image information of events, accidents, events, and events covering predictable or unpredictable events occurring in a general administrative district jurisdiction An external source; A CCTV system for inputting video information including the event in real time; The system is used in a support system for a suspicious object visual recognition part inspection in a CCTV image using a moving object detection technology including an open source of the external source and an image recognition support device for inputting real time image data from a CCTV system device,
A data input step in which the image recognition support apparatus inputs the open data from the external source and inputs real-time image data from a CCTV system which photographs the event in real time;
A target element preparation step in which the image recognition support apparatus extracts, from the open data, an action or an action related to the event and a corresponding event and a result according to an action or an action thereof as a target element;
An environment in which the image recognition support device extracts and prepares the temporal and natural phenomenon changes expected to occur on the road and the results of the actions or actions related to the objects or the objects other than the target element, Element preparation step;
A simulated target marking learning step of performing the machine learning in which the image recognition support device applies the simulated environment element and the simulated target element to the CCTV image including the event among the open data to discriminate the target element of the event from the environmental element Wow;
Wherein the neural network artificial intelligence generating unit of the image recognition support apparatus accumulates the learning result in the simulated target marking learning unit as reference information for discriminating the elementary image recognition information, the intermediate image recognition information, and the advanced image recognition information, A neural network artificial intelligence generating step of generating an essence neural network artificial intelligence having a function of recognizing a target element;
The image recognition support apparatus automatically recognizes the generation of an event and a target element related to the event in real-time CCTV image information input from the real-time data input unit using the essence neural network artificial intelligence, , Essence data providing at least one of the intermediate image recognition information and the advanced image recognition information;
A method for supporting a suspicious object part inspection in a CCTV image using a class image recognition technology for reducing the load of a visual recognition surveillant.

10. The method of claim 9,
Processing step of generating pre-processing information for reducing the amount of real-time CCTV image information to be recognized by removing or reducing a part not directly related to the target element from the real-time CCTV image information before the essence data providing step,
Further included,
Before the image preprocessing step,
A preprocessing element preparation step of preparing a preprocessing element which is an identification element related to the preprocessing;
A simulation preprocessing learning step of mechanically learning the simulation preprocessing based on the simulation preprocessing element by a deep learning method;
Wherein the neural network artificial intelligence generator generates a pre-processing neural network artificial intelligence capable of automatically performing pre-processing based on the simulation pre-processing learning results accumulated in the simulation pre-processing learning unit;
Further comprising:
Wherein the pre-processing information is automatically generated using the pre-processing neural network artificial intelligence in the image preprocessing step, and the essence data is generated based on the preprocessing information. A Method for Supporting Suspect Object Partial Inspection in CCTV Images Using.

11. The method of claim 10,
Wherein the image preprocessing step comprises:
A lower resolution sub-step for lowering the resolution of the CCTV image,
A color conversion unit step for simplifying a color type of a CCTV image,
A necessary region setting sub-step of excluding an object not related to a target in the CCTV and setting a portion, which is relevant to the target or the target,
A frame reduction sub-step of lowering the number of redundant portions of frames per second in the CCTV image,
And a preprocessing information generating sub-step of generating at least a part of the operation result of the resolution reducing sub-step, the color switching sub-step, the required area setting sub-step, and the frame reducing sub-step as preprocessing information for image recognition, A Supporting Method of Suspect Body Partial Inspection in CCTV Images Using Class - level Image Recognition Technology for Load Reduction of Surveyor.

12. The method according to any one of claims 9 to 11,
The basic image recognition information includes at least the result extracted by the inspection personnel or at least the output formed directly from the IP CCTV,
Wherein the essence data providing mechanism provides at least one of the intermediate image recognition information or the advanced image recognition information as essence data at a completion rate of 80% to 99% or more corresponding to the exclusion rate of the inspection personnel. A Method for Supporting Suspicious Object Detection in CCTV Images Using Classified Image Recognition Technology for Mitigation.

13. The method of claim 12,
The essence data automatically displays meaningful events in the real time CCTV image information input from the real time data input unit using the essence neural network artificial intelligence, and displays positional information and target elements of the original image related to the corresponding events, And if it is necessary, it is the data configured to be able to inspect only the significant scenes that are meta data if necessary, or the semi-processed data in which the inspection is completed or the final inspection is completed through the additional inspection of the inspection personnel. A Method for Supporting Suspicious Object Detection in CCTV Images Using Classified Image Recognition Technology for Mitigation.

14. The method of claim 13,
After the pre-processing step, the essence data providing apparatus further includes an image difference processing step of detecting a change in the front and rear CCTV images and generating a difference between the before and after images as image difference information to notify occurrence of an event or change in a specific area In addition,
In the image difference processing step, the essence data providing mechanism may reflect the image difference information to the essence metadata or the essence image, or may directly reflect the image difference information based on the preprocessing information from the preprocessing information generating unit of the image processing apparatus A method for supporting a suspicious object part inspection in a CCTV image using a class image recognition technology for load reduction of a visual recognition surveillant.

14. The method of claim 13,
Wherein the system further comprises a control mechanism for controlling the entire system, and the control mechanism is integrally or directly telecommunicated with the image recognition support apparatus, and during the image preprocessing step or the essence data providing step,
In the CCTV image processing operation of the image preprocessing mechanism and the essence data providing mechanism, the caching processing unit of the control mechanism separates the decoding operation of the recognition operation, the detection operation, the tracking operation, etc. and assigns them alternately to the different recognition target video streams A cascading process step of:
Further comprising the step of periodically or intermittently updating the neural network artificial intelligence according to addition or change of open data from the external source and real-time data provided from the CCTV system device, and environment elements and target elements, A Supporting Method for Suspect Body Partial Inspection in CCTV Images Using Classified Image Recognition Technology for Reducing the Load of Awareness Surveyor.

14. The method of claim 13,
Wherein the system further comprises a control mechanism for controlling the entire system, and the control mechanism is integrally or directly telecommunicated with the image recognition support apparatus,
At least one of the steps of preparing the simulated target element, preparing the simulated environment element, and preparing the simulated preprocessing element,
An input / output step of setting the operating condition of the image preprocessing step or the essence data providing step and monitoring the operation of the image recognition support apparatus,
A GUI step for providing a function of manually or automatically generating a part which can be preliminarily prepared in a shape among a simulation preprocessing element, a simulation environment element, and a simulated target element,
An alarm step for notifying the appearance of the target element in real time when the essence data providing apparatus is provided with the recognition work of the CCTV image by the essence data providing apparatus,
At least,
Wherein the machine learning includes at least one of Deep Learning and Shallow learning,
The method as claimed in claim 1, further comprising the step of storing data generated by an overall database (DB) of the image recognition support apparatus in the entire operation of the apparatus. How to support the inspection of my suspicious objects.