KR102594256B1

KR102594256B1 - Method, program, and apparatus for monitoring behaviors based on artificial intelligence

Info

Publication number: KR102594256B1
Application number: KR1020220153012A
Authority: KR
Inventors: 원동일; 이지훈; 신호환
Original assignee: 주식회사 에딘트
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-10-26
Also published as: WO2024106605A1

Abstract

According to an embodiment of the present disclosure, an artificial intelligence-based behavior monitoring method, program, and apparatus are provided. The method comprises the steps of: synchronizing first image data captured in a first direction based on the face of a person subject to behavior monitoring, and second image data captured in a second direction different from the first direction; generating an analysis result for at least one detection item included in each of a plurality of detection targets based on each of the synchronized first image data and second image data by using a pre-trained neural network model; and estimating a behavior of the person by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data. Therefore, the method can complexly determine and accurately monitor what behavior the person takes in a specific environment based on various detection results.

Description

Artificial intelligence-based behavior monitoring method, program, and device {METHOD, PROGRAM, AND APPARATUS FOR MONITORING BEHAVIORS BASED ON ARTIFICIAL INTELLIGENCE}

본 개시는 데이터 분석 기술에 관한 것으로, 구체적으로 인공지능을 기반으로 하는 사람의 행동에 대한 복합적 판단 결과를 토대로 추정하고 모니터링 하는 방법 및 장치에 관한 것이다.This disclosure relates to data analysis technology, and specifically to methods and devices for estimating and monitoring based on complex judgment results about human behavior based on artificial intelligence.

특정 목적에 따라 구축된 환경 내에서, 사람이 취한 행동을 확인하고 사람이 취한 행동이 어떠한 결과를 초래하는지를 분석하는 것이 필요한 상황이 발생한다. 예를 들어, 시험을 치르는 교육 환경에서는, 시험 시간 동안 응시자가 어떠한 행동을 취하는지 모니터링 할 필요가 있다. 특히, 오프라인 시험과는 달리 온라인 시험은 응시자의 행동과 주변 환경을 효과적으로 확인하기 어렵다. 따라서, 온라인 시험을 치르는 환경에서는, 관리자가 응시자가 취한 행동을 실시간으로 정확히 분석하여 부정행위가 있었는지를 확인하는 것이 더욱 중요하다.Within an environment built for a specific purpose, situations arise where it is necessary to check the actions taken by a person and analyze the consequences of the actions taken by the person. For example, in an educational environment where exams are administered, there is a need to monitor what actions the test taker takes during exam time. In particular, unlike offline exams, it is difficult to effectively check the test taker's behavior and surrounding environment in online exams. Therefore, in an online exam environment, it is more important for administrators to accurately analyze the actions taken by test takers in real time to determine whether there has been any cheating.

상술한 예시를 통해 알 수 있듯이, 온라인을 통해 구축된 환경에서 사람의 행동과 주변 환경을 효과적으로 모니터링 하는 것은 쉽지 않다. 카메라 등과 같은 감지 장치를 이용하여 사람의 특정 행동을 분석하는 종래 기술들이 존재하지만, 특정 상황에서 획득되는 단편적인 정보만을 토대로 사람의 특정 행동을 분석하는 것이 대부분이다. 다만, 이와 같이 단편적인 정보만을 토대로 분석이 수행되면, 특정 환경에서 판단이 요구되는 행동을 사람이 취하고 있는 것인지가 정확하게 해석될 수 없다. 예를 들어, 온라인 시험을 치르는 환경에서 사람을 촬영한 정면 이미지만을 분석하여 부정행위를 감지하는 경우, 정면 이미지에서 획득 가능한 제한된 정보로 인해 부정행위로 의심됨에도 부정행위로 판단하지 못하거나 부정행위가 아님에도 부정행위로 오판단하게 될 확률이 높아진다. As can be seen from the above examples, it is not easy to effectively monitor human behavior and the surrounding environment in an environment built online. Although there are conventional technologies that analyze specific human behavior using sensing devices such as cameras, most of them analyze specific human behavior based only on fragmentary information obtained in specific situations. However, if analysis is performed based on only fragmentary information like this, it cannot be accurately interpreted whether a person is taking an action that requires judgment in a specific environment. For example, when cheating is detected by analyzing only frontal images taken of people in an online exam environment, the limited information available from the frontal images makes it impossible to judge it as cheating or even if it is suspected to be cheating. Even if this is not the case, the probability of being misjudged as cheating increases.

대한민국 공개특허공보 제10-2007-0050029호(2007.05.14.)Republic of Korea Patent Publication No. 10-2007-0050029 (May 14, 2007)

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 다양한 감지 결과를 토대로 특정 환경 내에서 사람이 어떠한 행동을 취하는지를 복합적으로 판단하고 정확하게 모니터링 할 수 있는 방법 및 장치를 제공하고자 한다.The present disclosure was developed in response to the above-described background technology, and seeks to provide a method and device that can complexly determine and accurately monitor what actions a person takes within a specific environment based on various detection results.

다만, 본 개시에서 해결하고자 하는 과제는 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재를 근거로 명확하게 이해될 수 있을 것이다.However, the problems to be solved by this disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood based on the description below.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 컴퓨팅 장치에 의해 수행되는, 인공지능 기반 행동 모니터링 방법이 개시된다. 상기 방법은, 행동 모니터링의 대상이 되는 사람의 얼굴을 기준으로 제 1 방향에서 촬영된 제 1 영상 데이터, 및 상기 제 1 방향과 상이한 제 2 방향에서 촬영된 제 2 영상 데이터를 동기화(synchronization) 하는 단계; 사전 학습된 신경망 모델을 사용하여, 상기 동기화 된 제 1 영상 데이터와 제 2 영상 데이터 각각을 기초로 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성하는 단계; 및 상기 동기화 된 제 1 영상 데이터에 기반한 제 1 분석 결과와 상기 동기화 된 제 2 영상 데이터에 기반한 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 단계를 포함할 수 있다.An artificial intelligence-based behavior monitoring method performed by a computing device is disclosed according to an embodiment of the present disclosure for realizing the above-described task. The method synchronizes first image data captured in a first direction based on the face of a person subject to behavior monitoring, and second image data captured in a second direction different from the first direction. step; Using a pre-trained neural network model, generating an analysis result for at least one sensed item included in each of a plurality of sensed objects based on each of the synchronized first and second image data; and estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data.

대안적으로, 상기 감지 항목은, 상기 감지 대상의 하위 클래스를 기준으로 식별되는 상태 정보일 수 있다. 그리고, 상기 상태 정보는, 상기 사람의 행동에 따라 변화 가능할 수 있다.Alternatively, the sensed item may be status information identified based on the subclass of the sensed object. And, the status information may be changeable depending on the person's behavior.

대안적으로, 상기 복수의 감지 대상들은, 상기 사람의 신체 부위에 더해, 상기 사람을 제외한 사물, 상기 사람의 행동과 연관된 객체의 소리 혹은 상기 사람의 행동과 연관된 객체의 시간 중 적어도 하나를 포함할 수 있다.Alternatively, the plurality of sensing objects may include, in addition to body parts of the person, at least one of an object other than the person, a sound of an object associated with the person's action, or a time of an object associated with the person's action. You can.

대안적으로, 상기 신경망 모델은, 상기 제 1 방향에서 촬영된 영상 데이터를 기반으로 사람의 시선을 추적하는 제 1 모델; 및 상기 제 2 방향에서 촬영된 영상 데이터를 기반으로 사람의 포즈(pose)를 추정하는 제 2 모델을 포함할 수 있다.Alternatively, the neural network model may include: a first model that tracks a person's gaze based on image data captured in the first direction; and a second model that estimates the pose of the person based on image data captured in the second direction.

대안적으로, 상기 사전 학습된 신경망 모델을 사용하여, 상기 동기화 된 제 1 영상 데이터와 제 2 영상 데이터 각각을 기초로 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성하는 단계는, 상기 동기화 된 제 2 영상 데이터를 상기 제 2 모델에 입력하여 추출된 사람의 얼굴에 대한 특징점과 사람의 몸에 대한 특징점이 소정의 각도 이상 어긋나는 경우, 상기 동기화 된 제 1 영상 데이터를 상기 제 1 모델에 입력하여 추출된 특징점들의 좌표계를 기초로 상기 동기화 된 제 2 영상 데이터에서 추출된 특징점들을 보정하는 단계를 포함할 수 있다.Alternatively, using the pre-trained neural network model, generate an analysis result for at least one detection item included in each of the plurality of detection objects based on each of the synchronized first image data and second image data. In the step of inputting the synchronized second image data into the second model, if the extracted feature points for the human face and the feature points for the human body are misaligned by more than a predetermined angle, the synchronized first video data It may include correcting the feature points extracted from the synchronized second image data based on the coordinate system of the feature points extracted by inputting them into the first model.

대안적으로, 상기 동기화 된 제 1 영상 데이터에 기반한 제 1 분석 결과와 상기 동기화 된 제 2 영상 데이터에 기반한 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하기 위한 판단 조건을 결정하는 단계는, 상기 제 1 분석 결과에 따른 제 1 예상 행동 및 상기 제 2 분석 결과에 따른 제 2 예상 행동의 일치 여부를 판단하는 단계; 상기 제 1 분석 결과 혹은 상기 제 2 분석 결과 중 적어도 하나의 신뢰도를 추정하는 단계; 및 상기 판단된 일치 여부 및 상기 추정된 신뢰도를 기반으로 상기 제 1 분석 결과와 상기 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 단계를 포함할 수 있다.Alternatively, determining a judgment condition for estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data comprises: , determining whether a first expected behavior according to the first analysis result and a second expected behavior according to the second analysis result match; estimating reliability of at least one of the first analysis result or the second analysis result; and estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability.

대안적으로, 상기 판단된 일치 여부 및 상기 추정된 신뢰도를 기반으로 상기 제 1 분석 결과와 상기 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 단계는, 상기 제 1 예상 행동과 상기 제 2 예상 행동이 일치하는 경우, 상기 일치된 예상 행동을 상기 사람의 행동으로 추정하는 단계를 포함할 수 있다.Alternatively, the step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability may include the first expected behavior and the first expected behavior. 2 If the expected behavior matches, a step of estimating the matched expected behavior as the person's behavior may be included.

대안적으로, 상기 판단된 일치 여부 및 상기 추정된 신뢰도를 기반으로 상기 제 1 분석 결과와 상기 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 단계는, 상기 제 1 예상 행동과 상기 제 2 예상 행동이 불일치 하고, 상기 제 2 분석 결과의 신뢰도가 임계값 미만인 경우, 상기 제 1 예상 행동을 상기 사람의 행동으로 추정하는 단계를 포함할 수 있다.Alternatively, the step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability may include the first expected behavior and the first expected behavior. 2 If the expected behavior is inconsistent and the reliability of the second analysis result is less than a threshold, the method may include estimating the first expected behavior as the person's behavior.

대안적으로, 상기 판단된 일치 여부 및 상기 추정된 신뢰도를 기반으로 상기 제 1 분석 결과와 상기 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 단계는, 상기 제 1 예상 행동과 상기 제 2 예상 행동이 불일치 하고, 상기 제 2 분석 결과의 신뢰도가 임계값 이상인 경우, 상기 제 1 분석 결과와 상기 제 2 분석 결과를 결합하여 도출된 판단 조건을 토대로, 상기 사람의 행동을 추정하는 단계를 포함할 수 있다.Alternatively, the step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability may include the first expected behavior and the first expected behavior. 2 If the expected behavior is inconsistent and the reliability of the second analysis result is more than a threshold, estimating the person's behavior based on the judgment condition derived by combining the first analysis result and the second analysis result It can be included.

대안적으로, 상기 제 1 방향은, 상기 얼굴의 정면 방향이고, 상기 제 2 방향은, 상기 얼굴의 측면 방향일 수 있다.Alternatively, the first direction may be a frontal direction of the face, and the second direction may be a side direction of the face.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 컴퓨터 판독가능 저장 매체에 저장된 컴퓨터 프로그램(program)이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에서 실행되는 경우, 인공지능을 기반으로 행동을 모니터링 하기 위한 동작들을 수행하도록 한다. 이때, 상기 동작들은, 행동 모니터링의 대상이 되는 사람의 얼굴을 기준으로 제 1 방향에서 촬영된 제 1 영상 데이터, 및 상기 제 1 방향과 상이한 제 2 방향에서 촬영된 제 2 영상 데이터를 동기화 하는 동작; 사전 학습된 신경망 모델을 사용하여, 상기 동기화 된 제 1 영상 데이터와 제 2 영상 데이터 각각을 기초로 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성하는 동작; 및 상기 동기화 된 제 1 영상 데이터에 기반한 제 1 분석 결과와 상기 동기화 된 제 2 영상 데이터에 기반한 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정하는 동작을 포함할 수 있다.According to an embodiment of the present disclosure for realizing the above-described object, a computer program stored in a computer-readable storage medium is disclosed. When the computer program runs on one or more processors, it performs operations for monitoring behavior based on artificial intelligence. At this time, the operations include synchronizing first image data captured in a first direction with respect to the face of the person subject to behavior monitoring, and second image data captured in a second direction different from the first direction. ; Using a pre-trained neural network model, generating an analysis result for at least one detection item included in each of a plurality of detection objects based on each of the synchronized first image data and the second image data; and an operation of estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 인공지능을 기반으로 행동을 모니터링 하기 위한 컴퓨팅 장치가 개시된다. 상기 장치는, 적어도 하나의 코어(core)를 포함하는 프로세서; 상기 프로세서에서 실행 가능한 프로그램 코드(code)들을 포함하는 메모리(memory); 및 영상 데이터를 획득하기 위한 네트워크부(network unit)를 포함할 수 있다. 이때, 상기 프로세서는, 행동 모니터링의 대상이 되는 사람의 얼굴을 기준으로 제 1 방향에서 촬영된 제 1 영상 데이터, 및 상기 제 1 방향과 상이한 제 2 방향에서 촬영된 제 2 영상 데이터를 동기화 하고, 사전 학습된 신경망 모델을 사용하여, 상기 동기화 된 제 1 영상 데이터와 제 2 영상 데이터 각각을 기초로 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성하며, 상기 동기화 된 제 1 영상 데이터에 기반한 제 1 분석 결과와 상기 동기화 된 제 2 영상 데이터에 기반한 제 2 분석 결과를 조합하여, 상기 사람의 행동을 추정할 수 있다.According to an embodiment of the present disclosure for realizing the above-described task, a computing device for monitoring behavior based on artificial intelligence is disclosed. The device includes a processor including at least one core; a memory containing program codes executable on the processor; And it may include a network unit for acquiring image data. At this time, the processor synchronizes first image data captured in a first direction and second image data captured in a second direction different from the first direction based on the face of the person subject to behavior monitoring, Using a pre-trained neural network model, an analysis result for at least one detection item included in each of a plurality of detection objects is generated based on each of the synchronized first image data and the second image data, and the synchronized The person's behavior can be estimated by combining a first analysis result based on the first image data and a second analysis result based on the synchronized second image data.

본 개시는 다양한 감지 결과를 토대로 특정 환경 내에서 사람이 어떠한 행동을 취하는지를 복합적으로 판단하고 정확하게 모니터링 할 수 있는 방법 및 장치를 제공할 수 있다.The present disclosure can provide a method and device that can complexly determine and accurately monitor what actions a person takes within a specific environment based on various detection results.

도 1은 본 개시의 일 실시예에 따른 컴퓨팅 장치의 블록도이다.
도 2 는 본 개시의 일 실시예에 따른 컴퓨팅 장치의 행동 모니터링을 수행하는 과정을 나타낸 블록도이다.
도 3은 본 개시의 일 실시예에 따른 본 개시의 일 실시예에 따른 컴퓨팅 장치의 행동 별 추정 과정을 세분화 한 개념도이다.
도 4는 본 개시의 일 실시예에 따른 신경망 모델의 특징 보정을 위한 연산 과정을 나타낸 개념도이다.
도 5는 본 개시의 일 실시예에 따른 인공지능 기반 행동 모니터링 방법을 나타낸 순서도이다.
도 6은 본 개시의 일 실시예에 따른 온라인 시험 환경에서 행동을 모니터링하는 방법을 나타낸 순서도이다.1 is a block diagram of a computing device according to an embodiment of the present disclosure.
FIG. 2 is a block diagram illustrating a process for monitoring behavior of a computing device according to an embodiment of the present disclosure.
Figure 3 is a conceptual diagram detailing the estimation process for each behavior of a computing device according to an embodiment of the present disclosure.
Figure 4 is a conceptual diagram showing a computational process for feature correction of a neural network model according to an embodiment of the present disclosure.
Figure 5 is a flowchart showing an artificial intelligence-based behavior monitoring method according to an embodiment of the present disclosure.
Figure 6 is a flowchart showing a method for monitoring behavior in an online test environment according to an embodiment of the present disclosure.

아래에서는 첨부한 도면을 참조하여 본 개시의 기술 분야에서 통상의 지식을 가진 자(이하, 당업자)가 용이하게 실시할 수 있도록 본 개시의 실시예가 상세히 설명된다. 본 개시에서 제시된 실시예들은 당업자가 본 개시의 내용을 이용하거나 또는 실시할 수 있도록 제공된다. 따라서, 본 개시의 실시예들에 대한 다양한 변형들은 당업자에게 명백할 것이다. 즉, 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며, 이하의 실시예에 한정되지 않는다. Below, with reference to the attached drawings, embodiments of the present disclosure are described in detail so that those skilled in the art (hereinafter referred to as skilled in the art) can easily implement the present disclosure. The embodiments presented in this disclosure are provided to enable any person skilled in the art to use or practice the subject matter of this disclosure. Accordingly, various modifications to the embodiments of the present disclosure will be apparent to those skilled in the art. That is, the present disclosure can be implemented in various different forms and is not limited to the following embodiments.

본 개시의 명세서 전체에 걸쳐 동일하거나 유사한 도면 부호는 동일하거나 유사한 구성요소를 지칭한다. 또한, 본 개시를 명확하게 설명하기 위해서, 도면에서 본 개시에 대한 설명과 관계없는 부분의 도면 부호는 생략될 수 있다.The same or similar reference numerals refer to the same or similar elements throughout the specification of this disclosure. Additionally, in order to clearly describe the present disclosure, reference numerals in the drawings may be omitted for parts that are not related to the description of the present disclosure.

본 개시에서 사용되는 "또는" 이라는 용어는 배타적 "또는" 이 아니라 내포적 "또는" 을 의미하는 것으로 의도된다. 즉, 본 개시에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 이해되어야 한다. 예를 들어, 본 개시에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "X는 A 또는 B를 이용한다" 는 X가 A를 이용하거나, X가 B를 이용하거나, 혹은 X가 A 및 B 모두를 이용하는 경우 중 어느 하나로 해석될 수 있다.As used in this disclosure, the term “or” is intended to mean an inclusive “or” and not an exclusive “or.” That is, unless otherwise specified in the present disclosure or the meaning is not clear from the context, “X uses A or B” should be understood to mean one of natural implicit substitutions. For example, unless otherwise specified in the present disclosure or the meaning is not clear from the context, “X uses A or B” means that It can be interpreted as one of the cases where all B is used.

본 개시에서 사용되는 "및/또는" 이라는 용어는 열거된 관련 개념들 중 하나 이상의 개념의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.The term “and/or” as used in this disclosure should be understood to refer to and include all possible combinations of one or more of the listed related concepts.

본 개시에서 사용되는 "포함한다" 및/또는 "포함하는" 이라는 용어는, 특정 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는" 이라는 용어는, 하나 이상의 다른 특징, 다른 구성요소 및/또는 이들에 대한 조합의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. The terms “comprise” and/or “comprising” as used in this disclosure should be understood to mean that certain features and/or elements are present. However, the terms "comprise" and/or "including" should be understood as not excluding the presence or addition of one or more other features, other components, and/or combinations thereof.

본 개시에서 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 단수는 일반적으로 "하나 또는 그 이상" 을 포함할 수 있는 것으로 해석되어야 한다. Unless otherwise specified in this disclosure or the context is clear to indicate a singular form, the singular should generally be construed to include “one or more.”

본 개시에서 사용되는 "제 N(N은 자연수)" 이라는 용어는 본 개시의 구성요소들을 기능적 관점, 구조적 관점, 혹은 설명의 편의 등 소정의 기준에 따라 상호 구별하기 위해 사용되는 표현으로 이해될 수 있다. 예를 들어, 본 개시에서 서로 다른 기능적 역할을 수행하는 구성요소들은 제 1 구성요소 혹은 제 2 구성요소로 구별될 수 있다. 다만, 본 개시의 기술적 사상 내에서 실질적으로 동일하나 설명의 편의를 위해 구분되어야 하는 구성요소들도 제 1 구성요소 혹은 제 2 구성요소로 구별될 수도 있다.The term “Nth (N is a natural number)” used in the present disclosure can be understood as an expression used to distinguish the components of the present disclosure according to a predetermined standard such as a functional perspective, a structural perspective, or explanatory convenience. there is. For example, in the present disclosure, components performing different functional roles may be distinguished as first components or second components. However, components that are substantially the same within the technical spirit of the present disclosure but must be distinguished for convenience of explanation may also be distinguished as first components or second components.

본 개시에서 사용되는 "획득" 이라는 용어는, 외부 장치 혹은 시스템과의 무선 통신 네트워크를 통해 데이터를 수신하는 것 뿐만 아니라, 온-디바이스(on-device) 형태로 데이터를 생성 혹은 수신하는 것을 지칭하는 것으로 이해될 수 있다.The term "acquisition" used in this disclosure refers to generating or receiving data in an on-device form, as well as receiving data through a wireless communication network with an external device or system. It can be understood that

한편, 본 개시에서 사용되는 용어 "모듈(module)", 또는 "부(unit)" 는 컴퓨터 관련 엔티티(entity), 펌웨어(firmware), 소프트웨어(software) 혹은 그 일부, 하드웨어(hardware) 혹은 그 일부, 소프트웨어와 하드웨어의 조합 등과 같이 컴퓨팅 자원을 처리하는 독립적인 기능 단위를 지칭하는 용어로 이해될 수 있다. 이때, "모듈" 또는 "부"는 단일 요소로 구성된 단위일 수도 있고, 복수의 요소들의 조합 혹은 집합으로 표현되는 단위일 수도 있다. 예를 들어, 협의의 개념으로서 "모듈" 또는 "부"는 컴퓨팅 장치의 하드웨어 요소 또는 그 집합, 소프트웨어의 특정 기능을 수행하는 응용 프로그램, 소프트웨어 실행을 통해 구현되는 처리 과정(procedure), 또는 프로그램 실행을 위한 명령어 집합 등을 지칭할 수 있다. 또한, 광의의 개념으로서 "모듈" 또는 "부"는 시스템을 구성하는 컴퓨팅 장치 그 자체, 또는 컴퓨팅 장치에서 실행되는 애플리케이션 등을 지칭할 수 있다. 다만, 상술한 개념은 하나의 예시일 뿐이므로, "모듈" 또는 "부"의 개념은 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 정의될 수 있다.Meanwhile, the term "module" or "unit" used in this disclosure refers to a computer-related entity, firmware, software or part thereof, hardware or part thereof. , can be understood as a term referring to an independent functional unit that processes computing resources, such as a combination of software and hardware. At this time, the “module” or “unit” may be a unit composed of a single element, or may be a unit expressed as a combination or set of multiple elements. For example, a "module" or "part" in the narrow sense is a hardware element of a computing device, or set of pieces thereof, an application program that performs a specific function of software, a process implemented through software execution, or a program execution. It may refer to a set of instructions for . Additionally, as a broad concept, “module” or “unit” may refer to the computing device itself constituting the system, or an application running on the computing device. However, since the above-described concept is only an example, the concept of “module” or “unit” may be defined in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

본 개시에서 사용되는 "모델(model)" 이라는 용어는 특정 문제를 해결하기 위해 수학적 개념과 언어를 사용하여 구현되는 시스템, 특정 문제를 해결하기 위한 소프트웨어 단위의 집합, 혹은 특정 문제를 해결하기 위한 처리 과정에 관한 추상화 모형으로 이해될 수 있다. 예를 들어, 딥러닝(deep learning) "모델" 은 학습을 통해 문제 해결 능력을 갖는 신경망으로 구현되는 시스템 전반을 지칭할 수 있다. 이때, 신경망은 노드(node) 혹은 뉴런(neuron)을 연결하는 파라미터(parameter)를 학습을 통해 최적화하여 문제 해결 능력을 가질 수 있다. 딥러닝 "모델" 은 단일 신경망을 포함할 수도 있고, 복수의 신경망들이 조합된 신경망 집합을 포함할 수도 있다.As used in this disclosure, the term "model" refers to a system implemented using mathematical concepts and language to solve a specific problem, a set of software units to solve a specific problem, or a process to solve a specific problem. It can be understood as an abstract model of a process. For example, a deep learning “model” can refer to an overall system implemented as a neural network that has the ability to solve problems through learning. At this time, the neural network can have problem-solving capabilities by optimizing parameters connecting nodes or neurons through learning. A deep learning “model” may include a single neural network or a set of neural networks that are a combination of multiple neural networks.

본 개시에서 사용되는 "영상" 이라는 용어는 이산적 이미지 요소들로 구성된 다차원 데이터를 지칭할 수 있다. 다시 말해, "영상"은 사람의 눈으로 볼 수 있는 대상의 디지털 표현물을 지칭하는 용어로 이해될 수 있다. 예를 들어, "영상" 은 2차원 이미지에서 픽셀에 해당하는 요소들로 구성된 다차원 데이터를 지칭할 수 있다. "영상"은 3차원 이미지에서 복셀에 해당하는 요소들로 구성된 다차원 데이터를 지칭할 수 있다.The term “image” used in this disclosure may refer to multidimensional data composed of discrete image elements. In other words, “image” can be understood as a term referring to a digital representation of an object that can be seen by the human eye. For example, “image” may refer to multidimensional data consisting of elements corresponding to pixels in a two-dimensional image. “Image” may refer to multidimensional data consisting of elements corresponding to voxels in a three-dimensional image.

전술한 용어의 설명은 본 개시의 이해를 돕기 위한 것이다. 따라서, 전술한 용어를 본 개시의 내용을 한정하는 사항으로 명시적으로 기재하지 않은 경우, 본 개시의 내용을 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.The explanation of the foregoing terms is intended to aid understanding of the present disclosure. Therefore, if the above-mentioned terms are not explicitly described as limiting the content of the present disclosure, it should be noted that the content of the present disclosure is not used in the sense of limiting the technical idea.

도 1은 본 개시의 일 실시예에 따른 컴퓨팅 장치의 블록도다.1 is a block diagram of a computing device according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 데이터의 종합적인 처리 및 연산을 수행하는 하드웨어 장치 혹은 하드웨어 장치의 일부일 수도 있고, 통신 네트워크로 연결되는 소프트웨어 기반의 컴퓨팅 환경일 수도 있다. 예를 들어, 컴퓨팅 장치(100)는 집약적 데이터 처리 기능을 수행하고 자원을 공유하는 주체인 서버일 수도 있고, 서버와의 상호 작용을 통해 자원을 공유하는 클라이언트(client)일 수도 있다. 또한, 컴퓨팅 장치(100)는 복수의 서버들 및 클라이언트들이 상호 작용하여 데이터를 종합적으로 처리할 수 있도록 하는 클라우드 시스템(cloud system)일 수도 있다. 상술한 기재는 컴퓨팅 장치(100)의 종류와 관련된 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The computing device 100 according to an embodiment of the present disclosure may be a hardware device or part of a hardware device that performs comprehensive processing and calculation of data, or may be a software-based computing environment connected to a communication network. For example, the computing device 100 may be a server that performs intensive data processing functions and shares resources, or it may be a client that shares resources through interaction with the server. Additionally, the computing device 100 may be a cloud system that allows a plurality of servers and clients to interact and comprehensively process data. Since the above description is only an example related to the type of computing device 100, the type of computing device 100 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 프로세서(processor)(110), 메모리(memory)(120), 및 네트워크부(network unit)(130)를 포함할 수 있다. 다만, 도 1은 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)는 컴퓨팅 환경을 구현하기 위한 다른 구성들을 포함할 수 있다. 또한, 상기 개시된 구성들 중 일부만이 컴퓨팅 장치(100)에 포함될 수도 있다.Referring to FIG. 1, a computing device 100 according to an embodiment of the present disclosure may include a processor 110, a memory 120, and a network unit 130. there is. However, since FIG. 1 is only an example, the computing device 100 may include other components for implementing a computing environment. Additionally, only some of the configurations disclosed above may be included in computing device 100.

본 개시의 일 실시예에 따른 프로세서(110)는 컴퓨팅 연산을 수행하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 예를 들어, 프로세서(110)는 컴퓨터 프로그램을 판독하여 기계 학습을 위한 데이터 처리를 수행할 수 있다. 프로세서(110)는 기계 학습을 위한 입력 데이터의 처리, 기계 학습을 위한 특징 추출, 역전파(backpropagation)에 기반한 오차 계산 등과 같은 연산 과정을 처리할 수 있다. 이와 같은 데이터 처리를 수행하기 위한 프로세서(110)는 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit), 주문형 반도체(ASIC: application specific integrated circuit), 혹은 필드 프로그래머블 게이트 어레이(FPGA: field programmable gate array) 등을 포함할 수 있다. 상술한 프로세서(110)의 종류는 하나의 예시일 뿐이므로, 프로세서(110)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The processor 110 according to an embodiment of the present disclosure may be understood as a structural unit including hardware and/or software for performing computing operations. For example, the processor 110 may read a computer program and perform data processing for machine learning. The processor 110 may process computational processes such as processing input data for machine learning, extracting features for machine learning, and calculating errors based on backpropagation. The processor 110 for performing such data processing includes a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), a tensor processing unit (TPU), and a custom processing unit (TPU). It may include a semiconductor (ASIC: application specific integrated circuit), or a field programmable gate array (FPGA: field programmable gate array). Since the type of processor 110 described above is only an example, the type of processor 110 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

프로세서(110)는 행동 모니터링을 위해 획득된 복수의 영상 데이터에 대한 전처리를 수행할 수 있다. 동일한 객체 혹은 환경을 촬영함에도 불구하고, 촬영 시간, 촬영 방향 등에 따라 영상 데이터에서 표현되는 정보는 달라질 수 있다. 따라서, 행동 모니터링을 위해 복수의 영상 데이터를 활용하는 프로세서(110)는 복수의 영상 데이터에 대한 복합적 분석 및 판단의 정확도를 담보하기 위해 동기화(synchronization)를 수행할 수 있다. 이때, 동기화는 동일한 환경이 촬영된 복수의 영상 데이터에서 감지 대상이 되는 객체를 기준으로 복수의 영상데이터의 촬영 시점을 조정하거나 맞추는 작업으로 이해될 수 있다. 예를 들어, 사람의 얼굴이 정면에서 촬영된 영상 데이터와는 달리 사람의 얼굴이 측면에서 촬영된 영상 데이터는 사람의 머리 움직임에 따라 얼굴에 대한 정보를 포함하지 않는 경우가 발생할 수 있다. 따라서, 프로세서(110)는 정면에서 촬영된 영상 데이터와 측면에서 촬영된 영상 데이터가 상호 보완적으로 사용될 수 있도록, 두 개의 데이터 간의 촬영 시점을 맞추는 동기화 하는 전처리를 수행할 수 있다. 프로세서(110)는 이러한 동기화 전처리르를 통해, 프로세서(110)는 두 개의 영상 데이터의 분석 결과를 안정적으로 조합할 수 있고, 행동 모니터링을 위한 행동 추정 결과의 정확도를 향상시킬 수 있다.The processor 110 may perform preprocessing on a plurality of image data obtained for behavior monitoring. Even when shooting the same object or environment, the information expressed in the image data may vary depending on the shooting time, shooting direction, etc. Accordingly, the processor 110, which utilizes a plurality of image data for behavior monitoring, may perform synchronization to ensure accuracy of complex analysis and judgment of the plurality of image data. At this time, synchronization can be understood as an operation to adjust or match the shooting timing of a plurality of image data based on the object that is a detection target in a plurality of image data in which the same environment is captured. For example, unlike image data in which a person's face is captured from the front, image data in which a person's face is captured from the side may not include information about the face depending on the movement of the person's head. Accordingly, the processor 110 may perform preprocessing to synchronize the shooting timing between the two data so that the image data captured from the front and the image data captured from the side can be used complementary to each other. Through this synchronization preprocessor, the processor 110 can stably combine analysis results of two image data and improve the accuracy of behavior estimation results for behavior monitoring.

프로세서(110)는 사전 학습된 신경망 모델을 사용하여, 행동 모니터링의 대상이 되는 사람이 촬영된 영상 데이터를 기초로 복수의 감지 대상들 각각에 대한 분석 결과를 생성할 수 있다. 이때, 감지 대상은 사람의 행동을 추정하기 위한 기준이 되는, 영상 데이터의 구성 요소로 이해될 수 있다. 그리고, 감지 대상에 대한 분석 결과는 영상 데이터에 존재하는 감지 대상을 기준으로 사람이 어떠한 행동을 취하는지를 나타내는 정보일 수 있다. 구체적으로, 감지 대상은 사람의 신체 부위, 사람을 제외한 사물, 사람의 행동과 연관된 객체의 소리 혹은 사람의 행동과 연관된 객체의 시간 중 어느 하나일 수 있다. 사람의 행동과 연관된 객체는 사람의 신체 부위일 수도 있고, 사람의 행동에 영향을 받아 변화 가능한 사물일 수 있다. 그리고, 감지 대상에 대한 분석 결과는 영상 데이터에 존재하는 사람의 신체 부위, 사물, 객체의 소리, 혹은 객체의 시간을 기준으로 사람이 수행한 행동에 관한 정보일 수 있다. 즉, 프로세서(110)는 사전 학습된 신경망 모델로 영상 데이터를 입력하여, 영상 데이터에 존재하는 감지 대상들 각각에 대한 분석 결과를 행동 모니터링을 위한 특정 환경 하에서 수행되는 사람의 특정 행동을 검출하기 위한 기초 데이터로 생성할 수 있다.The processor 110 may use a pre-trained neural network model to generate analysis results for each of a plurality of detection objects based on image data captured by a person subject to behavior monitoring. At this time, the detection object can be understood as a component of image data that serves as a standard for estimating human behavior. Additionally, the analysis result of the detection object may be information indicating what action a person takes based on the detection object present in the image data. Specifically, the detection target may be any one of a person's body part, an object other than a person, the sound of an object associated with a person's action, or the time of an object associated with a person's action. Objects related to a person's behavior may be a part of the person's body, or they may be objects that can be changed by being influenced by the person's behavior. Additionally, the analysis result of the detection object may be information about actions performed by a person based on the sound of a person's body part, object, or object present in the image data, or the time of the object. That is, the processor 110 inputs image data into a pre-trained neural network model and analyzes the results of each detection target present in the image data to detect a specific human action performed under a specific environment for behavioral monitoring. It can be created from basic data.

프로세서(110)는 상술한 신경망 모델을 기반으로 수행된, 복수의 영상 데이터 각각에 대한 분석 결과를 토대로, 행동 모니터링을 위한 사람의 특정 행동을 추정할 수 있다. 프로세서(110)는 동기화가 완료된 복수의 영상 데이터 각각에 대한 신경망 모델의 분석 결과를 조합하여, 행동 모니터링의 대상이 되는 사람이 어떠한 행동을 취했는지를 해석할 수 있다. 이때, 프로세서(110)는 조합된 분석 결과를 토대로 사람의 행동을 추정하기 위해 룰셋(ruleset)을 사용할 수 있다. 룰셋은 행동 모니터링을 위한 특정 환경에서 검출 후보가 되는 행동 클래스와 각 행동 클래스 별 판단 조건의 집합일 수 있다. 그리고, 룰셋은 행동 모니터링을 위한 특정 환경을 구축한 관리자에 의해 생성, 변경, 혹은 수정될 수 있다. 즉, 프로세서(110)는 행동 모니터링을 위한 특정 환경에 맞추어 커스터마이징(customizing) 가능한 룰셋을 토대로, 영상 데이터에 존재하는 감지 대상들 각각에 대한 분석 결과를 종합적으로 판단하여 영상 데이터에 존재하는 사람의 행동을 추정할 수 있다. 예를 들어, 행동 모니터링을 위한 환경이 온라인 시험을 위한 환경이라고 가정하면, 시험 감독관의 클라이언트(client)에 의해 생성된 룰셋은 시험의 부정행위 및/또는 부정행위까지는 아니지만 부정행위로 의심될 수 있는 이상(abnormal) 행동과, 부정행위 및/또는 이상 행동 각각에 대한 판단 조건의 집합일 수 있다. 프로세서(110)는 정면 영상 데이터와 측면 영상 데이터 각각에 대한 신경망 모델의 분석 결과를 조합하고, 상술한 룰셋에 포함된 판단 조건에 매칭되는 조합 결과를 식별할 수 있다. 이때, 분석 결과의 조합은 정면 영상 데이터의 분석 결과로부터 예상되는 행동과 측면 영상 데이터의 분석 결과로부터 예상되는 행동을 비교하고, 비교 결과를 토대로 분석 결과를 하나의 근거 행동으로 결합하는 연산 과정으로 이해될 수 있다. 그리고, 프로세서(110)는 판단 조건에 매칭되는 조합 결과를 토대로, 관찰 데이터로 확인되는 상황 하에서 시험의 응시자가 룰셋에 포함된 부정행위 혹은 이상 행동을 수행했는지를 판단할 수 있다.The processor 110 may estimate a specific human behavior for behavior monitoring based on analysis results for each of a plurality of image data performed based on the above-described neural network model. The processor 110 may combine the analysis results of the neural network model for each of the plurality of synchronized image data to interpret what actions the person subject to behavioral monitoring has taken. At this time, the processor 110 may use a ruleset to estimate human behavior based on the combined analysis results. A rule set may be a set of behavior classes that are candidates for detection in a specific environment for behavior monitoring and judgment conditions for each behavior class. Additionally, rulesets can be created, changed, or modified by administrators who have established a specific environment for behavior monitoring. In other words, the processor 110 comprehensively determines the analysis results for each detection object present in the video data based on a rule set that can be customized to suit a specific environment for behavior monitoring and determines the human behavior present in the video data. can be estimated. For example, assuming that the environment for behavior monitoring is an environment for an online exam, the ruleset created by the test proctor's client may be used to detect exam cheating and/or that may be suspected of cheating, but not cheating. It may be a set of judgment conditions for each abnormal behavior, misconduct, and/or abnormal behavior. The processor 110 may combine the analysis results of the neural network model for each of the frontal image data and the side image data, and identify a combination result that matches the judgment conditions included in the above-described rule set. At this time, the combination of analysis results is understood as a computational process that compares the behavior expected from the analysis result of the frontal image data with the behavior expected from the analysis result of the side image data, and combines the analysis results into one ground action based on the comparison result. It can be. In addition, the processor 110 may determine whether the test taker has performed a cheating act or an abnormal behavior included in the rule set under a situation confirmed by observation data, based on the combination result matching the judgment conditions.

이와 같이 프로세서(110)는 복수의 영상 데이터 각각에 근거하여 예상되는 행동을 기반으로 사람이 어떠한 행동을 취하는지를 판단하기 위한 근거 행동에 관한 정보를 도출하고, 특정 환경에 맞춰 생성되는 룰셋을 근거로 사람의 특정 행동을 모니터링할 수 있다. 따라서, 프로세서(110)는 단일 영상 데이터에 근거한 감지 및 분석 대비 정밀하며 정확도 높고 안정적인 감지 및 분석을 수행할 수 있다.In this way, the processor 110 derives information about the basic behavior to determine what action the person will take based on the expected behavior based on each of the plurality of image data, and based on the rule set created according to the specific environment. You can monitor specific human behavior. Accordingly, the processor 110 can perform precise, accurate, and stable detection and analysis compared to detection and analysis based on single image data.

본 개시의 일 실시예에 따른 메모리(120)는 컴퓨팅 장치(100)에서 처리되는 데이터를 저장하고 관리하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 즉, 메모리(120)는 프로세서(110)가 생성하거나 결정한 임의의 형태의 데이터 및 네트워크부(130)가 수신한 임의의 형태의 데이터를 저장할 수 있다. 예를 들어, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리, 램(RAM: random access memory), 에스램(SRAM: static random access memory), 롬(ROM: read-only memory), 이이피롬(EEPROM: electrically erasable programmable read-only memory), 피롬(PROM: programmable read-only memory), 자기 메모리, 자기 디스크, 또는 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(120)는 데이터를 소정의 체제로 통제하여 관리하는 데이터베이스(database) 시스템을 포함할 수도 있다. 상술한 메모리(120)의 종류는 하나의 예시일 뿐이므로, 메모리(120)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The memory 120 according to an embodiment of the present disclosure may be understood as a structural unit including hardware and/or software for storing and managing data processed in the computing device 100. That is, the memory 120 can store any type of data generated or determined by the processor 110 and any type of data received by the network unit 130. For example, the memory 120 may be a flash memory type, hard disk type, multimedia card micro type, card type memory, or random access memory (RAM). ), SRAM (static random access memory), ROM (read-only memory), EEPROM (electrically erasable programmable read-only memory), PROM (programmable read-only memory), magnetic memory , a magnetic disk, or an optical disk may include at least one type of storage medium. Additionally, the memory 120 may include a database system that controls and manages data in a predetermined system. Since the type of memory 120 described above is only an example, the type of memory 120 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

메모리(120)는 프로세서(110)가 연산을 수행하는데 필요한 데이터, 데이터의 조합, 및 프로세서(110)에서 실행 가능한 프로그램 코드(code) 등을 구조화 및 조직화 하여 관리할 수 있다. 예를 들어, 메모리(120)는 후술할 네트워크부(130)를 통해 획득된 영상 데이터를 저장할 수 있다. 메모리(120)는 프로세서(110)가 딥러닝 모델을 학습시키도록 동작시키는 프로그램 코드, 프로세서(110)가 학습된 딥러닝 모델을 사용하여 사람의 행동을 추정하도록 동작시키는 프로그램 코드 및 프로그램 코드가 실행됨에 따라 산출된 각종 데이터 등을 저장할 수 있다.The memory 120 can manage data necessary for the processor 110 to perform operations, a combination of data, and program code executable on the processor 110 by structuring and organizing them. For example, the memory 120 may store image data acquired through the network unit 130, which will be described later. The memory 120 includes program codes that operate the processor 110 to learn a deep learning model, program codes that operate the processor 110 to estimate human behavior using the learned deep learning model, and program codes that are executed. As a result, various data calculated can be stored.

본 개시의 일 실시예에 따른 네트워크부(130)는 임의의 형태의 공지된 유무선 통신 시스템을 통해 데이터를 송수신하는 구성 단위로 이해될 수 있다. 예를 들어, 네트워크부(130)는 근거리 통신망(LAN: local area network), 광대역 부호 분할 다중 접속(WCDMA: wideband code division multiple access), 엘티이(LTE: long term evolution), 와이브로(WiBro: wireless broadband internet), 5세대 이동통신(5G), 초광역대 무선통신(ultra wide-band), 지그비(ZigBee), 무선주파수(RF: radio frequency) 통신, 무선랜(wireless LAN), 와이파이(wireless fidelity), 근거리 무선통신(NFC: near field communication), 또는 블루투스(Bluetooth) 등과 같은 유무선 통신 시스템을 사용하여 데이터 송수신을 수행할 수 있다. 상술한 통신 시스템들은 하나의 예시일 뿐이므로, 네트워크부(130)의 데이터 송수신을 위한 유무선 통신 시스템은 상술한 예시 이외에 다양하게 적용될 수 있다.The network unit 130 according to an embodiment of the present disclosure may be understood as a structural unit that transmits and receives data through any type of known wired or wireless communication system. For example, the network unit 130 is a local area network (LAN), wideband code division multiple access (WCDMA), long term evolution (LTE), and WiBro (wireless). broadband internet, 5th generation mobile communication (5G), ultra wide-band wireless communication, ZigBee, radio frequency (RF) communication, wireless LAN, wireless fidelity ), data transmission and reception can be performed using a wired or wireless communication system such as near field communication (NFC), or Bluetooth. Since the above-described communication systems are only examples, the wired and wireless communication systems for data transmission and reception of the network unit 130 may be applied in various ways other than the above-described examples.

네트워크부(130)는 임의의 시스템, 서버 혹은 클라이언트 등과의 유무선 통신을 통해, 프로세서(110)가 연산을 수행하는데 필요한 데이터를 수신할 수 있다. 또한, 네트워크부(130)는 임의의 시스템, 서버 혹은 클라이언트 등과의 유무선 통신을 통해, 프로세서(110)의 연산을 통해 생성된 데이터를 송신할 수 있다. 예를 들어, 네트워크부(130)는 카메라 등과 같은 감지 장치 혹은 감지 장치를 구비한 클라이언트 등과의 유무선 통신을 통해, 행동 모니터링의 대상이 되는 사람의 영상 데이터를 수신할 수 있다. 그리고, 네트워크부(130)는 감지 장치 혹은 감지 장치를 구비한 클라이언트에서 구현되는 사용자 인터페이스를 통해, 사용자 입력을 수신할 수 있다. 네트워크부(130)는 감지 장치 혹은 감지 장치를 구비한 클라이언트 등과의 유무선 통신을 통해, 영상 데이터를 기반으로 프로세서(110)의 연산을 통해 생성된 각종 데이터를 송신할 수 있다.The network unit 130 may receive data necessary for the processor 110 to perform operations through wired or wireless communication with any system, server, or client. Additionally, the network unit 130 may transmit data generated through calculations of the processor 110 through wired or wireless communication with any system, server, or client. For example, the network unit 130 may receive video data of a person subject to behavior monitoring through wired or wireless communication with a sensing device such as a camera or a client equipped with a sensing device. Additionally, the network unit 130 may receive user input through a user interface implemented in a sensing device or a client equipped with a sensing device. The network unit 130 may transmit various data generated through calculations of the processor 110 based on image data through wired or wireless communication with a sensing device or a client equipped with a sensing device.

도 2 는 본 개시의 일 실시예에 따른 컴퓨팅 장치의 행동 모니터링을 수행하는 과정을 나타낸 블록도이다. 그리고, 도 3은 본 개시의 일 실시예에 따른 본 개시의 일 실시예에 따른 컴퓨팅 장치의 행동 별 추정 과정을 세분화 한 개념도이다.FIG. 2 is a block diagram illustrating a process for monitoring behavior of a computing device according to an embodiment of the present disclosure. And, Figure 3 is a conceptual diagram detailing the estimation process for each behavior of a computing device according to an embodiment of the present disclosure.

도 2를 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 행동 모니터링의 대상이 되는 사람에 대한 복수의 영상 데이터에 대한 전처리를 수행할 수 있다. 이때, 영상 데이터는 행동 모니터링의 대상이 되는 사람의 얼굴을 기준으로 제 1 방향에서 촬영된 제 1 영상 데이터(11), 및 제 1 방향과 상이한 제 2 방향에서 촬영된 제 2 영상 데이터(13)를 포함할 수 있다. 즉, 컴퓨팅 장치(100)는 얼굴을 기준으로 제 1 방향으로 촬영된 제 1 영상 데이터(11)와 제 1 방향과 상이한 제 2 방향으로 촬영된 제 2 영상 데이터(13)를 동기화 할 수 있다.Referring to FIG. 2 , the computing device 100 according to an embodiment of the present disclosure may perform preprocessing on a plurality of image data about a person subject to behavior monitoring. At this time, the image data includes first image data 11 captured in a first direction based on the face of the person subject to behavior monitoring, and second image data 13 captured in a second direction different from the first direction. may include. That is, the computing device 100 may synchronize the first image data 11 captured in the first direction with respect to the face and the second image data 13 captured in the second direction different from the first direction.

예를 들어, 제 1 영상 데이터(11) 및 제 2 영상 데이터(13)는 온라인 시험을 위한 공간에 설치된 카메라를 통해 촬영된 영상일 수 있다. 이때, 카메라와 같은 감지 장치는 시험 응시자가 보유한 클라이언트의 일 구성일 수 있다. 감시 장치는 시험 응시자의 얼굴을 정면과 측면에서 개별적으로 촬영하도록 시험 공간에 설치될 수 있다. 온라인 시험이 시작되면, 시험 응시자의 클라이언트에 포함된 감지 장치가 시험 응시자의 얼굴이 정면에서 촬영된 제 1 영상 데이터(11)와 시험 응시자의 얼굴이 측면에서 촬영된 제 2 영상 데이터(13)를 생성할 수 있다. 컴퓨팅 장치(100)는 시험 응시자의 클라이언트와 유무선 통신을 통해 시험 응시자의 클라이언트에서 생성한 제 1 영상 데이터(11)와 제 2 영상 데이터(13)를 획득할 수 있다. 그리고, 컴퓨팅 장치(100)는 제 1 영상 데이터(11)와 제 2 영상 데이터(13)를 모두 후술할 신경망 모델(200)에 의한 분석에 활용하기 위해서, 제 1 영상 데이터(11)와 제 2 영상 데이터(13)를 동기화 할 수 있다. 제 2 영상 데이터(13)는 사람의 측면을 기준으로 촬영되므로, 사람의 움직임 혹은 촬영 각도에 따라 사람의 얼굴이 제대로 표현되지 못하는 경우가 발생할 수 있다. 따라서, 감지 대상인 사람의 신체 부위에 포함된 얼굴에 대한 분석을 정확하고 안정적으로 수행하기 위해서, 컴퓨팅 장치(100)는 제 1 영상 데이터(11)와 제 2 영상 데이터(13)를 동기화 할 수 있다.For example, the first image data 11 and the second image data 13 may be images captured through a camera installed in a space for online testing. At this time, a sensing device such as a camera may be a component of the client owned by the test taker. The monitoring device can be installed in the testing space to individually photograph the test taker's face from the front and side. When the online test starts, the sensing device included in the test taker's client generates first image data 11, in which the test taker's face is captured from the front, and second image data 13, in which the test taker's face is captured from the side. can be created. The computing device 100 may acquire first image data 11 and second image data 13 generated by the test taker's client through wired or wireless communication with the test taker's client. In addition, the computing device 100 uses both the first image data 11 and the second image data 13 for analysis by the neural network model 200, which will be described later. Video data (13) can be synchronized. Since the second image data 13 is captured based on the side of the person, the person's face may not be expressed properly depending on the person's movement or shooting angle. Therefore, in order to accurately and stably perform analysis on the face included in the body part of the person being detected, the computing device 100 can synchronize the first image data 11 and the second image data 13. .

컴퓨팅 장치(100)는 동기화 된 제 1 영상 데이터와 동기화 된 제 2 영상 데이터를 신경망 모델(200)로 입력할 수 있다. 이때, 신경망 모델(200)은 제 1 방향에서 촬영된 영상 데이터를 기반으로 사람의 시선을 추적하는 제 1 모델(210) 및 제 2 방향에서 촬영된 영상 데이터를 기반으로 사람의 포즈(pose)를 추정하는 제 2 모델(220)을 포함할 수 있다. 즉, 컴퓨팅 장치(100)는 시선 추적이라는 태스크(task)에 최적화 된 제 1 모델(210)로, 동기화 된 제 1 영상 데이터를 입력할 수 있다. 그리고, 컴퓨팅 장치(100)는 포즈 추정이라는 태스크에 최적화 된 제 2 모델(220)로, 동기화 된 제 2 영상 데이터를 입력할 수 있다.The computing device 100 may input the synchronized first image data and the synchronized second image data into the neural network model 200. At this time, the neural network model 200 tracks the person's gaze based on image data captured in the first direction and the pose of the person based on the image data captured in the second direction. It may include a second model 220 for estimation. That is, the computing device 100 can input synchronized first image data to the first model 210 optimized for the task of eye tracking. Additionally, the computing device 100 may input synchronized second image data to the second model 220 optimized for the task of pose estimation.

예를 들어, 제 1 모델(210)은 사람의 얼굴을 기준으로 정면에서 촬영된 영상 데이터를 입력받아 영상 데이터에 존재하는 사람의 시선을 추적할 수 있다. 구체적으로, 제 1 모델(210)은 사람이 촬영된 영상 데이터에서 사람의 얼굴 영역을 추출하여 크롭(crop) 이미지를 생성할 수 있다. 제 1 모델(210)은 크롭 이미지를 기반으로 특징을 추출하여 사람의 눈을 인식할 수 있다. 그리고, 제 1 모델(210)은 인식된 눈에 포함된 눈동자의 움직임 및 변화를 분석하여 사람의 시선을 추적할 수 있다. 이러한 시선 추적을 위해, 제 1 모델(210)은 이미지 처리에 최적화 된 신경망을 포함할 수 있다. 그리고, 제 1 모델(210)은 지도 학습 뿐만 아니라 준지도 학습, 비지도 학습, 자기 지도 학습 등을 기반으로 학습될 수 있다.For example, the first model 210 may receive image data captured from the front based on the person's face and track the person's gaze present in the image data. Specifically, the first model 210 may generate a crop image by extracting the face area of the person from image data in which the person is photographed. The first model 210 can recognize human eyes by extracting features based on the cropped image. Additionally, the first model 210 can track the person's gaze by analyzing the movements and changes of the pupils included in the recognized eyes. For this eye tracking, the first model 210 may include a neural network optimized for image processing. Additionally, the first model 210 may be learned based on not only supervised learning but also semi-supervised learning, unsupervised learning, and self-supervised learning.

제 2 모델(220)은 사람의 얼굴을 기준으로 측면에서 촬영된 영상 데이터를 입력받아 영상 데이터에 존재하는 사람이 취하는 포즈를 검출할 수 있다. 구체적으로, 제 2 모델(220)은 영상 데이터를 입력받아 사람의 포즈를 식별하기 위한 복수의 특징점들을 기준으로 신체 부위와 배경을 분류하여 신체 부위에 대한 마스크(mask)를 생성할 수 있다. 그리고, 제 2 모델(220)은 신체 부위에 대한 마스크를 분석하여 사람이 어떠한 포즈를 취하고 있는지를 추정할 수 있다. 이러한 포즈 추정을 위해, 제 2 모델(220)은 이미지 처리에 최적화 된 신경망을 포함할 수 있다. 그리고, 제 2 모델(220)은 지도 학습 뿐만 아니라 준지도 학습, 비지도 학습, 자기 지도 학습 등을 기반으로 학습될 수 있다.The second model 220 may receive image data captured from the side based on the person's face and detect the pose taken by the person present in the image data. Specifically, the second model 220 may receive image data and classify the body part and the background based on a plurality of feature points for identifying a person's pose, thereby generating a mask for the body part. Additionally, the second model 220 can estimate what pose the person is taking by analyzing the mask for the body part. For such pose estimation, the second model 220 may include a neural network optimized for image processing. Additionally, the second model 220 may be learned based on not only supervised learning but also semi-supervised learning, unsupervised learning, and self-supervised learning.

컴퓨팅 장치(100)는 동기화 된 복수의 영상 데이터가 입력된 신경망 모델(200)을 통해 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성할 수 있다. 컴퓨팅 장치(100)는 동기화 된 제 1 영상 데이터가 입력된 제 1 모델(210)을 통해 감지 항목 별 분석 결과를 생성할 수 있다. 컴퓨팅 장치(100)는 동기화 된 제 2 영상 데이터가 입력된 제 2 모델(220)을 통해 감지 항목 별 분석 결과를 생성할 수 있다. 이때, 제 1 모델(210)의 감지 항목과 제 2 모델(220)의 감지 항목은 동일할 수도 있고 상이할 수도 있다.The computing device 100 may generate an analysis result for at least one sensed item included in each of the plurality of sensed objects through the neural network model 200 to which a plurality of synchronized image data is input. The computing device 100 may generate analysis results for each sensed item through the first model 210 into which the synchronized first image data is input. The computing device 100 may generate analysis results for each sensed item through the second model 220 into which the synchronized second image data is input. At this time, the detection items of the first model 210 and the detection items of the second model 220 may be the same or different.

예를 들어, 감지 대상이 사람의 신체 부위라고 하면, 감지 대상의 하위 클래스는 얼굴, 팔 등으로 구분될 수 있다. 얼굴은 다시 눈, 코, 입, 귀로 구분될 수 있다. 그리고, 팔은 다시 손, 손바닥, 손가락 등으로 구분될 수 있다. 감지 항목은 이와 같은 감지 대상의 하위 클래스 각각을 기준으로 감지될 수 있는 상태 정보로서, 시선 방향, 발화 여부, 손의 위치, 손바닥 방향 등일 수 있다. 즉, 감지 항목은 감지 대상의 하위 클래스가 사람의 행동에 따라 움직이거나 변화함으로써 나타날 수 있는 특정 상태나 모습을 나타낼 수 있다.For example, if the detection target is a human body part, subclasses of the detection target can be divided into face, arm, etc. The face can be further divided into eyes, nose, mouth, and ears. And, the arm can be further divided into hand, palm, and fingers. Detection items are status information that can be detected based on each subclass of the detection target, and may include gaze direction, speech status, hand position, palm direction, etc. In other words, a detection item can represent a specific state or appearance that can appear when a subclass of the detection object moves or changes depending on a person's actions.

다시 말해서, 컴퓨팅 장치(100)는 동기화 된 제 1 영상 데이터를 사전 학습된 제 1 모델(210)에 입력하여 다양한 감지 항목에 대한 분석 결과들을 포함하는 제 1 분석 결과(15)를 생성할 수 있다. 그리고, 컴퓨팅 장치(100)는 동기화 된 제 2 영상 데이터를 사전 학습된 제 2 모델(220)에 입력하여 다양한 감지 항목에 대한 분석 결과들을 포함하는 제 2 분석 결과(17)를 생성할 수 있다. 이때, 제 1 분석 결과(15)와 제 2 분석 결과(17) 각각은 시선 방향, 발화 여부, 손의 위치, 손바닥 방향 등과 같은 감지 항목 각각에 매칭되는 분석 결과를 포함할 수 있다.In other words, the computing device 100 may input the synchronized first image data into the pre-trained first model 210 to generate a first analysis result 15 including analysis results for various sensed items. . Additionally, the computing device 100 may input the synchronized second image data into the pre-trained second model 220 to generate a second analysis result 17 including analysis results for various sensed items. At this time, each of the first analysis result 15 and the second analysis result 17 may include analysis results matching each of the detection items such as gaze direction, speech status, hand position, palm direction, etc.

예를 들어, 행동 모니터링을 위한 환경이 온라인 시험을 위한 환경이라고 가정하면, 시선 방향에 매칭되는 분석 결과는 시험 응시자가 시험지를 확인하는 디스플레이(display) 안을 응시하고 있는지에 대한 감지 결과를 나타낼 수 있다. 발화 여부에 매칭되는 분석 결과는 시험 응시자의 입모양이 변화했는지 여부를 감지한 결과를 나타낼 수 있다. 손의 위치에 매칭되는 분석 결과는 시험 응시자의 왼손 혹은 오른손이 시험 응시자의 몸과 책상의 배치에 따라 결정된 기준 공간 내에서 움직이고 있는지 여부를 감지한 결과를 나타낼 수 있다. 상술한 감지 항목에 매칭되는 분석 결과들은 제 1 분석 결과(15)와 제 2 분석 결과(17)에 각각 포함될 수 있다. 이와 같이 컴퓨팅 장치(100)는 제 1 모델(210) 및 제 2 모델(220)을 사용하여 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 개별적으로 생성할 수 있다. 컴퓨팅 장치(100)는 이러한 과정을 통해 다양한 각도로 촬영된 영상 데이터에서 후술할 행동 추정을 위한 연산에 활용될 다양한 정보를 획득할 수 있다.For example, assuming that the environment for behavior monitoring is an environment for an online test, the analysis result matching the gaze direction can indicate the detection result of whether the test taker is looking into the display that checks the test paper. . The analysis result matching whether the test taker made a speech may represent the result of detecting whether the shape of the test taker's mouth has changed. The analysis result matching the hand position may indicate the result of detecting whether the test taker's left or right hand is moving within a reference space determined according to the test taker's body and the arrangement of the desk. Analysis results matching the above-described detection items may be included in the first analysis result 15 and the second analysis result 17, respectively. In this way, the computing device 100 may individually generate analysis results for at least one sensing item included in each of the plurality of sensing objects using the first model 210 and the second model 220. Through this process, the computing device 100 can obtain various information to be used in calculations for behavior estimation, which will be described later, from image data captured at various angles.

도 2를 참조하면, 컴퓨팅 장치(100)는 모니터링 대상인 사람의 행동을 추정하기 위해서, 신경망 모델(200)을 통해 생성된 복수의 분석 결과들을 조합할 수 있다. 컴퓨팅 장치(100)는 행동 추정 결과(19)를 생성하기 위해서, 제 1 모델(210)을 통해 생성된 제 1 분석 결과(15)와 제 2 모델(220)을 통해 생성된 제 2 분석 결과(17)를 조합할 수 있다. 이때, 컴퓨팅 장치(100)는 제 1 분석 결과(15)와 제 2 분석 결과(17)를 조합하기 위해서, 제 1 분석 결과(15)로부터 유추되는 제 1 예상 행동과 제 2 분석 결과(19)로부터 유추되는 제 2 예상 행동을 활용할 수 있다. 그리고, 컴퓨팅 장치(100)는 제 1 분석 결과(15)와 제 2 분석 결과(17)를 조합하기 위해서, 제 1 분석 결과(15)와 제 2 분석 결과(17) 각각의 신뢰도를 활용할 수 있다.Referring to FIG. 2 , the computing device 100 may combine a plurality of analysis results generated through the neural network model 200 to estimate the behavior of a person to be monitored. In order to generate the behavior estimation result 19, the computing device 100 uses the first analysis result 15 generated through the first model 210 and the second analysis result generated through the second model 220 ( 17) can be combined. At this time, in order to combine the first analysis result 15 and the second analysis result 17, the computing device 100 uses the first expected behavior and the second analysis result 19 inferred from the first analysis result 15. The second expected behavior inferred from can be used. In addition, the computing device 100 may utilize the reliability of each of the first analysis result 15 and the second analysis result 17 in order to combine the first analysis result 15 and the second analysis result 17. .

구체적으로, 컴퓨팅 장치(100)는 사전 결정된 룰셋을 기반으로 제 1 분석 결과(15)에 따른 제 1 예상 행동과 제 2 분석 결과(17)에 따른 제 2 예상 행동을 추정할 수 있다. 컴퓨팅 장치(100)는 추정된 제 1 예상 행동과 추정된 제 2 예상 행동의 일치 여부를 판단할 수 있다. 또한, 컴퓨팅 장치(100)는 제 1 모델(210) 혹은 제 2 모델(220) 중 적어도 하나를 분석하여 제 1 분석 결과(15)의 신뢰도 혹은 제 2 분석 결과(17)의 신뢰도 중 적어도 하나를 추정할 수 있다. 컴퓨팅 장치(100)는 제 1 예상 행동과 제 2 예상 행동 간의 일치 여부에 대한 판단과 제 1 분석 결과(15) 혹은 제 2 분서 결과(17) 중 적어도 하나의 신뢰도를 토대로, 제 1 분석 결과(15)와 제 2 분석 결과(17)를 조합하여 행동을 추정하기 위한 판단 조건에 해당하는 근거 정보를 생성할 수 있다.Specifically, the computing device 100 may estimate the first expected behavior according to the first analysis result 15 and the second expected behavior according to the second analysis result 17 based on a predetermined rule set. The computing device 100 may determine whether the estimated first expected behavior matches the estimated second expected behavior. Additionally, the computing device 100 analyzes at least one of the first model 210 or the second model 220 to determine at least one of the reliability of the first analysis result 15 or the reliability of the second analysis result 17. It can be estimated. The computing device 100 generates a first analysis result ( By combining 15) and the second analysis result (17), basis information corresponding to the judgment conditions for estimating behavior can be generated.

예를 들어, 도 3을 참조하면, 제 1 분석 결과(15)가 시선 방향이라는 감지 항목을 기준으로 "시험 응시자가 시험지를 확인하는 디스플레이 밖을 응시하고 있음"으로 도출된 경우, 컴퓨팅 장치(100)는 사전 결정된 룰셋에 기반하여 제 1 분석 결과(15)로부터 제 1 예상 행동을 부정 행위로 추정할 수 있다. 즉, 컴퓨팅 장치(100)는 사전 결정된 룰셋에 포함된 부정행위의 판단 조건에 제 1 분석 결과(15)가 매칭되는 것을 확인하고, 제 1 예상 행동을 부정행위로 추정할 수 있다. 제 2 분석 결과(17)가 신체 부위라는 감지 항목을 기준으로 "시험 응시자의 팔꿈치 하단이 기준 영역에서 벗어났음"으로 도출된 경우, 컴퓨팅 장치(100)는 사전 결정된 룰셋에 기반하여 제 2 분석 결과(17)로부터 제 2 예상 행동을 이상 행동으로 추정할 수 있다. 즉, 컴퓨팅 장치(100)는 사전 결정된 룰셋에 포함된 이상 행동의 판단 조건에 제 2 분석 결과(17)가 매칭되는 것을 확인하고, 제 2 예상 행동을 이상 행동으로 추정할 수 있다. 이와 같이 제 1 예상 행동과 제 2 예상 행동이 불일치 하는 경우, 컴퓨팅 장치(100)는 "시험 응시자의 팔꿈치 하단이 기준 영역을 벗어났고, 시험 응시자가 디스플레이 밖을 응시하고 있음"으로 제 1 분석 결과(15)와 제 2 분석 결과(17)를 조합할 수 있다. 이때, 제 1 분석 결과(15) 및 제 2 분석 결과(17)의 신뢰도에 따라 상술한 조합의 결과는 달라질 수 있다. 가령 제 2 분석 결과(17)의 신뢰도가 임계치 미만인 경우, 컴퓨팅 장치(100)는 제 1 분석 결과인 "시험 응시자가 디스플레이 밖을 응시하고 있음"을 조합의 결과로 도출할 수 있다. 제 2 분석 결과(17)의 신뢰도가 임계치 이상인 경우, 컴퓨팅 장치(100)는 상술한 "시험 응시자의 팔꿈치 하단이 기준 영역을 벗어났고, 시험 응시자가 디스플레이 밖을 응시하고 있음"으로 조합의 결과를 도출할 수 있다. 다만, 컴퓨팅 장치(100)는 최종적인 행동 추정에 제 1 분석 결과(15)가 더 비중있게 반영될 수 있도록, 제 1 분석 결과(15)의 정확도와 제 2 분석 결과(17)의 정확도에 대한 가중합(weighted sum)을 토대로 조합의 결과의 정확도를 도출할 수도 있다. 또한, 컴퓨팅 장치(100)는 제 1 분석 결과(15)와 부정행위 혹은 이상 행동 간의 상관 관계 및 제 2 분석 결과(17)와 부정행위 혹은 이상 행동 간의 상관 관계에 대한 가중합을 토대로 조합 결과와 부정행위 혹은 이상 행동 간의 상관 관계를 도출할 수도 있다. 이때, 상관 관계는 특정 분석 결과가 부정행위 혹은 이상 행동을 판단하는데 영향을 미치는 정도를 나타내는 정량적 지표로 이해될 수 있다. 즉, 조합의 결과의 내용은 상술한 "시험 응시자의 팔꿈치 하단이 기준 영역을 벗어났고, 시험 응시자가 디스플레이 밖을 응시하고 있음"으로 도출되는 반면, 조합의 결과에 대한 정확도 혹은 상관 관계에 상술한 가중합의 결과가 반영될 수 있다. 그리고, 조합의 결과에 대한 정확도 혹은 상관 관계는 컴퓨팅 장치(100)의 최종 행동 추정을 위한 연산 과정 반영될 수 있다. 이와 같이 컴퓨팅 장치(100)는 서로 다른 방향(혹은 각도)로 촬영된 데이터의 분석 결과를 상호 보완적으로 조합하여 행동을 모니터링하는데 사용함으로써, 행동 추정의 정확도를 높일 수 있다.For example, referring to FIG. 3, if the first analysis result 15 is derived as “the test taker is looking outside the display checking the test paper” based on the detection item called gaze direction, the computing device (100 ) can estimate the first expected behavior as cheating from the first analysis result (15) based on a predetermined rule set. That is, the computing device 100 may confirm that the first analysis result 15 matches the cheating judgment conditions included in the predetermined rule set and estimate the first expected behavior as cheating. If the second analysis result 17 is derived as "the lower end of the test taker's elbow is out of the reference area" based on the detection item called body part, the computing device 100 determines the second analysis result based on a predetermined rule set. From (17), the second expected behavior can be estimated as abnormal behavior. That is, the computing device 100 may confirm that the second analysis result 17 matches the abnormal behavior judgment conditions included in the predetermined rule set and estimate the second expected behavior as abnormal behavior. In this case, when the first expected behavior and the second expected behavior are inconsistent, the computing device 100 determines the first analysis result as “the lower part of the test taker’s elbow is outside the reference area, and the test taker is looking outside the display.” (15) and the second analysis result (17) can be combined. At this time, the results of the above-described combination may vary depending on the reliability of the first analysis result 15 and the second analysis result 17. For example, if the reliability of the second analysis result 17 is less than the threshold, the computing device 100 may derive the first analysis result, “Test taker is looking outside the display,” as a combination result. If the reliability of the second analysis result 17 is greater than or equal to the threshold, the computing device 100 determines the result of the combination as "the lower part of the test taker's elbow is outside the reference area, and the test taker is gazing out of the display" as described above. It can be derived. However, the computing device 100 controls the accuracy of the first analysis result 15 and the accuracy of the second analysis result 17 so that the first analysis result 15 is reflected more heavily in the final behavior estimation. The accuracy of the combination result can also be derived based on the weighted sum. In addition, the computing device 100 combines the results based on the weighted sum of the correlation between the first analysis result 15 and the cheating or abnormal behavior and the correlation between the second analysis result 17 and the cheating or abnormal behavior. Correlations between cheating or abnormal behavior can also be derived. At this time, correlation can be understood as a quantitative indicator that indicates the extent to which a specific analysis result influences the judgment of misconduct or abnormal behavior. In other words, the content of the combination result is derived from the above-mentioned "the bottom of the test taker's elbow is outside the reference area, and the test taker is gazing outside the display," while the accuracy or correlation of the combination result is as described above. The results of the weighted agreement can be reflected. In addition, the accuracy or correlation of the combination result may be reflected in the calculation process for estimating the final behavior of the computing device 100. In this way, the computing device 100 can increase the accuracy of behavior estimation by using a complementary combination of analysis results of data taken in different directions (or angles) to monitor behavior.

컴퓨팅 장치(100)는 행동 추정 결과(19)를 도출하기 위해서, 행동 모니터링을 위한 특정 환경에 맞추어 사전 결정된 룰셋을 사용할 수 있다. 구체적으로, 컴퓨팅 장치(100)는 제 1 분석 결과(15)와 제 2 분석 결과(17)의 조합 결과와 매칭되는 사전 결정된 룰셋에 포함된 행동 클래스 별 판단 조건을 식별할 수 있다. 이때, 제 1 분석 결과(15)과 제 2 분석 결과(17)가 가중합을 통해 조합된 경우, 컴퓨팅 장치(100)는 제 1 분석 결과(15)와 제 2 분석 결과(17)의 조합 결과에 대한 정확도 혹은 상관 관계를 반영하여 사전 결정된 룰셋에 포함된 행동 클래스 별 판단 조건을 필터링(filtering) 하여 행동 클래스에 대응되는 행동 추정 결과(19)를 도출할 수 있다. 컴퓨팅 장치(100)는 식별된 판단 조건에 해당하는 행동 클래스를 특정 환경 하에서 행동 모니터링을 위해 검출되어야 하는 사람의 행동을 추정할 수 있다.The computing device 100 may use a predetermined rule set tailored to a specific environment for behavior monitoring in order to derive the behavior estimation result 19. Specifically, the computing device 100 may identify a judgment condition for each action class included in a predetermined rule set that matches the combination result of the first analysis result 15 and the second analysis result 17. At this time, when the first analysis result 15 and the second analysis result 17 are combined through a weighted sum, the computing device 100 combines the first analysis result 15 and the second analysis result 17 By filtering the judgment conditions for each behavior class included in the predetermined rule set, reflecting the accuracy or correlation, the behavior estimation result (19) corresponding to the behavior class can be derived. The computing device 100 may estimate the behavior of a person whose behavior class corresponding to the identified judgment condition must be detected for behavior monitoring under a specific environment.

예를 들어, 행동 모니터링을 위한 환경이 온라인 시험을 위한 환경이라고 가정하면, 컴퓨팅 장치(100)는 사전 결정된 룰셋을 스크리닝(screening) 함으로써, 제 1 분석 결과(15)와 제 2 분석 결과(17)의 조합 결과에 매칭되는 부정행위에 관한 제 1 행동 클래스의 판단 조건 또는 이상 행동에 관한 제 2 행동 클래스의 판단 조건을 식별할 수 있다. 컴퓨팅 장치(100)는 식별된 판단 조건에 해당하는 부정행위에 관한 제 1 행동 클래스 혹은 이상 행동에 관한 제 2 행동 클래스를 행동 추정 결과(19)로 도출할 수 있다.For example, assuming that the environment for behavior monitoring is an environment for an online test, the computing device 100 screens a predetermined rule set to obtain the first analysis result 15 and the second analysis result 17. The judgment conditions of the first behavior class regarding misconduct or the judgment conditions of the second behavior class regarding abnormal behavior that match the combination result of can be identified. The computing device 100 may derive a first behavior class related to misconduct or a second behavior class related to abnormal behavior corresponding to the identified judgment conditions as the behavior estimation result 19.

도 4는 본 개시의 일 실시예에 따른 신경망 모델의 특징 보정을 위한 연산 과정을 나타낸 개념도이다.Figure 4 is a conceptual diagram showing a computational process for feature correction of a neural network model according to an embodiment of the present disclosure.

도 4를 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 동기화 된 제 1 영상 데이터(21)를 제 1 모델로 입력하여 사람의 얼굴 및 몸 각각에 대한 특징점들을 추출할 수 있다. 컴퓨팅 장치(100)는 동기화 된 제 2 영상 데이터(25)를 제 2 모델로 입력하여 사람의 얼굴 및 몸 각각에 대한 특징점들을 추출할 수 있다. 이때, 동기화 된 제 2 영상 데이터(25)로부터 추출된 사람의 얼굴에 대한 특징점과 사람의 몸에 대한 특징점이 소정의 각도 이상 어긋나는 경우, 컴퓨팅 장치(100)는 동기화 된 제 1 영상 데이터(21)로부터 추출된 특징점들을 기준으로, 동기화 된 제 2 영상 데이터(25)로부터 추출된 특징점들을 보정할 수 있다.Referring to FIG. 4, the computing device 100 according to an embodiment of the present disclosure may input the synchronized first image data 21 into the first model to extract feature points for each of the human face and body. . The computing device 100 may input the synchronized second image data 25 into the second model and extract feature points for each of the person's face and body. At this time, if the feature points for the human face and the feature points for the human body extracted from the synchronized second image data 25 are offset by more than a predetermined angle, the computing device 100 generates the synchronized first image data 21. Based on the feature points extracted from , the feature points extracted from the synchronized second image data 25 can be corrected.

예를 들어, 동기화 된 제 1 영상 데이터(21)는 사람의 정면 방향에서 촬영된 영상, 동기화 된 제 2 영상 데이터(25)는 사람의 측면 방향에서 촬영된 영상이라고 가정한다. 얼굴 포착이 사람의 움직임에 영향을 적게 받는 제 1 영상 데이터(21)와 달리 제 2 영상 데이터(25)에서 얼굴 포착은 사람의 움직임에 영향을 많이 받는다. 즉, 사람의 얼굴 각도가 몸을 기준으로 45도 이상 회전하는 경우와 같이 사람이 움직이게 되면, 제 2 모델은 제 2 영상 데이터(25)에서 사람의 얼굴에 대한 특징점을 제대로 추출하지 못하고 불안정한 분석을 수행하게 된다. 따라서, 컴퓨팅 장치(100)는 제 2 영상 데이터(25)로부터 추출된 얼굴에 대한 특징점과 몸에 대한 특징점이 45도 이상 어긋나는 경우, 사람의 움직임에 영향이 적은 제 1 영상 데이터(21)로부터 추출된 특징점들의 좌표계(30)를 기준으로 제 2 영상 데이터(25)로부터 추출된 특징점들의 좌표를 조정할 수 있다. 컴퓨팅 장치(100)는 좌표가 조정된 제 2 영상 데이터(25)의 특징점을 사용하여 제 2 모델의 출력을 생성할 수 있다. 즉, 제 2 모델은 제 1 모델에 의해 추출된 특징점들을 토대로 보정된 특징점을 이용하여 감지 항목 별 분석 결과를 생성할 수 있다. 이와 같은 보정을 통해 컴퓨팅 장치(100)는 데이터 분석의 안정성을 확보할 수 있다. For example, assume that the first synchronized image data 21 is an image captured from the frontal direction of the person, and the second synchronized image data 25 is an image captured from the side direction of the person. Unlike the first image data 21, where face capture is less affected by human movement, face capture in the second image data 25 is greatly affected by human movement. In other words, when the person moves, such as when the angle of the person's face rotates more than 45 degrees relative to the body, the second model cannot properly extract feature points for the person's face from the second image data 25 and performs unstable analysis. It will be performed. Therefore, when the feature points for the face and the feature points for the body extracted from the second image data 25 are offset by more than 45 degrees, the computing device 100 extracts the feature points from the first image data 21, which have little effect on the person's movement. The coordinates of the feature points extracted from the second image data 25 can be adjusted based on the coordinate system 30 of the feature points. The computing device 100 may generate an output of the second model using the feature points of the second image data 25 whose coordinates have been adjusted. That is, the second model can generate analysis results for each detected item using feature points corrected based on feature points extracted by the first model. Through such correction, the computing device 100 can secure the stability of data analysis.

도 5는 본 개시의 일 실시예에 따른 인공지능 기반 행동 모니터링 방법을 나타낸 순서도이다.Figure 5 is a flowchart showing an artificial intelligence-based behavior monitoring method according to an embodiment of the present disclosure.

도 5를 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 행동 모니터링의 대상이 되는 사람의 얼굴을 기준으로 제 1 방향에서 촬영된 제 1 영상 데이터, 및 상기 제 1 방향과 상이한 제 2 방향에서 촬영된 제 2 영상 데이터를 동기화 할 수 있다(S110). 예를 들어, 컴퓨팅 장치(100)는 사람의 얼굴 정면에 설치된 카메라를 통해 생성된 제 1 영상 데이터와 사람의 얼굴 측면에 설치된 카메라를 통해 생성된 제 2 영상 데이터를 동기화 할 수 있다. 이때, 제 1 영상 데이터와 제 2 영상 데이터의 동기화는 사람의 얼굴이 촬영된 시점이 일치되도록 맞추는 작업으로 이해될 수 있다. 컴퓨팅 장치(100)는 동기화를 통해 행동 추정을 위한 데이터로 제 1 영상 데이터와 제 2 영상 데이터를 함께 효과적으로 활용할 수 있다.Referring to FIG. 5, the computing device 100 according to an embodiment of the present disclosure includes first image data captured in a first direction with respect to the face of a person subject to behavior monitoring, and image data different from the first direction. The second image data captured in the second direction can be synchronized (S110). For example, the computing device 100 may synchronize first image data generated through a camera installed in front of a person's face and second image data generated through a camera installed on a side of the person's face. At this time, synchronization of the first image data and the second image data can be understood as a task of adjusting the time when the person's face is photographed to match. The computing device 100 can effectively use the first image data and the second image data together as data for behavior estimation through synchronization.

컴퓨팅 장치(100)는 사전 학습된 신경망 모델을 사용하여, S110 단계를 통해 동기화 된 제 1 영상 데이터와 제 2 영상 데이터 각각을 기초로 복수의 감지 대상들 각각에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성할 수 있다(S120). 이때, 복수의 감지 대상들은 사람의 신체 부위와 함께, 상기 사람을 제외한 사물, 상기 사람의 행동과 연관된 객체의 소리 혹은 상기 사람의 행동과 연관된 객체의 시간 중 적어도 하나를 포함할 수 있다. 그리고, 감지 항목은, 감지 대상의 하위 클래스를 기준으로 식별되는 상태 정보이며, 사람의 행동에 따라 변화 가능한 상태 정보일 수 있다. 예를 들어, 컴퓨팅 장치(100)는 얼굴 정면에서 촬영된 영상 데이터를 기반으로 시선을 추적하는 제 1 모델에 제 2 영상 데이터와 동기화 된 제 1 영상 데이터를 입력하여, 신체 부위, 사물, 객체의 소리 및 객체의 시간 중 적어도 하나에 포함된 적어도 하나의 감지 항목에 대한 분석 결과를 생성할 수 있다. 컴퓨팅 장치(100)는 얼굴 측면에서 촬영된 영상 데이터를 기반으로 사람의 포즈를 추정하는 제 2 모델에 제 1 영상 데이터와 동기화 된 제 2 영상 데이터를 입력하여, 신체 부위, 사물, 객체의 소리 및 객체의 시간 중 적어도 하나에 포함된 적어도 하나의 감지 항목에 대한 분서 결과를 생성할 수 있다. 제 1 모델과 제 2 모델 각각의 분석 기준이 되는 감지 대상, 감지 항목은 동일할 수도 있고, 상이할 수도 있다.The computing device 100 uses a pre-trained neural network model to detect at least one detection item included in each of the plurality of detection objects based on each of the first image data and the second image data synchronized through step S110. Analysis results can be generated (S120). At this time, the plurality of detection objects may include at least one of a body part of a person, an object other than the person, a sound of an object related to the person's action, or a time of an object related to the person's action. Additionally, the detection item is status information identified based on the subclass of the detection target, and may be status information that can change depending on human behavior. For example, the computing device 100 inputs first image data synchronized with second image data into a first model that tracks gaze based on image data taken from the front of the face to identify body parts, objects, and objects. An analysis result for at least one detection item included in at least one of sound and object time may be generated. The computing device 100 inputs second image data synchronized with the first image data into a second model that estimates the pose of a person based on image data taken from the side of the face, and generates body parts, objects, sounds of objects, and An analysis result may be generated for at least one detection item included in at least one of the times of the object. The detection target and detection item that serve as analysis standards for each of the first model and the second model may be the same or different.

컴퓨팅 장치(100)는 S120 단계를 통해 생성된 제 1 분석 결과와 S120 단계를 통해 생성된 제 2 분석 결과를 조합하여, 사람의 행동을 추정할 수 있다(S130). 구체적으로, 컴퓨팅 장치(100)는 제 1 분석 결과에 따른 제 1 예상 행동 및 제 2 분석 결과에 따른 제 2 예상 행동의 일치 여부를 판단할 수 있다. 이때, 예상 행동은 행동 모니터링 환경에 따라 사전 결정된 룰셋에 포함된 추정 행동에 대응될 수 있다. 그리고, 컴퓨팅 장치(100)는 제 1 분석 결과 혹은 제 2 분석 결과 중 적어도 하나의 신뢰도를 추정할 수 있다. 컴퓨팅 장치(100)는 앞서 판단된 일치 여부 및 추정된 신뢰도를 기반으로 제 1 분석 결과와 제 2 분석 결과를 조합하여, 사람의 행동을 추정할 수 있다. 예를 들어, 제 1 예상 행동과 제 2 예상 행동이 일치하는 경우, 컴퓨팅 장치(100)는 일치된 예상 행동을 사람의 행동으로 추정할 수 있다. 제 1 예상 행동과 상기 제 2 예상 행동이 불일치 하고, 제 2 분석 결과의 신뢰도가 임계값 미만인 경우, 컴퓨팅 장치(100)는 제 1 예상 행동을 상기 사람의 행동으로 추정할 수 있다. 제 1 예상 행동과 제 2 예상 행동이 불일치 하고, 제 2 분석 결과의 신뢰도가 임계값 이상인 경우, 컴퓨팅 장치(100)는 제 1 분석 결과와 제 2 분석 결과를 결합하여 도출된 판단 조건을 토대로, 사람의 행동을 추정할 수 있다. 이때, 각 예상 행동 혹은 판단 조건을 토대로 추정되는 사람의 행동은 행동 모니터링 환경에 따라 사전 결정된 룰셋을 토대로 결정될 수 있다. 이때, 임계값은 행동 모니터링을 위한 관리자에 의해 사전 결정된 값일 수 있다.The computing device 100 may estimate human behavior by combining the first analysis result generated through step S120 and the second analysis result generated through step S120 (S130). Specifically, the computing device 100 may determine whether the first expected behavior according to the first analysis result and the second expected behavior according to the second analysis result match. At this time, the expected behavior may correspond to the estimated behavior included in a predetermined rule set according to the behavior monitoring environment. Additionally, the computing device 100 may estimate the reliability of at least one of the first analysis result or the second analysis result. The computing device 100 may estimate the person's behavior by combining the first analysis result and the second analysis result based on the previously determined match and the estimated reliability. For example, when the first expected behavior and the second expected behavior match, the computing device 100 may estimate the matched expected behavior as human behavior. If the first expected behavior and the second expected behavior are inconsistent, and the reliability of the second analysis result is less than a threshold, the computing device 100 may estimate the first expected behavior as the person's behavior. If the first expected behavior and the second expected behavior are inconsistent and the reliability of the second analysis result is greater than or equal to the threshold, the computing device 100 combines the first analysis result and the second analysis result based on the judgment condition derived, Human behavior can be estimated. At this time, the person's behavior estimated based on each expected behavior or judgment condition may be determined based on a predetermined rule set according to the behavior monitoring environment. At this time, the threshold may be a value predetermined by the administrator for behavior monitoring.

도 6은 본 개시의 일 실시예에 따른 온라인 시험 환경에서 행동을 모니터링하는 방법을 나타낸 순서도이다.Figure 6 is a flowchart showing a method for monitoring behavior in an online test environment according to an embodiment of the present disclosure.

도 6을 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 온라인 시험의 주최자 클라이언트를 통해 입력된 사용자 요청에 기반하여 온라인 시험을 생성할 수 있다(S210). 이때, 온라인 시험을 위한 환경 조건, 시험 응시자의 행동 모니터링을 위한 룰셋 등은 주최자 클라이언트를 통해 입력된 사용자 요청을 반영하여 결정될 수 있다. 예를 들어, 컴퓨팅 장치(100)는 주최자 클라이언트를 통해 입력된 사용자 요청을 토대로 관찰 데이터의 획득 주기, 부정행위 혹은 이상 행동에 대한 정의 및 판단 조건을 포함하는 룰셋 등을 결정할 수 있다. 룰셋은 사용자 요청에 의해 생성된 이후에 컴퓨팅 장치(100)가 행동 추정을 반복적으로 수행하는 과정에서 동적으로 업데이트 될 수 있다.Referring to FIG. 6, the computing device 100 according to an embodiment of the present disclosure may create an online exam based on a user request input through an online exam organizer client (S210). At this time, the environmental conditions for the online test, the rule set for monitoring the test taker's behavior, etc. may be determined by reflecting the user request entered through the host client. For example, the computing device 100 may determine an acquisition cycle for observation data, a rule set including definitions and judgment conditions for misconduct or abnormal behavior, etc., based on a user request input through the host client. After the rule set is created by a user request, it may be dynamically updated while the computing device 100 repeatedly performs behavior estimation.

온라인 시험이 생성되면(S210), 컴퓨팅 장치(100)는 사전 결정된 소정의 주기에 맞춰 시험 응시자를 촬영한 영상을 획득할 수 있다(S220). 예를 들어, 컴퓨팅 장치(100)는 시험 응시 공간에 설치된 감지 장치와 유무선 통신을 통해 100ms 내지 1s 간격으로 시험 응시자의 얼굴에 대한 정면 영상과 측면 영상을 획득할 수 있다. 이때, 감지 장치는 시험 응시자의 클라이언트에 구비된 일 구성일 수도 있고, 컴퓨팅 장치(100)의 일 구성일 수도 있다. 그리고, 관찰 데이터의 획득 주기는 S210 단계를 통해 온라인 시험의 환경 조건에 맞추어 사전 결정될 수 있다.When an online test is created (S210), the computing device 100 may acquire images of the test taker at a predetermined cycle (S220). For example, the computing device 100 may acquire frontal and side images of the test taker's face at intervals of 100 ms to 1 s through wired or wireless communication with a sensing device installed in the test taking space. At this time, the sensing device may be a component provided in the test taker's client or may be a component of the computing device 100. In addition, the acquisition cycle of observation data can be predetermined according to the environmental conditions of the online test through step S210.

컴퓨팅 장치(100)는 정면 영상과 측면 영상을 동기화 할 수 있다(S230). 정면 영상과 측면 영상이 동일한 피사체(i.e. 시험 응시자)를 촬영하여 생성된 영상이라 할지라도, 컴퓨팅 장치(100)는 안정적인 분석을 위해 시험 응시자의 얼굴을 기준으로 정면 영상과 측면 영상의 촬영 시점을 맞추는 전처리 작업을 수행할 수 있다. The computing device 100 may synchronize the front image and the side image (S230). Even if the front image and the side image are images created by shooting the same subject (i.e. the test taker), the computing device 100 adjusts the shooting timing of the front image and the side image based on the test taker's face for stable analysis. Preprocessing operations can be performed.

컴퓨팅 장치(100)는 동기화 된 정면 영상(51)과 동기화 된 측면 영상(52)을 신경망 모델에 입력하여 감지 항목 별 분석을 수행할 수 있다(S240). 컴퓨팅 장치(100)는 동기화 된 정면 영상(51)을 사전 학습된 제 1 모델(210)에 입력하여 제 1 모델(210)이 매칭되는 감지 항목에 대한 제 1 분석 결과(53)를 생성할 수 있다. 컴퓨팅 장치(100)는 동기화 된 측면 영상(52)을 사전 학습된 제 2 모델(220)에 입력하여 제 2 모델(220)이 매칭되는 감지 항목에 대한 제 2 분석 결과(54)를 생성할 수 있다. 컴퓨팅 장치(100)는 제 1 분석 결과(53)를 토대로 제 1 예상 행동(55)을 추정할 수 있다. 제 1 예상 행동(55)은 제 1 분석 결과(53)를 근거로 부정행위 혹은 이상 행동을 예상했을 때 결정되는 행동으로 이해될 수 있다. 컴퓨팅 장치(100)는 제 2 분석 결과(54)를 토대로 제 2 예상 행동(56)을 추정할 수 있다. 제 2 예상 행동(56)은 제 2 분석 결과(54)를 근거로 부정행위 혹은 이상 행동을 예상했을 때 결정되는 행동으로 이해될 수 있다.The computing device 100 may perform analysis for each detected item by inputting the synchronized front image 51 and the synchronized side image 52 into a neural network model (S240). The computing device 100 may input the synchronized frontal image 51 into the pre-trained first model 210 to generate a first analysis result 53 for the detected item to which the first model 210 matches. there is. The computing device 100 may input the synchronized side image 52 into the pre-trained second model 220 to generate a second analysis result 54 for the detected item to which the second model 220 matches. there is. The computing device 100 may estimate the first expected action 55 based on the first analysis result 53. The first expected behavior (55) can be understood as a behavior determined when cheating or abnormal behavior is expected based on the first analysis result (53). The computing device 100 may estimate the second expected action 56 based on the second analysis result 54. The second expected behavior (56) can be understood as a behavior determined when cheating or abnormal behavior is expected based on the second analysis result (54).

컴퓨팅 장치(100)는 제 1 예상 행동(55)과 제 2 예상 행동(56)이 일치하는지 여부를 판단할 수 있다(S251). 제 1 예상 행동(55)과 제 2 예상 행동(56)이 일치하는 경우, 컴퓨팅 장치(100)는 일치된 예상 행동으로 시험 응시자의 행동을 결정할 수 있다(S255). 제 1 예상 행동(55)과 제 2 예상 행동(56)이 불일치 하는 경우, 컴퓨팅 장치(100)는 제 2 분석 결과(54)의 신뢰도가 임계값 이상인지 여부를 판단할 수 있다(S261). 제 2분석 결과(54)의 신뢰도가 임계값 미만인 경우, 컴퓨팅 장치(100)는 제 1 예상 행동(55)으로 시험 응시자의 행동을 결정할 수 있다(S265). 제 2 분석 결과(54)의 신뢰도가 임계값 이상인 경우, 컴퓨팅 장치(100)는 제 1 분석 결과(53)와 제 2 분석 결과(54)를 조합할 수 있다(S270). 그리고, 컴퓨팅 장치(100)는 조합 결과에 기반하여 시험 응시자의 행동을 추정할 수 있다(S280). The computing device 100 may determine whether the first expected behavior 55 and the second expected behavior 56 match (S251). If the first expected behavior 55 and the second expected behavior 56 match, the computing device 100 may determine the test taker's behavior based on the matched expected behavior (S255). If the first expected action 55 and the second expected action 56 do not match, the computing device 100 may determine whether the reliability of the second analysis result 54 is greater than or equal to the threshold (S261). If the reliability of the second analysis result 54 is less than the threshold, the computing device 100 may determine the test taker's behavior using the first expected behavior 55 (S265). If the reliability of the second analysis result 54 is greater than or equal to the threshold, the computing device 100 may combine the first analysis result 53 and the second analysis result 54 (S270). And, the computing device 100 may estimate the test taker's behavior based on the combination result (S280).

예를 들어, 제 1 예상 행동(55)과 제 2 예상 행동(56)이 부정행위 혹은 이상 행동으로 일치하는 경우, 컴퓨팅 장치(100)는 일치된 행동으로 시험 응시자의 행동을 결정할 수 있다. 제 1 예상 행동(55)은 부정행위, 제 2 예상 행동(56)은 이상 행동으로 불일치하는 경우, 컴퓨팅 장치(100)는 제 2 분석 결과(54)의 신뢰도가 임계값 이상인지 여부를 판단할 수 있다. 제 2 분석 결과(54)의 신뢰도가 임계값 미만인 경우, 컴퓨팅 장치(100)는 제 1 예상 행동(55)인 부정행위를 시험 응시자의 행동으로 결정할 수 있다. 제 2 분석 결과(54)의 신뢰도가 임계값 이상인 경우, 컴퓨팅 장치(100)는 제 1 분석 결과(53)와 제 2 분석 결과(54)를 조합하고, 조합의 결과가 사전 결정된 룰셋에 존재하는지 파악할 수 있다. 컴퓨팅 장치(100)는 조합의 결과가 사전 결정된 룰셋에 포함된 판단 조건으로 존재하는 것으로 파악되면, 파악된 판단 조건에 따른 행동인 부정행위 혹은 이상 행동을 시험 응시자의 행동으로 결정할 수 있다.For example, if the first expected behavior 55 and the second expected behavior 56 match cheating or abnormal behavior, the computing device 100 may determine the test taker's behavior based on the matched behavior. If the first expected behavior 55 is inconsistent with cheating and the second expected behavior 56 is abnormal behavior, the computing device 100 determines whether the reliability of the second analysis result 54 is above the threshold. You can. If the reliability of the second analysis result 54 is less than the threshold, the computing device 100 may determine the first expected behavior 55, which is cheating, as the behavior of the test taker. When the reliability of the second analysis result 54 is greater than or equal to the threshold, the computing device 100 combines the first analysis result 53 and the second analysis result 54 and determines whether the combination result exists in a predetermined rule set. It can be figured out. If the computing device 100 determines that the result of the combination exists as a judgment condition included in a predetermined rule set, the computing device 100 may determine the test taker's behavior to be cheating or abnormal behavior, which is behavior according to the identified judgment condition.

앞서 설명된 본 개시의 다양한 실시예는 추가 실시예와 결합될 수 있고, 상술한 상세한 설명에 비추어 당업자가 이해 가능한 범주에서 변경될 수 있다. 본 개시의 실시예들은 모든 면에서 예시적인 것이며, 한정적이 아닌 것으로 이해되어야 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다. 따라서, 본 개시의 특허청구범위의 의미, 범위 및 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 개시의 범위에 포함되는 것으로 해석되어야 한다. The various embodiments of the present disclosure described above may be combined with additional embodiments and may be changed within the scope understandable to those skilled in the art in light of the above detailed description. The embodiments of the present disclosure should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form. Accordingly, all changes or modified forms derived from the meaning and scope of the claims of the present disclosure and their equivalent concepts should be construed as being included in the scope of the present disclosure.

Claims

An artificial intelligence-based behavior monitoring method performed by a computing device including at least one processor, comprising:
Synchronizing first image data captured in a first direction with respect to the face of a person subject to behavior monitoring, and second image data captured in a second direction different from the first direction;
Using a pre-trained neural network model, generating an analysis result for at least one sensed item included in each of a plurality of sensed objects based on each of the synchronized first and second image data; and
Combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data to estimate the person's behavior; including,
The step of determining a judgment condition for estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data, comprising:
determining whether a first expected behavior according to the first analysis result and a second expected behavior according to the second analysis result match;
estimating reliability of at least one of the first analysis result or the second analysis result; and
combining the first analysis result and the second analysis result based on the determined match and the estimated reliability to estimate the person's behavior;
Including,
method.

According to claim 1,
The detection items are:
It is status information identified based on the subclass of the detection target,
The status information is,
that can change depending on the person's actions,
method.

According to claim 1,
The plurality of detection targets are,
In addition to the body part of the person, it includes at least one of an object other than the person, a sound of an object associated with the action of the person, or a time of an object associated with the action of the person.
method.

According to claim 1,
The neural network model is,
a first model that tracks a person's gaze based on image data captured in the first direction; and
a second model that estimates a pose of a person based on image data captured in the second direction;
Including,
method.

According to claim 4,
Generating an analysis result for at least one detection item included in each of a plurality of detection objects based on each of the synchronized first and second image data using the pre-trained neural network model, comprising:
When the feature points of the human face and the feature points of the human body extracted by inputting the synchronized second image data into the second model are offset by more than a predetermined angle,
Inputting the synchronized first image data into the first model and correcting feature points extracted from the synchronized second image data based on a coordinate system of the extracted feature points;
Including,
method.

delete

According to claim 1,
The step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability includes,
If the first expected behavior and the second expected behavior match,
estimating the matched expected behavior as the person's behavior;
Including,
method.

According to claim 1,
The step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability includes,
When the first expected behavior and the second expected behavior are inconsistent, and the reliability of the second analysis result is less than a threshold value,
estimating the first expected behavior as the person's behavior;
Including,
method.

According to claim 1,
The step of estimating the person's behavior by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability includes,
When the first expected behavior and the second expected behavior are inconsistent, and the reliability of the second analysis result is greater than or equal to a threshold value,
estimating the person's behavior based on judgment conditions derived by combining the first analysis result and the second analysis result;
Including,
method.

According to claim 1,
The first direction is,
It is in the frontal direction of the face,
The second direction is,
In the side direction of the face,
method.

A computer program stored in a computer-readable storage medium, wherein the computer program, when executed on one or more processors, performs operations for monitoring behavior based on artificial intelligence,
The above operations are:
An operation of synchronizing first image data captured in a first direction with respect to the face of a person subject to behavior monitoring, and second image data captured in a second direction different from the first direction;
Using a pre-trained neural network model, generating an analysis result for at least one detection item included in each of a plurality of detection objects based on each of the synchronized first image data and the second image data; and
An operation of estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data,
The operation of determining a judgment condition for estimating the person's behavior by combining a first analysis result based on the synchronized first image data and a second analysis result based on the synchronized second image data includes:
Determine whether the first expected behavior according to the first analysis result and the second expected behavior according to the second analysis result match, and estimate the reliability of at least one of the first analysis result or the second analysis result, Estimating the person's behavior based on a judgment condition derived by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability,
computer program.

A computing device for monitoring behavior based on artificial intelligence,
A processor including at least one core;
a memory containing program codes executable on the processor; and
A network unit for acquiring image data;
Including,
The processor,
Synchronizing first image data captured in a first direction and second image data captured in a second direction different from the first direction based on the face of the person subject to behavioral monitoring,
Using a pre-trained neural network model, generate an analysis result for at least one detection item included in each of a plurality of detection objects based on each of the synchronized first and second image data,
Determine whether a first expected behavior according to a first analysis result based on the synchronized first image data matches a second expected behavior according to a second analysis result based on the synchronized second image data, and perform the first analysis Estimating the reliability of at least one of the result or the second analysis result, based on a judgment condition derived by combining the first analysis result and the second analysis result based on the determined match and the estimated reliability, estimating the behavior of said person,
Device.