KR20230133230A

KR20230133230A - Method, program and apparatus for contrasitive pre-training of neural network model based on electrocardiogram

Info

Publication number: KR20230133230A
Application number: KR1020230031269A
Authority: KR
Inventors: 권준명; 조용연; 홍동균
Original assignee: 주식회사 메디컬에이아이
Priority date: 2022-03-10
Filing date: 2023-03-09
Publication date: 2023-09-19

Abstract

The present invention relates to a contrastive pre-learning method of a neural network model based on electrocardiogram, which is performed by a computing device including at least one processor. The contrastive pre-learning method comprises: a local information generation step of generating a local representation context using a self-supervised learning-based model when an unlabeled raw electrocardiogram signal is input; a global information generation step of generating a global representation context by applying time average pooling to the local representation context; and a learning step of pre-learning a neural network model through contrast learning based on learning data including the local and global representation contexts. Therefore, excellent performance in diagnostic classification and patient identification can be exhibited.

Description

Method, program and apparatus for contrastive pre-training of neural network model based on electrocardiogram {METHOD, PROGRAM AND APPARATUS FOR CONTRASITIVE PRE-TRAINING OF NEURAL NETWORK MODEL BASED ON ELECTROCARDIOGRAM}

본 개시의 내용은 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법에 관한 것으로, 구체적으로 라벨링 되어있지 않은 대량의 심전도 신호로 로컬 및 글로벌 특성(representation)을 학습한 후에 소량의 라벨링된 데이터로 미세 조정을 수행하는 방법에 관한 것이다.The present disclosure relates to contrasting pre-training methods for electrocardiogram-based neural network models, specifically learning local and global features (representation) with large amounts of unlabeled electrocardiogram signals and then fine-tuning them with small amounts of labeled data. It's about how to do it.

심전도(ECG: electrocardiogram)는 심장에서 발생하는 전기적인 신호를 측정하여 심장에서부터 전극까지의 전도계통의 이상 유무를 확인하여 질환유무를 판별할 수 있게 하는 신호이다.An electrocardiogram (ECG) is a signal that measures electrical signals generated in the heart and checks for abnormalities in the conduction system from the heart to the electrodes to determine the presence or absence of disease.

심전도의 발생 원인인 심장박동은 우심방(right atrium)에 자리잡은 동방결절(sinus node)에서 시작된 임펄스가 먼저 우심방과 좌심방(left atrium)을 탈분극(deploarization)시키며 방실결절 (atrioventricular node)에서 잠시 지체된 후 심실을 활성화시킨다. The heartbeat, which is the cause of the electrocardiogram, is an impulse that originates from the sinus node located in the right atrium, first depolarizes the right and left atrium, and after a brief delay in the atrioventricular node, Activates the ventricles.

중격(septum)이 가장 빠르고 벽이 얇은 우심실은 벽이 두꺼운 좌심실보다 먼저 활성화된다. 푸르키녜 섬유(purkinje fiber)까지 전달된 탈분극 파는 심근에서 파도(wavefront)와 같이 심장내막에서 외심막으로 퍼져나가면서 심실수축을 일으키게 된다. 정상적으로 전기적 자극이 심장을 통하여 전도되기 때문에 심장은 분당 약 60~100회 수축된다. 각 수축은 1회 심박동수로 나타낸다. The right ventricle, which has the fastest septum and thin walls, activates before the left ventricle, which has thick walls. The depolarization wave transmitted to the Purkinje fibers spreads from the endocardium to the epicardium like a wavefront in the myocardium, causing ventricular contraction. Because electrical impulses are normally conducted through the heart, the heart contracts approximately 60 to 100 times per minute. Each contraction is represented by one heart beat.

이와 같은 심전도는 두 부위 간의 전위차를 기록하는 양극 유도(bipolar lead)와 전극을 부착시킨 부위의 전위를 기록하는 단극 유도(unipolar lead)를 통해 검출할 수 있으며, 심전도를 측정하는 방법에는 양극 유도인 표준 유도(standard limb lead), 단극 유도인 사지 유도(unipolar limb lead), 단극 유도인 흉부 유도(precordial lead) 등이 있다. Such an electrocardiogram can be detected through a bipolar lead, which records the potential difference between two parts, and a unipolar lead, which records the potential of the area where the electrode is attached. Methods for measuring an electrocardiogram include the bipolar lead. There is a standard limb lead, a unipolar limb lead, and a unipolar thoracic lead (precordial lead).

심장의 전기적 활성단계는 크게 심방 탈분극, 심실 탈분극, 심실 재분극 시기로 나뉘며, 이러한 각 단계는 도 1에 나타난 바와 같이 P, Q, R, S, T파라고 불리는 몇 개의 파의 형태로 반영된다. The electrical activity stage of the heart is largely divided into atrial depolarization, ventricular depolarization, and ventricular repolarization, and each of these stages is reflected in the form of several waves called P, Q, R, S, and T waves, as shown in Figure 1.

이러한 파들은 표준 형태를 갖추어야 심장의 전기적 활성이 정상이라고 볼 수 있다. 표준 형태인지 아닌지를 파악하기 위해서는 각 파가 유지되는 시간, 각 파끼리의 간격(interval), 각 파의 진폭, 첨도 등의 특징들이 정상 범위에 속하는지를 검사하여야 한다.These waves must have a standard shape for the heart's electrical activity to be considered normal. In order to determine whether it is a standard form or not, it is necessary to check whether characteristics such as the time each wave is maintained, the interval between each wave, the amplitude of each wave, and kurtosis are within the normal range.

이러한 심전도는 고가의 측정 장비로 측정되어 환자의 건강상태를 측정하기 위한 보조 도구로 사용되며, 일반적으로 심전도 측정 장비는 측정결과만을 표시해주며 진단은 온전히 의사의 몫이었다. These electrocardiograms are measured with expensive measuring equipment and used as an auxiliary tool to measure the patient's health status. In general, electrocardiogram measuring equipment only displays measurement results and diagnosis is entirely up to the doctor.

현재, 의사의 의존도를 낮추기 위해 심전도를 기초로 인공지능을 이용하여 신속 정확하게 질환을 진단하는 연구가 계속되고 있다. 또한, 웨어러블, 라이프스타일 심전도 측정 기기의 발달과 함께 심전도를 기초로 심장 질환뿐만 아닌 다른 여러 질환을 진단 및 모니터링할 수 있는 가능성이 대두되고 있다. Currently, research is continuing to quickly and accurately diagnose diseases using artificial intelligence based on electrocardiograms to reduce dependence on doctors. In addition, with the development of wearable and lifestyle electrocardiogram measurement devices, the possibility of diagnosing and monitoring not only heart disease but also various other diseases based on electrocardiogram is emerging.

심전도를 측정하여 심장의 기능을 비침습적으로 평가하는 방법은 부정맥, 심근경색, 동맥질환 등 수많은 심장질환 진단에 도움을 준다. 모든 의료 시설에서 심장과 관련된 증상에는 심전도(ECG) 측정이 포함되며, 이는 ECG 기록의 매일 축적되어 저장된다. Non-invasively assessing heart function by measuring electrocardiograms helps diagnose numerous heart diseases, including arrhythmia, myocardial infarction, and arterial disease. In all medical facilities, cardiac-related symptoms include electrocardiogram (ECG) measurements, which are stored in daily accumulation of ECG records.

종래 기술의 딥 러닝에 기반한 심전도 신호 분석 방법은 ECG 데이터의 감독 학습이 임상 개업의 또는 전문 심장 전문의만이 주석을 달 수 있는 레이블로 수행되었다. 이로 인해 학습에 사용할 수 있는 샘플의 수가 제한되었기 때문에 소량의 학습 데이터로 딥러닝에 기반한 모델을 학습하는 데에 어려움이 있고, 라벨링 되지 않은 데이터에 대해서 학습할 수 없다는 문제점이 있다.Prior art deep learning-based ECG signal analysis methods have involved supervised learning of ECG data with labels that can only be annotated by clinical practitioners or expert cardiologists. Because of this, the number of samples that can be used for learning is limited, which makes it difficult to learn a model based on deep learning with a small amount of training data, and there is a problem that it cannot be learned on unlabeled data.

대한민국 공개특허공보 제10-2021-0061769호(2021.05.28.)Republic of Korea Patent Publication No. 10-2021-0061769 (May 28, 2021)

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 본 개시의 일 실시예에 따른 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법은 자기 지도 학습을 통해 라벨링되지 않은 대량의 심전도 신호로 로컬 특성 및 글로벌 특성을 모두 학습한 후에 소량의 라벨링된 데이터로 미세 조정을 수행하도록 하는 것을 목적으로 한다.The present disclosure was created in response to the above-mentioned background technology, and the contrasting dictionary learning method of the neural network model based on the electrocardiogram according to an embodiment of the present disclosure is a method of learning local characteristics and local characteristics using a large amount of unlabeled electrocardiogram signals through self-supervised learning. The goal is to perform fine-tuning with a small amount of labeled data after learning all global features.

다만, 본 개시에서 해결하고자 하는 과제는 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재를 근거로 명확하게 이해될 수 있을 것이다.However, the problems to be solved by this disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood based on the description below.

전술한 바와 같은 과제를 실현하기 위한 본 개시의 일 실시예에 따라 컴퓨팅 장치에 의해 수행되는, 적어도 하나의 프로세서를 포함하는 컴퓨팅 장치에 의해 수행되는, 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법으로서, 라벨링되지 않은 원시 심전도 신호가 입력되면, 자기지도 학습에 기반한 모델을 이용하여 로컬 특성 컨텍스트(Local Representation Context)를 생성하는 로컬 정보 생성 단계; 상기 로컬 특성 컨텍스트로부터 글로벌 특성 컨텍스트를 생성하는 글로벌 정보 생성 단계; 및 상기 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 포함한 학습 데이터에 기초하여 대조 학습을 통해 신경망 모델을 사전 학습하는 학습 단계를 포함하는 방법을 제공하고자 한다.As a contrastive dictionary learning method of a neural network model based on an electrocardiogram, performed by a computing device including at least one processor, according to an embodiment of the present disclosure for realizing the above-described task, , When an unlabeled raw ECG signal is input, a local information generation step of generating a local representation context (Local Representation Context) using a model based on self-supervised learning; A global information generation step of generating a global characteristic context from the local characteristic context; and a learning step of pre-training a neural network model through contrast learning based on learning data including the local feature context and the global feature context.

대안적으로, 상기 로컬 정보 생성 단계는 Wav2Vec 모델을 이용하는 것을 포함하고, 상기 글로벌 정보 생성 단계는 CMSC(Contrastive Multi-segment Coding) 기법을 이용하는 것을 포함하는 방법을 제공하고자 한다.Alternatively, it is intended to provide a method in which the local information generation step includes using a Wav2Vec model, and the global information generation step includes using a CMSC (Contrastive Multi-segment Coding) technique.

대안적으로, 상기 로컬 정보 생성 단계는, 기 설정된 개수의 리드에 대한 원시 심전도 신호를 2×N(N>0인 자연수)개의 샘플 데이터로 분할하고, 분할된 샘플 데이터에 RLM(Random Lead Masking)을 적용하여 마스킹된 리드 심전도 신호를 생성하는 단계를 더 포함하는 방법을 제공하고자 한다.Alternatively, the local information generation step divides the raw ECG signal for a preset number of leads into 2×N (a natural number where N>0) sample data, and performs RLM (Random Lead Masking) on the divided sample data. It is intended to provide a method further including the step of generating a masked lead electrocardiogram signal by applying .

대안적으로, 심장 부정맥 분류 및 환자 식별에 대한 레이블이 지정된 데이터세트를 이용하여 상기 사전 학습된 신경망 모델을 미세 조정(Fine Tuning)하여 평가하는 평가 단계를 더 포함하는 방법을 제공하고자 한다.Alternatively, we would like to provide a method that further includes an evaluation step of fine tuning and evaluating the pre-trained neural network model using a labeled dataset for cardiac arrhythmia classification and patient identification.

대안적으로, 상기 Wav2Vec 모델은 적어도 하나 이상의 CNN(Convolutional neural network)과 상기 CNN에 대응되는 트랜스포머(Transformer)를 포함하고, 상기 로컬 특성 컨텍스트는 상기 원시 심전도 신호가 상기 CNN과 트랜스포머를 거치면서 생성되는 것이고, 상기 글로벌 특성 컨텍스트는 상기 로컬 특성 컨텍스트가 상기 트랜스포머를 거치면서 시간 평균 풀링(Average Pooling)이 적용되어 생성되는 방법을 제공하고자 한다.Alternatively, the Wav2Vec model includes at least one convolutional neural network (CNN) and a transformer corresponding to the CNN, and the local characteristic context is generated as the raw ECG signal passes through the CNN and the transformer. The aim is to provide a method in which the global characteristic context is generated by applying time average pooling while the local characteristic context passes through the transformer.

대안적으로, 상기 로컬 정보 생성 단계는, 상기 원시 심전도 신호가 입력되어 CNN을 사용하여 잠재 공간 상에서 잠재 벡터(Z)를 생성하고, 상기 잠재 벡터(Z)를 인코딩하여 상기 로컬 특성 컨텍스트(C)를 생성하는 것을 포함하고, 상기 학습 단계는 사전 학습 과정에서, 기 설정된 시간 단계에서 잠재 벡터(Z)가 양자화되어 양자화된 특징(Q)을 생성하고, 트랜스포머에 제공되기 이전에 마스킹(m)되는 것을 포함하는 방법을 제공하고자 한다.Alternatively, the local information generation step may include inputting the raw ECG signal to generate a latent vector (Z) in a latent space using a CNN, and encoding the latent vector (Z) to create the local feature context (C). Including generating a , wherein the learning step is to quantize a potential vector (Z) at a preset time step in a pre-learning process to generate a quantized feature (Q), which is masked (m) before being provided to the transformer. We would like to provide a method that includes this.

대안적으로, 상기 신경망 모델은, 각 마스킹된 시간 단계에서 상기 로컬 특성 컨텍스트와 상기 양자화된 특징 간의 코사인 유사성이 최대값을 갖도록 학습된 것을 특징으로 한다. Alternatively, the neural network model is characterized in that the cosine similarity between the local feature context and the quantized feature is learned to have a maximum value at each masked time step.

대안적으로, 상기 글로벌 정보 생성 단계는, 제1 환자의 심전도 기록에서 i번째 심전도 신호에 대해 지속 시간(Si) 및 상기 지속 시간(Si) 내에서 서로 인접하면서 겹치지 않는 시간 구역의 시간 세그먼트 특성을 양수 쌍으로 정의하는 단계; 및 상기 제1 환자의 심전도 신호에 대한 시간 세그먼트 특성과 다른 제2 환자의 심전도 신호에 대한 시간 세그먼트 특성을 음수 쌍으로 정의하는 단계를 포함하는 방법을 제공하고자 한다.Alternatively, the global information generation step may include a duration Si and time segment characteristics of adjacent, non-overlapping time zones within the duration Si for the ith ECG signal in the ECG record of the first patient. defining a pair of positive numbers; and defining time segment characteristics of the ECG signal of the second patient, which are different from the time segment characteristics of the ECG signal of the first patient, as a negative pair.

대안적으로, 상기 학습 단계는, 상기 사전 학습 과정에서 제1 환자와 제2 환자의 심전도 기록 간의 상관 관계를 이용하여 자기지도 학습을 통해 환자별로 학습된 것을 특징으로 한다. Alternatively, the learning step may be characterized in that each patient is learned through self-supervised learning using the correlation between the electrocardiogram records of the first patient and the second patient in the prior learning process.

본 개시의 다른 실시예에 따른 컴퓨터 판독가능 저장 매체 저장된 컴퓨터 프로그램(program)으로서, 상기 컴퓨터 프로그램은 하나 이상의 프로세서(processor)에서 실행되는 경우, 심전도에 기초한 신경망 모델의 대조적인 사전 학습하기 위한 동작들을 수행하도록 하며, 상기 동작들은, 라벨링되지 않은 원시 심전도 신호가 입력되면, 자기지도 학습에 기반한 모델을 이용하여 로컬 특성 컨텍스트(Local Representation Context)를 생성하는 동작; 상기 로컬 특성 컨텍스트에 시간 평균 풀링(Average Pooling)을 적용하여 글로벌 특성 컨텍스트를 생성하는 동작; 및 상기 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 포함한 학습 데이터에 기초하여 대조 학습을 통해 신경망 모델을 사전 학습하는 동작을 포함하는 컴퓨터 프로그램을 제공하고자 한다.A computer program stored in a computer-readable storage medium according to another embodiment of the present disclosure, wherein the computer program, when executed on one or more processors, performs operations for contrastive pre-learning of a neural network model based on the electrocardiogram. The operations include: generating a local representation context using a model based on self-supervised learning when an unlabeled raw ECG signal is input; generating a global feature context by applying time average pooling to the local feature context; and pre-training a neural network model through contrast learning based on learning data including the local feature context and the global feature context.

본 개시의 또 다른 실시예에 따른 심전도에 기초한 신경망 모델의 대조적인 사전 학습하기 위한 컴퓨팅 장치로서, 적어도 하나의 코어(core)를 포함하는 프로세서(processor); 및 상기 프로세서에서 실행 가능한 프로그램 코드(code)들을 포함하는 메모리(memory); 를 포함하고, 상기 프로세서는, 상기 프로그램 코드의 실행에 따라, 라벨링되지 않은 원시 심전도 신호가 입력되면, 자기지도 학습에 기반한 모델을 이용하여 적어도 하나 이상의 로컬 특성 컨텍스트(Local Representation Context)를 생성하고, 상기 로컬 특성 컨텍스트에 시간 평균 풀링(Average Pooling)을 적용하여 글로벌 특성 컨텍스트를 생성하며, 상기 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 포함한 학습 데이터에 기초하여 대조 학습을 통해 신경망 모델을 사전 학습하는 장치를 제공하고자 한다.A computing device for contrastive pre-learning of a neural network model based on an electrocardiogram according to another embodiment of the present disclosure, comprising: a processor including at least one core; and a memory containing program codes executable on the processor; It includes, wherein, according to execution of the program code, when an unlabeled raw ECG signal is input, the processor generates at least one local characteristic context (Local Representation Context) using a model based on self-supervised learning, Provides a device that generates a global feature context by applying time average pooling to the local feature context and pre-trains a neural network model through contrastive learning based on learning data including the local feature context and the global feature context. I want to do it.

본 개시의 일 실시예에 따른 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법은 Wavc2Vec 모델, CMSC 기법 및 RLM 기법이 통합된 신경망 모델을 이용하여 라벨링 되지 않은 심전도 신호에 대한 글로벌 특성 및 로컬 특성을 모두 사전 학습하고, 사전 학습된 신경망 모델을 임의의 리드 세트가 있는 다운스트림 작업을 통해 미세 조정하여 소량의 데이터에서도 진단 분류 및 환자 식별에 우수한 성능을 나타낼 수 있는 신경망 모델을 제공할 수 있는 효과가 있다. The contrasting pre-learning method of the ECG-based neural network model according to an embodiment of the present disclosure uses a neural network model integrating the Wavc2Vec model, CMSC technique, and RLM technique to obtain both global and local characteristics for the unlabeled ECG signal. By pre-training and fine-tuning the pre-trained neural network model through downstream operations with a random read set, there is an effect of providing a neural network model that can show excellent performance in diagnostic classification and patient identification even with a small amount of data. .

도 1은 본 개시에 따른 심전도 신호를 보여주는 도면이다.
도 2는 본 개시의 일 실시예에 따른 컴퓨팅 장치의 블록도이다.
도 3은 본 개시의 일 실시예에 따라 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법을 보여주는 순서도이다.
도 4는 본 개시의 일 실시예에 따른 신경망 모델의 구조를 보여주는 도면이다.
도 5는 본 발명의 일 실시예에 따른 학습 단계의 수행 과정을 설명하는 순서도이다.1 is a diagram showing an electrocardiogram signal according to the present disclosure.
2 is a block diagram of a computing device according to an embodiment of the present disclosure.
3 is a flowchart showing a contrasting dictionary learning method for a neural network model based on an electrocardiogram according to an embodiment of the present disclosure.
Figure 4 is a diagram showing the structure of a neural network model according to an embodiment of the present disclosure.
Figure 5 is a flowchart explaining the process of performing the learning step according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 개시의 기술 분야에서 통상의 지식을 가진 자(이하, 당업자)가 용이하게 실시할 수 있도록 본 개시의 실시예가 상세히 설명된다. 본 개시에서 제시된 실시예들은 당업자가 본 개시의 내용을 이용하거나 또는 실시할 수 있도록 제공된다. 따라서, 본 개시의 실시예들에 대한 다양한 변형들은 당업자에게 명백할 것이다. 즉, 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며, 이하의 실시예에 한정되지 않는다. Below, with reference to the attached drawings, embodiments of the present disclosure are described in detail so that those skilled in the art (hereinafter referred to as skilled in the art) can easily implement the present disclosure. The embodiments presented in this disclosure are provided to enable any person skilled in the art to use or practice the subject matter of this disclosure. Accordingly, various modifications to the embodiments of the present disclosure will be apparent to those skilled in the art. That is, the present disclosure can be implemented in various different forms and is not limited to the following embodiments.

본 개시의 명세서 전체에 걸쳐 동일하거나 유사한 도면 부호는 동일하거나 유사한 구성요소를 지칭한다. 또한, 본 개시를 명확하게 설명하기 위해서, 도면에서 본 개시에 대한 설명과 관계없는 부분의 도면 부호는 생략될 수 있다.The same or similar reference numerals refer to the same or similar elements throughout the specification of this disclosure. Additionally, in order to clearly describe the present disclosure, reference numerals in the drawings may be omitted for parts that are not related to the description of the present disclosure.

본 개시에서 사용되는 "또는" 이라는 용어는 배타적 "또는" 이 아니라 내포적 "또는" 을 의미하는 것으로 의도된다. 즉, 본 개시에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "x는 a 또는 b를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 이해되어야 한다. 예를 들어, 본 개시에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "x는 a 또는 b를 이용한다" 는 x가 a를 이용하거나, x가 b를 이용하거나, 혹은 x가 a 및 b 모두를 이용하는 경우 중 어느 하나로 해석될 수 있다. As used in this disclosure, the term “or” is intended to mean an inclusive “or” and not an exclusive “or.” That is, unless otherwise specified in the present disclosure or the meaning is not clear from the context, “x uses a or b” should be understood to mean one of natural implicit substitutions. For example, unless otherwise specified in the present disclosure or the meaning is not clear from the context, “x uses a or b” means that x uses a, x uses b, or x uses a and It can be interpreted as one of the cases where both b are used.

본 개시에서 사용되는 "및/또는" 이라는 용어는 열거된 관련 개념들 중 하나 이상의 개념의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.The term “and/or” as used in this disclosure should be understood to refer to and include all possible combinations of one or more of the listed related concepts.

본 개시에서 사용되는 "포함한다" 및/또는 "포함하는" 이라는 용어는, 특정 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는" 이라는 용어는, 하나 이상의 다른 특징, 다른 구성요소 및/또는 이들에 대한 조합의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. The terms “comprise” and/or “comprising” as used in this disclosure should be understood to mean that certain features and/or elements are present. However, the terms "comprise" and/or "including" should be understood as not excluding the presence or addition of one or more other features, other components, and/or combinations thereof.

본 개시에서 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 단수는 일반적으로 "하나 또는 그 이상" 을 포함할 수 있는 것으로 해석되어야 한다.Unless otherwise specified in this disclosure or the context is clear to indicate a singular form, the singular should generally be construed to include “one or more.”

본 개시에서 사용되는 "제 n(n은 자연수)" 이라는 용어는 본 개시의 구성요소들을 기능적 관점, 구조적 관점, 혹은 설명의 편의 등 소정의 기준에 따라 상호 구별하기 위해 사용되는 표현으로 이해될 수 있다. 예를 들어, 본 개시에서 서로 다른 기능적 역할을 수행하는 구성요소들은 제 1 구성요소 혹은 제 2 구성요소로 구별될 수 있다. 다만, 본 개시의 기술적 사상 내에서 실질적으로 동일하나 설명의 편의를 위해 구분되어야 하는 구성요소들도 제 1 구성요소 혹은 제 2 구성요소로 구별될 수도 있다.The term "th nth (n is a natural number)" used in the present disclosure can be understood as an expression used to distinguish the components of the present disclosure according to a predetermined standard such as a functional perspective, a structural perspective, or explanatory convenience. there is. For example, in the present disclosure, components performing different functional roles may be distinguished as first components or second components. However, components that are substantially the same within the technical spirit of the present disclosure but must be distinguished for convenience of explanation may also be distinguished as first components or second components.

본 개시에서 사용되는 "획득" 이라는 용어는, 외부 장치 혹은 시스템과의 유무선 통신 네트워크를 통해 데이터를 수신하는 것 뿐만 아니라, 온-디바이스(on-device) 형태로 데이터를 생성하는 것을 의미하는 것으로 이해될 수 있다.The term “acquisition” used in this disclosure is understood to mean not only receiving data through a wired or wireless communication network with an external device or system, but also generating data in an on-device form. It can be.

한편, 본 개시에서 사용되는 용어 "모듈(module)", 또는 "부(unit)" 는 컴퓨터 관련 엔티티(entity), 펌웨어(firmware), 소프트웨어(software) 혹은 그 일부, 하드웨어(hardware) 혹은 그 일부, 소프트웨어와 하드웨어의 조합 등과 같이 컴퓨팅 자원을 처리하는 독립적인 기능 단위를 지칭하는 용어로 이해될 수 있다. 이때, "모듈", 또는 "부"는 단일 요소로 구성된 단위일 수도 있고, 복수의 요소들의 조합 혹은 집합으로 표현되는 단위일 수도 있다. 예를 들어, 협의의 개념으로서 "모듈", 또는 "부"는 컴퓨팅 장치의 하드웨어 요소 또는 그 집합, 소프트웨어의 특정 기능을 수행하는 응용 프로그램, 소프트웨어 실행을 통해 구현되는 처리 과정(procedure), 또는 프로그램 실행을 위한 명령어 집합 등을 지칭할 수 있다. 또한, 광의의 개념으로서 "모듈", 또는 "부"는 시스템을 구성하는 컴퓨팅 장치 그 자체, 또는 컴퓨팅 장치에서 실행되는 애플리케이션 등을 지칭할 수 있다. 다만, 상술한 개념은 하나의 예시일 뿐이므로, "모듈", 또는 "부"의 개념은 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 정의될 수 있다.Meanwhile, the term "module" or "unit" used in this disclosure refers to a computer-related entity, firmware, software or part thereof, hardware or part thereof. , can be understood as a term referring to an independent functional unit that processes computing resources, such as a combination of software and hardware. At this time, the “module” or “unit” may be a unit composed of a single element, or may be a unit expressed as a combination or set of multiple elements. For example, a "module" or "part" in the narrow sense is a hardware element or set of components of a computing device, an application program that performs a specific function of software, a process implemented through the execution of software, or a program. It can refer to a set of instructions for execution, etc. Additionally, as a broad concept, “module” or “unit” may refer to the computing device itself constituting the system, or an application running on the computing device. However, since the above-described concept is only an example, the concept of “module” or “unit” may be defined in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

본 개시에서 사용되는 "모델(model)" 이라는 용어는 특정 문제를 해결하기 위해 수학적 개념과 언어를 사용하여 구현되는 시스템, 특정 문제를 해결하기 위한 소프트웨어 단위의 집합, 혹은 특정 문제를 해결하기 위한 처리 과정에 관한 추상화 모형으로 이해될 수 있다. 예를 들어, 신경망(neural network) "모델" 은 학습을 통해 문제 해결 능력을 갖는 신경망으로 구현되는 시스템 전반을 지칭할 수 있다. 이때, 신경망은 노드(node) 혹은 뉴런(neuron)을 연결하는 파라미터(parameter)를 학습을 통해 최적화하여 문제 해결 능력을 가질 수 있다. 신경망 "모델" 은 단일 신경망을 포함할 수도 있고, 복수의 신경망들이 조합된 신경망 집합을 포함할 수도 있다.As used in this disclosure, the term "model" refers to a system implemented using mathematical concepts and language to solve a specific problem, a set of software units to solve a specific problem, or a process to solve a specific problem. It can be understood as an abstract model of a process. For example, a neural network “model” may refer to an overall system implemented as a neural network that has problem-solving capabilities through learning. At this time, the neural network can have problem-solving capabilities by optimizing parameters connecting nodes or neurons through learning. A neural network “model” may include a single neural network or a neural network set in which multiple neural networks are combined.

본 개시에서 사용되는 "데이터"는 "영상", 신호 등을 포함할 수 있다. 본 개시에서 사용되는 "영상" 이라는 용어는 이산적 이미지 요소들로 구성된 다차원 데이터를 지칭할 수 있다. 다시 말해, "영상"은 사람의 눈으로 볼 수 있는 대상의 디지털 표현물을 지칭하는 용어로 이해될 수 있다. 예를 들어, "영상"은 2차원 이미지에서 픽셀에 해당하는 요소들로 구성된 다차원 데이터를 지칭할 수 있다. "영상"은 3차원 이미지에서 복셀에 해당하는 요소들로 구성된 다차원 데이터를 지칭할 수 있다.“Data” used in this disclosure may include “image”, signals, etc. The term “image” used in this disclosure may refer to multidimensional data composed of discrete image elements. In other words, “image” can be understood as a term referring to a digital representation of an object that can be seen by the human eye. For example, “image” may refer to multidimensional data consisting of elements corresponding to pixels in a two-dimensional image. “Image” may refer to multidimensional data consisting of elements corresponding to voxels in a three-dimensional image.

본 개시에서 사용되는 "블록(block)" 이라는 용어는 종류, 기능 등과 같은 다양한 기준을 기초로 구분된 구성의 집합으로 이해될 수 있다. 따라서, 하나의 "블록"으로 분류되는 구성은 기준에 따라 다양하게 변경될 수 있다. 예를 들어, 신경망 "블록"은 적어도 하나의 신경망을 포함하는 신경망 집합으로 이해될 수 있다. 이때, 신경망 "블록"에 포함된 신경망을 특정 연산을 동일하게 수행하는 것으로 가정할 수 있다. 전술한 용어의 설명은 본 개시의 이해를 돕기 위한 것이다. 따라서, 전술한 용어를 본 개시의 내용을 한정하는 사항으로 명시적으로 기재하지 않은 경우, 본 개시의 내용을 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.The term “block” used in the present disclosure can be understood as a set of components divided based on various criteria such as type, function, etc. Accordingly, the configuration classified as one “block” can be changed in various ways depending on the standard. For example, a neural network “block” can be understood as a set of neural networks containing at least one neural network. At this time, it can be assumed that the neural networks included in the neural network “block” perform the same specific operation. The explanation of the foregoing terms is intended to aid understanding of the present disclosure. Therefore, if the above-mentioned terms are not explicitly described as limiting the content of the present disclosure, it should be noted that the content of the present disclosure is not used in the sense of limiting the technical idea.

도 2는 본 개시의 일 실시예에 따른 컴퓨팅 장치의 블록 구성도이다.Figure 2 is a block diagram of a computing device according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 데이터의 종합적인 처리 및 연산을 수행하는 하드웨어 장치 혹은 하드웨어 장치의 일부일 수도 있고, 통신 네트워크로 연결되는 소프트웨어 기반의 컴퓨팅 환경일 수도 있다. 예를 들어, 컴퓨팅 장치(100)는 집약적 데이터 처리 기능을 수행하고 자원을 공유하는 주체인 서버일 수도 있고, 서버와의 상호 작용을 통해 자원을 공유하는 클라이언트(client)일 수도 있다. 또한, 컴퓨팅 장치(100)는 복수의 서버들 및 클라이언트들이 상호 작용하여 데이터를 종합적으로 처리하는 클라우드 시스템(cloud system)일 수도 있다. 상술한 기재는 컴퓨팅 장치(100)의 종류와 관련된 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The computing device 100 according to an embodiment of the present disclosure may be a hardware device or part of a hardware device that performs comprehensive processing and calculation of data, or may be a software-based computing environment connected to a communication network. For example, the computing device 100 may be a server that performs intensive data processing functions and shares resources, or it may be a client that shares resources through interaction with the server. Additionally, the computing device 100 may be a cloud system in which a plurality of servers and clients interact to comprehensively process data. Since the above description is only an example related to the type of computing device 100, the type of computing device 100 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

도 2를 참조하면, 본 개시의 일 실시예에 따른 컴퓨팅 장치(100)는 프로세서(processor)(110), 메모리(memory)(120), 및 네트워크부(network unit)(130)를 포함할 수 있다. 다만, 도 2는 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)는 컴퓨팅 환경을 구현하기 위한 다른 구성들을 포함할 수 있다. 또한, 상기 개시된 구성들 중 일부만이 컴퓨팅 장치(100)에 포함될 수도 있다.Referring to FIG. 2, the computing device 100 according to an embodiment of the present disclosure may include a processor 110, a memory 120, and a network unit 130. there is. However, since FIG. 2 is only an example, the computing device 100 may include other components for implementing a computing environment. Additionally, only some of the configurations disclosed above may be included in computing device 100.

본 개시의 일 실시예에 따른 프로세서(110)는 컴퓨팅 연산을 수행하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 예를 들어, 프로세서(110)는 컴퓨터 프로그램을 판독하여 기계 학습을 위한 데이터 처리를 수행할 수 있다. 프로세서(110)는 기계 학습을 위한 입력 데이터의 처리, 기계 학습을 위한 특징 추출, 역전파(backpropagation)에 기반한 오차 계산 등과 같은 연산 과정을 처리할 수 있다. 이와 같은 데이터 처리를 수행하기 위한 프로세서(110)는 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit), 주문형 반도체(ASICc: application specific integrated circuit), 혹은 필드 프로그래머블 게이트 어레이(FPGA: field programmable gate array) 등을 포함할 수 있다. 상술한 프로세서(110)의 종류는 하나의 예시일 뿐이므로, 프로세서(110)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The processor 110 according to an embodiment of the present disclosure may be understood as a structural unit including hardware and/or software for performing computing operations. For example, the processor 110 may read a computer program and perform data processing for machine learning. The processor 110 may process computational processes such as processing input data for machine learning, extracting features for machine learning, and calculating errors based on backpropagation. The processor 110 for performing such data processing includes a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), a tensor processing unit (TPU), and a custom processing unit (TPU). It may include a semiconductor (ASICc: application specific integrated circuit), or a field programmable gate array (FPGA: field programmable gate array). Since the type of processor 110 described above is only an example, the type of processor 110 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

프로세서(110)는 심전도 신호에 기초하여 진단 식별 및 환자 분류에서 우수한 성능을 나타내는 신경망 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 심전도 신호와 함께, 성별, 나이, 체중, 신장 등의 정보를 포함하는 생물학적 데이터를 기초로 진단 식별 및 환자 분류 등을 추정하도록 신경망 모델을 학습시킬 수 있다. 프로세서(110)는 신경망 모델의 학습 과정에서 신경망 모델에 포함된 적어도 하나의 신경망 블록을 표현하는 연산을 수행할 수 있다. 이러한 프로세서(110)는 심전도 신호에 기초한 로컬 특성 컨텍스트 및 글로벌 특성 컨텍스트를 모두 고려한 대조적인 사전 학습을 수행함으로써, 심장 부정맥 분류 및 환자 식별이라는 두 가지 다운스트림 작업에 대해 다른 심전도 관련 사전 학습보다 성능이 우수한 신경망 모델을 제공할 수 있다. 프로세서(110)는 상술한 학습 과정을 통해 생성된 신경망 모델을 이용하여 심전도 신호를 진단 식별 및 환자 분류를 추정할 수 있도록 한다. The processor 110 may learn a neural network model that exhibits excellent performance in diagnosis identification and patient classification based on the ECG signal. For example, the processor 110 may train a neural network model to estimate diagnosis identification and patient classification based on biological data including information such as gender, age, weight, height, etc., along with electrocardiogram signals. The processor 110 may perform an operation representing at least one neural network block included in the neural network model during the learning process of the neural network model. This processor 110 performs contrastive dictionary learning that considers both local feature context and global feature context based on the ECG signal, thereby outperforming other ECG-related dictionary learning for two downstream tasks: cardiac arrhythmia classification and patient identification. It can provide excellent neural network models. The processor 110 uses the neural network model generated through the above-described learning process to estimate diagnostic identification and patient classification from ECG signals.

상술한 예시 이외에도 심전도 신호를 포함한 의료 데이터의 종류 및 신경망 모델의 출력은 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.In addition to the examples described above, the types of medical data including electrocardiogram signals and the output of the neural network model may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

본 개시의 일 실시예에 따른 메모리(120)는 컴퓨팅 장치(100)에서 처리되는 데이터를 저장하고 관리하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 즉, 메모리(120)는 프로세서(110)가 생성하거나 결정한 임의의 형태의 데이터 및 네트워크부(130)가 수신한 임의의 형태의 데이터를 저장할 수 있다. 예를 들어, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리, 램(ram: random access memory), 에스램(sram: static random access memory), 롬(rom: read-only memory), 이이피롬(eeprom: electrically erasable programmable read-only memory), 피롬(prom: programmable read-only memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(120)는 데이터를 소정의 체제로 통제하여 관리하는 데이터베이스(database) 시스템을 포함할 수도 있다. 상술한 메모리(120)의 종류는 하나의 예시일 뿐이므로, 메모리(120)의 종류는 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The memory 120 according to an embodiment of the present disclosure may be understood as a structural unit including hardware and/or software for storing and managing data processed in the computing device 100. That is, the memory 120 can store any type of data generated or determined by the processor 110 and any type of data received by the network unit 130. For example, the memory 120 may be a flash memory type, hard disk type, multimedia card micro type, card type memory, or random access memory (RAM). ), SRAM (static random access memory), ROM (read-only memory), EEPROM (electrically erasable programmable read-only memory), PROM (prom: programmable read-only memory), magnetic memory , may include at least one type of storage medium among a magnetic disk and an optical disk. Additionally, the memory 120 may include a database system that controls and manages data in a predetermined system. Since the type of memory 120 described above is only an example, the type of memory 120 may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present disclosure.

메모리(120)는 프로세서(110)가 연산을 수행하는데 필요한 데이터, 데이터의 조합, 및 프로세서(110)에서 실행 가능한 프로그램 코드(code) 등을 구조화 및 조직화하여 관리할 수 있다. 예를 들어, 메모리(120)는 후술할 네트워크부(130)를 통해 수신된 심전도 신호를 저장할 수 있다. 메모리(120)는 신경망 모델이 심전도 신호를 입력받아 학습을 수행하도록 동작시키는 프로그램 코드, 신경망 모델이 심전도 신호를 입력받아 컴퓨팅 장치(100)의 사용 목적에 맞춰 추론을 수행하도록 동작시키는 프로그램 코드, 및 프로그램 코드가 실행됨에 따라 생성된 가공 데이터 등을 저장할 수 있다.The memory 120 can structure, organize, and manage data necessary for the processor 110 to perform operations, combinations of data, and program codes executable on the processor 110. For example, the memory 120 may store an electrocardiogram signal received through the network unit 130, which will be described later. The memory 120 includes a program code that operates the neural network model to receive an electrocardiogram signal and perform learning, a program code that operates the neural network model to receive an electrocardiogram signal and perform inference according to the purpose of use of the computing device 100, and Processed data generated as the program code is executed can be saved.

본 개시의 일 실시예에 따른 네트워크부(130)는 임의의 형태의 공지된 유무선 통신 시스템을 통해 데이터를 송수신하는 구성 단위로 이해될 수 있다. 예를 들어, 네트워크부(130)는 근거리 통신망(LAN: local area network), 광대역 부호 분할 다중 접속(WCDMA: wideband code division multiple access), 엘티이(LTE: long term evolution), 와이브로(WIBRO: wireless broadband internet), 5세대 이동통신(5g), 초광역대 무선통신(ultra wide-band), 지그비(zigbee), 무선주파수(RF: radio frequency) 통신, 무선랜(wireless lan), 와이파이(wireless fidelity), 근거리 무선통신(NFC: near field communication), 또는 블루투스(bluetooth) 등과 같은 유무선 통신 시스템을 사용하여 데이터 송수신을 수행할 수 있다. 상술한 통신 시스템들은 하나의 예시일 뿐이므로, 네트워크부(130)의 데이터 송수신을 위한 유무선 통신 시스템은 상술한 예시 이외에 다양하게 적용될 수 있다.The network unit 130 according to an embodiment of the present disclosure may be understood as a structural unit that transmits and receives data through any type of known wired or wireless communication system. For example, the network unit 130 may be connected to a local area network (LAN), wideband code division multiple access (WCDMA), long term evolution (LTE), or wireless (WIBRO). broadband internet, 5th generation mobile communication (5g), ultra wide-band wireless communication, zigbee, radio frequency (RF) communication, wireless LAN, wireless fidelity ), data transmission and reception can be performed using a wired or wireless communication system such as near field communication (NFC), or Bluetooth. Since the above-described communication systems are only examples, the wired and wireless communication systems for data transmission and reception of the network unit 130 may be applied in various ways other than the above-described examples.

네트워크부(130)는 임의의 시스템 혹은 임의의 클라이언트 등과의 유무선 통신을 통해, 프로세서(110)가 연산을 수행하는데 필요한 데이터를 수신할 수 있다. 또한, 네트워크부(130)는 임의의 시스템 혹은 임의의 클라이언트 등과의 유무선 통신을 통해, 프로세서(110)의 연산을 통해 생성된 데이터를 송신할 수 있다. 예를 들어, 네트워크부(130)는 병원 환경 내 데이터베이스, 심전도 신호를 포함한 의료 데이터의 표준화 등의 작업을 수행하는 클라우드 서버, 혹은 컴퓨팅 장치 등과의 통신을 통해 의료 데이터를 수신할 수 있다. 네트워크부(130)는 전술한 데이터베이스, 서버, 혹은 컴퓨팅 장치 등과의 통신을 통해, 신경망 모델의 출력 데이터, 및 프로세서(110)의 연산 과정에서 도출되는 중간 데이터, 가공 데이터 등을 송신할 수 있다.The network unit 130 may receive data necessary for the processor 110 to perform calculations through wired or wireless communication with any system or client. Additionally, the network unit 130 may transmit data generated through the calculation of the processor 110 through wired or wireless communication with any system or any client. For example, the network unit 130 may receive medical data through communication with a database in a hospital environment, a cloud server that performs tasks such as standardizing medical data including electrocardiogram signals, or a computing device. The network unit 130 may transmit output data of the neural network model, intermediate data derived from the calculation process of the processor 110, processed data, etc. through communication with the above-described database, server, or computing device.

도 3은 본 개시의 일 실시예에 따라 심전도에 기초한 신경망 모델의 대조적인 사전 학습 방법을 보여주는 순서도이다.3 is a flowchart showing a contrasting dictionary learning method for a neural network model based on an electrocardiogram according to an embodiment of the present disclosure.

도 3을 참조하면, 적어도 하나의 프로세서를 포함하는 컴퓨팅 장치에 의해 수행되는, 심전도를 기초로 신경망 모델의 대조적인 사전 학습 방법으로서, 먼저 원시 심전도 신호를 획득하는 단계(S100)가 수행될 수 있다.Referring to FIG. 3, as a contrastive pre-training method of a neural network model based on an electrocardiogram, which is performed by a computing device including at least one processor, a step (S100) of first acquiring a raw electrocardiogram signal may be performed. .

심전도 신호는 심전도 측정 기기를 통해 측정된 것이 직접적으로 획득되거나, 심전도 측정 기기로부터 네트워크 통신을 통해 획득될 수 있다. 구체적으로, 심전도 신호는 신체의 특정 표면에 배치된 여러 전극을 통해 획득되고, 전극들 간의 전압을 측정하여 심장 상태가 평가되도록 한다. The ECG signal may be obtained directly as measured through an ECG measurement device, or may be obtained through network communication from the ECG measurement device. Specifically, electrocardiogram signals are acquired through multiple electrodes placed on specific surfaces of the body, and the heart condition is evaluated by measuring the voltage between the electrodes.

일반적으로, 표준 심전도 측정 방식은 12개의 리드(채널)로 구성되며, 각 리드는 특정 전위차를 측정하는 것이지만, 환자가 의료 시설에서 피부에 최소 10개의 전극을 부착해야 하기 때문에 12개 리드를 모두 유지하는 것은 사실상 어렵기 때문에 실제로 많은 의료인과 심장 전문의는 축소 채널 시스템을 활용하고 있다. Typically, the standard electrocardiogram measurement method consists of 12 leads (channels), each of which measures a specific potential difference, but because patients need to have at least 10 electrodes placed on their skin in a medical facility, all 12 leads must be maintained. Because it is difficult to do, many medical practitioners and cardiologists actually use a reduced channel system.

그 다음으로, 원시 심전도 신호에 대해 샘플링된 샘플 데이터를 신경망 모델에 입력하여, 자기지도 학습을 통한 글로벌 특성 컨텍스트와 로컬 특성 컨텍스트를 모두 고려한 대조적인 사전 학습을 수행하는 단계(S200)가 수행될 수 있다. 이렇게 사전 학습된 신경망 모델은 다운스트림 작업에서 다른 신경망 모델에 비해 진단 분류와 환자 식별면에서 우수한 성능을 발휘할 수 있다. Next, a step (S200) can be performed by inputting the sample data sampled for the raw ECG signal into a neural network model and performing contrastive dictionary learning considering both the global feature context and the local feature context through self-supervised learning. there is. This pre-trained neural network model can perform better in diagnostic classification and patient identification compared to other neural network models in downstream tasks.

또한 상기 학습 단계(S200)는 신경망 모델로 심전도 신호와 함께 나이, 성별, 체중, 신장 중 적어도 하나를 포함하는 생물학적 데이터를 심장질환의 특성에 영향을 미치는 변수로 입력하여, 생물학적 데이터와 심장 질환 간의 상관관계에 기초하여 학습되도록 하는 단계를 더 포함할 수 있다. In addition, the learning step (S200) inputs biological data including at least one of age, gender, weight, and height along with the electrocardiogram signal into a neural network model as variables affecting the characteristics of heart disease, and determines the relationship between the biological data and heart disease. A step of learning based on correlation may be further included.

도 4는 본 개시의 일 실시예에 따른 신경망 모델의 구조를 보여주는 도면이다.Figure 4 is a diagram showing the structure of a neural network model according to an embodiment of the present disclosure.

도 4에 도시된 바와 같이, 신경망 모델은 Wav2Vec 2.0을 사용하는데, 일반적으로Wav2Vec 모델은 두 개의 컨벌루션 신경망 네트워크(convolution neural network)가 쌓여 있는 구조이고, f : X ⇒ Z를 인코더 네트워크(encoder network), h : Z ⇒ C 를 컨텍스트 네트워크(context network)라고 할 수 있다.As shown in Figure 4, the neural network model uses Wav2Vec 2.0. In general, the Wav2Vec model has a structure in which two convolution neural networks are stacked, and f: , h : Z ⇒ C can be called a context network.

신경망 모델은 원시 심전도 신호가 CMSC(Contrastive Multi-Segment Coding) 기법에 의해 2개의 세그먼트로 나누어지고, 각 세그먼트는 Wav2Vec의 CNN과 트랜스포머를 순차적으로 거치게 된다. 즉, 신경망 모델은 하나의 세그먼트를 입력받아 n개의 로컬 특성 컨텍스트(C)를 출력하고, n 개의 로컬 특성 컨텍스트를 시간 평균 풀링을 사용하여 글로벌 특성 컨텍스트(G)를 생성한다 In the neural network model, the raw ECG signal is divided into two segments using the CMSC (Contrastive Multi-Segment Coding) technique, and each segment sequentially passes through Wav2Vec's CNN and transformer. In other words, the neural network model receives one segment as input, outputs n local feature contexts (C), and creates a global feature context (G) using time average pooling of the n local feature contexts.

구체적으로, 인코더 네트워크는 원시 심전도 신호를 인코딩하는 여러 컨벌루션 인코더 블록으로 구성된다. 컨텍스트 네트워크는 CNN 출력에서 상황에 맞는 로컬 특성을 도출하기 위한 복수의 트랜스포머 인코더 블록으로 구성된다. 또한, 컨텍스트 네트워크는 컨벌루션 인코더 블록에서 파생된 잠재 기능이 양자화되어 로컬 특성 컨텍스트와 대조 작업을 수행한다. 여기서, 양자화란 학습 가능한 코드북을 통해 연속 값 특징 벡터를 이산 값 벡터(즉, 코드)의 유한 세트에 매핑하는 프로세스가 될 수 있다. 코드북에는 잠재 기능이 Gumbel-softmax를 통해 선택된 코드북에서 가장 가까운 코드로 대체되는 벡터가 포함되어 있다.Specifically, the encoder network consists of several convolutional encoder blocks that encode the raw ECG signal. The context network consists of multiple transformer encoder blocks to derive local characteristics appropriate for the context from the CNN output. Additionally, the context network performs the task of collating the latent features derived from the convolutional encoder block with the local feature context by quantizing them. Here, quantization can be the process of mapping continuous-valued feature vectors to a finite set of discrete-valued vectors (i.e., codes) through a learnable codebook. The codebook contains vectors where latent features are replaced by the closest code in the codebook selected via Gumbel-softmax.

특히, 인코더 네트워크는 원시 심전도 신호(ECG)가 입력 X가 주어지면 컨벌루션 인코더 f : X ⇒ Z는 잠재 기능 'Z'를 생성하고, 컨텍스트 네트워크는 트랜스포머 인코더 블록에서 h : Z ⇒ C라는 컨텍스트화된 로컬 특성 컨텍스트 'C'를 생성한다. 이때, 양자화 모듈 q : Z → Q는 잠재 기능을 양자화하게 되는데, 사전 학습 중에 임의의 시간 단계에서 잠재 기능은 q로 양자화되고, CNN 레이어 위 부분에 표시된 바와 같이 Q가 트랜스포머에 제공되기 이전에 m으로 마스킹된다. In particular, the encoder network is a convolutional encoder where, given an input X, a raw electrocardiogram signal (ECG), f: Create a local property context 'C'. At this time, the quantization module q : Z → Q quantizes the latent function. During pre-training, the latent function is quantized to q at a random time step, and as shown in the upper part of the CNN layer, m before Q is provided to the transformer. is masked.

그 다음에, 신경망 모델은 각 마스킹된 시간 단계에서 로컬 특성 컨텍스트와 해당 양자화된 특징 간의 코사인 유사성을 최대화하도록 학습된다. Next, a neural network model is trained to maximize the cosine similarity between the local feature context and the corresponding quantized features at each masked time step.

한편, CMSC는 레이블이 지정되지 않은 환자의 효과적인 자기지도 학습을 통한 환자별 학습 방식으로서, 양수 쌍으로서의 시간 세그먼트, 특히 지속시간이 i번째 심전도 신호의 기록인 경우 Si초, 시간 세그먼트 쌍은 서로 겹치지 않는 Si/2초 시간 구역 또는 Si/4초 시간 구역 등의 세그먼트로 샘플링될 수 있다(도 4의 원시 심전도 신호 아래 지속 시간 참조). 본 개시에서는 단순화를 위해 각 i번째 심전도 신호에 대해 지속 시간 Si = 10으로 고정하여, 인접한 세그먼트가 양수 쌍으로 활용될 수 있고, 다른 심전도(ECG) 샘플의 다른 시간 세그먼트는 음수 쌍으로 사용될 수 있다. Meanwhile, CMSC is a patient-specific learning method through effective self-supervised learning of unlabeled patients, where time segments as positive pairs, especially when the duration is the recording of the ith ECG signal Si seconds, pairs of time segments do not overlap each other. can be sampled in segments, such as Si/2 second time zones or Si/4 second time zones (see duration below the raw ECG signal in Figure 4). In the present disclosure, for simplicity, the duration Si = 10 is fixed for each ith ECG signal, so that adjacent segments can be used as positive pairs, and other time segments of other electrocardiogram (ECG) samples can be used as negative pairs. .

CMSC 기법을 이용한 대조적인 사전 학습 방법은 서로 다른 환자 간의 ECG 기록 비교를 통해 ECG 기록의 글로벌 특성에 중점을 두고 있다. The contrastive dictionary learning method using the CMSC technique focuses on the global characteristics of ECG records through comparison of ECG records between different patients.

ECG 기록이 시간적(temporal) 정보와 공간적(spatial) 정보를 모두 반영한다는 가정 하에, 신경망 모델은 시간적 불변성(Temporal invariance)과 공간적 불변성(Spatial invariance)을 모두 활용할 수 있어야 한다. 여기서, 시간적 불변성은 ECG에 대한 갑작스러운 변화가 몇 초 단위로 발생할 가능성이 낮으므로 더 짧은 지속기간(duration)의 인접 세그먼트는 계속해서 컨텍스트를 공유한다는 것을 의미하고, 시간적 불변성은 다른 리드일지라도 동일한 심장 기능을 반영하므로 컨텍스트를 공유한다는 것을 의미한다. Under the assumption that ECG records reflect both temporal and spatial information, the neural network model must be able to utilize both temporal invariance and spatial invariance. Here, temporal invariance means that sudden changes to the ECG are unlikely to occur on the order of seconds, so adjacent segments of shorter duration continue to share the context, and temporal invariance means that even in different leads, the same heart Because it reflects functionality, it means sharing context.

따라서, CMSC는 인접하면서 겹치지 않는 시간적 세그먼트의 특성을 양수 쌍으로 정의하여 심전도 신호의 시간적 불변성을 활용하는 것이다. 완전히 다른 환자의 심장 신호의 특성이 주여지면 서로 반발하는 음수 쌍 특성을 형성할 수 있고, 이러한 프레임워크는 심전도 신호의 시간적 변화에 변하지 않는 특성을 학습할 수 있도록 한다.Therefore, CMSC utilizes the temporal invariance of the ECG signal by defining the characteristics of adjacent but non-overlapping temporal segments as positive pairs. Given the characteristics of heart signals from completely different patients, negative pair characteristics that repel each other can be formed, and this framework allows learning characteristics that do not change with temporal changes in the ECG signal.

이와 같이, 본 개시는 사전 학습된 신경망 모델을 사용하여 진단 분류 및 환자 식별에 우수한 성능을 나타내는 신경망 모델을 구현할 수 있는 프레임워크를 제공하고 있고, 이 프레임워크는 각 심전도 신호의 로컬 특성 및 양자화된 특징 모두에 중요한 로컬 정보를 효과적으로 캡처할 수 있고, 환자 식별을 위한 글로벌 특성에 중요한 글로벌 정보를 효과적으로 캡쳐할 수 있다. As such, the present disclosure provides a framework for implementing a neural network model that exhibits excellent performance in diagnostic classification and patient identification using a pre-trained neural network model, and this framework combines the local characteristics of each ECG signal and the quantized It can effectively capture local information that is important for both features, and global information that is important for global features for patient identification.

일례로, 본 개시의 일 실시예에 따른 프레임워크는 Fairseq에서 구현된 인공지능 프레임워크로서, 4개의 블록으로 구성되는 특징 추출기를 포함하고 있고, 각 블록은 컨벌루션 레이어와 레이어 정규화 및 GELU 활성화 함수로 구성될 수 있다. 또한, 각 블록의 컨볼루션 레이어에는 스트라이드가 2이고, 커널 길이가 2인 256개의 채널이 있으며, 트랜스포머 설정은 BERTBASE 모델에 기초하여 설정되어 트랜스포머 블록 레이어의 수는 12이고, 모델의 차원은 768개, 셀프어텐션 헤드의 수는 12개, 피드포워드 네트워크의 차원은 3,072개로 설정될 수 있다. 미세 조정시 추가 완전 연결 계층은 특성 벡터를 각 작업의 클래스 수에 투영하고, 식별 작업의 경우 클래스는 학습 데이터셋의 고유한 환자를 나타낼 수 있다. For example, the framework according to an embodiment of the present disclosure is an artificial intelligence framework implemented in Fairseq, and includes a feature extractor composed of four blocks, each block consisting of a convolutional layer, a layer normalization, and a GELU activation function. It can be configured. In addition, the convolution layer of each block has 256 channels with a stride of 2 and a kernel length of 2, and the transformer settings are set based on the BERTBASE model, so the number of transformer block layers is 12 and the dimension of the model is 768. , the number of self-attention heads can be set to 12, and the dimension of the feedforward network can be set to 3,072. When fine-tuning, an additional fully connected layer projects the feature vector into a number of classes for each task, and for identification tasks, a class can represent a unique patient in the training dataset.

도 5는 본 발명의 일 실시예에 따른 학습 단계의 수행 과정을 설명하는 순서도이다.Figure 5 is a flowchart explaining the process of performing the learning step according to an embodiment of the present invention.

구체적으로, 도 5은 도 3의 S200단계의 일 실시예를 더욱 구체적으로 나타낸 것이다. 학습 단계(S200)는 Wav2Vec 2.0의 프레임워크, CMSC 기법, RLM 방식을 통합하여 대조 학습을 통해 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 모두 학습할 수 있다. Specifically, Figure 5 shows one embodiment of step S200 of Figure 3 in more detail. The learning step (S200) integrates the framework of Wav2Vec 2.0, CMSC technique, and RLM method to learn both local feature context and global feature context through contrast learning.

학습 단계에서는 12채널의 원시 심전도 신호를 2×N개의 샘플 데이터로 분할한다(S210). 여기서, N은 샘플 수를 나타내며, 도 4에 표시된 로컬 대조 작업에 대한 입력으로 사용된다. 샘플 데이터는 심전도 신호의 특징을 반영할 수 있는 다수의 세그먼트, 즉 QRS군, P파, T파, ST분절 등의 다양한 시간적 및 공간적 특징값들을 포함할 수 있다. In the learning step, the 12-channel raw ECG signal is divided into 2 × N sample data (S210). Here, N represents the number of samples and is used as input to the local contrast task shown in Figure 4. Sample data may include a number of segments that can reflect the characteristics of the ECG signal, that is, various temporal and spatial characteristic values, such as QRS complex, P wave, T wave, and ST segment.

이때, 12개의 리드 심전도 신호는 다양한 심전도 관련 다운스트림 작업의 표준으로 사용되지만, (축소 채널 시스템을 이용한) 감소된 리드 심전도 신호에 대해 강력한 성능을 제공할 수 있는 신경망 모델에 대한 수요 증가하고 있다. 이는 표준 12리드 ECG 기록의 접근성이 제한적이기 때문이다. 또한 일반적으로 리드 감소 ECG 신호를 측정하는 웨어러블 심전도 측정 기기(예를 들어, 스마트 워치 등)의 인기가 높아지면서 훨씬 더 많은 양의 리드 감소 ECG가 생성될 가능성이 있다. 이를 고려하여 명백한 자기지도 학습 방법은 각 리드 세트에 대해 개별 신경망 모델을 사전 학습한 후 미세 조정하는 것이지만, 이는12개의 리드로 가능한 조합이 너무 많다는 문제점이 있다.At this time, the 12-lead ECG signal is used as the standard for various ECG-related downstream tasks, but there is an increasing demand for neural network models that can provide robust performance for reduced lead ECG signals (using reduced channel systems). This is due to limited accessibility of standard 12-lead ECG recordings. Additionally, the increasing popularity of wearable electrocardiography devices (e.g., smart watches, etc.) that typically measure lead-reduced ECG signals is likely to result in even greater volumes of lead-reduced ECGs. Considering this, the obvious self-supervised learning method is to pre-train and then fine-tune individual neural network models for each lead set, but this has the problem that there are too many possible combinations with 12 leads.

따라서, 본 개시에서는 p = 0.5의 확률로 각 리드를 개별적으로 무작위로 마스킹하는 ECG 특정 확대 방법으로, 분할된 샘플 데이터에 RLM(Random Lead Masking)을 적용하여 마스킹된 리드 심전도 신호를 생성하여 CNN에 제공한다(S220). RLM은 12리드에 대한 신경망 모델이 모든 반복에서 무작위 리드를 확률적으로 마스킹하여 다양한 리드 조합을 사용하는 사전 학습 설정을 모방할 수 있다. 즉, RLM이 적용된 샘플 데이터가 입력되는 신경망 모델은 사전 학습 단계에서 다양한 리드 조합에 노출되어 심전도 신호의 12개의 리드 모두에 대한 특성 의존성을 감소시킬 수 있다. 따라서, RLM을 사용하여 사전 학습된 신경망 모델은 임의의 리드 세트가 있는 다운스트림 작업 과정에서 미세 조정될 때 강력한 성능을 보여줄 수 있다. Therefore, in the present disclosure, an ECG-specific enlargement method that individually and randomly masks each lead with a probability of p = 0.5, applies Random Lead Masking (RLM) to the segmented sample data to generate a masked lead ECG signal and transmits it to the CNN. Provided (S220). RLM can mimic a pre-training setup in which a neural network model for 12 leads uses various combinations of leads by stochastically masking random leads at every iteration. In other words, the neural network model to which RLM-applied sample data is input is exposed to various lead combinations in the pre-learning stage, thereby reducing the dependence on characteristics of all 12 leads of the ECG signal. Therefore, neural network models pre-trained using RLM can show robust performance when fine-tuned in downstream tasks with arbitrary lead sets.

신경망 모델은 Wav2Vec 2.0를 사용하여 로컬의 최소화 과정을 거치고, 로컬 특성 및 이의 양자화된 특징 사이의 대조 손실을 정의한다(S230). 이때, 로컬 특성 컨텍스트는 Wav2Vec 2.0의 트랜스포머 이전에 CNN을 거치면서 생성된다. The neural network model goes through a local minimization process using Wav2Vec 2.0, and defines the contrast loss between the local feature and its quantized feature (S230). At this time, the local feature context is created through CNN before the transformer of Wav2Vec 2.0.

구체적으로, 마스킹된 시간 단계 t에서 로컬 특성 벡터 c_t와 양자화된 잠재 특징 q_t가 주어지면 로컬 대조 손실은 하기 수학식 1과 같이 정의된다.Specifically, given the local feature vector c _t and the quantized latent feature q _t at masked time step t, the local contrast loss is defined as Equation 1 below.

수학식 1에서, sim(a, b)는 두 벡터 a와 b 사이의 코사인 유사도이고, Q는 q_t와 다른 마스킹된 시간 단계의 양자화된 특징으로 구성된 양자화된 후보 잠재 특징 집합이며, M은 마스킹된 시간 단계 집합이다. In Equation 1, sim(a, b) is the cosine similarity between two vectors a and b, Q is the set of quantized candidate latent features consisting of quantized features of masked time steps different from q _t , and M is the masked is a set of time steps.

신경망 모델은 로컬 특성 컨텍스트와 함께 글로벌 특성 컨텍스트를 학습하기 위해, 로컬 특성 컨텍스트에 시간 평균 풀링을 적용(도 4에서 트랜스포머 블록의 윗부분에 표시된 것을 참조)하여 글로벌 특성 컨텍스트를 생성한다(S240). 그 다음, 심전도 신호의 글로벌 특성을 학습하기 위해 환자별 노이즈 대비 추정 손실이 활용하는데, 글로벌 대비 손실은 하기 수학식 2와 같이 정의된다. In order to learn the global feature context along with the local feature context, the neural network model applies time-averaged pooling to the local feature context (see what is shown at the top of the transformer block in FIG. 4) to generate the global feature context (S240). Next, the estimated loss compared to noise for each patient is used to learn the global characteristics of the ECG signal, and the global compared loss is defined as Equation 2 below.

수학식 2에서, g_i = 1, t∈T, c_t는 i번째 샘플의 글로벌 특성을 나타내고, P+는 배치에서 양수 쌍의 인덱스 집합을 나타낸다. 마지막으로, 신경망 모델의 전체 목적 함수는 하기 수학식 3와 같이 나타낼 수 있다. In Equation 2, g _i = 1, t∈T, c _t represents the global characteristic of the ith sample, and P+ represents the index set of positive pairs in the batch. Finally, the overall objective function of the neural network model can be expressed as Equation 3 below.

로컬 대조 학습 작업은 각 심전도 신호 내의 내부 관계에 초점을 맞추는 반면, 글로벌 대조 학습 작업은 학습 중에 다른 환자의 ECG 기록 간의 상호 관계를 이용한다. 이때, 로컬 특성은 심전도의 일부 세그먼트, 즉 하나의 비트보다 적은 파라미터에 대응될 수 있고, 글로벌 특성은 복수의 비트로 구성되는 심전도 신호에 대한 전체 세그먼트에 대응할 수 있다. 이때, 세그먼트는 하나의 비트에 대응될 수도 있고, 아닐 수도 있다. The local contrast learning task focuses on the internal relationships within each ECG signal, while the global contrast learning task exploits the interrelationships between ECG records from different patients during learning. At this time, the local characteristics may correspond to some segments of the ECG, that is, parameters less than one bit, and the global characteristics may correspond to the entire segment of the ECG signal consisting of a plurality of bits. At this time, the segment may or may not correspond to one bit.

이와 같이, 로컬 대조 학습 작업과 글로벌 대조 학습을 통해 신경망 모델에 대한 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 모두 학습하고(S250), 이렇게 사전 학습된 신경망 모델은 레이블이 지정된 데이터세트를 이용한 다운스트림 작업을 통해 미세 조정하여 신경망 모델을 평가될 수 있다(S260). In this way, both the local feature context and the global feature context for the neural network model are learned through local contrast learning tasks and global contrast learning (S250), and this pre-trained neural network model is used for downstream tasks using the labeled dataset. The neural network model can be evaluated by fine-tuning (S260).

이때, 다운스트림 작업은 심장 부정맥 분류 및 환자 식별에 대해 레이블이 지정된 데이터를 미세 조정하여 신경망 모델을 평가하는 것이다. 미세 조정 프로세스 중에 트랜스포머의 출력(즉, 컨텍스트화된 로컬 특성(ct's))에 시간 평균 풀링을 적용하여 전체 심전도 신호에 대한 특성 벡터를 추출한다. 그 다음에 다운스트림 작업을 수행하기 위해 임시 평균 풀링 계층 뒤에 무작위로 초기화된 완전 연결 계층을 추가한다. At this point, the downstream task is to evaluate the neural network model by fine-tuning the labeled data for cardiac arrhythmia classification and patient identification. During the fine-tuning process, time-averaged pooling is applied to the output of the transformer (i.e., contextualized local features (ct's)) to extract feature vectors for the entire ECG signal. We then add a randomly initialized fully connected layer after the temporary average pooling layer to perform downstream tasks.

일례로, 사용 가능한 리드의 5가지 다른 리드 조합, 즉 12-리드, 6-리드(I, II, III, aVF, aVL, aVR), 3리드(I, II, V2), 2리드(I, II) 및 1리드(I)으로 실험할 수 있다. 리드 III은 리드 I과 II의 간단한 방정식으로 얻을 수 있고(리드 III = -리드 I + 리드 II), 4-리드 조합은 3-리드 조합에 비해 추가 정보를 제공하지 않는다. 스마트 워치와 같은 많은 웨어러블 심전도 측정 기기는 1-리드 ECG 신호만 측정하기 때문에 1-리드(I) 실험을 추가하는 것이 바람직하다.As an example, there are five different lead combinations of leads available: 12-lead, 6-lead (I, II, III, aVF, aVL, aVR), 3-lead (I, II, V2), 2-lead (I, II) and 1 lead (I). Lead III can be obtained by the simple equation of leads I and II (Lead III = -Lead I + Lead II), and the 4-lead combination does not provide additional information compared to the 3-lead combination. Because many wearable ECG measurement devices, such as smart watches, only measure 1-lead ECG signals, it is desirable to add 1-lead (I) experiments.

심장부정맥 분류는 진단을 예측하기 위한 작업으로서, 심전도 점수가 매겨진 26개의 SNOMED-CT 다중 레이블로 분류할 수 있고, 평가를 위해 PhysioNet/Computing in Cardiology에 도입된 CinC Score를 활용할 수 있다. 진정한 진단(또는 진진단)과 유사한 증상을 보이는 오진에 부분적으로 인정하고, 진정한 진단과 매우 다른 증상을 가진 오진에 대해 부분적으로 벌칙을 부여하는 경우, 이 가중 점수는 임상적으로 모든 오진이 같은 것으로 간주되지 않는 현실을 반영할 수 있다. Cardiac arrhythmia classification is a task to predict diagnosis. It can be classified into 26 SNOMED-CT multi-labels with ECG scores, and the CinC Score introduced in PhysioNet/Computing in Cardiology can be used for evaluation. By partially acknowledging misdiagnoses with symptoms similar to the true diagnosis (or diagnosis) and partially penalizing misdiagnoses with symptoms very different from the true diagnosis, this weighted score can be used to clinically treat all misdiagnoses as equal. It can reflect a reality that is not considered.

모든 샘플에 대해 올바른 클래스를 출력하는 신경망 모델은 1점을 받고, 항상 정상 클래스(즉, 이상이 없음)를 출력하는 신경망 모델은 0점을 받으며, 점수 범위는 -1에서 1까지로 설정할 수 있다.A neural network model that outputs the correct class for all samples receives a score of 1, and a neural network model that always outputs a normal class (i.e., no abnormalities) receives a score of 0, and the score range can be set from -1 to 1. .

환자 식별은 효과적인 학습을 위해 신경망 모델에 필요한 작업으로서, 동일한 환자의 두 가지 다른 ECG 사이의 유사성이 상대적으로 높도록 ECG 신호를 표현할 수 있다. 여기서, 완전 연결 계층을 각 환자(클래스)에 대한 가중치 벡터 모음으로 취급한다. 즉, 기존의 softmax 분류기를 사용하여 분류 작업을 수행할 때, 특징 벡터는 해당 가중치 벡터와 높은 유사성(즉, 높은 내적)을 갖고 다른 가중치 벡터와의 낮은 유사성(즉, 낮은 내적)을 갖도록 학습된다. 그러나, softmax 분류기는 내적 유사성을 최적화하기 때문에 환자 내 심전도 사이의 높은 유사성과 환자 간 심전도 간의 다양성을 강요하지 않는다. 이러한 문제를 극복하기 위해, 하기 수학식 4와 같이 공식화된 Arc-Face 손실을 이용한다.Patient identification is a necessary task for neural network models for effective learning, allowing ECG signals to be represented so that the similarity between two different ECGs from the same patient is relatively high. Here, the fully connected layer is treated as a collection of weight vectors for each patient (class). That is, when performing a classification task using a conventional softmax classifier, the feature vector is learned to have high similarity (i.e., high dot product) with the corresponding weight vector and low similarity (i.e., low dot product) with other weight vectors. . However, because the softmax classifier optimizes internal similarity, it does not force high similarity between ECGs within patients and diversity between ECGs between patients. To overcome this problem, the Arc-Face loss formulated as Equation 4 below is used.

수학식 4에서,

는 특징 벡터와 해당 가중치 벡터 사이의 각도이고, m은

에 대한 각도 마진 페널티이며, s는 softmax를 적용하기 전의 배율 인수이다. 추가 각도 마진 페널티로 코사인 유사성을 최적화함으로써 ArcFace 손실은 환자 간 ECG를 최소화하면서 환자 내 ECG의 유사성을 최대화할 수 있다.In equation 4,

is the angle between the feature vector and the corresponding weight vector, and m is

is the angle margin penalty for , and s is the scaling factor before applying softmax. By optimizing cosine similarity with an additional angular margin penalty, ArcFace loss can maximize the similarity of within-patient ECGs while minimizing the between-patient ECGs.

평가를 위해 테스트 세트는 갤러리 및 프로브 세트라고 하는 두 개의 하위 세트로 구성된다. 하위 집합은 고유한 ECG 샘플로 구성되지만 동일한 환자를 공유할 수 있다. 예를 들어, 갤러리 세트에 환자 A의 ECG 샘플이 포함된 경우 프로브 세트에는 동일한 환자 A의 ECG 기록도 포함되어야 한다. 테스트 중에 추가 완전 연결 레이어를 버리고, 각 레이어에서 ECG 샘플의 표현 벡터만 가져오고, 부분 집합을 고려한다.For evaluation purposes, the test set consists of two subsets, called gallery and probe set. Subsets consist of unique ECG samples but may share the same patient. For example, if the gallery set contains an ECG sample from Patient A, the probe set must also contain an ECG recording from the same Patient A. During testing, we discard the additional fully connected layers, take only the representation vectors of the ECG samples from each layer, and consider the subset.

그런 다음 갤러리와 프로브 세트 사이의 가능한 모든 쌍의 코사인 유사도를 계산하고, 가장 가까운 쌍이 서로 동일한 ID를 갖는 것으로 정의한다. We then calculate the cosine similarity of all possible pairs between the gallery and the probe set, and define the closest pair as having the same ID.

이하에서는 본 개시의 일 실시예에 따른 신경망 모델의 검증 연구에 대하여 설명하기로 한다.Hereinafter, a verification study of a neural network model according to an embodiment of the present disclosure will be described.

첫 번째 실험에서는, 12개의 리드를 모두 사용하여 대규모 ECG 데이터세트에서 제안된 신경망 모델 및 기타 기준선을 사전 학습하고, 2개의 다운스트림 작업에 대해 사용할 수 없는 리드를 0으로 채워 5개의 감소된 리드 조합에서 단일 12리드 모델을 미세 조정한다.In the first experiment, we pre-trained the proposed neural network model and other baselines on a large-scale ECG dataset using all 12 leads, followed by 5 reduced lead combinations by padding unavailable leads with zeros for the 2 downstream tasks. Fine-tune a single 12-lead model.

두 번째 실험에서는, 첫 번째 실험의 이론적 상한 성능을 보여주기 위해 제안된 신경망 모델과 각 리드 조합에 대한 기타 기준선을 사전 학습하고 미세 조정한다. 이 실험에는 각 리드 세트에 대해 모델을 개별적으로 사전 학습하므로 엄청난 양의 교육 시간과 계산 리소스가 필요하다.In the second experiment, the proposed neural network model and other baselines for each lead combination are pretrained and fine-tuned to demonstrate the theoretical upper bound performance of the first experiment. This experiment requires a huge amount of training time and computational resources as the model is pre-trained separately for each read set.

세번째 실험에서는, RLM에 대한 절제 연구를 수행함으로써 RLM을 다른 기본 모델과 함께 활용한 결과를 관찰할 수 있다. RLM을 사용한 향상된 성능은 자기지도 대조 학습을 위한 ECG 특정 증강으로서 RLM의 효과를 보여준다.In the third experiment, we can observe the results of using RLM with other base models by performing an ablation study on RLM. The improved performance using RLM demonstrates the effectiveness of RLM as an ECG-specific augmentation for self-supervised contrastive learning.

3KG의 주요 목표는 ECG를 VCG로 변환하고, 확률적 3D 섭동을 적용하여 양수 쌍을 생성하는 데 있기 때문에, 3KG로 두 번째 및 세 번째 실험을 수행하지 않는다. VCG로 변환하는 동안 더 적은 수의 ECG 리드를 사용하면 3D 구조를 구성할 수 없으며 섭동은 맥락적 의미를 갖지 않는다.Since the main goal of 3KG is to convert ECG to VCG and generate positive pairs by applying stochastic 3D perturbations, we do not perform the second and third experiments with 3KG. If fewer ECG leads are used during conversion to VCG, 3D structures cannot be constructed and perturbations do not have contextual significance.

신경망 모델의 검증 연구에서, PhysioNet 2021(Reyna et al., 2021)의 CPSC, CPSC-Extra, PTB-XL, Georgia, Ningbo 및 Chapman인 6개 데이터세트에 대한 사전 학습 실험을 수행한다. In a validation study of the neural network model, pre-training experiments are performed on six datasets: CPSC, CPSC-Extra, PTB-XL, Georgia, Ningbo, and Chapman from PhysioNet 2021 (Reyna et al., 2021).

각 샘플에는 샘플링 주파수는 500Hz이고, 범위는 5~144초이다. 각 i번째 데이터 샘플에 대해 Si = 10초 시간 세그먼트를 반복적으로 분할한다. 또한 글로벌 대조 작업을 위해 데이터를 쉽게 처리하기 위해 각 10초 샘플을 2개의 Si/2 = 5초 세그먼트로 분할한다. 이것은 각각 2500개의 샘플 크기를 갖는 12-리드 ECG 기록의 189,051개 샘플로 이어진다. 심장 부정맥 분류를 위한 미세 조정에서 CPSC 및 Georgia 데이터세트를 활용한다. 이 두 데이터세트는 PhysioNet/Computing에서 평가에 사용되기 때문이다. 이러한 샘플 데이터는 교육, 검증 및 테스트 세트로 나뉘어진다. Each sample has a sampling frequency of 500 Hz and a range of 5 to 144 seconds. For each ith data sample, we repeatedly segment Si = 10 second time segments. Additionally, to facilitate data processing for global contrast operations, each 10-second sample is split into two Si/2 = 5-second segments. This results in 189,051 samples of 12-lead ECG recordings, with a sample size of 2500 each. Utilizing the CPSC and Georgia datasets in fine-tuning for cardiac arrhythmia classification. This is because these two datasets are used for evaluation in PhysioNet/Computing. These sample data are divided into training, validation, and test sets.

하기 표 1은 단일 12리드 모델을 사전 학습하고, 5가지 리드 조합을 미세 조정할 때 성능을 테스트한 결과이다. 이때, 미세 조정에서는 사용할 수 없는 리드를 0으로 채우고, 이를 P-N-리드(Padded-N-lead)라고 한다. 여기서 W2V는 Wav2Vec 2.0을 나타낸다. 진단 분류(Dx.)를 위한 CinC 점수와 환자 식별(Id.)을 위한 정확도를 측정한다. 평균 및 95% 신뢰 구간은 3개의 시드에 걸쳐 표시된다. 두 가지 작업에 대한 각 리드 조합에 대해 굵은 글꼴로 최고의 성능을 강조한다.Table 1 below shows the results of testing the performance when pre-training a single 12-lead model and fine-tuning 5 lead combinations. At this time, leads that cannot be used in fine tuning are filled with 0, and this is called a P-N-lead (Padded-N-lead). Here, W2V stands for Wav2Vec 2.0. CinC score for diagnostic classification (Dx.) and accuracy for patient identification (Id.) are measured. Means and 95% confidence intervals are shown across three seeds. For each lead combination for both tasks, the best performance is highlighted in bold font.

8:1:1 비율로 각각 32640, 4079 및 4079 샘플을 생성한다. 검증 및 테스트 세트는 사전 학습 데이터세트에서 제외된다. 환자 식별 작업을 위한 미세 조정에서 학습 및 검증을 위해 사전 학습 데이터세트를 사용한다. 학습 세트와 검증 세트는 147,444개와 17,670개의 샘플로 구성되며, 사전 학습 데이터세트의 비율은 8:2이다. 대부분의 PhysioNet 2021 데이터세트에는 환자 신원 정보가 포함되어 있지 않으므로 동일한 ECG에서 분할된 샘플이 동일한 신원을 갖는 것으로 간주한다. 그러나 미세 조정 후에는 아래에서 설명할 환자 ID가 포함된 PTB-XL에서 모델 성능을 테스트한다.It generates 32640, 4079, and 4079 samples, respectively, at an 8:1:1 ratio. Validation and test sets are excluded from the pre-training dataset. In fine-tuning for the patient identification task, we use a pre-training dataset for training and validation. The training set and validation set consist of 147,444 and 17,670 samples, and the ratio of the pre-training dataset is 8:2. Since most PhysioNet 2021 datasets do not contain patient identity information, samples segmented from the same ECG are assumed to have the same identity. However, after fine-tuning, we test the model performance on PTB-XL with patient ID, which is explained below.

PTB-XL 환자 식별 작업을 위한 모델을 적절하게 테스트하기 위해 PhysioNet 2021의 하위 집합이지만, 각 ECG 샘플에 대한 환자 ID가 있는 PTB-XL 데이터세트(Wagner et al., 2020)를 활용한다. 따라서 모델을 평가하는 더 적절한 방법인 다른 세션의 ECG 샘플을 식별할 수 있다. 최소 2회의 ECG 세션이 있는 환자를 선택하고 무작위로 5초로 분할한다. 그 후에, 각 고유 환자에 대해 두 개의 ECG 세션을 무작위로 선택하고 각각 갤러리 및 프로브 세트로 사용한다. 그 결과 사전 학습 데이터세트에서 제외된 12리드 ECG 기록의 2127개의 고유한 환자 쌍을 유지한다.To properly test our model for the PTB-XL patient identification task, we utilize the PTB-XL dataset (Wagner et al., 2020), which is a subset of PhysioNet 2021, but with patient IDs for each ECG sample. Therefore, it is possible to identify ECG samples from different sessions, which is a more appropriate way to evaluate the model. Patients with at least two ECG sessions are selected and randomly split into 5-second intervals. Afterwards, for each unique patient, two ECG sessions are randomly selected and used as gallery and probe set, respectively. As a result, we retain 2127 unique patient pairs of 12-lead ECG records that were excluded from the pre-training dataset.

표 2는 개인별 데이터를 사전 학습하고 미세 조정할 때 다양한 사전 학습 방법의 모델 성능을 테스트한 결과이다. 각 리드 조합에 대한 모델별로 진단 분류(Dx.)를 위한 CinC 점수와 환자 식별(Id.)을 위한 정확도를 측정한다. 최고의 성능은 굵은 글씨체로 강조하여 표시하였다. 여기에서는 표 1의 12개 리드 결과를 참조로 가져온다.Table 2 shows the results of testing the model performance of various pre-training methods when pre-training and fine-tuning individual data. The CinC score for diagnostic classification (Dx.) and the accuracy for patient identification (Id.) are measured for each model for each lead combination. The best performance is highlighted in bold. Here, the 12 lead results in Table 1 are taken as reference.

로컬 대조 학습의 경우 원본 Wav2Vec 2.0의 하이퍼 매개변수 설정을 따르고, 확률 0.065로 마스킹할 범위의 시작으로 각 토큰을 선택하며, 후속 10개의 시간 단계를 마스킹한다. 또한 양자화 모듈에는 320개 코드의 두 그룹이 포함되어 있다. 식별에서 ArcFace의 경우 배율 인수 s = 192와 여백 m = 1.0을 사용한다.For local contrast learning, we follow the hyperparameter settings of the original Wav2Vec 2.0, select each token as the start of the range to be masked with probability 0.065, and mask the subsequent 10 time steps. Additionally, the quantization module includes two groups of 320 codes. For ArcFace in identification, a scale factor s = 192 and a margin m = 1.0 are used.

Adam(Kingma and Ba, 2014)을 사용하여 분류 작업에 대한 사전 학습 및 미세 조정을 위한 학습 속도 5 Х 10-5, 식별 작업에 대한 미세 조정을 위한 3 Х 10-5를 사용하여 신경망 모델을 최적화한다.Optimizing the neural network model using Adam (Kingma and Ba, 2014) with a learning rate of 5 Х 10-5 for pre-training and fine-tuning for the classification task and 3 Х 10-5 for fine-tuning for the discrimination task. do.

사전 학습의 경우, 배치 크기는 10초 512 ECG 샘플이며 4개의 RTX A6000 GPU에서 5초 1024 ECG 샘플로 분할되어 24시간의 교육 시간을 제공한다. 미세 조정을 위해 배치 크기는 단일 RTX 3090 GPU에서 5초 128 ECG 샘플로, 심장 부정맥 분류를 위한 교육 시간은 각각 4시간, 환자 식별을 위한 18시간이다.For pre-training, the batch size is 512 ECG samples of 10 seconds, split into 1024 ECG samples of 5 seconds across four RTX A6000 GPUs, giving a training time of 24 hours. For fine-tuning, the batch size is 128 ECG samples in 5 seconds on a single RTX 3090 GPU, with training times of 4 hours for cardiac arrhythmia classification and 18 hours for patient identification, respectively.

표 1에는 단일 12리드 모델을 사전 학습하고 다양한 리드 조합을 미세 조정할 때의 실험 결과가 나와 있다. 즉, 12개 리드로 사전 학습된 단일 모델은 사용할 수 없는 리드를 0으로 채워 감소된 리드에서 미세 조정된다.Table 1 shows experimental results when pretraining a single 12-lead model and fine-tuning various lead combinations. That is, a single model pre-trained with 12 leads is fine-tuned on reduced leads by filling in unusable leads with zeros.

실험 결과, Wav2Vec 모델, CMSC 기법 및 RLM이 적용된 신경망 모델을 이용하여 로컬 특성 컨텍스트와 글로벌 특성 컨텍스트를 분리하여 사전 학습하는 신경망 모델이 다른 모델에 비해 우수한 성능을 보여줌을 알 수 있다. 분류 작업의 경우 Wav2Vec 모델, CMSC 기법 및 RLM이 적용된 신경망 모델은 모든 리드 조합에서 Wav2Vec 모델 및 CMSC 기법이 적용된 모델 비해 CinC 점수에서 각각 평균 0.0298 및 0.1366 개선되었음을 보여준다. 마찬가지로 식별 작업의 경우에도, Wav2Vec 모델, CMSC 기법 및 RLM이 적용된 신경망 모델은 모든 리드 조합에서 Wav2Vec 모델 및 CMSC 기법이 적용된 모델 비해 각각 평균 6.4%p 및 6.67%p의 정확도가 증가함을 알 수 있다.As a result of the experiment, it can be seen that the neural network model, which is pre-trained by separating local feature context and global feature context using the Wav2Vec model, CMSC technique, and neural network model with RLM, shows superior performance compared to other models. For the classification task, the Wav2Vec model, CMSC technique, and neural network model with RLM showed an average improvement of 0.0298 and 0.1366 in CinC score, respectively, compared to the Wav2Vec model and the model with CMSC technique in all read combinations. Similarly, in the case of identification tasks, the accuracy of the Wav2Vec model, the CMSC technique, and the neural network model with RLM increases by an average of 6.4%p and 6.67%p, respectively, compared to the Wav2Vec model and the model with the CMSC technique in all lead combinations. .

표 3은 Wav2Vec 모델과 CMSC 기법이 통합된 모델에 무작위 리드 마스킹을 적용할 때의 성능을 테스트한 결과로서, CinC를 측정한다. 진단 분류 점수(Dx.) 및 환자 식별 정확도(Id.). RLM으로 성능이 향상되면 굵은 글씨로 표시한다.Table 3 shows the results of testing the performance when applying random read masking to a model that integrates the Wav2Vec model and the CMSC technique, measuring CinC. Diagnostic classification score (Dx.) and patient identification accuracy (Id.). If performance is improved with RLM, it is indicated in bold.

표 3에 나타나 있듯이, P-3-리드의 성능이 모든 방법에서 P-2-리드 또는 P-6-리드보다 일관되게 높다는 것을 알 수 있다. 이는 두 개의 사지 리드가 동일한 정면(관상) 방향의 평면에서 측정되기 때문에 다른 네 개의 리드를 계산하는 데 사용될 수 있기 때문이다. 따라서 2-리드와 6-리드에는 동일한 양의 ECG 정보가 포함된다. 한편, 3-리드는 2개의 사지 리드(I, II)와 함께 전척수 리드(V2)를 포함하므로 3-리드는 2-리드 또는 6-리드에 비해 추가 ECG 정보를 갖는다.As shown in Table 3, it can be seen that the performance of P-3-read is consistently higher than that of P-2-read or P-6-read in all methods. This is because the two limb leads are measured in the same frontal (coronal) plane and can therefore be used to calculate the other four leads. Therefore, 2-lead and 6-lead contain the same amount of ECG information. On the other hand, 3-lead contains the prespinal lead (V2) along with two extremity leads (I, II), so 3-lead has additional ECG information compared to 2-lead or 6-lead.

또한, Wav2Vec 모델, CMSC 기법 및 RLM 기법이 통합된 모델과, Wav2Vec 모델과 CMSC 기법이 통합된 모델의 결과를 비교해 보면, 모든 리드 조합에 상당한 차이가 존재함을 알 수 있다. 이는 12리드 ECG 신호에 대한 사전 학습 시 Wav2Vec 모델과 CMSC 기법과 함께 RLM을 활용하는 것이 성능면에서 우수함을 보여준다.In addition, when comparing the results of the model that integrates the Wav2Vec model, CMSC technique, and RLM technique, and the model that integrates the Wav2Vec model and CMSC technique, it can be seen that there is a significant difference in all lead combinations. This shows that using RLM along with the Wav2Vec model and CMSC technique is superior in terms of performance when pre-learning the 12-lead ECG signal.

각 리드 조합에 대해 개별적으로 사전 학습된 모델의 미세 조정 결과는 표 2에 나타나 있다. 분류 작업의 경우, Wav2Vec 모델은 12-리드 이외의 모든 리드 조합에 대해 최고의 점수를 보여준다. 이 분류 작업은 정확한 진단을 위해 로컬 특성 컨텍스트에 중점을 둔다. 따라서 로컬 대조 방식인 Wav2Vec 모델은 글로벌 대조 방식 및 로컬 대조 방식인 Wav2Vec 모델, CMSC 기법 및 RLM 기법이 통합된 신경망 모델보다 더 강력한 성능을 발휘할 수 있다.The fine-tuning results of the pre-trained models separately for each lead combination are shown in Table 2. For classification tasks, the Wav2Vec model shows the best scores for all lead combinations other than 12-lead. This classification task focuses on local feature context for accurate diagnosis. Therefore, the Wav2Vec model, which is a local contrast method, can demonstrate more powerful performance than a neural network model that integrates the global contrast method and the Wav2Vec model, which is a local contrast method, CMSC technique, and RLM technique.

식별 작업의 경우 Wav2Vec 모델과 CMSC 기법이 통합된 모델은 12-리드 이외의 다른 모든 리드 조합에 대해 최적의 성능을 보여준다. 이 경우 각 리드 조합에 대해 학습된 글로벌 특성 및 로컬 특성이 로컬 또는 글로벌 특성을 능가할 수 있다.For identification tasks, the model combining the Wav2Vec model and CMSC technique shows optimal performance for all lead combinations other than 12-lead. In this case, the global and local features learned for each read combination may outperform the local or global features.

이 실험 설정에서 Wav2Vec 모델, CMSC 기법 및 RLM 기법이 통합된 모델의 성능은 모든 감소된 리드 조합(6-리드, 3-리드, 2-리드)에 대해 Wav2Vec 모델과 CMSC 기법이 통합된 모델보다 지속적으로 저조함을 알 수 있다. 이는 RLM을 사용한 사전 학습에서 더 적은 수의 리드를 사용했기 때문이라고 가정할 수 있고, 12개의 리드를 모두 사용할 때와 비교하여 모델의 표현력을 손상시킨다. 즉, 사용 가능한 리드를 줄였기 때문에 가능한 리드 조합이 함께 크게 줄었다. RLM의 효능은 모델이 사전 학습 단계에서 리드의 다양한 조합을 학습할 수 있다는 점에서 비롯되므로 사용 가능한 리드를 줄이는 것은 임의의 리드 집합에 대한 강력한 표현을 학습하는 모델에 적합하지 않다.In this experimental setup, the performance of the model combining the Wav2Vec model, CMSC technique, and RLM technique was consistently better than the model combining the Wav2Vec model and CMSC technique for all reduced lead combinations (6-lead, 3-lead, 2-lead). It can be seen that it is low. This can be assumed to be due to the use of fewer leads in pre-training with RLM, compromising the expressiveness of the model compared to using all 12 leads. In other words, by reducing the available leads, the possible lead combinations are also greatly reduced. The efficacy of RLM comes from the fact that the model can learn different combinations of leads in the pre-training phase, so reducing the available leads is not appropriate for a model that learns a robust representation for an arbitrary set of leads.

표 3에서 볼 수 있듯이, Wav2Vec 모델과 RLM 기법이 통합된 모델은 분류 작업에서 모든 리드 조합에 대해 향상된 성능을 보여준다. 그러나, RLM을 적용할 때 식별 작업에서 일부 리드 조합의 성능이 저하된다. 이러한 결과의 주된 이유는 RLM을 Wav2Vec 모델과 함께 사용하면 Wac2Vec 모델이 글로벌 특성 컨텍스트가 아닌 로컬 특성 컨텍스트를 포착하는 기능이 향상되기 때문이라고 가정할 수 있다. 따라서, Wav2Vec 모델과 RLM 기법이 통합된 모델은 Wav2Vec 모델에 비해 로컬 특성 컨텍스트가 필요한 분류 작업에서는 우수한 성능을 보이지만 글로벌 특성 컨텍스트가 필요한 식별 작업에서는 성능이 떨어지게 된다.As can be seen in Table 3, the model combining the Wav2Vec model and the RLM technique shows improved performance for all read combinations in the classification task. However, when applying RLM, the performance of some read combinations in the identification task deteriorates. It can be assumed that the main reason for these results is that using RLM with the Wav2Vec model improves the ability of the Wac2Vec model to capture local feature context rather than global feature context. Therefore, the model that integrates the Wav2Vec model and the RLM technique shows superior performance in classification tasks that require local feature context compared to the Wav2Vec model, but performs poorly in identification tasks that require global feature context.

표 4는 Wav2Vec 모델과 CMSC 기법이 통합된 모델에 다양한 증강을 적용할 때 성능을 테스트한 결과이다. Physio(4)는 전력선 노이즈, 근전도 노이즈 노이즈, 베이스라인 원더 및 베이스라인 이동을 나타낸다. Physio(3)는 전력선 노이즈, 근전도 노이즈 및 기준선 원더를 나타낸다. 진단 분류(Dx.)를 위한 CinC 점수와 환자 식별(Id.)을 위한 정확도를 측정한다.Table 4 shows the results of performance testing when applying various augmentations to a model that integrates the Wav2Vec model and the CMSC technique. Physio(4) represents power line noise, electromyography noise, baseline wander, and baseline shift. Physio(3) represents power line noise, electromyography noise, and baseline wander. CinC score for diagnostic classification (Dx.) and accuracy for patient identification (Id.) are measured.

반면에, CMSC 기법과 RLM 기법이 통합된 모델은 모든 경우에 CMSC기법만 적용된 모델에 비해 대폭 향상된 성능을 보인다. 시간적으로 인접한 ECG 세그먼트 간의 일치를 최대화하는 원래 CMSC 기법의 대조 작업이 모델이 가치 있는 표현을 학습하기에는 너무 단순하다고 가정한다. RLM 기법은 모델이 리드의 다양한 조합을 학습하는 데 도움이 된다. 즉, 모델이 사전 학습 중에 각 환자에 대해 방대한 양의 증강된 ECG 샘플을 탐색할 수 있다.On the other hand, the model incorporating the CMSC and RLM techniques shows significantly improved performance compared to the model to which only the CMSC technique is applied in all cases. We assume that the contrast task of the original CMSC technique, which maximizes the match between temporally adjacent ECG segments, is too simple for the model to learn valuable representations. RLM techniques help the model learn different combinations of leads. This means that the model can explore a large amount of augmented ECG samples for each patient during pre-training.

이를 통해 CMSC 기법은 두 개의 다운스트림 작업에서 모든 리드 조합에 대해 훨씬 더 나은 성능을 보이는 환자의 유용한 특성을 학습할 수 있다.This allows the CMSC technique to learn useful characteristics of patients that perform significantly better for all lead combinations in the two downstream tasks.

이와 같이, 본 개시에서는 대조적인 사전 학습 방식인 Wav2Vec 모델과 CMSC 기법을 통합하여 글로벌 특성 및 로컬 특성을 위한 자기지도 학습 방법을 제공한다. 또한, 본 개시에서는 다운스트림 작업을 통해 12개의 리드를 모두 사용할 수 없는 경우에 임의의 리드 조합으로 사전 학습된 신경망 모델을 성공적으로 미세 조정할 수 있도록, 학습 단계에서 각 리드를 무작위로 마스킹하는 증강 기술인 RLM을 적용하고 있다. 결과적으로, 본 개시에서 제공하는 Wav2Vec 모델, CMSC 기법 및 RLM 기법이 모두 통합된 신경망 모델이 진단 분류 및 환자 식별면에서 우수한 성능을 보임을 알 수 있다. As such, this disclosure provides a self-supervised learning method for global features and local features by integrating the Wav2Vec model, which is a contrasting dictionary learning method, and the CMSC technique. Additionally, in this disclosure, an augmentation technique that randomly masks each read during the learning phase so that a pre-trained neural network model can be successfully fine-tuned with random read combinations in cases where all 12 reads are not available through downstream tasks. RLM is being applied. As a result, it can be seen that the neural network model that integrates the Wav2Vec model, CMSC technique, and RLM technique provided in this disclosure shows excellent performance in terms of diagnosis classification and patient identification.

앞서 설명된 본 개시의 다양한 실시예는 추가 실시예와 결합될 수 있고, 상술한 상세한 설명에 비추어 당업자가 이해 가능한 범주에서 변경될 수 있다. 본 개시의 실시예들은 모든 면에서 예시적인 것이며, 한정적이 아닌 것으로 이해되어야 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다. 따라서, 본 개시의 특허청구범위의 의미, 범위 및 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 개시의 범위에 포함되는 것으로 해석되어야 한다. The various embodiments of the present disclosure described above may be combined with additional embodiments and may be changed within the scope understandable to those skilled in the art in light of the above detailed description. The embodiments of the present disclosure should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form. Accordingly, all changes or modified forms derived from the meaning and scope of the claims of the present disclosure and their equivalent concepts should be construed as being included in the scope of the present disclosure.

Claims

1. A method for contrastive dictionary training of a neural network model based on an electrocardiogram, performed by a computing device comprising at least one processor, comprising:
When an unlabeled raw ECG signal is input, a local information generation step of generating a local representation context using a model based on self-supervised learning;
A global information generation step of generating a global characteristic context from the local characteristic context; and
A learning step of pre-training a neural network model through contrast learning based on learning data including the local feature context and the global feature context,
method.

According to paragraph 1,
The local information generation step includes using the Wav2Vec model,
The global information generation step includes using CMSC (Contrastive Multi-segment Coding) technique,
method.

According to paragraph 1,
The local information generation step is,
The raw ECG signal for a preset number of leads is divided into 2 further comprising steps,
method.

According to paragraph 1,
Further comprising an evaluation step of fine tuning the pre-trained neural network model using a labeled dataset for cardiac arrhythmia classification and patient identification,
method.

According to paragraph 1,
The local information generation step is,
Inputting the raw ECG signal to generate a latent vector (Z) in a latent space using a CNN, and encoding the latent vector (Z) to generate the local feature context (C),
The learning step is,
In the pre-learning process, the latent vector (Z) is quantized at a preset time step to generate a quantized feature (Q), which includes masking (m) before being provided to the transformer.
method.

According to clause 5,
The neural network model is,
wherein the cosine similarity between the local feature context and the quantized feature is learned to have a maximum value at each masked time step,
method.

According to paragraph 1,
The global information generation step is,
Defining a duration (Si) and time segment characteristics of adjacent but non-overlapping time zones within the duration (Si) for the ith ECG signal in the ECG record of the first patient as positive pairs; and
Comprising the step of defining time segment characteristics of the ECG signal of the second patient, which are different from the time segment characteristics of the ECG signal of the first patient, as a negative pair,
method.

In clause 7,
The learning step is,
In the pre-learning process, each patient is learned through self-supervised learning using the correlation between the ECG records of the first patient and the second patient,
method.

A computer program stored in a computer-readable storage medium, wherein the computer program, when executed on one or more processors, performs operations for contrastive dictionary learning of a neural network model based on an electrocardiogram,
The above operations are:
When an unlabeled raw ECG signal is input, an operation of generating a local representation context using a model based on self-supervised learning;
generating a global feature context by applying time average pooling to the local feature context; and
Comprising the operation of pre-training a neural network model through contrast learning based on learning data including the local feature context and the global feature context,
computer program.

A computing device for a contrastive dictionary learning method of a neural network model based on electrocardiogram, comprising:
A processor including at least one core; and
a memory containing program codes executable on the processor;
Including,
The processor, according to execution of the program code,
When an unlabeled raw ECG signal is input, at least one local representation context is created using a model based on self-supervised learning,
Create a global feature context by applying time average pooling to the local feature context,
Pre-training a neural network model through contrast learning based on learning data including the local feature context and the global feature context,
Device.