KR102646428B1

KR102646428B1 - Method and apparatus for extracting similar letters using artificial intelligence learning model

Info

Publication number: KR102646428B1
Application number: KR1020230054871A
Authority: KR
Inventors: 오세원; 손진희; 송영환
Original assignee: 대한민국
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2024-03-12

Abstract

인공지능 학습 모델을 이용한 유사 글자 추출 방법이 개시된다. 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출 방법은 사용자로부터 확인하고자 하는 대상인 제1 글자 데이터를 입력받는 단계, 복수의 대상 문서들에 포함된 전체 글자들에 대응하는 제2 글자 데이터들 중 상기 제1 글자 데이터의 형식에 따른 기 설정된 유사 글자 판단 조건에 기초하여 제3 글자 데이터를 추출하는 단계 및 사용자 단말에 상기 제3 글자 데이터를 제공하는 단계를 포함할 수 있다.A method for extracting similar characters using an artificial intelligence learning model is disclosed. A method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention includes receiving first character data that is to be confirmed from the user, and second character data corresponding to all characters included in a plurality of target documents. It may include extracting third character data from among the character data based on preset similar character judgment conditions according to the format of the first character data and providing the third character data to the user terminal.

Description

Method and device for extracting similar letters using artificial intelligence learning model {METHOD AND APPARATUS FOR EXTRACTING SIMILAR LETTERS USING ARTIFICIAL INTELLIGENCE LEARNING MODEL}

본 발명은 인공지능 학습 모델을 이용한 유사 글자 추출 방법 및 장치에 관한 것이다.The present invention relates to a method and device for extracting similar characters using an artificial intelligence learning model.

이하에 기술되는 내용은 단순히 본 발명에 따른 일 실시예와 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described below simply provides background information related to an embodiment of the present invention and does not constitute prior art.

필적은 글을 쓸 때 나타나는 고유의 습성을 의미한다. 따라서, 필적은 사람마다 미묘하게 다르게 나타난다. 이러한 점 때문에 범죄 수사나 유서 등에서 본인이 작성한 문서인지를 판단하기 위해 필적감정을 이용한다. 필적 전문가가 아니면 비슷하게 보여서 구분하기가 쉽지 않지만, 강요당해서 쓸 때와 다른 이의 글을 베낄 때 등의 글씨체도 평소와 다른 흔적이 남기 때문에 증거로서의 신용도가 높다.Handwriting refers to the unique habits that appear when writing. Therefore, handwriting appears subtly different from person to person. For this reason, handwriting analysis is used to determine whether a document was written by the person in criminal investigations or wills. Unless you are a handwriting expert, it is not easy to distinguish because they look similar, but the handwriting, such as when written under pressure or when copying someone else's writing, leaves traces that are different from usual, so it is highly credible as evidence.

기존의 필적감정 작업은 감정관이 육안으로 확인하고 비교하여 동일 글자를 추출하고 분류하는 작업을 거치게 된다. 이에 따라 추출 및 분류에 시간이 과다하게 소요되고, 누락 또는 오류가 발생하기도 한다. 따라서, 감정관의 노동력을 줄여주고 소모되는 시간을 단축시킬 수 있는 글자 추출 방법이 요구된다.Existing handwriting appraisal work involves an appraiser visually checking and comparing identical letters to extract and classify them. Accordingly, excessive time is spent on extraction and classification, and omissions or errors may occur. Therefore, a character extraction method that can reduce the labor of the appraiser and shorten the time consumed is required.

본 발명은 인공지능 학습 모델을 이용한 유사 글자를 추출하는 것을 목적으로 한다.The purpose of the present invention is to extract similar letters using an artificial intelligence learning model.

또한, 본 발명은 사용자가 입력한 글자와 유사한 글자를 추출하여 제공하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to extract and provide letters similar to letters entered by a user.

또한, 본 발명은 추출된 글자 중 사용자가 선택한 글자를 표시하여 제공하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to display and provide letters selected by the user among the extracted letters.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출 방법은 사용자로부터 확인하고자 하는 대상인 제1 글자 데이터를 입력받는 단계, 복수의 대상 문서들에 포함된 전체 글자들에 대응하는 제2 글자 데이터들 중 상기 제1 글자 데이터의 형식에 따른 기 설정된 유사 글자 판단 조건에 기초하여 제3 글자 데이터를 추출하는 단계 및 사용자 단말에 상기 제3 글자 데이터를 제공하는 단계를 포함할 수 있다.In order to achieve the above object, a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention includes the steps of inputting first character data that is to be confirmed from the user, and the entire text contained in the plurality of target documents. Extracting third character data from among second character data corresponding to letters based on preset similar character judgment conditions according to the format of the first character data and providing the third character data to the user terminal. may include.

상기 제2 글자 데이터는 제1 인공지능 학습 모델을 이용하여 상기 복수의 대상 문서들에 포함된 전체 글자들 각각이 위치한 영역에 대한 제1 조각 이미지를 추출한 것을 포함하고, 제2 인공지능 학습 모델을 이용하여 상기 제1 조각 이미지 각각에 대응하는 글자를 레이블링한 레이블 값을 더 포함하고, 상기 제1 인공지능 학습 모델은 상기 복수의 대상 문서들에 대응하는 문서 이미지 내에서 글자가 존재할 것으로 예상되는 영역에 대한 조각 이미지를 추출하는 인공지능 학습 모델이고, 상기 제2 인공지능 학습 모델은 상기 제1 인공지능 학습 모델에 의해 추출된 조각 이미지에 대응하는 글자를 레이블링하는 인공지능 학습 모델일 수 있다.The second letter data includes extracting a first fragment image of the area where each of the entire letters included in the plurality of target documents is located using a first artificial intelligence learning model, and using a second artificial intelligence learning model. It further includes a label value labeling a letter corresponding to each of the first fragment images, and the first artificial intelligence learning model is configured to determine an area where the letter is expected to exist within the document image corresponding to the plurality of target documents. It is an artificial intelligence learning model that extracts a fragment image for, and the second artificial intelligence learning model may be an artificial intelligence learning model that labels letters corresponding to the fragment image extracted by the first artificial intelligence learning model.

상기 제1 글자 데이터를 입력받는 단계는 상기 복수의 대상 문서에 대응하는 문서 이미지들 중 사용자에 의해 지정된 영역에 대응하는 제2 조각 이미지를 포함하여 입력받는 단계 또는 상기 사용자 단말로부터 상기 사용자가 입력한 텍스트를 상기 제1 글자 데이터로 입력받는 단계를 포함할 수 있다.The step of receiving the first character data includes receiving a second piece image corresponding to an area designated by the user among the document images corresponding to the plurality of target documents, or input by the user from the user terminal. It may include receiving text as the first character data.

상기 제3 글자 데이터를 추출하는 단계는 상기 제1 글자 데이터가 상기 제2 조각 이미지인 경우, 제3 인공지능 학습 모델을 이용하여 상기 제2 조각 이미지와 상기 제1 조각 이미지들 사이의 유사도를 각각 산출하는 단계 및 상기 제1 조각 이미지들 중 상기 유사도가 기 설정된 유사도 이상인 제3 조각 이미지들을 상기 제3 글자 데이터로 추출하는 단계를 포함하고, 상기 제3 인공지능 학습 모델은 제1 조각 이미지와 제2 조각 이미지 사이의 유사도를 산출하는 인공지능 학습 모델일 수 있다.In the step of extracting the third character data, when the first character data is the second fragment image, the similarity between the second fragment image and the first fragment images is respectively calculated using a third artificial intelligence learning model. A step of calculating and extracting third fragment images whose similarity is greater than or equal to a preset similarity among the first fragment images as the third character data, wherein the third artificial intelligence learning model includes the first fragment image and the third fragment image. It may be an artificial intelligence learning model that calculates the similarity between two piece images.

상기 제3 글자 데이터를 추출하는 단계는 상기 제1 글자 데이터가 상기 텍스트인 경우, 상기 제2 글자 데이터들 중 상기 제2 글자 데이터에 대응하는 레이블 값이 상기 텍스트와 동일한 제4 글자 데이터를 추출하는 단계 및 상기 제4 글자 데이터에 대응하는 제4 조각 이미지를 상기 제3 글자 데이터로 추출하는 단계를 포함할 수 있다.The step of extracting the third character data includes, when the first character data is the text, extracting fourth character data whose label value corresponding to the second character data is the same as the text among the second character data. It may include extracting a fourth piece image corresponding to the fourth character data and the third character data.

상기 제3 글자 데이터를 제공하는 단계는 상기 제3 글자 데이터에 대응하는 조각 이미지들을 이용하여 그리드 형태의 글자 리스트를 생성하는 단계 및 상기 사용자 단말에 상기 제1 글자 데이터 및 상기 글자 리스트를 제공하는 단계를 포함할 수 있다.The step of providing the third character data includes generating a grid-shaped character list using fragment images corresponding to the third character data and providing the first character data and the character list to the user terminal. may include.

상기 제3 글자 데이터를 제공하는 단계 후, 상기 사용자 단말로부터 상기 글자 리스트 중 어느 하나인 제5 조각 이미지에 대한 상기 사용자의 선택을 수신하는 단계 및 상기 문서 이미지들 중 상기 제5 조각 이미지가 포함된 페이지 내에서 상기 제5 조각 이미지에 대응하는 영역에 기 설정된 크기의 블록을 표시하는 단계를 포함할 수 있다.After providing the third character data, receiving the user's selection of a fifth piece image that is one of the letter list from the user terminal and including the fifth piece image among the document images. It may include displaying a block of a preset size in an area corresponding to the fifth fragment image within the page.

상기 제1 글자 데이터를 입력받는 단계는 상기 사용자 단말을 통해 상기 사용자가 직접 입력한 필기체를 입력받는 단계 및 상기 필기체로부터 상기 제1 글자 데이터를 추출하는 단계를 포함할 수 있다.The step of receiving the first character data may include receiving a cursive directly input by the user through the user terminal and extracting the first character data from the cursive.

상기 제1 글자 데이터를 입력받는 단계는 상기 제2 글자 데이터에 대응하는 상기 제1 조각 이미지들 중 하나인 제6 조각 이미지에 대한 상기 사용자의 선택을 수신하는 단계 및 상기 제6 조각 이미지의 레이블링에 대응하는 텍스트를 상기 제1 글자 데이터로 입력받는 단계를 포함할 수 있다.The step of receiving the first character data includes receiving the user's selection of a sixth piece image that is one of the first piece images corresponding to the second character data and labeling the sixth piece image. It may include receiving the corresponding text as the first character data.

또한, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출 장치는 적어도 하나의 프로그램이 기록된 메모리 및 상기 프로그램을 실행하는 프로세서를 포함하고, 상기 프로그램은 사용자로부터 확인하고자 하는 대상인 제1 글자 데이터를 입력받는 단계, 복수의 대상 문서들에 포함된 전체 글자들에 대응하는 제2 글자 데이터들 중 상기 제1 글자 데이터의 형식에 따른 기 설정된 유사 글자 판단 조건에 기초하여 제3 글자 데이터를 추출하는 단계 및 사용자 단말에 상기 제3 글자 데이터를 제공하는 단계를 수행하기 위한 명령어들을 포함할 수 있다.In addition, a similar character extracting device using an artificial intelligence learning model according to an embodiment of the present invention includes a memory in which at least one program is recorded and a processor that executes the program, and the program is an object that the user wishes to check. Inputting one character data, third character data based on preset similar character judgment conditions according to the format of the first character data among second character data corresponding to all characters included in a plurality of target documents It may include instructions for performing the step of extracting and providing the third character data to the user terminal.

본 발명에 따르면, 인공지능 학습 모델을 이용한 유사 글자를 추출할 수 있다.According to the present invention, similar letters can be extracted using an artificial intelligence learning model.

또한, 본 발명에 따르면, 사용자가 입력한 글자와 유사한 글자를 추출하여 제공할 수 있다.Additionally, according to the present invention, letters similar to letters input by the user can be extracted and provided.

또한, 본 발명에 따르면, 추출된 글자 중 사용자가 선택한 글자를 표시하여 제공할 수 있다.Additionally, according to the present invention, letters selected by the user among the extracted letters can be displayed and provided.

도 1은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출을 위한 주체들을 나타낸 블록도이다.
도 2는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.
도 8은 본 발명의 일 실시예에 따른 필기체 인식 기능의 구조를 나타낸 도면이다.
도 9는 본 발명의 일 실시예에 따른 동일 문자 자동 추출 방법 및 유사도 학습 구조를 나타낸 도면이다.
도 10은 CRAFT(Character Region Awareness for Text Detection) 모델을 이용한 글자 영역 검출에 대한 예시를 나타낸 도면이다.
도 11은 CRAFT 모델의 신경망 구조를 나타낸 도면이다.
도 12은 CRAFT 모델에서 BOX를 SCORE로 만드는 과정을 나타낸 도면이다.
도 13는 CRAFT 모델의 학습 과정을 나타낸 도면이다.
도 14은 CRAFT 모델을 이용하여 글자를 추출한 결과를 나타낸 도면이다.
도 15는 메트릭 학습을 위한 양성 샘플 및 음성 샘플의 예시를 나타낸 도면이다.
도 16은 메트릭 학습에서 Loss에 대한 임베딩 공간에서의 영향을 나타낸 도면이다.
도 17은 메트릭 학습 방법에 대한 묘사를 나타낸 도면이다.
도 18은 메트릭 학습 방법에 이용하는 데이터를 증강시키는 방법의 예시를 나타낸 도면이다.
도 19는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출 방법에서 사용자에 의해 지정된 영역을 표시한 대상 문서를 나타낸 도면이다.
도 20은 사용자 단말에 제공된 결과 화면의 UI의 예시를 나타낸 도면이다.Figure 1 is a block diagram showing subjects for extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 2 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 3 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 4 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 5 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 6 is an operation flowchart showing a method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 7 is a diagram showing a computer system according to an embodiment of the present invention.
Figure 8 is a diagram showing the structure of a handwriting recognition function according to an embodiment of the present invention.
Figure 9 is a diagram showing a method for automatically extracting identical characters and a similarity learning structure according to an embodiment of the present invention.
Figure 10 is a diagram showing an example of character region detection using the CRAFT (Character Region Awareness for Text Detection) model.
Figure 11 is a diagram showing the neural network structure of the CRAFT model.
Figure 12 is a diagram showing the process of making BOX into SCORE in the CRAFT model.
Figure 13 is a diagram showing the learning process of the CRAFT model.
Figure 14 is a diagram showing the results of extracting letters using the CRAFT model.
Figure 15 is a diagram showing examples of positive samples and negative samples for metric learning.
Figure 16 is a diagram showing the influence of the embedding space on loss in metric learning.
Figure 17 is a diagram showing a depiction of a metric learning method.
Figure 18 is a diagram showing an example of a method for augmenting data used in a metric learning method.
Figure 19 is a diagram showing a target document displaying an area designated by the user in a similar character extraction method using an artificial intelligence learning model according to an embodiment of the present invention.
Figure 20 is a diagram showing an example of the UI of the result screen provided to the user terminal.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will be described in detail with reference to the attached drawings as follows. Here, repeated descriptions, known functions that may unnecessarily obscure the gist of the present invention, and detailed descriptions of configurations are omitted. Embodiments of the present invention are provided to more completely explain the present invention to those skilled in the art. Accordingly, the shapes and sizes of elements in the drawings may be exaggerated for clearer explanation.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part “includes” a certain element, this means that it may further include other elements rather than excluding other elements, unless specifically stated to the contrary.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출을 위한 주체들을 나타낸 블록도이다.Figure 1 is a block diagram showing subjects for extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출을 위한 주체들은 유사 글자 추출 장치(110) 및 사용자 단말(120)을 포함한다.Referring to FIG. 1, subjects for extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention include a similar character extracting device 110 and a user terminal 120.

유사 글자 추출 장치(110)는 사용자가 확인하고자 하는 대상인 제1 글자 데이터를 입력받는 장치를 의미할 수 있다.The similar character extraction device 110 may refer to a device that receives first character data that the user wishes to check.

유사 글자 추출 장치(110)는 복수의 대상 문서들에 포함된 전체 글자들에 대응하는 제2 글자 데이터들 중 제1 글자 데이터의 형식에 따른 기 설정된 유사 글자 판단 조건을 이용하여 제3 글자 데이터를 추출하는 장치일 수 있다.The similar character extraction device 110 extracts the third character data using preset similar character judgment conditions according to the format of the first character data among the second character data corresponding to all characters included in the plurality of target documents. It may be an extraction device.

유사 글자 추출 장치(110)는 사용자 단말(120)에 제3 글자 데이터를 제공하는 장치일 수 있다.The similar character extraction device 110 may be a device that provides third character data to the user terminal 120.

사용자 단말(120)은 유사 글자 추출 장치(110)로부터 제3 글자 데이터를 수신하는 장치일 수 있다.The user terminal 120 may be a device that receives third character data from the similar character extraction device 110.

유사 글자 추출 장치(110) 및 사용자 단말(120)은 통신망을 통해 상호 연결될 수 있다.The similar character extracting device 110 and the user terminal 120 may be interconnected through a communication network.

통신망은 위와 같은 주체들 사이에서 데이터가 송수신되도록 하기 위한 접속 경로를 의미한다. 예컨대, 통신망은 LANs(Local Area Networks), WANs(Wide Area Networks), MANs(Metropolitan Area Networks), ISDNs(Integrated Service Digital Networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명에 적용될 수 있는 통신망의 범위가 이에 한정되는 것은 아니다.A communication network refers to an access path for data to be transmitted and received between the above entities. For example, communication networks include wired networks such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), and ISDNs (Integrated Service Digital Networks), or wireless networks such as wireless LANs, CDMA, Bluetooth, and satellite communications. It may encompass networks, but the scope of communication networks applicable to the present invention is not limited thereto.

도 2는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.Figure 2 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법은 먼저, 사용자로부터 확인하고자 하는 대상인 제1 글자 데이터를 입력받을 수 있다(S210).Referring to FIG. 2, the method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention can first receive input of first letter data that is to be confirmed from the user (S210).

다음으로, 복수의 대상 문서들에 포함된 전체 글자들에 대응하는 제2 글자 데이터들 중 상기 제1 글자 데이터의 형식에 따른 기 설정된 유사 글자 판단 조건에 기초하여 제3 글자 데이터를 추출할 수 있다(S220).Next, third character data can be extracted from second character data corresponding to all characters included in a plurality of target documents based on preset similar character judgment conditions according to the format of the first character data. (S220).

여기서, 제2 글자 데이터는 제1 인공지능 학습 모델을 이용하여 상기 복수의 대상 문서들에 포함된 전체 글자들 각각이 위치한 영역에 대한 제1 조각 이미지를 추출한 것을 포함할 수 있다. 또한, 제2 인공지능 학습 모델을 이용하여 상기 제1 조각 이미지 각각에 대응하는 글자를 레이블링한 레이블 값을 더 포함할 수 있다.Here, the second letter data may include extracting a first fragment image for an area where each of all letters included in the plurality of target documents is located using a first artificial intelligence learning model. In addition, it may further include a label value labeling letters corresponding to each of the first fragment images using a second artificial intelligence learning model.

여기서, 레이블 값은 제1 조각 이미지에 포함된 글자에 대응하는 값일 수 있다. 예를 들면, 제1 조각 이미지에 '가' 라는 글자가 포함되어 있는 경우, 제1 조각 이미지의 레이블 값은 '가' 로 레이블링될 수 있다.Here, the label value may be a value corresponding to a letter included in the first fragment image. For example, if the first piece image includes the letter 'A', the label value of the first piece image may be labeled as 'A'.

이 때, 제1 인공지능 학습 모델은 상기 복수의 대상 문서들에 대응하는 문서 이미지 내에서 글자가 존재할 것으로 예상되는 영역에 대한 조각 이미지를 추출하는 인공지능 학습 모델이고, 제2 인공지능 학습 모델은 상기 제1 인공지능 학습 모델에 의해 추출된 조각 이미지에 대응하는 글자를 레이블링하는 인공지능 학습 모델일 수 있다.At this time, the first artificial intelligence learning model is an artificial intelligence learning model that extracts a fragment image for the area where letters are expected to exist within the document image corresponding to the plurality of target documents, and the second artificial intelligence learning model is It may be an artificial intelligence learning model that labels letters corresponding to the fragment image extracted by the first artificial intelligence learning model.

예컨대, 제1 인공지능 학습 모델은 CRAFT(Character Region Awareness for Text Detection, 텍스트 감지를 위한 문자 영역 인식) 모델을 포함할 수 있다. CRAFT 모델은 글자 단위로 글자가 위치한 영역을 탐색하여 글자를 추출할 수 있는 모델이다.For example, the first artificial intelligence learning model may include a CRAFT (Character Region Awareness for Text Detection) model. The CRAFT model is a model that can extract letters by searching the area where the letters are located on a letter by letter basis.

다음으로, 사용자 단말에 상기 제3 글자 데이터를 제공할 수 있다(S230).Next, the third character data can be provided to the user terminal (S230).

일 실시예에 따르면, 제1 글자 데이터를 입력받는 단계는 상기 복수의 대상 문서에 대응하는 문서 이미지들 중 사용자에 의해 지정된 영역에 대응하는 제2 조각 이미지를 포함하여 입력받는 단계 또는 상기 사용자 단말로부터 상기 사용자가 입력한 텍스트를 상기 제1 글자 데이터로 입력받는 단계를 포함할 수 있다.According to one embodiment, the step of receiving first character data includes receiving a second piece image corresponding to an area designated by the user among the document images corresponding to the plurality of target documents or from the user terminal. It may include receiving the text input by the user as the first character data.

예컨대, 도 19와 같이 사용자는 문서 이미지들 중 입력하고자 하는 글자가 포함된 일정 영역을 드래그하여 선택하고, 선택된 영역에 대응하는 제2 조각 이미지를 제1 글자 데이터로 입력받을 수 있다. 또는 사용자가 문서 이미지들에서 찾고자 하는 글자를 텍스트로 입력하고, 입력된 텍스트를 제1 글자 데이터로 입력받을 수 있다.For example, as shown in Figure 19, the user can drag and select a certain area containing the letter to be input among the document images, and receive the second piece image corresponding to the selected area as the first letter data. Alternatively, the user can input the letters they want to find in document images as text, and receive the input text as first letter data.

일 실시예에 따르면, 제1 글자 데이터를 입력받는 단계는 상기 사용자 단말을 통해 상기 사용자가 직접 입력한 필기체를 입력받는 단계 및 상기 필기체로부터 상기 제1 글자 데이터를 추출하는 단계를 포함할 수 있다.According to one embodiment, the step of receiving first character data may include receiving a handwriting directly input by the user through the user terminal and extracting the first character data from the handwriting.

예컨대, 사용자는 펜 형태의 입력 장치를 이용하여 직접 필기체를 입력하고, 입력한 필기체를 포함하는 기 설정된 영역을 제1 글자 데이터로 추출하여 입력받을 수 있다. 범죄 수사에 이용하는 경우, 용의자에게 직접 필기체를 쓰도록 하여 용의자의 필적감정에 이용할 수 있다.For example, a user can directly input cursive writing using a pen-type input device, and extract a preset area containing the input cursive text as first character data and receive the input. When used in a criminal investigation, it can be used to evaluate the suspect's handwriting by having the suspect write in cursive.

일 실시예에 따르면, 제1 글자 데이터를 입력받는 단계는 상기 제2 글자 데이터에 대응하는 상기 제1 조각 이미지들 중 하나인 제6 조각 이미지에 대한 상기 사용자의 선택을 수신하는 단계 및 상기 제6 조각 이미지의 레이블링에 대응하는 텍스트를 상기 제1 글자 데이터로 입력받는 단계를 포함할 수 있다.According to one embodiment, the step of receiving first character data includes receiving the user's selection of a sixth piece image that is one of the first piece images corresponding to the second letter data, and the sixth piece image. It may include receiving text corresponding to the labeling of the fragment image as the first character data.

예컨대, 사용자는 문서 이미지들에 포함된 조각 이미지들 중 입력하고자 하는 글자가 포함된 조각 이미지를 선택하고, 선택된 조각 이미지에 대한 레이블링에 대응하는 텍스트를 제1 글자 데이터로 입력받을 수 있다.For example, the user may select a fragment image containing a letter to be input among fragment images included in document images, and receive text corresponding to the labeling for the selected fragment image as first character data.

일 실시예에 다르면, 상기 S210 단계 전, 상기 복수의 대상 문서들에 포함된 전체 글자들 각각이 위치한 영역에 대한 제1 조각 이미지를 추출하고, 제2 인공지능 학습 모델을 이용하여 상기 제1 조각 이미지 각각에 대응하는 글자를 레이블링하여 제2 글자 데이터를 생성할 수 있다.According to one embodiment, before step S210, the first fragment image for the area where each of the entire letters included in the plurality of target documents is located is extracted, and the first fragment image is extracted using a second artificial intelligence learning model. Second letter data can be generated by labeling letters corresponding to each image.

일 실시예에 따르면, 상기 S220 단계 전, 상기 복수의 대상 문서들에 포함된 전체 글자들 각각이 위치한 영역에 대한 제1 조각 이미지를 추출하고, 제2 인공지능 학습 모델을 이용하여 상기 제1 조각 이미지 각각에 대응하는 글자를 레이블링하여 제2 글자 데이터를 생성할 수 있다.According to one embodiment, before step S220, a first fragment image for an area where each of all letters included in the plurality of target documents is located is extracted, and the first fragment image is extracted using a second artificial intelligence learning model. Second letter data can be generated by labeling letters corresponding to each image.

도 3은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.Figure 3 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법은 먼저, 상기 제1 글자 데이터가 상기 제2 조각 이미지인 경우, 제3 인공지능 학습 모델을 이용하여 상기 제2 조각 이미지와 상기 제1 조각 이미지들 사이의 유사도를 각각 산출할 수 있다(S310).Referring to FIG. 3, the method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention first uses a third artificial intelligence learning model when the first character data is the second fragment image. Thus, the degree of similarity between the second fragment image and the first fragment image can be calculated, respectively (S310).

이 때, 제3 인공지능 학습 모델은 제1 조각 이미지와 제2 조각 이미지 사이의 유사도를 산출하는 인공지능 학습 모델일 수 있다.At this time, the third artificial intelligence learning model may be an artificial intelligence learning model that calculates the similarity between the first fragment image and the second fragment image.

예컨대, 제2 조각 이미지에는 레이블링이 되어 있지 않기 때문에 제3 인공지능 학습 모델을 이용하여 제2 조각 이미지와 제1 조각 이미지들 각각의 이미지 형태를 비교하여 유사도를 산출할 수 있다.For example, since the second fragment image is not labeled, the similarity can be calculated by comparing the image shapes of each of the second fragment image and the first fragment image using a third artificial intelligence learning model.

여기서, 제3 인공지능 학습 모델은 Proxy Anchor Loss(프록시 앵커 손실) 기반 메트릭 학습(Metric Learning) 방법을 적용하여 학습할 수 있다. 같은 글자는 양성 샘플로, 다른 글자는 음성 샘플로 두고 Proxy Anchor Loss를 이용한 학습을 통해 유사도를 산출할 수 있다. Proxy 기반으로 빠른 속도와 성능을 가질 수 있다. 데이터가 부족한 경우, 데이터 증강 기법을 이용하여 부족한 데이터를 극복할 수 있다. 데이터를 증강하는 경우, 도 18과 같이 같은 글씨에 대하여 기울기, 색, 배경색, 선명도 등을 다르게 하여 데이터를 증강시킬 수 있다. 또한, 공책의 가로줄과 유사한 환경을 추가하여 데이터를 증강시킬 수 있다.Here, the third artificial intelligence learning model can be learned by applying the Proxy Anchor Loss-based metric learning method. Similarity can be calculated through learning using Proxy Anchor Loss, with the same letters as positive samples and different letters as negative samples. It can have fast speed and performance based on proxy. If data is insufficient, data augmentation techniques can be used to overcome the insufficient data. When augmenting data, the data can be augmented by varying the slope, color, background color, sharpness, etc. for the same text, as shown in FIG. 18. Additionally, data can be augmented by adding an environment similar to the horizontal lines of a notebook.

다음으로, 상기 제1 조각 이미지들 중 상기 유사도가 기 설정된 유사도 이상인 제3 조각 이미지들을 상기 제3 글자 데이터로 추출할 수 있다(S320).Next, among the first fragment images, third fragment images whose similarity is greater than or equal to a preset similarity can be extracted as the third character data (S320).

도 4는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.Figure 4 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법은 먼저, 상기 제1 글자 데이터가 상기 텍스트인 경우, 상기 제2 글자 데이터들 중 상기 제2 글자 데이터에 대응하는 레이블 값이 상기 텍스트와 동일한 제4 글자 데이터를 추출할 수 있다(S410).Referring to FIG. 4, the method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention first, when the first letter data is the text, the second letter among the second letter data Fourth character data whose label value corresponding to the data is the same as the text can be extracted (S410).

다음으로, 상기 제4 글자 데이터에 대응하는 제4 조각 이미지를 상기 제3 글자 데이터로 추출할 수 있다(S420).Next, the fourth piece image corresponding to the fourth character data can be extracted as the third character data (S420).

선택적 실시예로서, 제4 글자 데이터 및 제4 글자 데이터와 유사한 글자 데이터를 제3 글자 데이터로 추출할 수 있다.As an optional embodiment, fourth character data and character data similar to the fourth character data may be extracted as third character data.

구체적으로, 제4 글자 데이터를 추출한 후, 제3 인공지능 학습 모델을 이용하여 제4 글자 데이터에 대응하는 조각 이미지와 제1 조각 이미지들 사이의 유사도를 각각 산출할 수 있다.Specifically, after extracting the fourth letter data, the similarity between the fragment image corresponding to the fourth letter data and the first fragment images can be calculated using a third artificial intelligence learning model, respectively.

다음으로, 산출된 유사도에 기초하여, 제1 조각 이미지들 중 기 설정된 유사도 이상인 제7 조각 이미지들 및 제4 글자 데이터를 제3 글자 데이터로 추출할 수 있다.Next, based on the calculated similarity, the seventh fragment images and the fourth character data that have a preset similarity or higher among the first fragment images may be extracted as the third character data.

이를 통해, 제2 글자 데이터의 레이블링 오류가 발생하여 사용자의 텍스트 입력에 대하여 동일한 글자 데이터로 추출되지 않는 경우, 조각 이미지의 유사도를 통해 추가로 추출할 수 있는 효과가 있다.Through this, if a labeling error occurs in the second character data and the same character data is not extracted for the user's text input, there is an effect of allowing additional extraction through the similarity of the fragment image.

도 5는 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.Figure 5 is an operation flowchart showing a method of extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법은 먼저, 상기 제3 글자 데이터에 대응하는 조각 이미지들을 이용하여 그리드 형태의 글자 리스트를 생성할 수 있다(S510).Referring to FIG. 5, the method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention can first generate a grid-shaped letter list using fragment images corresponding to the third letter data. There is (S510).

다음으로, 상기 사용자 단말에 상기 제1 글자 데이터 및 상기 글자 리스트를 제공할 수 있다(S520).Next, the first character data and the character list may be provided to the user terminal (S520).

도 20은 사용자 단말에 제공된 제1 글자 데이터, 글자 리스트 및 제1 글자 데이터에 대응하는 문서 페이지를 포함하는 결과 화면의 UI(사용자 인터페이스)의 예시를 나타낸 도면이다. 사용자는 제1 글자 데이터, 글자 리스트 및 문서 페이지를 하나의 화면을 통해 확인할 수 있다.Figure 20 is a diagram showing an example of a UI (user interface) of a result screen including first character data provided to a user terminal, a character list, and a document page corresponding to the first character data. The user can check the first character data, character list, and document page through one screen.

도 6은 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법을 나타낸 동작 흐름도이다.Figure 6 is an operation flowchart showing a method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출하는 방법은 상기 제3 글자 데이터를 제공하는 단계 후, 상기 사용자 단말로부터 상기 글자 리스트 중 어느 하나인 제5 조각 이미지에 대한 상기 사용자의 선택을 수신할 수 있다(S610).Referring to FIG. 6, in the method of extracting similar letters using an artificial intelligence learning model according to an embodiment of the present invention, after providing the third letter data, a fifth letter from the user terminal is selected from the list of letters. The user's selection of the sculpture image may be received (S610).

다음으로, 상기 문서 이미지들 중 상기 제5 조각 이미지가 포함된 페이지 내에서 상기 제5 조각 이미지에 대응하는 영역에 기 설정된 크기의 블록을 표시할 수 있다(S620).Next, a block of a preset size may be displayed in an area corresponding to the fifth fragment image within the page including the fifth fragment image among the document images (S620).

선택적 실시예로서, 형태소 단위의 글자 데이터를 입력받고, 동일 또는 유사 글자 데이터를 추출하여 제공할 수 있다.As an optional embodiment, character data in morpheme units may be input, and identical or similar character data may be extracted and provided.

구체적으로, 사용자로부터 제1 글자 데이터를 형태소 단위의 조각 이미지를 입력받거나, 형태소 단위의 텍스트를 입력받을 수 있다.Specifically, the first character data may be input as a fragment image in units of morphemes or as text in units of morphemes from the user.

또한, 제2 글자 데이터를 추출하는 경우, 제1 인공지능 학습 모델의 파라미터를 조정하여 형태소 단위로 제1 조각 이미지를 추출하고, 제2 인공지능 학습 모델을 이용하여 제1 조각 이미지 각각에 대응하는 형태소를 레이블링하여 제2 글자 데이터를 추출할 수 있다.In addition, when extracting the second character data, the parameters of the first artificial intelligence learning model are adjusted to extract the first fragment image in units of morphemes, and the second artificial intelligence learning model is used to extract the first fragment image corresponding to each of the first fragment images. The second letter data can be extracted by labeling the morpheme.

다음으로, 제1 글자 데이터가 조각 이미지인 경우, 제1 조각 이미지들과의 유사도를 산출하고, 기 설정된 유사도 이상인 제3 조각 이미지들을 제3 글자 데이터로 추출할 수 있다. 또한, 제1 글자 데이터가 텍스트인 경우, 제2 글자 데이터에 대응하는 레이블링이 텍스트와 동일한 제4 글자 데이터를 추출하고 이에 대응하는 제4 조각 이미지를 제3 글자 데이터로 추출할 수 있다.Next, when the first letter data is a fragment image, the similarity with the first fragment images can be calculated, and the third fragment images with a preset similarity or higher can be extracted as the third letter data. In addition, when the first character data is text, the fourth character data whose labeling corresponding to the second character data is the same as the text can be extracted, and the fourth fragment image corresponding thereto can be extracted as the third character data.

다음으로, 제3 글자 데이터에 대응하는 형태소 단위의 조각 이미지들을 이용하여 그리드 형태의 형태소 리스트를 생성하고 이를 사용자 단말에 제공할 수 있다.Next, a grid-shaped morpheme list can be created using fragment images of morpheme units corresponding to the third character data and provided to the user terminal.

도 7은 본 발명의 일 실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.Figure 7 is a diagram showing a computer system according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 인공지능 학습 모델을 이용한 유사 글자 추출 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1000)에서 구현될 수 있다.An apparatus for extracting similar characters using an artificial intelligence learning model according to an embodiment of the present invention may be implemented in a computer system 1000 such as a computer-readable recording medium.

도 7을 참조하면, 컴퓨터 시스템(1000)은 버스(1020)를 통하여 서로 통신하는 하나 이상의 프로세서(1010), 메모리(1030), 사용자 인터페이스 입력 장치(1040), 사용자 인터페이스 출력 장치(1050) 및 스토리지(1060)를 포함할 수 있다. 또한, 컴퓨터 시스템(1000)은 네트워크(1080)에 연결되는 네트워크 인터페이스(1070)를 더 포함할 수 있다. 프로세서(1010)는 중앙 처리 장치 또는 메모리(1030)나 스토리지(1060)에 저장된 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1030) 및 스토리지(1060)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리는 ROM(1031)이나 RAM(1032)을 포함할 수 있다.Referring to FIG. 7, the computer system 1000 includes one or more processors 1010, memory 1030, user interface input device 1040, user interface output device 1050, and storage that communicate with each other through a bus 1020. It may include (1060). Additionally, the computer system 1000 may further include a network interface 1070 connected to the network 1080. The processor 1010 may be a central processing unit or a semiconductor device that executes processing instructions stored in the memory 1030 or storage 1060. Memory 1030 and storage 1060 may be various types of volatile or non-volatile storage media. For example, memory may include ROM 1031 or RAM 1032.

도 8은 본 발명의 일 실시예에 따른 필기체 인식 기능의 구조를 나타낸 도면이다.Figure 8 is a diagram showing the structure of a handwriting recognition function according to an embodiment of the present invention.

도 8을 참조하면, 필기체 인식 기능의 구조는 디지털 이미지로 변환된 파일로부터 한글 필기체를 인식하고 각 글자 위치의 좌표를 저장할 수 있다.Referring to Figure 8, the structure of the cursive recognition function can recognize Korean cursive from a file converted to a digital image and store the coordinates of each letter position.

이 때, 인식된 글자와 유사한 글자를 매칭한 후 해당 글자의 위치에 기 설정된 색상 및 형태의 마킹 처리하고, 마킹 처리된 영역을 이미지로 저장할 수 있다.At this time, after matching letters similar to the recognized letters, the position of the letter can be marked with a preset color and shape, and the marked area can be saved as an image.

도 9는 본 발명의 일 실시예에 따른 동일 문자 자동 추출 방법 및 유사도 학습 구조를 나타낸 도면이다.Figure 9 is a diagram showing a method for automatically extracting identical characters and a similarity learning structure according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시예에 따른 동일 문자 자동 추출 방법은 초성, 중성 및 종성의 조합으로 이루어져서 매우 많은 문자를 가지는 한글에 맞는 광학 문자 인식 방법을 이용할 수 있다.Referring to FIG. 9, the method for automatically extracting identical characters according to an embodiment of the present invention can use an optical character recognition method suitable for Hangul, which has a large number of characters due to its combination of initial consonants, middle consonants, and final consonants.

대상 문서들에서 입력받은 글자와 유사한 글자를 추출할 수 있다.Letters similar to the input letters can be extracted from target documents.

이 때, 메트릭 학습을 기반으로 학습된 인공지능 학습 모델을 이용하여 비슷한 형태의 글자 사이의 유사도를 산출할 수 있다.At this time, the similarity between letters of similar shapes can be calculated using an artificial intelligence learning model learned based on metric learning.

여기서, 메트릭 학습 방법은 동일 글자 사이의 유사도를 높이고 서로 다른 글자 사이의 유사도는 줄어들도록 학습하는 방법을 의미할 수 있다.Here, the metric learning method may refer to a method of learning to increase the similarity between the same letters and reduce the similarity between different letters.

도 10은 CRAFT(Character Region Awareness for Text Detection) 모델을 이용한 글자 영역 검출에 대한 예시를 나타낸 도면이다. 도 11은 CRAFT 모델의 신경망 구조를 나타낸 도면이다. 도 12은 CRAFT 모델에서 BOX를 SCORE로 만드는 과정을 나타낸 도면이다. 도 13는 CRAFT 모델의 학습 과정을 나타낸 도면이다. 도 14은 CRAFT 모델을 이용하여 글자를 추출한 결과를 나타낸 도면이다.Figure 10 is a diagram showing an example of character region detection using the CRAFT (Character Region Awareness for Text Detection) model. Figure 11 is a diagram showing the neural network structure of the CRAFT model. Figure 12 is a diagram showing the process of making BOX into SCORE in the CRAFT model. Figure 13 is a diagram showing the learning process of the CRAFT model. Figure 14 is a diagram showing the results of extracting letters using the CRAFT model.

기존의 글자 검출 모델인 YOLO v5 모델을 이용하여 글자를 검출하는 경우, 인쇄체는 어절 단위로 검출이 가능하지만 글자 단위의 검출이 어렵고, 인쇄체와 필기체가 섞인 경우에는 검출이 어려운 한계가 발생한다. 이에 따라 CRAFT 모델을 기반으로 하는 인공지능 학습 모델을 이용하여 글자 단위의 필기체를 검출하는 방법을 적용할 수 있다.When detecting letters using the YOLO v5 model, which is an existing letter detection model, printed fonts can be detected word by word, but character-by-character detection is difficult, and when printed and cursive fonts are mixed, detection is difficult. Accordingly, a method of detecting handwriting at the character level can be applied using an artificial intelligence learning model based on the CRAFT model.

도 10을 참조하면, CRAFT 모델은 일상 생활에서 볼 수 있는 간판 등에 포함된 글자를 검출할 수 있다. 일반적으로 가로로 이어서 쓰게 되는 문서에 비해서 문자가 이어지는 구조 자체는 복잡한 경우가 많지만 얼핏 보아도 정보를 전달할 수 있도록 가독성이 높고 각각의 문자를 명확하게 구분할 수 있다.Referring to Figure 10, the CRAFT model can detect letters included in signs, etc. that can be seen in everyday life. Compared to documents that are generally written horizontally, the structure of the letters themselves is often more complicated, but it is highly readable and can clearly distinguish each letter to convey information even at a glance.

도 11을 참조하면, CRAFT 모델은 VGG16-BN을 기본으로 컨볼루션 레이어를 쌓고 up sampling으로 결과값의 해상도를 입력 이미지에 준하는 크기로 키운 단순한 segmentation 신경망과 유사한 구조를 가질 수 있다.Referring to Figure 11, the CRAFT model may have a structure similar to a simple segmentation neural network in which convolutional layers are stacked based on VGG16-BN and the resolution of the result is increased to a size equivalent to the input image through up sampling.

여기서, 문자에 해당하는 영역의 확률을 나타내는 Region Score와 앞의 글자와 뒤의 글자가 이어져있는 글자일 확률을 나타내는 Affinity Score의 2가지 마스크를 결과값으로 얻을 수 있다.Here, two masks can be obtained as result values: Region Score, which represents the probability of the area corresponding to the character, and Affinity Score, which represents the probability that the preceding and following characters are connected.

도 12를 참조하면, Annotation box는 2D 가우시안 형태의 Region score와 Affinity Score로 변환될 수 있다.Referring to FIG. 12, the annotation box can be converted into a 2D Gaussian region score and affinity score.

도 13을 참조하면, CRAFT 모델을 어절 단위로 Annotation box가 주어진 데이터셋에서 학습시킬 수 있다. 어절 단위로 자른 이미지에서 얻은 Region Score와 Affinity Score가 원래 이미지에서 얻은 Score보다 정확하다는 가정으로 Pseudo Ground Truth로 두어 학습시킬 수 있다. 여기에 좀 더 정확한 보정을 위해 임의의 이미지에 임의의 글자를 합성하여 가공된 이미지를 생성하여 학습시킬 수 있다. 이는 정확한 Score를 만들어낼 수 있기 때문에 초기 학습에 도움이 될 수 있다.Referring to Figure 13, the CRAFT model can be trained on a dataset given an annotation box on a word-by-word basis. The region score and affinity score obtained from the image cut into words can be trained as pseudo ground truth on the assumption that they are more accurate than the score obtained from the original image. Here, for more accurate correction, you can create a processed image by combining random letters with a random image and train it. This can be helpful for initial learning because it can produce an accurate score.

도 15는 메트릭 학습을 위한 양성 샘플 및 음성 샘플의 예시를 나타낸 도면이다. 도 16은 메트릭 학습에서 Loss에 대한 임베딩 공간에서의 영향을 나타낸 도면이다. 도 17은 메트릭 학습 방법에 대한 묘사를 나타낸 도면이다.Figure 15 is a diagram showing examples of positive samples and negative samples for metric learning. Figure 16 is a diagram showing the influence of the embedding space on loss in metric learning. Figure 17 is a diagram showing a depiction of a metric learning method.

도 15 내지 도 17을 참조하면, 메트릭 학습 방법은 같은 클래스의 이미지끼리는 비슷한 임베딩 벡터를, 다른 클래스의 이미지에서는 비슷하지 않은 임베딩 벡터를 얻도록 네트워크를 학습시키는 학습 방법일 수 있다. 기준이 되는 Anchor를 정하고 같은 클래스 이미지(positive pair)를 가깝게, 다른 클래스 이미지들(negative pair)를 멀게 임베딩하도록 학습시킬 수 있다. 즉, 도 15의 예시에 나타난 양성 샘플 및 음성 샘플에 대하여 도 16과 같이 양성 샘플과의 거리는 가깝게, 음성 샘플 사이의 거리는 멀게 임베딩 되도록 학습시킬 수 있다.Referring to Figures 15 to 17, the metric learning method may be a learning method that trains a network to obtain similar embedding vectors for images of the same class and dissimilar embedding vectors for images of different classes. You can set a standard anchor and learn to embed images of the same class (positive pair) closer and images of different classes (negative pairs) farther away. That is, the positive and negative samples shown in the example of FIG. 15 can be learned to be embedded so that the distance to the positive sample is close and the distance between negative samples is large, as shown in FIG. 16.

본 발명에서 설명하는 특정 실행들은 일 실시예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, "필수적인", "중요하게" 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are examples and do not limit the scope of the present invention in any way. For the sake of brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connections or connection members of lines between components shown in the drawings exemplify functional connections and/or physical or circuit connections, and in actual devices, various functional connections or physical connections may be replaced or added. Can be represented as connections, or circuit connections. Additionally, if there is no specific mention such as “essential,” “important,” etc., it may not be a necessary component for the application of the present invention.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all scopes equivalent to or equivalently changed from the scope of the claims are within the scope of the spirit of the present invention. It will be said to belong to

110: 유사 글자 추출 장치 120: 사용자 단말
1000: 컴퓨터 시스템 1010: 프로세서
1020: 버스 1030: 메모리
1031: 롬 1032: 램
1040: 사용자 인터페이스 입력 장치
1050: 사용자 인터페이스 출력 장치
1060: 스토리지 1070: 네트워크 인터페이스
1080: 네트워크110: Similar character extraction device 120: User terminal
1000: computer system 1010: processor
1020: Bus 1030: Memory
1031: Rom 1032: RAM
1040: User interface input device
1050: User interface output device
1060: Storage 1070: Network Interface
1080: Network

Claims

Receiving first character data that is to be confirmed from the user;
extracting third character data from among second character data corresponding to all characters included in a plurality of target documents based on a preset similar character judgment condition according to the format of the first character data; and
Providing the third character data to the user terminal
Including,
The second character data is,
It includes extracting a first fragment image for an area where each of all letters included in the plurality of target documents is located using a first artificial intelligence learning model,
It further includes a label value labeling letters corresponding to each of the first fragment images using a second artificial intelligence learning model,
The first artificial intelligence learning model is,
It is an artificial intelligence learning model that extracts a fragment image for an area where letters are expected to exist within the document image corresponding to the plurality of target documents,
The second artificial intelligence learning model is,
A method of extracting similar letters using an artificial intelligence learning model, which is an artificial intelligence learning model that labels letters corresponding to the fragment image extracted by the first artificial intelligence learning model.

delete

According to claim 1,
The step of receiving the first character data is,
Receiving an input including a second fragment image corresponding to an area designated by the user among the document images corresponding to the plurality of target documents; or
Receiving the text input by the user as the first character data from the user terminal
A method of extracting similar characters using an artificial intelligence learning model, including.

According to clause 3,
The step of extracting the third character data is,
When the first character data is the second fragment image, calculating the similarity between the second fragment image and the first fragment image using a third artificial intelligence learning model; and
Extracting third fragment images among the first fragment images, the similarity of which is greater than or equal to a preset similarity, as the third character data.
Including,
The third artificial intelligence learning model is,
A similar character extraction method using an artificial intelligence learning model, which is an artificial intelligence learning model that calculates the similarity between the first fragment image and the second fragment image.

According to clause 3,
The step of extracting the third character data is,
When the first character data is the text, extracting fourth character data from among the second character data whose label value corresponding to the second character data is the same as the text; and
Extracting the fourth piece image corresponding to the fourth character data as the third character data
A method of extracting similar characters using an artificial intelligence learning model, including.

According to claim 4 or 5,
The step of providing the third character data is,
generating a grid-shaped letter list using fragment images corresponding to the third letter data; and
Providing the first character data and the character list to the user terminal
A method of extracting similar characters using an artificial intelligence learning model, including.

According to clause 6,
After providing the third character data,
Receiving the user's selection of a fifth piece image that is one of the letter list from the user terminal; and
Displaying a block of a preset size in an area corresponding to the fifth fragment image within a page including the fifth fragment image among the document images.
A method of extracting similar characters using an artificial intelligence learning model, including.

According to clause 7,
The step of receiving the first character data is,
Receiving handwriting directly input by the user through the user terminal; and
extracting the first character data from the cursive
A method of extracting similar characters using an artificial intelligence learning model, including.

According to clause 8,
The step of receiving the first character data is,
Receiving the user's selection of a sixth fragment image that is one of the first fragment images corresponding to the second character data; and
Receiving text corresponding to the labeling of the sixth fragment image as the first character data
A method of extracting similar characters using an artificial intelligence learning model, including.

a memory in which at least one program is recorded; and
Processor that executes the program
Including,
The above program is
Receiving first character data that is to be confirmed from the user;
extracting third character data from among second character data corresponding to all characters included in a plurality of target documents based on a preset similar character judgment condition according to the format of the first character data; and
Providing the third character data to the user terminal
Contains instructions to perform,
The second character data is,
It includes extracting a first fragment image for an area where each of all letters included in the plurality of target documents is located using a first artificial intelligence learning model,
It further includes a label value labeling letters corresponding to each of the first fragment images using a second artificial intelligence learning model,
The first artificial intelligence learning model is,
It is an artificial intelligence learning model that extracts a fragment image for an area where letters are expected to exist within the document image corresponding to the plurality of target documents,
The second artificial intelligence learning model is,
A similar character extraction device using an artificial intelligence learning model, which is an artificial intelligence learning model that labels letters corresponding to the fragment image extracted by the first artificial intelligence learning model.