KR20180122829A

KR20180122829A - Method, apparatus, and computer program for selecting music based on image

Info

Publication number: KR20180122829A
Application number: KR1020170056908A
Authority: KR
Inventors: 하정우; 김정명; 김정희; 조재윤; 정채원
Original assignee: 네이버 주식회사
Priority date: 2017-05-04
Filing date: 2017-05-04
Publication date: 2018-11-14
Also published as: KR102011099B1

Abstract

The present invention discloses a method, a device and a computer program, which recommend a sound source corresponding to an image to a user based on a result learning a relationship between the image and the sound source by machine learning. According to an embodiment of the present invention, the method for selecting a sound source based on an image comprises the following steps of: receiving the image from a user terminal, and extracting a feature vector of the image; and calculating a similarity between at least a part of the image feature vector and a sound source vector by referring to a map mapping the sound source vector with respect to each of a plurality of sound sources, and selecting the sound source corresponding to the image feature vector based on the similarity.

Description

TECHNICAL FIELD [0001] The present invention relates to an image-based sound source selecting method, a device, and a computer program,

본 발명의 실시예들은 이미지 기반 음원 선택 방법, 장치 및 컴퓨터 프로그램에 관한 것이다.Embodiments of the present invention are directed to an image-based sound source selection method, apparatus, and computer program.

미디어의 발달과 인터넷의 보급으로, 우리는 언제 어디서나 쉽게 음악을 접할 수 있다. 우리는 개인의 취향에 따라, 음악의 분류에 따라, 또는 날씨나 상황에 따라 원하는 음악을 선택하여 듣곤 한다. 그러나 이와 같이 사용자가 직접 음악을 선택하는 경우, 사용자가 직접 아는 음악에 선택 범위가 한정된다.With the development of the media and the spread of the Internet, we can easily access music anywhere, anytime. We listen to our music according to the individual's taste, according to the classification of the music, or according to the weather or situation. However, when the user directly selects music, the selection range is limited to the music that the user directly knows.

본 발명의 일 실시예는 이미지 기반 음원 선택 방법, 장치 및 컴퓨터 프로그램을 제공한다. 본 발명의 일 실시예는 사용자에 의해 선택되는 이미지에 대응되는 음원을 선택하여 사용자에게 추천해주는 방법 장치 및 컴퓨터 프로그램을 제공한다. 본 발명의 일 실시예는 머신러닝(machine learning)으로 이미지와 음원 간의 관계를 학습한 결과를 바탕으로, 이미지에 대응되는 음원을 사용자에게 추천해주는 방법 장치 및 컴퓨터 프로그램을 제공한다.One embodiment of the present invention provides an image-based sound source selection method, apparatus, and computer program. One embodiment of the present invention provides a method apparatus and a computer program for selecting a sound source corresponding to an image selected by a user and recommending the sound source to a user. An embodiment of the present invention provides a method apparatus and a computer program for recommending a sound source corresponding to an image to a user based on a result of learning a relation between an image and a sound source by machine learning.

본 발명의 일 실시예는 사용자 단말로부터 이미지를 입력 받고, 상기 이미지의 특징벡터를 추출하는 단계; 및 복수의 음원 각각에 대한 음원 벡터를 맵핑한 맵을 참조하여, 상기 이미지 특징벡터와 상기 음원 벡터 중 적어도 일부 간의 유사도를 산출하고, 상기 유사도에 기초하여 상기 이미지 특징 벡터에 대응되는 음원을 선택하는 음원 선택 단계;를 포함하는 이미지 기반 음원 선택 방법을 개시한다.An embodiment of the present invention includes: receiving an image from a user terminal and extracting a feature vector of the image; Calculating a similarity degree between at least a part of the image feature vector and the sound source vector by referring to a map in which a sound source vector is mapped to each of the plurality of sound sources, and selecting a sound source corresponding to the image feature vector based on the similarity A sound source selection step is disclosed.

본 실시예에서 상기 맵은 텍스트 벡터와 음원 벡터가 맵핑된 제2 맵이고, 상기 음원 선택 단계는, 복수의 텍스트 각각에 대하여 기설정된 텍스트 벡터와 상기 이미지의 특징 벡터 간의 제1 유사도에 기초하여 하나 이상의 텍스트를 선택하는 제1 선택 단계; 및 상기 제2 맵을 참조하여 상기 선택된 텍스트에 대응되는 음원을 선택하는 제2 선택 단계;를 포함할 수 있다.In the present embodiment, the map is a second map in which a text vector and a sound source vector are mapped, and the sound source selecting step selects one of the plurality of texts based on a first similarity degree between a predetermined text vector and a feature vector of the image, A first selection step of selecting the above text; And a second selection step of selecting a sound source corresponding to the selected text with reference to the second map.

본 실시예에서 상기 제2 선택 단계는, 상기 선택된 텍스트에 대응되는 텍스트 벡터와 상기 음원 벡터 중 적어도 일부 간의 제2 유사도에 기초하여 상기 음원을 선택할 수 있다.In the present embodiment, the second selection step may select the sound source based on a second similarity degree between the text vector corresponding to the selected text and at least a part of the sound source vector.

본 실시예에서 상기 제1 선택 단계는, 상기 제1 유사도에 기초하여 하나 이상의 텍스트 를 선택하고, 상기 제2 선택 단계는, 상기 선택된 텍스트와 상기 음원 벡터 중 적어도 일부 간의 제2 유사도에 기초하여 하나 이상의 음원을 선택하고, 상기 음원 선택 단계는, 상기 제1 유사도 및 상기 제2 유사도를 기준으로 결정되는 상기 음원의 순위에 따라 상기 음원을 선택할 수 있다.In the present embodiment, the first selection step selects one or more texts based on the first similarity, and the second selection step selects one or more texts based on the second similarity degree between the selected text and at least a part of the tone vector The sound source selecting step may select the sound source according to the order of the sound source determined based on the first similarity degree and the second similarity degree.

본 실시예에서 상기 맵은 복수의 이미지 특징 벡터와 복수의 텍스트 벡터가 맵핑된 제1 맵을 및 상기 제2 맵을 포함하고, 상기 제1 맵에 맵핑된 텍스트 벡터 각각에 대응되는 텍스트는 복수의 단어를 나열한 시퀀스를 포함하고, 상기 제2 맵에 맵핑된 텍스트 벡터 각각에 대응되는 텍스트는 하나의 단어를 포함하고, 상기 제1 선택 단계는 복수의 이미지 특징 벡터와 복수의 텍스트 벡터가 맵핑된 제1 맵 상에서 상기 이미지의 특징 벡터와 상기 복수의 텍스트 벡터 중 적어도 일부 간의 제1 유사도에 기초하여 상기 하나 이상의 텍스트를 선택하고, 상기 제2 선택 단계는, 상기 제1 선택 단계에서 선택된 하나 이상의 텍스트에 포함된 단어 각각에 대응되는 상기 제2 맵 상의 텍스트 벡터와 상기 음원 벡터 중 적어도 일부 간의 제2 유사도에 기초하여 상기 음원을 선택할 수 있다.In the present embodiment, the map includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped and the second map, and text corresponding to each of the text vectors mapped in the first map includes a plurality of Wherein the text corresponding to each of the text vectors mapped to the second map includes one word and the first selection step includes a step of mapping a plurality of image feature vectors and a plurality of text vectors Wherein the selecting step selects the one or more texts based on a first similarity degree between a feature vector of the image and at least a part of the plurality of text vectors on one map, Based on a second similarity degree between at least a part of the tone vector and a text vector on the second map corresponding to each of the included words, You can choose.

본 실시예에서 상기 음원 선택 단계는, 상기 제1 유사도, 상기 제2 유사도 및 상기 제1 선택 단계에서 선택된 하나 이상의 텍스트에 포함된 단어 각각에 대응되는 스코어를 고려하여 상기 음원을 선택할 수 있다.In the present embodiment, the sound source selection step may select the sound source in consideration of the score corresponding to each of words included in one or more texts selected in the first similarity degree, the second similarity degree, and the first selection step.

본 실시예에서 상기 스코어는 상기 제1 맵에 맵핑된 복수의 텍스트 벡터에 대응되는 복수의 텍스트를 기준으로 하는 상기 단어 각각의 빈도에 따라 기 부여된 기본 스코어, 및 상기 선택된 하나 이상의 텍스트를 기준으로 하는 상기 단어 각각의 빈도에 따라 산출되는 추가 스코어 중 하나 이상에 기초하여 산출될 수 있다.In this embodiment, the score includes a basic score preliminarily assigned according to the frequency of each of the words based on a plurality of texts corresponding to the plurality of text vectors mapped to the first map, Quot; is calculated based on at least one of the additional scores calculated according to the frequency of each of the words.

본 실시예에서 상기 복수의 음원 각각에 대하여 지정된 해시태그, 가사, 및 사용자에 의해 입력된 텍스트 중 적어도 하나에 포함된 단어를 참조하여, 상기 복수의 음원 각각에 대응되는 음원 벡터 및 상기 단어에 대응되는 텍스트 벡터를 포함하는 제2 맵을 생성하는 단계;를 더 포함하고, 상기 음원 선택 단계는, 복수의 이미지 특징 벡터 및 복수의 텍스트 벡터가 맵핑된 제1 맵 및 상기 제2 맵을 참조하여 음원을 선택할 수 있다.In the present embodiment, referring to the words included in at least one of the hashtag, the lyrics, and the text input by the user for each of the plurality of sound sources, the sound source vector corresponding to each of the plurality of sound sources and the corresponding And generating a second map including a text vector that is a source of the text vector, wherein the sound source selection step includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped, Can be selected.

본 실시예에서 상기 맵은 복수의 음원 벡터가 맵핑된 제3 맵이고, 상기 음원 선택 단계는, 상기 추출된 이미지 특징 벡터와 상기 복수의 음원 벡터 중 적어도 일부 간의 상기 제3 맵 상에서의 유사도에 기초하여 상기 추출된 이미지 특징 벡터에 대응되는 음원을 선택할 수 있다.In the present embodiment, the map is a third map to which a plurality of sound source vectors are mapped, and the sound source selecting step is based on the similarity between the extracted image feature vector and at least a part of the plurality of sound source vectors on the third map And may select a sound source corresponding to the extracted image feature vector.

본 실시예에서 복수의 이미지 및 상기 복수의 이미지 각각에 대하여 지정된 텍스트를 포함하는 이미지-텍스트 데이터 및 상기 복수의 음원 각각에 대하여 지정된 해시태그, 가사, 및 사용자에 의해 입력된 텍스트 중 적어도 하나에 포함된 단어를 참조하여, 상기 복수의 이미지 각각에 대한 특징 벡터와 상기 복수의 음원 각각에 대한 음원 벡터를 포함하는 제3 맵을 생성하는 단계;를 더 포함하고, 상기 음원 선택 단계는, 상기 생성된 제3 맵을 참조하여 음원을 선택할 수 있다.In this embodiment, image-text data including a plurality of images and text specified for each of the plurality of images, and text included in at least one of hashtag, lyrics, and user-entered text designated for each of the plurality of sound sources And generating a third map including a feature vector for each of the plurality of images and a sound source vector for each of the plurality of sound sources with reference to the generated word, The sound source can be selected by referring to the third map.

본 실시예에서 상기 유사도에 기초하여 상기 선택된 음원을 나열하는 재생 목록을 제공하는 단계;를 더 포함할 수 있다.In the present embodiment, the method may further include providing a playlist that lists the selected sound sources based on the similarity.

본 발명의 다른 실시예는 사용자 단말로부터 이미지를 입력 받고, 상기 이미지의 특징벡터를 추출하는 이미지 특징값 추출부; 및 복수의 음원 각각에 대한 음원 벡터를 맵핑한 맵을 참조하여, 상기 이미지 특징 벡터와 상기 음원 벡터 중 적어도 일부 간의 유사도를 산출하고, 상기 유사도에 기초하여 상기 이미지 특징 벡터에 대응되는 음원을 선택하는 음원 선택부;를 포함하는 이미지 기반 음원 선택 장치를 개시한다.According to another aspect of the present invention, there is provided an image processing apparatus including an image feature value extracting unit that receives an image from a user terminal and extracts a feature vector of the image; Calculating a similarity degree between at least a part of the image feature vector and the sound source vector by referring to a map in which a sound source vector is mapped to each of the plurality of sound sources, and selecting a sound source corresponding to the image feature vector based on the similarity And a sound source selection unit.

본 실시예에서 상기 맵은 텍스트 벡터와 음원 벡터가 맵핑된 제2 맵이고, 상기 음원 선택부는, 복수의 텍스트 각각에 대하여 기설정된 텍스트 벡터와 상기 이미지의 특징 벡터 간의 제1 유사도에 기초하여 하나 이상의 텍스트를 선택하는 제1 유사도 산출부; 및 상기 제2 맵을 참조하여 상기 선택된 텍스트에 대응되는 음원을 선택하는 제2 유사도 산출부;를 포함할 수 있다.In the present embodiment, the map is a second map in which a text vector and a sound source vector are mapped, and the sound source selection unit selects one of the plurality of texts based on a first similarity degree between a predetermined text vector and a feature vector of the image, A first similarity degree calculating unit for selecting a text; And a second similarity calculating unit for referring to the second map and selecting a sound source corresponding to the selected text.

본 실시예에서 상기 제2 유사도 산출부는, 상기 선택된 단어텍스트에 대응되는 단어텍스트 벡터와 상기 음원 벡터 중 적어도 일부 간의 제2 유사도에 기초하여 상기 음원을 선택할 수 있다.In the present embodiment, the second similarity degree calculating unit may select the sound source based on a second similarity degree between at least a part of the word vector and the word text vector corresponding to the selected word text.

본 실시예에서 상기 맵은 복수의 이미지 특징 벡터와 복수의 텍스트 벡터가 맵핑된 제1 맵을 및 상기 제2 맵을 포함하고, 상기 제1 맵에 맵핑된 텍스트 벡터 각각에 대응되는 텍스트는 복수의 단어를 나열한 시퀀스를 포함하고, 상기 제2 맵에 맵핑된 텍스트 벡터 각각에 대응되는 텍스트는 하나의 단어를 포함하고, 상기 제1 유사도 산출부는 복수의 이미지 특징 벡터와 복수의 텍스트 벡터가 맵핑된 제1 맵 상에서 상기 이미지의 특징 벡터와 상기 복수의 텍스트 벡터 중 적어도 일부 간의 제1 유사도에 기초하여 상기 하나 이상의 텍스트를 선택하고, 상기 제2 유사도 산출부는, 상기 제1 선택 단계에서 선택된 하나 이상의 텍스트에 포함된 단어 각각에 대응되는 상기 제2 맵 상의 텍스트 벡터와 상기 음원 벡터 중 적어도 일부 간의 제2 유사도에 기초하여 상기 음원을 선택할 수 있다.In the present embodiment, the map includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped and the second map, and text corresponding to each of the text vectors mapped in the first map includes a plurality of Wherein the text corresponding to each of the text vectors mapped to the second map includes one word, and the first similarity calculating unit includes a plurality of image feature vectors and a plurality of text vectors Wherein the first similarity calculating unit selects one or more texts based on a first similarity degree between a feature vector of the image and at least a part of the plurality of text vectors on one map, Based on a second similarity degree between at least a part of the tone vector and a text vector on the second map corresponding to each of the included words, You can select a sound group.

본 실시예에서 상기 복수의 음원 각각에 대하여 지정된 해시태그, 가사, 및 사용자에 의해 입력된 텍스트 중 적어도 하나에 포함된 단어를 참조하여, 상기 복수의 음원 각각에 대응되는 음원 벡터 및 상기 단어에 대응되는 텍스트 벡터를 포함하는 제2 맵을 생성하는 맵 구축부;를 더 포함하고, 상기 음원 선택부는, 복수의 이미지 특징 벡터 및 복수의 단어 벡터가 맵핑된 제1 맵 및 상기 제2 맵을 참조하여 음원을 선택할 수 있다.In the present embodiment, referring to the words included in at least one of the hashtag, the lyrics, and the text input by the user for each of the plurality of sound sources, the sound source vector corresponding to each of the plurality of sound sources and the corresponding And a map construction unit configured to generate a second map including a text vector to be displayed on the display unit, wherein the sound source selection unit refers to a first map in which a plurality of image feature vectors and a plurality of word vectors are mapped, You can select a sound source.

본 실시예에서 상기 맵은 복수의 음원 벡터가 맵핑된 제3 맵이고, 상기 음원 선택부는, 상기 추출된 이미지 특징 벡터와 상기 복수의 음원 벡터 중 적어도 일부 간의 상기 제3 맵 상에서의 유사도에 기초하여 상기 추출된 이미지 특징 벡터에 대응되는 음원을 선택하는 제3 유사도 산출부;를 더 포함할 수 있다.In the present embodiment, the map is a third map in which a plurality of sound source vectors are mapped, and the sound source selection unit selects a sound source vector based on the extracted image feature vector and at least a part of the plurality of sound source vectors, And a third similarity calculation unit for selecting a sound source corresponding to the extracted image feature vector.

본 실시예에서 복수의 이미지 및 상기 복수의 이미지 각각에 대하여 지정된 텍스트를 포함하는 이미지-텍스트 데이터 및 상기 복수의 음원 각각에 대하여 지정된 해시태그, 가사, 및 사용자에 의해 입력된 텍스트 중 적어도 하나에 포함된 단어를 참조하여, 상기 복수의 이미지 각각에 대한 특징 벡터와 상기 복수의 음원 각각에 대한 음원 벡터를 포함하는 제3 맵을 생성하는 제3 맵 구축부;를 더 포함하고, 상기 음원 선택부는, 상기 생성된 제3 맵을 참조하여 음원을 선택할 수 있다.In this embodiment, image-text data including a plurality of images and text specified for each of the plurality of images, and text included in at least one of hashtag, lyrics, and user-entered text designated for each of the plurality of sound sources And a third map constructing unit for generating a third map including a feature vector for each of the plurality of images and a sound source vector for each of the plurality of sound sources, And the sound source can be selected by referring to the generated third map.

본 실시예에서 상기 음원 선택부는 상기 유사도에 기초하여 상기 선택된 음원을 나열하는 재생 목록을 제공할 수 있다.In the present embodiment, the sound source selection unit may provide a play list that lists the selected sound sources based on the similarity.

본 발명의 다른 실시예는 전술한 방법을 실행하기 위하여 매체에 저장된 컴퓨터 프로그램을 개시한다.Another embodiment of the present invention discloses a computer program stored on a medium for performing the above-described method.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다. Other aspects, features, and advantages will become apparent from the following drawings, claims, and detailed description of the invention.

이러한 일반적이고 구체적인 측면이 시스템, 방법, 컴퓨터 프로그램, 또는 어떠한 시스템, 방법, 컴퓨터 프로그램의 조합을 사용하여 실시될 수 있다.These general and specific aspects may be implemented by using a system, method, computer program, or any combination of systems, methods, and computer programs.

본 발명의 실시예들에 관한 이미지 기반 음원 선택 방법, 장치 및 컴퓨터 프로그램은, 사용자에 의해 선택되는 이미지에 대응되는 음원을 선택하여 사용자에게 추천함으로써, 사용자의 편의에 기여한다.The image-based sound source selecting method, apparatus, and computer program according to embodiments of the present invention contribute to the convenience of the user by selecting a sound source corresponding to the image selected by the user and recommending the sound source to the user.

본 발명의 실시예들에 관한 이미지 기반 음원 선택 방법, 장치 및 컴퓨터 프로그램은, 이미지에 대응되는 음원을 선택함으로써, 이미지와 음원을 연계하는 서비스에 기여한다.The image-based sound source selection method, apparatus, and computer program according to embodiments of the present invention contribute to a service for linking an image and a sound source by selecting a sound source corresponding to the image.

도 1은 본 발명의 일 실시예에 따른 음원 추천 시스템을 개략적으로 도시한 것이다.
도 2는 도 1에 도시된 서버에 구비되는 음원 추천 장치를 개략적으로 도시한 블록도이다.
도 3은 도 2에 도시된 음원 추천 장치에 의해 처리되는 본 발명의 일 실시예에 따른 음원 추천 방법을 도시한 흐름도이다.
도 4는 도 2에 도시된 음원 추천 장치의 다른 예를 도시한 블록도이다.
도 5은 도 4에 도시된 음원 추천 장치에 의해 처리되는 본 발명의 일 실시예에 따른 음원 추천 방법을 도시한 흐름도이다.
도 6는 도 2에 도시된 이미지 특징값 추출부를 개략적으로 도시한 블록도의 일 예이다.
도 7는 도 6에 도시된 이미지 특징값 추출부에 의해 처리되는 본 발명의 일 실시예에 따른 이미지 특징값 추출 방법을 도시한 흐름도이다.
도 8은 도 4에 도시된 맵 구축부의 제1 실시예를 개략적으로 도시한 블록도의 일 예이다.
도 9은 도 8에 도시된 맵 구축부에 의해 처리되는 본 발명의 일 실시예에 따른 맵 구축 방법을 도시한 흐름도이며, 도 5에 도시된 단계 S30의 상세 흐름도이다.
도 10는 도 8에 도시된 제2 맵 구축부를 개략적으로 도시한 블록도의 예이다.
도 11은 도 10에 도시된 제2 맵 구축부에 의해 처리되는 본 발명의 일 실시예에 따른 제2 맵 구축 방법을 도시한 흐름도이다.
도 12은 도 4에 도시된 맵 구축부의 제2 실시예를 개략적으로 도시한 블록도의 예이다.
도 13는 도 12에 도시된 맵 구축부에 의해 처리되는 본 발명의 일 실시예에 따른 맵 구축 방법을 도시한 흐름도이며, 도 5에 도시된 단계 S30의 상세 흐름도이다.
도 14는 도 8 및 도 12에 도시된 제1 맵 구축부를 개략적으로 도시한 블록도이다.
도 15는 도 14에 도시된 제1 맵 구축부에 의해 처리되는 본 발명의 일 실시예에 따른 제1 맵 구축 방법을 도시한 흐름도이다.
도 16은 도 2 및 도 4의 음원 선택부의 본 발명의 제1 실시예에 따른 구성을 개략적으로 도시한 블록도이다.
도 17은 도 16에 도시된 음원 선택부에 의해 처리되는 본 발명의 일 실시예에 따른 음원 선택 방법을 도시한 흐름도이다.
도 18은 도 2 및 도 4의 음원 선택부의 본 발명의 제2 실시예에 따른 구성을 개략적으로 도시한 블록도이다.
도 19는 도 18에 도시된 음원 선택부에 의해 처리되는 본 발명의 일 실시예에 따른 음원 선택 방법을 도시한 흐름도이다.
도 20은 제1 맵의 예를 도시한 것이다.
도 21은 제2 맵의 예를 도시한 것이다.
도 22는 제3 맵의 예를 도시한 것이다. FIG. 1 schematically shows a sound source recommendation system according to an embodiment of the present invention.
2 is a block diagram schematically illustrating a sound source recommendation apparatus provided in the server shown in FIG.
3 is a flowchart illustrating a sound source recommendation method according to an embodiment of the present invention, which is processed by the sound source recommendation apparatus shown in FIG.
4 is a block diagram showing another example of the sound source recommendation apparatus shown in FIG.
FIG. 5 is a flowchart illustrating a sound source recommendation method according to an embodiment of the present invention, which is processed by the sound source recommendation apparatus shown in FIG.
FIG. 6 is an example of a block diagram schematically illustrating the image feature value extracting unit shown in FIG. 2. Referring to FIG.
FIG. 7 is a flowchart illustrating an image feature value extraction method according to an embodiment of the present invention, which is performed by the image feature value extraction unit shown in FIG.
8 is an example of a block diagram schematically showing a first embodiment of the map construction unit shown in FIG.
FIG. 9 is a flowchart illustrating a map construction method according to an embodiment of the present invention, which is performed by the map construction unit shown in FIG. 8, and is a detailed flowchart of step S30 shown in FIG.
10 is an example of a block diagram schematically showing the second map construction unit shown in FIG.
FIG. 11 is a flowchart illustrating a second map construction method according to an embodiment of the present invention, which is processed by the second map construction unit shown in FIG.
12 is an example of a block diagram schematically showing a second embodiment of the map construction unit shown in Fig.
FIG. 13 is a flowchart showing a map construction method according to an embodiment of the present invention, which is processed by the map construction unit shown in FIG. 12, and is a detailed flowchart of step S30 shown in FIG.
FIG. 14 is a block diagram schematically showing the first map construction unit shown in FIGS. 8 and 12. FIG.
FIG. 15 is a flowchart illustrating a first map construction method according to an embodiment of the present invention, which is processed by the first map construction unit shown in FIG.
FIG. 16 is a block diagram schematically showing the configuration of the sound source selection unit of FIG. 2 and FIG. 4 according to the first embodiment of the present invention.
FIG. 17 is a flowchart illustrating a sound source selection method according to an embodiment of the present invention, which is processed by the sound source selection unit shown in FIG.
FIG. 18 is a block diagram schematically illustrating the configuration of the sound source selection unit of FIG. 2 and FIG. 4 according to the second embodiment of the present invention.
FIG. 19 is a flowchart illustrating a sound source selection method according to an embodiment of the present invention, which is processed by the sound source selection unit shown in FIG.
20 shows an example of the first map.
Fig. 21 shows an example of the second map.
Fig. 22 shows an example of the third map.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. The effects and features of the present invention and methods of achieving them will be apparent with reference to the embodiments described in detail below with reference to the drawings. However, the present invention is not limited to the embodiments described below, but may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals refer to like or corresponding components throughout the drawings, and a duplicate description thereof will be omitted .

이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.In the following embodiments, the terms first, second, and the like are used for the purpose of distinguishing one element from another element, not the limitative meaning. In the following examples, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. In the following embodiments, terms such as inclusive or possessive are intended to mean that a feature, or element, described in the specification is present, and does not preclude the possibility that one or more other features or elements may be added. In the drawings, components may be exaggerated or reduced in size for convenience of explanation. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, and thus the present invention is not necessarily limited to those shown in the drawings.

도 1은 본 발명의 일 실시예에 따른 음원 추천 시스템을 개략적으로 도시한 것이다.FIG. 1 schematically shows a sound source recommendation system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 음원 추천 시스템은 서버(100), 사용자 단말(200) 및 이들을 연결하는 네트워크(300)를 포함한다. Referring to FIG. 1, a sound source recommendation system according to an embodiment of the present invention includes a server 100, a user terminal 200, and a network 300 connecting them.

본 발명의 일 실시예에 따른 음원 추천 시스템은, 음원 추천 서비스를 제공한다. 상세히, 본 발명의 일 실시예에 따라 제공되는 음원 추천 서비스는, 이미지를 기반으로 이미지에 어울리는 음원을 선택하는 서비스이다. 본 발명의 일 실시예에 따른 음원 추천 시스템은, 머신 러닝 기법으로 이미지와 음원 간의 맵핑 정보를 구축하고, 구축된 데이터베이스를 이용하여 이미지에 맵핑되는 음원을 추천한다. 음원 추천 서비스는 사용자 단말(200)에 의해 선택된 이미지를 기반으로 이에 어울리는 음원을 선택하여 사용자 단말(200)에 추천할 수도 있고, 특정 이미지를 기반으로 특정 이미지에 어울리는 음원을 선택하여, 특정 이미지 및 이에 어울리는 음원을 함께 제공할 수도 있다. The sound source recommendation system according to an embodiment of the present invention provides a sound source recommendation service. In detail, the sound source recommendation service provided according to an embodiment of the present invention is a service for selecting a sound source suitable for an image based on an image. The sound source recommendation system according to an embodiment of the present invention constructs mapping information between an image and a sound source using a machine learning technique and recommends a sound source that is mapped to an image using the constructed database. The sound source recommendation service may select a sound source suitable for the selected image based on the image selected by the user terminal 200 and recommend it to the user terminal 200 or may select a sound source suitable for a specific image based on the specific image, You can also provide a sound source that suits you.

한편, 음원 추천 시스템의 변형 예에 따르면, 음원 추천 시스템은 음원을 기반으로 음원에 어울리는 이미지를 선택하는 이미지 추천 서비스를 제공한다. 본 발명의 일 실시예에 따른 음원 추천 시스템의 변형 예는 머신 러닝 기법으로 이미지와 음원 간의 맵핑 정보를 구축하고, 구축된 데이터베이스를 이용하여 음원에 맵핑되는 이미지를 추천한다. 음원 추천 시스템의 변형예는 사용자 단말(200)에 의해 선택된 음원을 기반으로 이에 어울리는 이미지를 선택할 수도 있고, 특정 음원을 기반으로 특정 음원에 어울리는 이미지를 선택하여, 특정 음원 및 이에 어울리는 이미지를 함께 제공할 수도 있다. On the other hand, according to a modified example of the sound source recommendation system, the sound source recommendation system provides an image recommendation service for selecting an image suitable for a sound source based on a sound source. A modified example of the sound source recommendation system according to an embodiment of the present invention constructs mapping information between an image and a sound source using a machine learning technique and recommends an image mapped to the sound source using the constructed database. In a variation of the sound source recommendation system, an image matching the selected sound source may be selected based on the sound source selected by the user terminal 200, an image matching a specific sound source may be selected based on the specific sound source, You may.

이하에서는 음원 추천 시스템이 이미지를 기반으로 이미지에 맵핑되는 음원을 선택하는 다양한 실시예에 대하여 설명할 것이지만, 이하에서 설명되는 실시예들은 음원 추천 시스템의 변형예에 따라 음원을 기반으로 음원에 맵핑되는 이미지를 선택하는 실시예에도 동일하게 적용 가능하다.Hereinafter, various embodiments will be described in which a sound source recommendation system selects a sound source to which an image is mapped based on an image. However, the embodiments described below may be applied to a sound source based on a sound source The present invention is equally applicable to an embodiment for selecting an image.

본 발명의 일 실시예에 따른 음원 추천 시스템은, 머신 러닝 기법으로 이미지와 텍스트 간의 제1 맵핑 정보를 구축하고, 음원에 대하여 사용자들이 지정하는 해시태그, 음원의 가사, 또는 사용자들이 음원을 SNS에 공유하면서 입력하는 텍스트 등의 정보에 기초하여 음원과 텍스트 간의 제2 맵핑 정보를 구축하고, 제1 맵핑 정보 및 제2 맵핑 정보에 따라 이미지에 대응되는 음원을 선별 및 추천할 수 있다. The sound source recommendation system according to an embodiment of the present invention constructs first mapping information between an image and text using a machine learning technique, and generates a first mapping information between a hash tag and a lyrics of a sound source, The second mapping information between the sound source and the text is constructed on the basis of information such as text to be input while sharing and the sound source corresponding to the image can be selected and recommended according to the first mapping information and the second mapping information.

본 발명의 일 실시예에 따른 음원 추천 시스템은, 전술한 제1 맵핑 정보 및 제2 맵핑 정보를 하나의 제3 맵핑 정보로 병합(merging)하고, 제3 맵핑 정보에 따라 이미지에 대응되는 음원을 선별 및 추천할 수 있다. The sound source recommendation system according to an embodiment of the present invention merges the first mapping information and the second mapping information into one third mapping information and outputs a sound source corresponding to the image according to the third mapping information Screening and recommendation.

본 발명의 일 실시예에 따른 서버(100)는 선별된 음원을 포함하는 재생목록을 사용자 단말(200)에 제공한다. 본 발명의 일 실시예에 따른 서버(100)는 재생목록에 포함된 음원들을, 이미지와 각 음원의 유사도에 따라 정렬하여 제공할 수 있다.The server 100 according to an exemplary embodiment of the present invention provides a play list including a selected sound source to the user terminal 200. The server 100 according to an exemplary embodiment of the present invention may arrange the sound sources included in the play list according to the degree of similarity between images and sound sources.

도 1을 참조하면, 사용자 단말(200)은 유무선 통신 환경에서 웹 서비스를 이용할 수 있는 통신 단말을 의미한다. 사용자 단말(200)은 카메라 유닛을 포함할 수 있고, 터치 스크린을 구비할 수 있다. 여기서 사용자 단말(200)은 사용자의 퍼스널 컴퓨터(201)일 수도 있고, 또는 사용자의 휴대용 단말(202)일 수도 있다. 도 1에서는 휴대용 단말(202)이 스마트폰(smart phone)으로 도시되었지만, 본 발명의 사상은 이에 제한되지 아니하며, 본 발명의 일 실시예에 의해 제공되는 음원 추천 서비스에 접속 가능한 애플리케이션의 탑재가 가능한 단말은 제한 없이 차용될 수 있다.Referring to FIG. 1, the user terminal 200 refers to a communication terminal capable of using a web service in a wired / wireless communication environment. The user terminal 200 may include a camera unit and may have a touch screen. Here, the user terminal 200 may be a personal computer 201 of a user or a portable terminal 202 of a user. 1, the portable terminal 202 is illustrated as a smart phone, but the spirit of the present invention is not limited thereto, and it is possible to install an application connectable to a music recommendation service provided by an embodiment of the present invention A terminal may be borrowed without limitation.

한편, 사용자 단말(200)은 화면을 표시하는 표시부 및 사용자로부터 데이터를 입력받는 입력 장치를 더 구비한다. 입력 장치는 예를 들어, 키보드, 마우스, 트랙볼, 마이크, 버튼, 터치패널 등을 포함할 수 있으나, 이에 한정하지 않는다.On the other hand, the user terminal 200 further includes a display unit for displaying a screen and an input device for receiving data from the user. The input device may include, but is not limited to, a keyboard, a mouse, a trackball, a microphone, a button, a touch panel, and the like.

네트워크(300)는 사용자 단말(200)과 서버(100)를 연결하는 역할을 수행한다. 예를 들어, 네트워크(300)는 사용자 단말(200)이 서버(100)에 접속한 후 패킷 데이터를 송수신할 수 있도록 접속 경로를 제공한다.The network 300 plays a role of connecting the user terminal 200 and the server 100. For example, the network 300 provides a connection path so that the user terminal 200 can transmit and receive packet data after connecting to the server 100.

도면에는 도시되지 않았으나, 본 발명의 일 실시예에 따른 서버(100)는 메모리, 입/출력부, 프로그램 저장부, 제어부 등을 포함할 수 있다.Although not shown in the figure, the server 100 according to an embodiment of the present invention may include a memory, an input / output unit, a program storage unit, a control unit, and the like.

도 2는 도 1에 도시된 서버(100)에 구비되는 음원 추천 장치(110)를 개략적으로 도시한 블록도이다.FIG. 2 is a block diagram schematically showing a sound source recommendation apparatus 110 provided in the server 100 shown in FIG.

본 발명의 일 실시예에 따른 음원 추천 장치(110)는 적어도 하나 이상의 프로세서(processor)에 해당하거나, 적어도 하나 이상의 프로세서를 포함할 수 있다. 이에 따라, 음원 추천 장치(110)는 마이크로 프로세서나 범용 컴퓨터 시스템과 같은 다른 하드웨어 장치에 포함된 형태로 구동될 수 있다. 음원 추천 장치(110)는 도 1에 도시된 서버(100)에 탑재될 수 있다.The sound source recommending apparatus 110 according to an embodiment of the present invention may correspond to at least one processor or may include at least one or more processors. Accordingly, the sound source recommendation apparatus 110 can be driven in the form incorporated in other hardware devices such as a microprocessor or a general purpose computer system. The sound source recommendation apparatus 110 may be mounted on the server 100 shown in FIG.

도 2에 도시된 음원 추천 장치(110)는 본 실시예의 특징이 흐려지는 것을 방지하기 위하여 본 실시예와 관련된 구성요소들만을 도시한 것이다. 따라서, 도 2에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 본 실시예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다.The sound source recommendation apparatus 110 shown in FIG. 2 shows only the components related to the present embodiment in order to prevent the features of the present embodiment from being blurred. Accordingly, it will be understood by those skilled in the art that other general-purpose components other than the components shown in FIG. 2 may be further included.

한편, 도 3은 도 2에 도시된 음원 추천 장치(110)에 의해 처리되는 본 발명의 일 실시예에 따른 음원 추천 방법을 도시한 흐름도이다. 이하에서는 도 2 및 도 3을 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 3 is a flowchart illustrating a sound source recommendation method according to an embodiment of the present invention, which is performed by the sound source recommendation apparatus 110 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIG. 2 and FIG.

도 2를 참조하면, 본 발명의 일 실시예에 따른 음원 추천 장치(110)는 이미지 특징값 추출부(111), 음원 선택부(112) 및 스토리지(113)를 포함한다. Referring to FIG. 2, the sound source recommending apparatus 110 according to an embodiment of the present invention includes an image feature value extracting unit 111, a sound source selecting unit 112, and a storage 113.

단계 S31에서 일 실시예에 따른 이미지 특징값 추출부(111)는 이미지를 입력받고, 입력받은 이미지의 특징벡터를 추출한다. 이미지는 사용자 단말(200)로부터 입력될 수 있으나 이에 한정하지 않으며, 서버(100)에 의해 자체적으로 선정되어 입력될 수도 있다.In step S31, the image feature value extracting unit 111 according to an embodiment receives an image and extracts a feature vector of the input image. The image may be input from the user terminal 200, but not limited thereto, and may be selected and input by the server 100 itself.

단계 S32에서 일 실시예에 따른 음원 선택부(112)는, 단계 S31에서 추출된 이미지의 특징벡터에 대응되는 음원을 선택한다. 일 실시예에 따른 음원 선택부(112)는 복수의 음원 각각에 대한 음원 벡터가 맵핑된 맵을 참조하여, 이미지 특징 벡터와 복수의 음원 벡터 중 적어도 일부 간의 유사도를 산출하고, 산출된 유사도에 기초하여 이미지 특징 벡터에 대응되는 음원을 선택한다. 음원 벡터가 맵핑된 맵은, 복수의 음원 각각에 대하여 지정된 해시태그, 가사, 사용자에 의해 입력된 텍스트 중 적어도 하나에 기초하여 구축될 수 있다. 일 실시예에 따른 스토리지(113)는 맵을 저장한다. In step S32, the sound source selection unit 112 according to the embodiment selects a sound source corresponding to the feature vector of the image extracted in step S31. The sound source selection unit 112 according to an embodiment calculates a similarity degree between at least a part of an image feature vector and a plurality of sound source vectors by referring to a map to which a sound source vector is mapped for each of a plurality of sound sources, And selects a sound source corresponding to the image feature vector. The map to which the sound source vector is mapped may be constructed based on at least one of the hashtag specified for each of the plurality of sound sources, the lyrics, and the text input by the user. The storage 113 according to one embodiment stores a map.

도 3에 도시되지 않았으나, 본 발명의 일 실시예에 따른 음원 추천 방법은 도 2의 음원 선택부(112)가 단계 S32에서 선택된 음원을 포함하는 재생 목록을 제공하는 단계를 더 포함할 수 있다. 재생 목록은 단계 S32에서 선택된 복수의 음원을 나열할 수 있고, 단계 S32에서 산출된 이미지와 음원 간의 유사도에 기초하여 복수의 음원을 나열할 수 있다. 예를 들어, 재생 목록은 이미지와 음원 간의 유사도가 높은 순으로 음원을 나열할 수 있다.Although not shown in FIG. 3, the sound source recommendation method according to an embodiment of the present invention may further include providing a playlist including the sound source selected in step S32 by the sound source selection unit 112 of FIG. The playlist may list a plurality of sound sources selected in step S32, and may list a plurality of sound sources based on the similarity between the image and the sound source calculated in step S32. For example, a playlist can list sources in the order of highest similarity between images and sound sources.

도 4는 도 2에 도시된 음원 추천 장치(110)의 다른 예를 도시한 블록도이다. 한편, 도 5은 도 4에 도시된 음원 추천 장치(110)에 의해 처리되는 본 발명의 일 실시예에 따른 음원 추천 방법을 도시한 흐름도이다. 이하에서는 도 4 및 도 5를 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 4 is a block diagram showing another example of the sound source recommendation apparatus 110 shown in FIG. FIG. 5 is a flowchart illustrating a sound source recommendation method according to an embodiment of the present invention, which is performed by the sound source recommendation apparatus 110 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 4 and 5. FIG.

도 4를 참조하면, 본 발명의 일 실시예에 따른 음원 추천 장치(110)는 맵 구축부(114)를 더 포함한다.Referring to FIG. 4, the sound source recommendation apparatus 110 according to an embodiment of the present invention further includes a map construction unit 114.

도 5를 참조하면, 본 발명의 일 실시예에 따른 음원 추천 방법은, 도 3에 도시된 단계 S31 이전에, 단계 S30을 더 포함한다.Referring to FIG. 5, the sound source recommendation method according to an embodiment of the present invention further includes step S30 before step S31 shown in FIG.

단계 S30에서 맵 구축부(114)는 본 발명의 음원 추천 서비스의 기반이 되는 맵을 구축하여 스토리지(113)에 저장한다. 단계 S30에서 생성되는 맵은 이미지 특징 벡터와 음원 벡터의 연관 정보를 포함한다. 맵 구축부(114)는 인경신공망 알고리즘을 이용하여 맵을 구축할 수 있다.In step S30, the map building unit 114 constructs a map serving as a base of the sound source recommendation service of the present invention, and stores the map in the storage 113. [ The map generated in step S30 includes association information of the image feature vector and the sound source vector. The map construction unit 114 can construct a map using a new ring network algorithm.

본 발명의 일 실시예(이하, "제1 실시예" 라고 한다.)에 따르면, 단계 S30에서 맵 구축부(114)는 이미지 특징 벡터와 텍스트 벡터가 맵핑된 제1 맵 및 텍스트 벡터와 음원 벡터가 맵핑된 제2 맵을 각각 구축하여 스토리지(113)에 저장한다. 제1 맵은 기 공개된 오픈 소스를 활용하여 구축될 수 있다. 즉, 맵 구축부(114)는 복수의 이미지와 복수의 텍스트의 관계를 학습한 결과에 따라 복수의 이미지와 복수의 텍스트 각각에 대하여 벡터를 정의하고 각 벡터를 N차원 맵에 맵핑한 오픈 소스를 획득하여 바로 스토리지(113)에 저장하거나, 수정을 거쳐 스토리지(113)에 저장할 수 있다. 한편, 제1 맵은 텍스트 벡터 각각에 대응되는 실제 텍스트를 함께 저장할 수 있다.According to an embodiment of the present invention (hereinafter referred to as "first embodiment"), at step S30, the map construction unit 114 generates a first map and a text vector to which an image feature vector and a text vector are mapped, And stores the generated second map in the storage unit 113. [0050] The first map can be constructed using the open source. That is, the map construction unit 114 defines an vector for each of a plurality of images and a plurality of texts according to a result of learning a relation between a plurality of images and a plurality of texts, and maps an open source May be acquired and immediately stored in the storage 113, or may be modified and stored in the storage 113. On the other hand, the first map may store actual text corresponding to each text vector.

일 실시예에 따르면, 맵 구축부(114)는 복수의 음원 각각에 대하여 지정된 해시태그, 가사 및 사용자에 의해 입력된 텍스트 중 적어도 하나에 포함된 텍스트를 참조하여 복수의 음원 벡터와 텍스트 벡터가 맵핑된 제2 맵을 스토리지(113)에 저장한다. 한편, 제2 맵은 텍스트 벡터 각각에 대응되는 실제 텍스트를 함께 저장할 수 있다.According to one embodiment, the map construction unit 114 refers to a text included in at least one of a hash tag, lyrics and text input by a user for each of a plurality of sound sources, so that a plurality of sound source vectors and a text vector are mapped And stores the second map in the storage 113. On the other hand, the second map may store actual text corresponding to each text vector.

본 발명의 다른 실시예(이하, "제2 실시예" 라고 한다.)에 따르면, 단계 S30에서 맵 구축부(114)는 이미지 특징 벡터와 음원 벡터가 맵핑된 제3 맵을 스토리지(113)에 저장한다. 맵 구축부(114)는 제1 맵 및 제2 맵을 병합하여 제3 맵을 생성할 수 있다. 맵 구축부(114)는 제1 맵과 별도의 정보를 참조하여 제3 맵을 생성할 수 있다. 별도의 정보는, 음원과 텍스트의 연관 관계를 추출할 수 있는 정보, 예컨대 각 음원에 대해 지정된 해시태그, 가사, 사용자에 의해 입력된 텍스트 등을 포함할 수 있다. 맵 구축부(114)는 제1 맵의 트레이닝 데이터인 이미지-텍스트 데이터와 제2 맵의 트레이닝 데이터인 음원-텍스트 데이터를 이용하여, 제3 맵을 생성할 수 있다. 이미지-텍스트 데이터는 복수의 이미지 및 복수의 이미지 각각에 대하여 지정된 텍스트를 포함하고, 음원-텍스트 데이터는 복수의 음원 및 복수의 음원 각각에 대하여 지정된 텍스트를 포함한다.According to another embodiment of the present invention (hereinafter referred to as "the second embodiment"), in step S30, the map construction unit 114 stores a third map in which an image feature vector and a sound source vector are mapped, . The map construction unit 114 can generate the third map by merging the first map and the second map. The map construction unit 114 can generate the third map with reference to the information different from the first map. The additional information may include information capable of extracting an association between the sound source and the text, for example, a hash tag specified for each sound source, lyrics, text entered by the user, and the like. The map construction unit 114 can generate the third map using the image-text data, which is the training data of the first map, and the sound source-text data, which is the training data of the second map. The image-text data includes a plurality of images and a designated text for each of the plurality of images, and the sound-text data includes a plurality of sound sources and text specified for each of the plurality of sound sources.

여기서 텍스트는, 하나 이상의 단어를 포함할 수 있다. 예를 들어, 텍스트는 하나의 단어일 수 있다. 다른 예를 들어, 텍스트는 문장, 즉 복수의 단어를 나열한 시퀀스일 수 있다.Here, the text may include one or more words. For example, the text may be a single word. As another example, the text may be a sentence, that is, a sequence listing a plurality of words.

단계 S32에서 음원 선택부(112)는 단계 S30에서 구축되어 스토리지(113)에 저장된 맵을 이용하여 음원을 선택한다. 제1 실시예에 따르면, 단계 S32에서 음원 선택부(112)는 제1 맵을 참조하여 이미지 특징 벡터에 대응되는 텍스트를 선택하고, 선택된 텍스트에 대응되는 음원을 제2 맵을 참조하여 선택하는 2단계 선택 방식으로 음원을 선택한다. 제2 실시예에 따르면, 단계 S32에서 음원 선택부(112)는 제3 맵을 이용하여 이미지 특징 벡터에 대응되는 음원 벡터를 션택하는 1단계 선택 방식으로, 음원을 선택한다.In step S32, the sound source selection unit 112 selects a sound source using the map constructed in step S30 and stored in the storage 113. [ According to the first embodiment, in step S32, the sound source selection unit 112 refers to the first map and selects text corresponding to the image feature vector, and refers to the second map by referring to the sound source corresponding to the selected text Select a sound source in a step-by-step manner. According to the second embodiment, in step S32, the sound source selection unit 112 selects a sound source as a one-step selection method for selecting a sound source vector corresponding to an image feature vector using the third map.

본 발명의 일 실시예에 따른 스토리지(113)에 저장된 맵은 N차원 맵으로, N차원의 벡터가 맵핑될 수 있다. 제1 실시예에 따른 음원 추천 장치(110)는, 복수의 이미지 특징 벡터와 복수의 텍스트 벡터가 하나의 맵에 맵핑된 제1 맵 및 복수의 텍스트 벡터와 복수의 음원 벡터가 하나의 맵에 맵핑된 제2 맵을 구비할 수 있다. 제2 실시예에 따른 음원 추천 장치(110)는, 복수의 이미지 특징 벡터와 복수의 음원 벡터가 하나의 맵에 맵핑된 제3 맵을 구비할 수 있다. 제1 맵, 제2 맵, 및 제3 맵은 복수의 파라미터를 1:1로 매칭시키는 룩업테이블이 아니라, 복수의 파라미터를 한꺼번에 맵핑시키는 N차원 상의 맵일 수 있다. 이와 같은 방식으로 구비되는 맵을 이용함에 따라 본 발명의 일 실시예에 따른 음원 추천 장치(110)에서는 유사어 처리가 용이하게 구현되며, 기준치 설정에 따라 원하는 개수의 출력을 얻을 수 있게 된다. The map stored in the storage 113 according to an embodiment of the present invention may be an N-dimensional map, and an N-dimensional vector may be mapped. The sound source recommendation apparatus 110 according to the first embodiment includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped to a single map and a plurality of text vectors and a plurality of sound source vectors are mapped The second map may be provided. The sound source recommendation apparatus 110 according to the second embodiment may include a third map in which a plurality of image feature vectors and a plurality of sound source vectors are mapped to one map. The first map, the second map, and the third map are not a lookup table for matching a plurality of parameters at a ratio of 1: 1, but may be an N-dimensional map for mapping a plurality of parameters at once. By using the map provided in this manner, the sound source recommending apparatus 110 according to the embodiment of the present invention can easily implement the similarity processing, and a desired number of outputs can be obtained according to the reference value setting.

도 6는 도 2에 도시된 이미지 특징값 추출부(111)를 개략적으로 도시한 블록도의 일 예이다. 도 7는 도 6에 도시된 이미지 특징값 추출부(111)에 의해 처리되는 본 발명의 일 실시예에 따른 이미지 특징값 추출 방법을 도시한 흐름도이다. 이하에서는 도 6 및 도 7를 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 6 is an example of a block diagram schematically showing the image feature value extracting unit 111 shown in FIG. FIG. 7 is a flowchart illustrating an image feature value extraction method according to an embodiment of the present invention, which is performed by the image feature value extraction unit 111 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIG. 6 and FIG.

도 6을 참조하면, 이미지 특징값 추출부(111)는 필터 결정부(11), 영역 분할부(12), 영역 특징값 추출부(13) 및 대표 특징값 추출부(14)를 포함한다.Referring to FIG. 6, the image feature value extracting unit 111 includes a filter determining unit 11, an area dividing unit 12, an area feature value extracting unit 13, and a representative feature value extracting unit 14.

단계 S71에서 필터 결정부(11)는 이미지를 획득한다. 사용자는 특정 이미지에 어울리는 음원을 추천받기 위해 해당 이미지 파일을 서버(100)에 직접 전송하거나, 해당 이미지의 식별 정보를 서버(100)에 전송할 수 있다. 또는 서버(100)는 이미지와 음원을 연계하는 서비스를 제공하기 위해, 소정의 이미지를 음원 추천 장치(110)에 입력할 수 있다.In step S71, the filter determination unit 11 acquires an image. The user can directly transmit the image file to the server 100 or send the identification information of the image to the server 100 in order to receive a sound source suitable for the specific image. Alternatively, the server 100 may input a predetermined image to the sound source recommendation apparatus 110 in order to provide a service linking the image and the sound source.

단계 S72에서 일 예에 따른 필터 결정부(11)는 콘볼루션 신경망(Convolutional neural network; CNN)을 적용하여, 이미지로부터 특징값을 추출하기 위해 이미지에 적용할 필터를 결정한다. 구체적인 알고리즘은 전술한 콘볼루션 신경망에 한정하지 않는다.In step S72, the filter determination unit 11 according to an exemplary embodiment applies a convolutional neural network (CNN) to determine a filter to be applied to an image in order to extract a feature value from the image. The specific algorithm is not limited to the above-described convolutional neural network.

단계 S73에서 영역 분할부(12)는 이미지를 복수의 영역으로 분할한다. In step S73, the area dividing unit 12 divides the image into a plurality of areas.

단계 S74에서 영역 특징값 추출부(13)는 단계 S72에서 결정된 필터를 단계 S73에서 분할된 각 영역에 적용하여, 각 영역에 대한 특징값을 추출한다. 일 실시예에 따르면 단계 S74에서 추출되는 특징값은, nm차원 맵에 맵핑될 수 있는 nm 벡터로 표현된다.In step S74, the area feature value extracting unit 13 applies the filter determined in step S72 to each area divided in step S73, and extracts feature values for each area. According to one embodiment, the feature value extracted in step S74 is represented by a nm vector that can be mapped to a nm-dimensional map.

단계 S75에서 대표 특징값 추출부(14)는 단계 S74에서 추출된 각 영역 특징값에 기초하여 단계 S71에서 획득된 이미지에 대한 대표 특징값을 추출한다. 대표 특징값 추출부(14)는 단계 S74에서 추출된 각 영역 특징값에 가중치(weight)를 각기 적용하여 대표 특징값을 산출할 수 있다. 다만, 가중치 합은 대표 특징값 산출 방법의 일 예이므로 본 발명이 이에 한정되지 않는다. 단계 S75에서 산출되는 대표 특징값 역시 nm 행렬로 표현될 수 있고, nm차원 맵에 맵핑될 수 있는 nm 벡터일 수 있다. 각 영역에 대한 가중치는 각 영역의 특징값에 기초하여 결정될 수 있으나 이에 한정하지 않는다. 각 영역에 대한 가중치는 어텐션 모델(attention model)을 이용한 기계 학습에 의해 결정될 수 있다.In step S75, the representative feature value extracting unit 14 extracts a representative feature value for the image obtained in step S71 based on each area feature value extracted in step S74. The representative feature value extracting unit 14 may calculate a representative feature value by applying a weight to each region feature value extracted in step S74. However, since the sum of the weights is an example of the representative feature value calculating method, the present invention is not limited thereto. The representative characteristic value calculated in step S75 may also be represented by an nm matrix and may be an nm vector that can be mapped to a nm-dimensional map. The weight for each region may be determined based on the feature value of each region, but is not limited thereto. The weights for each region can be determined by machine learning using an attention model.

본 발명의 일 실시예에 따른 이미지 특징값 추출부(111)가 이미지의 특징값을 추출함에 있어서 도 7에 도시된 것과 같이 이미지를 복수 영역으로 분할한 후 각 영역에 대한 특징값을 산출하고 이들을 기반으로 대표 특징값을 산출함에 따라, 이미지에 포함된 영역들 중 주요 영역에 대한 특징값이 높은 비중으로 대표 특징값에 반영될 수 있다. 결과적으로 이미지의 실질적 특징이 이미지의 대표 특징값에 반영되어, 대표 특징값의 정확도가 향상된다.In extracting feature values of an image according to an embodiment of the present invention, as shown in FIG. 7, an image feature value extracting unit 111 divides an image into a plurality of regions, calculates feature values for each region, The feature value of the main region among the regions included in the image can be reflected to the representative feature value with a high specific gravity. As a result, the substantial feature of the image is reflected in the representative feature value of the image, so that the accuracy of the representative feature value is improved.

다만 본 발명은 이에 한정하지 않으며, 본 발명의 일 실시예에 따른 이미지 특징값 추출부(111)는 본 발명의 실시 환경에 따라 이미지를 분할하지 않고 이미지 전체에 대한 특징값을 대표 특징값으로 산출함으로써, 이미지 대표 특징값을 산출하는데 소요되는 처리 단계와 처리 시간을 줄일 수 있다. However, the present invention is not limited to this, and the image feature value extracting unit 111 according to an embodiment of the present invention may calculate the feature value for the entire image as a representative feature value without dividing the image according to the implementation environment of the present invention The processing steps and the processing time required to calculate the image representative feature value can be reduced.

도 8은 도 4에 도시된 맵 구축부(114)의 제1 실시예를 개략적으로 도시한 블록도의 일 예이다. 도 9은 도 8에 도시된 맵 구축부(114)에 의해 처리되는 본 발명의 일 실시예에 따른 맵 구축 방법을 도시한 흐름도이며, 도 5에 도시된 단계 S30의 상세 흐름도이다. 이하에서는 도 8 및 도 9을 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 8 is an example of a block diagram schematically showing the first embodiment of the map construction unit 114 shown in FIG. FIG. 9 is a flowchart showing a map construction method according to an embodiment of the present invention, which is processed by the map construction unit 114 shown in FIG. 8, and is a detailed flowchart of step S30 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 8 and 9. FIG.

도 8을 참조하면, 본 발명의 제1 실시예에 따른 맵 구축부(114)는 제1 맵 구축부(41) 및 제2 맵 구축부(42)를 포함한다. Referring to FIG. 8, the map construction unit 114 according to the first embodiment of the present invention includes a first map construction unit 41 and a second map construction unit 42.

단계 S301에서 본 발명의 일 실시예에 따른 제1 맵 구축부(41)는 이미지 특징 벡터 및 텍스트 벡터가 맵핑되는 제1 맵을 구축한다. 제1 맵 구축부(41)는 공개된 제1 맵을 획득하여 스토리지(113)에 저장하거나, 공개된 제1 맵을 수정하여 스토리지(113)에 저장하거나, 자체적으로 제1 맵을 생성하여 스토리지(113)에 저장할 수 있다. In step S301, the first map construction unit 41 according to an embodiment of the present invention constructs a first map to which an image feature vector and a text vector are mapped. The first map construction unit 41 acquires the first map and stores the first map in the storage 113, or modifies the first map in the storage 113 and stores the first map in the storage 113, (113).

예를 들어, 제1 맵 구축부(41)는 이미지 특징 벡터와 텍스트 벡터를 맵핑하는 공개된 제1 맵을 획득하고, 획득한 제1 맵을 스토리지(113)에 저장한다.For example, the first map construction unit 41 acquires a first published map for mapping an image feature vector and a text vector, and stores the acquired first map in the storage 113.

예를 들어, 제1 맵 구축부(41)는 이미지와 이에 대응되는 하나 이상의 텍스트 조합의 데이터베이스를 획득하고, 획득한 데이터베이스를 기반으로 머신러닝 처리방법을 이용하여 이미지 특징 벡터와 텍스트 벡터를 맵핑하는 제1 맵을 생성하여 스토리지(113)에 저장한다. 제1 맵 구축부(41)는 텍스트가 문장인 경우, 복수의 단어를 나열한 시퀀스에 대하여 순환신경망(RNN)을 적용하여 텍스트의 벡터를 획득할 수 있다.For example, the first map construction unit 41 acquires a database of an image and at least one text combination corresponding thereto, and maps an image feature vector and a text vector using a machine learning processing method based on the obtained database The first map is generated and stored in the storage 113. When the text is a sentence, the first map construction unit 41 can obtain a vector of text by applying a circular neural network (RNN) to a sequence in which a plurality of words are arranged.

단계 S302에서 본 발명의 일 실시예에 따른 제2 맵 구축부(42)는 텍스트 벡터 및 음원 벡터가 맵핑되는 제2 맵을 구축하여 스토리지(113)에 저장한다. 제2 맵 구축부(42)는 음원과 이에 대응되는 하나 이상의 텍스트의 데이터베이스를 획득하고, 획득한 데이터베이스를 기반으로 머신러닝 처리방법을 이용하여 음원 벡터와 텍스트 벡터를 맵핑하는 제2 맵을 생성하여 스토리지(113)에 저장한다.In step S302, the second map construction unit 42 according to an embodiment of the present invention constructs a second map to which the text vector and the sound source vector are mapped and stores the second map in the storage 113. [ The second map construction unit 42 acquires a sound source and a database of one or more texts corresponding thereto and generates a second map for mapping the sound source vector and the text vector using the machine learning processing method based on the obtained database And stores it in the storage 113.

도 10는 도 8에 도시된 제2 맵 구축부(42)를 개략적으로 도시한 블록도의 예이다. 도 11은 도 10에 도시된 제2 맵 구축부(42)에 의해 처리되는 본 발명의 일 실시예에 따른 제2 맵 구축 방법을 도시한 흐름도이다. 이하에서는 도 10 및 도 11을 함께 참조하여 본 발명의 일 실시예를 설명한다.10 is an example of a block diagram schematically showing the second map construction unit 42 shown in FIG. FIG. 11 is a flowchart illustrating a second map construction method according to an embodiment of the present invention, which is processed by the second map construction unit 42 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 10 and 11. FIG.

도 10을 참조하면, 본 발명의 일 실시예에 따른 제2 맵 구축부(42)는 음원 및 텍스트 수집부(421), 음원-텍스트 조합부(422), 신경망 임베딩 기반의 텍스트 매칭부(423), 및 제2 맵 저장부(424)를 포함한다.10, the second map constructing unit 42 according to an embodiment of the present invention includes a sound source and text collecting unit 421, a sound source-text combining unit 422, a neural network embedding-based text matching unit 423 ), And a second map storage unit 424.

단계 S111에서 일 실시예에 따른 음원 및 텍스트 수집부(421)는 음원 및 이에 대응되는 텍스트 정보를 수집한다. 텍스트는 음원에 대하여 지정된 해시태그, 음원의 가사, 음원에 대하여 사용자에 의해 직접 입력된 텍스트 중 적어도 하나를 포함할 수 있다.In step S111, the sound source and text collection unit 421 according to one embodiment collects the sound source and the text information corresponding thereto. The text may include at least one of a hashtag specified for the sound source, lyrics of the sound source, and text directly entered by the user for the sound source.

단계 S112에서 일 실시예에 따른 음원-텍스트 조합부(422)는 단계 S111에서 수집된 음원과 텍스트를 매칭한다. 예를 들어, 음원-텍스트 조합부(422)는 음원에 대하여 지정된 해시태그가 있는 경우, 음원과 각 해시태그를 매칭한다. 해시태그는 사용자에 의해 지정될 수 있다. 예를 들어, 사용자는 특정 음원에 대하여 #사랑, #따뜻함 #연주곡 의 3개 해시태그를 지정할 수 있다. 이 경우 음원-텍스트 조합부(422)는 해당 음원과 사용자에 의해 지정된 3개의 해시태그 각각을 매칭한다. 일 실시예에 따른 음원-텍스트 조합부(422)는 음원에 대한 가사 정보가 있는 경우, 가사 주요 단어를 추출하고, 추출된 주요 단어를 음원과 매칭한다. 일 실시예에 따른 음원-텍스트 조합부(422)는 음원에 대하여 사용자에 의해 직접 입력된 텍스트를 음원과 매칭한다. 이 때 음원-텍스트 조합부(422)는 머신러닝에 의해 학습되는 단어들의 중요도에 따라, 사용자에 의해 직접 입력된 텍스트를 수정할 수 있다. 예를 들어, 사용자가 복수의 음원을 포함하는 재생 목록을 생성하고 "바다에서 듣기 좋은 재즈"의 텍스트를 제목으로 입력한 경우, 음원-텍스트 조합부(422)는 "바다에서 듣기 좋은 재즈"를 "바다, 재즈"로 수정하고, 재생 목록에 포함된 복수의 음원에 대해 "바다, 재즈"의 텍스트를 매칭한다. 다만 본 발명은 이에 한정하지 않으며, 음원-텍스트 조합부(422)는 텍스트를 수정 없이 매칭할 수도 있다. 한편, 단계 S112에서 조합되는 음원-텍스트에서의 텍스트는 하나의 단어를 포함할 수 있다. 예를 들어, 재생 목록에 포함된 복수의 음원(제1 음원 및 제2 음원)에 대해 "바다, 재즈"의 텍스트가 매칭된 경우, 음원-텍스트 조합부(422)는 "제1 음원-바다", "제2 음원-바다", "제1 음원-재즈", "제2 음원-재즈"의 조합을 생성할 수 있다. In step S112, the sound source-text combining unit 422 according to the embodiment matches the sound source and the text collected in step S111. For example, the sound source-text combining unit 422 matches the sound source with each hash tag if there is a hash tag specified for the sound source. The hash tag can be specified by the user. For example, a user can specify three hashtags for a particular sound source: # love, # warmth # music. In this case, the sound source-text combination unit 422 matches each sound source and each of the three hash tags designated by the user. The sound source-text combination unit 422 according to an embodiment extracts the main words of the lyrics when there is the lyrics information about the sound source, and matches the extracted main words with the sound source. The sound source-text combining unit 422 according to an embodiment matches the text directly input by the user with respect to the sound source with the sound source. At this time, the sound source-text combining unit 422 can modify the text directly input by the user according to the importance of words learned by machine learning. For example, when a user creates a playlist containing a plurality of sound sources and inputs a text titled "jazz good to be heard in the sea " as a title, the sound source-text combination unit 422 reads" "Sea, Jazz", and matches the text of "Sea, Jazz" for multiple sound sources included in the playlist. However, the present invention is not limited to this, and the sound source-text combining unit 422 may match the text without modification. On the other hand, the text in the sound source-text combined in step S112 may include one word. For example, when the text of "sea, jazz" is matched to a plurality of sound sources (first sound source and second sound source) included in the play list, the sound source- , "Second sound source-sea", "first sound source-jazz", "second sound source-jazz".

구체적으로 예를 들어, 실제 임베딩 성능 향상을 위해서, 바다에서 듣기 좋은 재즈: 1, 2, 3, 4, 5, 6(숫자는 음원ID로 가정) 이라면 데이터를 만들 때 다음과 같이 여러 형태로 인공문장을 만들어 낸다.Specifically, for example, to improve the actual embedding performance, if you can listen to jazz at the sea: 1, 2, 3, 4, 5, 6 (numbers are assumed to be sound IDs) Produce sentences.

- 재즈, 1, 2, 3, 재즈, 4, 5, 6- Jazz, 1, 2, 3, Jazz, 4, 5, 6

- 바다, 1, 2, 3, 바다, 4, 5, 6- Sea, 1, 2, 3, Sea, 4, 5, 6

- 바다, 듣기, 재즈, 1, 2, 3, 바다, 듣기, 재즈, 4, 5, 6 (두 단어를 같이)- Sea, Listening, Jazz, 1, 2, 3, Sea, Listening, Jazz, 4, 5, 6 (two words together)

- 혹은 위의 순서를 마구 섞은 후에 문장으로 사용- Or use the sentence after mixing the above sequence

본 발명의 일 실시예에 따르면, 위의 과정을 통해 생성된 데이터를 워드 임베딩(word Embedding)을 통해 단어-음원 매핑관계를 학습하는 모델을 만들어낼 수 있다.According to an embodiment of the present invention, a model for learning a word-sound source mapping relationship through word embedding can be created by using the above-described process.

단계 S113에서 텍스트 매칭부(423)는, 단계 S112에서 각 음원에 매칭된 텍스트와 제1 맵에 맵핑된 텍스트를 매칭한다. 일 실시예에 따르면 제1 맵에 맵핑된 텍스트는 영어 단어를 포함할 수 있고, 단계 S112에서 각 음원에 매칭된 텍스트는 한글 단어를 포함할 수 있다. 이 경우 S113에서 텍스트 매칭부(423)는 제1 맵에 맵핑된 텍스트에 포함된 영어 단어를 번역하여 각 음원에 매칭된 텍스트에 포함된 한글 단어와 연결할 수 있다. 단계 S113의 처리에 따라, 각 음원에 매칭된 텍스트에 포함된 단어는 제1 맵 상에 맵핑된 텍스트에 포함된 단어로 표현될 수 있다.In step S113, the text matching unit 423 matches the text matched to each sound source and the text mapped to the first map in step S112. According to an embodiment, the text mapped to the first map may include an English word, and the text matched to each sound source in step S112 may include a Korean word. In this case, in step S113, the text matching unit 423 may translate the English words included in the text mapped to the first map and associate them with Hangul words included in the text matched to each sound source. According to the process of step S113, the words included in the text matched to each sound source can be expressed by words included in the text mapped on the first map.

단계 S114에서 제2 맵 저장부(424)는 단계 S112에서 조합된 음원-텍스트를 트레이닝 데이터로써 이용하는 기계 학습 결과에 따라 음원과 텍스트의 관계를 학습하여 제2 맵 상에 저장한다. 단계 S114의 학습에 따라 각 음원과 각 텍스트의 관계가 학습되고, 학습된 결과에 따라 각 음원에 대한 벡터와, 각 텍스트에 대한 벡터가 동일 맵에 맵핑된다. 단계 S115에서 제2 맵 저장부(425)는 제2 맵 상에 음원 벡터와 텍스트 벡터를 맵핑하여 스토리지(113)에 저장한다. 단계 S115에서 제2 맵 저장부(425)는 단계 S114에서 학습된 복수의 음원 각각에 대응되는 음원 벡터 및 각 텍스트에 대응되는 텍스트 벡터를 포함하는 제2 맵을 생성하여 스토리지(113)에 저장한다.In step S114, the second map storage unit 424 learns the relationship between the sound source and the text on the basis of the machine learning result using the combined sound source-text as the training data in step S112, and stores it on the second map. The relationship between each sound source and each text is learned in accordance with the learning in step S114, and a vector for each sound source and a vector for each text are mapped to the same map according to the learned result. In step S115, the second map storage unit 425 maps the tone vector and the text vector on the second map, and stores the tone vector and the text vector in the storage 113. In step S115, the second map storage unit 425 generates a second map including a tone vector corresponding to each of the plurality of sound sources learned in step S114 and a text vector corresponding to each text, and stores the generated second map in the storage 113 .

도 12은 도 4에 도시된 맵 구축부(114)의 제2 실시예를 개략적으로 도시한 블록도의 예이다. 도 13는 도 12에 도시된 맵 구축부(114)에 의해 처리되는 본 발명의 일 실시예에 따른 맵 구축 방법을 도시한 흐름도이며, 도 5에 도시된 단계 S30의 상세 흐름도이다. 이하에서는 도 12 및 도 13를 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 12 is an example of a block diagram schematically showing a second embodiment of the map construction unit 114 shown in FIG. FIG. 13 is a flowchart showing a map construction method according to an embodiment of the present invention, which is processed by the map construction unit 114 shown in FIG. 12, and is a detailed flowchart of step S30 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 12 and 13. FIG.

도 12를 참조하면, 본 발명의 일 실시예에 따른 맵 구축부(114)는 제1 맵 구축부(41), 제2 맵 구축부(42) 및 제3 맵 구축부(43)를 포함한다.12, the map construction unit 114 according to an embodiment of the present invention includes a first map construction unit 41, a second map construction unit 42 and a third map construction unit 43 .

단계 S301'에서 본 발명의 일 실시예에 따른 제1 맵 구축부(41)는 이미지 특징 벡터 및 텍스트 벡터가 맵핑되는 제1 맵을 구축한다. 단계 S302'에서 본 발명의 일 실시예에 따른 제2 맵 구축부(42)는 텍스트 벡터 및 음원 벡터가 맵핑되는 제2 맵을 구축한다. In step S301 ', the first map construction unit 41 according to an embodiment of the present invention constructs a first map to which an image feature vector and a text vector are mapped. In step S302 ', the second map construction unit 42 according to an embodiment of the present invention constructs a second map to which a text vector and a sound source vector are mapped.

단계 S303에서 본 발명의 일 실시예에 따른 제3 맵 구축부(43)는 음원 벡터와 이미지 특징 벡터가 맵핑된 제3 맵을 구축하여 스토리지(113)에 저장한다. 일 실시예에 따른 제3 맵 구축부(43)는 제1 맵 상에 맵핑된 텍스트 벡터와 제2 맵 상에 맵핑된 텍스트 벡터를 서로 매칭하여, 제1 맵과 제2 맵을 병합(merging)하는 제3 맵을 구축하고, 제3 맵을 스토리지(113)에 저장한다. 제3 맵 구축부(43)는, 제1 맵 상에 맵핑된 텍스트 벡터와 제2 맵 상에 맵핑된 텍스트 벡터를 서로 매칭하여 제1 맵 상에 맵핑된 이미지 벡터와 제2 맵 상에 맵핑된 음원 벡터의 관계를 나타내는 제3 맵을 저장한다. 예를 들어, 제3 맵 구축부(43)는 제1 맵 상에서 제1 이미지와 제1 텍스트 벡터의 관계, 및 제1 맵 상에서의 제1 텍스트 벡터에 매칭되는 제2 맵 상에서의 제2 텍스트 벡터와 제1 음원의 관계에 기초하여, 제1 이미지와 제1 음원의 관계를 획득하고, 획득된 관계를 반영하도록 제3 맵에 제1 이미지와 제1 음원을 맵핑할 수 있다.In step S303, the third map construction unit 43 constructs a third map to which the sound source vector and the image feature vector are mapped according to an embodiment of the present invention, and stores the third map in the storage 113. The third map construction unit 43 according to the embodiment merges the first map and the second map by matching the text vector mapped on the first map and the text vector mapped on the second map, , And stores the third map in the storage 113. [0060] The third map construction unit 43 constructs the third map construction unit 43 to map the text vector mapped on the first map and the text vector mapped on the second map to the image vector mapped on the first map, And stores the third map indicating the relationship of the sound source vectors. For example, the third map construction unit 43 constructs the second text vector on the first map on the first map and the first text vector on the first map, The first image and the first sound source may be mapped to the third map in order to acquire the relationship between the first image and the first sound source based on the relationship between the first image and the first sound source.

다른 실시예에 따른 제3 맵 구축부(43)는 제1 맵의 트레이닝 데이터인 이미지-텍스트 데이터와 제2 맵의 트레이닝 데이터인 음원-텍스트 데이터를 이용하여, 제3 맵을 생성할 수 있다. 예를 들어 제3 맵 구축부(43)는 제1 맵의 트레이닝 데이터인 이미지-텍스트 데이터를 이용하여, 텍스트에 포함된 각 단어와 이미지의 관계를 학습하고, 제2 맵의 트레이닝 데이터인 음원-텍스트 데이터를 이용하여 텍스트에 포함된 각 단어와 음원의 관계를 학습하고, 학습된 결과를 머징(merging)하여 이미지와 음원의 관계를 나타내는 제3 맵을 생성할 수 있다.The third map construction unit 43 according to another embodiment may generate the third map using the image-text data, which is the training data of the first map, and the sound source-text data, which is the training data of the second map. For example, the third map construction unit 43 learns the relationship between each word and the image included in the text using the image-text data, which is the training data of the first map, Learning the relationship between each word and the sound source included in the text using text data, and merging the learned result to generate a third map representing the relationship between the image and the sound source.

다른 실시예에 따른 제3 맵 구축부(43)는 제1 맵의 트레이닝 데이터인 이미지-텍스트 데이터에서, 텍스트에 포함된 단어들을 제2 맵의 트레이닝 데이터인 음원-텍스트 데이터를 이용하여 음원으로 대체함으로써 이미지-음원 데이터를 구축하고, 구축된 이미지-음원 데이터를 이용하여 이미지와 음원의 관계를 나타내는 제3 맵을 생성할 수 있다.The third map construction unit 43 according to another embodiment replaces the words included in the text with the sound source using the sound source-text data, which is the training data of the second map, in the image-text data which is the training data of the first map Thereby constructing image-sound source data, and generating a third map representing the relationship between the image and the sound source using the constructed image-sound source data.

도 12에 도시된 맵 구축부(114)에 따르면, 제1 맵 구축부(41)는 제1 맵을 제3 맵 구축부(43)에 출력하고, 제2 맵 구축부(42)는 제2 맵을 제3 맵 구축부(43)에 출력하고, 제3 맵 구축부(43)는 제1 맵과 제2 맵을 병합하는 제3 맵을 구축하여 스토리지(113)에 저장한다.According to the map construction unit 114 shown in Fig. 12, the first map construction unit 41 outputs the first map to the third map construction unit 43, and the second map construction unit 42 outputs the second map And outputs the map to the third map construction unit 43. The third map construction unit 43 constructs a third map for merging the first map and the second map and stores the third map in the storage 113. [

도 14는 도 8 및 도 12에 도시된 제1 맵 구축부(41)를 개략적으로 도시한 블록도이다. 도 15는 도 14에 도시된 제1 맵 구축부(41)에 의해 처리되는 본 발명의 일 실시예에 따른 제1 맵 구축 방법을 도시한 흐름도이다. 이하에서는 도 14 및 도 15을 함께 참조하여 본 발명의 일 실시예를 설명한다.FIG. 14 is a block diagram schematically showing the first map construction unit 41 shown in FIG. 8 and FIG. FIG. 15 is a flowchart illustrating a first map construction method according to an embodiment of the present invention, which is processed by the first map construction unit 41 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 14 and 15. FIG.

도 14를 참조하면, 제1 맵 구축부(41)는 이미지-텍스트 DB 획득부(411), 필터 결정부(412), 영역 분할부(413), 영역 특징값 추출부(414), 대표 특징값 추출부(415), 순환 신경망 학습부(416) 및 제1 맵 저장부(417)를 포함한다.14, the first map construction unit 41 includes an image-text DB acquisition unit 411, a filter determination unit 412, an area division unit 413, an area feature value extraction unit 414, A value extracting unit 415, a cyclic-neural network learning unit 416, and a first map storage unit 417.

단계 S151에서 이미지-텍스트 DB 획득부(411)는 복수의 이미지 및 각 이미지에 대응되는 텍스트의 데이터를 포함하는 DB를 획득한다.단계 S152에서 일 실시예에 따른 필터 결정부(412)는 콘볼루션 신경망(Convolutional neural network; CNN)을 적용하여, 단계 S151에서 획득된 이미지로부터 특징값을 추출하기 위해 이미지에 적용할 필터를 결정한다. 구체적인 알고리즘은 전술한 콘볼루션 신경망에 한정하지 않는다.In step S151, the image-text DB acquisition unit 411 acquires a DB including data of a plurality of images and texts corresponding to the respective images. In step S152, the filter determination unit 412, according to an embodiment, A convolutional neural network (CNN) is applied to determine a filter to be applied to the image to extract feature values from the image obtained in step S151. The specific algorithm is not limited to the above-described convolutional neural network.

단계 S153에서 영역 분할부(413)는 이미지를 복수의 영역으로 분할한다. In step S153, the area dividing unit 413 divides the image into a plurality of areas.

단계 S154에서 영역 특징값 추출부(414)는 단계 S152에서 결정된 필터를 단계 S153에서 분할된 각 영역에 적용하여, 각 영역에 대한 특징값을 추출한다. 일 실시예에 따르면 단계 S154에서 추출되는 특징값은, nm차원 맵에 맵핑될 수 있는 nm 벡터로 표현된다.In step S154, the area feature value extracting unit 414 applies the filter determined in step S152 to each area divided in step S153, and extracts the feature value for each area. According to one embodiment, the feature value extracted in step S154 is expressed as a nm vector that can be mapped to the nm-dimensional map.

단계 S155에서 대표 특징값 추출부(415)는 단계 S154에서 추출된 각 영역 특징값에 기초하여 단계 S151에서 획득된 이미지에 대한 대표 특징값을 추출한다. 대표 특징값 추출부(415)는 단계 S154에서 추출된 각 영역 특징값에 가중치(weight)를 적용하여 대표 특징값을 산출할 수 있다. 다만, 가중치 합은 대표 특징값 산출 방법의 일 예이므로 본 발명이 이에 한정되지 않는다. 각 영역에 대한 가중치는 어텐션 모델(attention model)을 이용한 기계 학습에 의해 결정될 수 있다.In step S155, the representative feature value extracting unit 415 extracts a representative feature value for the image obtained in step S151 based on each area feature value extracted in step S154. The representative feature value extracting unit 415 may calculate a representative feature value by applying a weight to each region feature value extracted in step S154. However, since the sum of the weights is an example of the representative feature value calculating method, the present invention is not limited thereto. The weights for each region can be determined by machine learning using an attention model.

단계 S155에서 산출되는 대표 특징값 역시 nm 행렬로 표현될 수 있고, nm차원 맵에 맵핑될 수 있는 nm 벡터일 수 있다.단계 156에서 제1 맵 저장부(416)는 단계 S155에서 획득된 이미지의 대표 특징값 및 각 이미지에 대응되는 텍스트의 벡터를 제1 맵 상에 저장함으로써, 이미지 특징 벡터 및 텍스트 벡터를 포함하는 제1 맵을 구축하여 스토리지(113)에 저장한다.The representative characteristic value calculated in step S155 may also be represented by an nm matrix and may be an nm vector that can be mapped to the nm dimensional map. In step 156, the first map storage unit 416 stores A representative feature value and a vector of text corresponding to each image are stored on the first map, thereby constructing a first map including the image feature vector and the text vector, and storing the first map in the storage 113. [

단계 156에서 순환 신경망 학습부(416)는, 각 이미지에 대응되는 텍스트에 대한 벡터를 획득할 수 있다. 예를 들어, 텍스트가 복수의 단어를 나열한 시퀀스인 경우, 순환 신경망 학습부(416)는 단어의 시퀀스를 고려하여 텍스트와 이미지의 관계를 순환신경망으로 학습한다. 학습 결과에 따라 텍스트의 벡터가 획득될 수 있다. 다만, 텍스트가 하나의 단어인 경우 순환 신경망 학습부(416)는 단어의 시퀀스를 고려하는 순환 신경망 알고리즘의 사용 없이 단순히 텍스트와 이미지의 관계를 기계 학습할 수 있다.In step 156, the circular neural network learning unit 416 may obtain a vector for the text corresponding to each image. For example, when the text is a sequence in which a plurality of words are arranged, the circular neural network learning unit 416 learns the relationship between the text and the image by using the circular neural network in consideration of the sequence of words. A vector of text can be obtained according to the learning result. However, if the text is a single word, the circular neural network learning unit 416 can mechanically learn the relationship between text and images without using a circular neural network algorithm considering a sequence of words.

본 발명의 일 실시예에 따른 이미지 특징값 추출부(111)가 이미지의 특징값을 추출함에 있어서 도 15에 도시된 것과 같이 이미지를 복수 영역으로 분할한 후 각 영역에 대한 특징값을 산출하고, 이들을 기반으로 대표 특징값을 산출함에 따라, 이미지에 포함된 영역들 중 주요 영역에 대한 특징값이 높은 비중으로 대표 특징값에 반영될 수 있다. 결과적으로 이미지의 실질적 특징이 이미지의 대표 특징값에 반영되어, 대표 특징값의 정확도가 향상된다.In extracting feature values of an image according to an embodiment of the present invention, an image feature value extracting unit 111 divides an image into a plurality of regions as shown in FIG. 15, calculates feature values for the regions, By calculating the representative feature value based on these, the feature value of the main region among the regions included in the image can be reflected to the representative feature value with a high specific gravity. As a result, the substantial feature of the image is reflected in the representative feature value of the image, so that the accuracy of the representative feature value is improved.

도 16은 도 2 및 도 4의 음원 선택부(112)의 본 발명의 제1 실시예에 따른 구성을 개략적으로 도시한 블록도이다. 도 17은 도 16에 도시된 음원 선택부(112)에 의해 처리되는 본 발명의 일 실시예에 따른 음원 선택 방법을 도시한 흐름도이다. 이하에서는 도 16 및 도 17를 함께 참조하여 본 발명의 일 실시예를 설명한다. 이하에서는 제1 맵의 텍스트 벡터에 대응되는 텍스트는 복수의 단어를 나열한 시퀀스이고, 제2 맵의 텍스트 벡터에 대응되는 텍스트는 하나의 단어인 경우의 실시예에 대하여 도 16 및 도 17을 함께 참조하여 설명한다.FIG. 16 is a block diagram schematically illustrating the configuration of the sound source selection unit 112 of FIGS. 2 and 4 according to the first embodiment of the present invention. FIG. 17 is a flowchart illustrating a sound source selection method according to an embodiment of the present invention, which is processed by the sound source selection unit 112 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIGS. 16 and 17. FIG. Hereinafter, the text corresponding to the text vector of the first map is a sequence in which a plurality of words are arranged, and the text corresponding to the text vector of the second map is a single word, see Figs. 16 and 17 together .

도 16을 참조하면, 음원 선택부(112)는 제1 유사도 산출부(21), 제2 유사도 산출부(22) 및 순위 결정부(23)를 포함한다. 16, the sound source selection unit 112 includes a first similarity degree calculating unit 21, a second similarity degree calculating unit 22, and a ranking determining unit 23.

단계 S171에서 제1 유사도 산출부(21)는 이미지 특징값 추출부(111)에 의해 추출된 이미지 특징 벡터와 스토리지(113)에 저장된 제1 맵 상의 복수의 텍스트 벡터 중 적어도 일부 텍스트 벡터 간의 제1 유사도를 산출한다. 벡터 간의 유사도는 벡터 간의 거리에 대응될 수 있다. 제1 유사도 산출부(21)는 제1 유사도에 따라 하나 이상의 텍스트를 선택할 수 있다. 예를 들어, 제1 유사도 산출부(21)는 제1 유사도가 높은 순으로 기설정된 개수의 텍스트를 선택할 수 있다. 다른 예를 들어, 제1 유사도 산출부(21)는 제1 유사도가 기설정된 기준을 만족하는(예를 들어, 기준 유사도 이상인) 하나 이상의 텍스트를 선택할 수 있다.In step S171, the first similarity degree calculating section 21 calculates the first similarity degree between the image feature vector extracted by the image feature value extracting section 111 and at least one text vector among the plurality of text vectors on the first map stored in the storage 113 And calculates the degree of similarity. The similarity between the vectors may correspond to the distance between the vectors. The first similarity degree calculating section 21 can select one or more texts according to the first degree of similarity. For example, the first similarity degree calculating section 21 can select a predetermined number of texts in descending order of the first similarity degree. As another example, the first similarity degree calculating section 21 can select one or more texts whose first similarity degree satisfies a predetermined criterion (for example, the reference similarity degree or higher).

단계 S172에서 제2 유사도 산출부(22)는 단계 S171에서 선택된 하나 이상의 텍스트 벡터에 대응되는 텍스트 각각에 대하여, 스토리지(113)에 저장된 제2 맵 상의 음원 벡터들 중 적어도 일부와의 제2 유사도를 산출한다. 제2 유사도 산출부(22)는 제2 유사도에 따라 하나 이상의 음원을 선택할 수 있다.In step S172, the second similarity degree calculating section 22 calculates a second similarity degree with at least a part of the sound source vectors on the second map stored in the storage 113, for each text corresponding to the one or more text vectors selected in step S171 . The second similarity degree calculating section 22 can select one or more sound sources according to the second degree of similarity.

단계 S173에서 순위 결정부(23)는 제1 유사도 및 제2 유사도에 기초하여, 단계 S172에서 선택된 음원의 순위를 결정하고, 최종적으로 사용자에게 추천할 음원을 선택한다. 예를 들어, 순위 결정부(23)는 제1 유사도와 제2 유사도의 합(sum), 곱(multiply), 또는 이들의 조합에 기초하여 음원의 순위를 결정한다. 예를 들어, 순위 결정부(23)는 제1 유사도와 제2 유사도의 합, 곱(multiply), 또는 이들의 조합이 높을수록 음원의 순위를 높게 결정한다. 순위 결정부(23)는 제1 유사도와 제2 유사도의 합, 곱, 또는 이들의 조합이 기설정된 기준 이상인 음원을 선택할 수 있다. 다른 예에 따르면 순위 결정부(23)는 제1 유사도와 제2 유사도의 합, 곱, 또는 이들의 조합에 따라 결정되는 음원의 순위가 높은 순서대로 기설정된 개수의 음원을 선택할 수 있다. 순위 결정부(23)는 선택된 음원을 순위가 높은 순으로 정렬하는 재생 목록을 사용자 단말(200)에 제공할 수 있다. In step S173, the ranking determining unit 23 determines the ranking of the selected sound source in step S172 based on the first similarity degree and the second similarity degree, and finally selects a sound source to be recommended to the user. For example, the ranking unit 23 determines the ranking of sound sources based on a sum, a multiply, or a combination of the first and second similarities. For example, the ranking unit 23 determines the ranking of the sound sources as the sum of the first similarity degree and the second similarity degree, multiply, or a combination thereof is high. The ranking unit 23 can select a sound source whose sum or product of the first similarity and the second similarity, or a combination thereof, is equal to or greater than a predetermined reference. According to another example, the ranking unit 23 may select a predetermined number of sound sources in descending order of the sound sources determined according to the sum, product, or combination of the first and second similarities. The ranking unit 23 may provide the user terminal 200 with a playlist that sorts the selected sound sources in descending order of rank.

이와 같이 룩업테이블을 이용한 하드맵핑 방법을 이용하지 않고 이미지 특징 벡터와 텍스트 벡터 간의 유사도에 따라 텍스트 벡터를 선택하는 벡터 맵핑 방법을 이용하게 되면, 텍스트 선택 기준을 조절함으로써 다양한 설계 변경이 가능하고 유의미한 수치의 산출이 가능해진다. 예를 들어, 전술한 유사도의 기설정된 기준 값을 변경하거나 전술한 음원의 기설정된 개수를 변경함으로써, 텍스트 선택과 관련한 다양한 설계 변경이 가능하다.If a vector mapping method is used to select a text vector according to the degree of similarity between an image feature vector and a text vector without using a hard mapping method using a lookup table, various design changes can be made by adjusting the text selection criterion, Can be calculated. For example, various design changes related to text selection are possible by changing the predetermined reference value of the above-described similarity or by changing the predetermined number of sound sources described above.

한편 본 발명의 일 실시예에 따르면, 단계 S171에서 제1 유사도 산출부(21)는 제1 유사도에 따라 선택된 하나 이상의 텍스트에 포함된 각 단어들에 대하여 스코어를 부여할 수 있다. 단계 S173에서 순위 결정부(23)는 단계 S171에서 선택된 텍스트에 포함된 각 단어들에 대하여 부여된 스코어를 더 고려하여 음원의 순위를 결정할 수 있다.Meanwhile, according to an embodiment of the present invention, in step S171, the first similarity calculation unit 21 may assign a score to each word included in one or more texts selected according to the first degree of similarity. In step S173, the ranking unit 23 may determine the ranking of the sound sources by further considering the scores given to the respective words included in the text selected in step S171.

예를 들어, 단계 S171에서 제1 유사도 산출부(21)는 제1 유사도에 따라 선택된 하나 이상의 텍스트에서의 각 단어의 빈도에 따라 각 단어의 스코어를 부여할 수 있다. 예를 들어, 제1 유사도 산출부(21)는 제1 유사도에 따라 선택된 하나 이상의 텍스트에서의 각 단어의 빈도에 비례하는 스코어를 각 단어에 부여할 수 있다. 예를 들어, 제1 유사도 산출부(21)는 제1 유사도에 따라 선택된 하나 이상의 텍스트에서의 각 단어의 빈도가 높을수록 단어의 스코어를 높게 부여할 수 있다. For example, in step S171, the first similarity calculating section 21 may assign a score of each word according to the frequency of each word in one or more texts selected according to the first similarity. For example, the first similarity degree calculating section 21 may assign a score to each word that is proportional to the frequency of each word in one or more texts selected according to the first degree of similarity. For example, the first degree-of-similarity calculation unit 21 may increase the score of a word as the frequency of each word in one or more texts selected according to the first degree of similarity is high.

제1 유사도 산출부(21)가 제1 유사도에 따라 총 5개의 텍스트를 선택하였고 각 텍스트는 복수의 단어를 포함하는 경우, 제1 유사도 산출부(21)는 5개의 텍스트에 포함된 단어들에 대하여 스코어를 부여할 수 있다. 예를 들어, 5개의 텍스트에서 "바다"라는 단어가 5번 등장한 경우, 제1 유사도 산출부(21)는 "바다"라는 단어가 해당 이미지를 표현할 수 있는 중요 단어인 것으로 인지하여 "바다"라는 단어에 대하여 5점의 스코어를 부여할 수 있다.When the first degree of similarity calculating unit 21 selects a total of five texts according to the first degree of similarity and each text includes a plurality of words, the first degree of similarity calculating unit 21 calculates the degree of similarity The score can be given. For example, when the word "sea" appears five times in five texts, the first similarity calculating section 21 recognizes that the word "sea" A score of 5 points can be given to the word.

다만, 각 단어의 스코어를 산출하는 방법은 상기 예시에 한정하지 않는다. 예를 들어, 각 단어의 빈도는 제1 유사도에 따라 선택된 하나 이상의 텍스트에서의 빈도가 아니라, 제1 맵 상에 포함된 모든 텍스트 벡터에 대응되는 텍스트에서의 빈도일 수 있다. 이에 따르면 제1 맵은, 제1 맵에 포함되는 각 텍스트 벡터에 포함되는 각 단어에 대한 스코어 값을 미리 저장하고 있을 수 있다. 각 단어에 대한 스코어는, 제1 맵의 학습 과정에서 트레이닝 데이터를 이용한 학습이 이루어질 때마다 갱신되어 저장될 수 있다. 각 단어의 스코어를 산출하는 방법은 반드시 빈도를 고려하는 것에 한정되는 것은 아니다. However, the method of calculating the score of each word is not limited to the above example. For example, the frequency of each word may be a frequency in text corresponding to all the text vectors included on the first map, not the frequency in one or more texts selected according to the first degree of similarity. According to this, the first map may previously store a score value for each word included in each text vector included in the first map. The score for each word can be updated and stored every time training is performed using the training data in the learning process of the first map. The method of calculating the score of each word is not necessarily limited to the consideration of the frequency.

각 단어의 빈도는 제1 맵 상에 포함된 모든 텍스트 벡터에 대응되는 텍스트에서의 빈도에 반비례하도록 저장될 수 있다. 예를 들어, "이것은"이라는 단어는 특별한 의미를 갖는 단어는 아니지만 많은 문장에 포함되어 출현 빈도가 높을 수 있으며, 이와 같이 모든 텍스트 내에서의 빈도가 높은 단어일수록 낮은 스코어를 부여할 수 있다.The frequency of each word may be stored in inverse proportion to the frequency in the text corresponding to all the text vectors included on the first map. For example, the word "this" is not a word having a special meaning, but it may be included in many sentences and the occurrence frequency may be high. Thus, a word having a high frequency in all texts can be given a low score.

일 예에 따르면, 제1 맵 상에 포함된 모든 텍스트 벡터에 대응되는 텍스트에서의 각 단어의 빈도에 반비례하는 각 단어의 기본 스코어가 제1 맵과 함께 저장될 수 있다. 그리고 제1 유사도 산출부(21)는 제1 유사도에 따라 선택된 하나 이상의 텍스트 내에서의 각 단어의 빈도에 비례하도록 각 단어의 추가 스코어를 부여하고, 각 단어의 기본 스코어와 추가 스코어를 고려하여 최종 스코어를 계산할 수 있다. 이에 따르면, 제1 맵 상에 포함된 모든 텍스트 벡터에 대응되는 텍스트에서의 빈도는 낮지만 제1 유사도에 따라 선택된 하나 이상의 텍스트 내에서의 빈도가 높은 단어의 경우, 높은 최종 스코어가 부여될 수 있다.According to one example, a basic score of each word in inverse proportion to the frequency of each word in the text corresponding to all text vectors included on the first map may be stored along with the first map. The first degree of similarity calculation unit 21 gives additional scores of each word so as to be proportional to the frequency of each word in one or more texts selected in accordance with the first degree of similarity, The score can be calculated. According to this, in the case of a word having a low frequency in text corresponding to all the text vectors included in the first map but a high frequency in one or more texts selected according to the first degree of similarity, a high final score can be given .

단계 S172에서 제2 유사도 산출부(22)는 단계 S171에서 제1 유사도에 의해 선택된 텍스트에 포함된 단어 각각에 대하여, 스토리지(113)에 저장된 제2 맵 상의 음원 벡터들 중 적어도 일부 음원과의 제2 유사도를 산출한다. 본 예에서 제2 맵 상에 저장된 텍스트 벡터는 하나의 단어를 포함하며, 따라서 단계 S171에서 제1 유사도에 의해 선택된 텍스트에 포함된 단어 각각에 대응되는 제2 맵 상의 벡터가 결정될 수 있다. 제2 유사도 산출부(22)는 제2 유사도에 따라 하나 이상의 음원을 선택할 수 있다.In step S172, for each of the words included in the text selected by the first degree of similarity in step S171, the second similarity calculating unit 22 compares the sound source vectors of at least some of the sound source vectors on the second map stored in the storage 113 2 Calculate the similarity. In this example, the text vector stored on the second map includes one word, so that the vector on the second map corresponding to each of the words included in the text selected by the first degree of similarity in step S171 can be determined. The second similarity degree calculating section 22 can select one or more sound sources according to the second degree of similarity.

단계 S173에서 순위 결정부(23)는 단계 S171에서 선택된 하나 이상의 텍스트 각각에 대한 제1 유사도, 하나 이상의 텍스트에 포함된 각 단어에 대해 부여된 스코어, 각 단어와 단계 S172에서 선택된 하나 이상의 음원과의 제2 유사도를 모두 고려하여, 단계 S172에서 선택된 하나 이상의 음원 각각에 대한 순위를 부여할 수 있고, 부여된 순위 순으로 하나 이상의 음원을 선택할 수 있다.In step S173, the ranking unit 23 compares the first similarity degree of each of the one or more texts selected in step S171, the score given to each word included in the one or more texts, the degree of each word and the one or more sound sources selected in step S172 In consideration of all of the second similarities, it is possible to assign a ranking to each of the one or more sound sources selected in step S172, and to select one or more sound sources in order of the assigned ranking.

도 18은 도 2 및 도 4의 음원 선택부(112)의 본 발명의 제2 실시예에 따른 구성을 개략적으로 도시한 블록도이다. 도 19는 도 18에 도시된 음원 선택부(112)에 의해 처리되는 본 발명의 일 실시예에 따른 음원 선택 방법을 도시한 흐름도이다. 이하에서는 도 18 및 도 19를 함께 참조하여 본 발명의 일 실시예를 설명한다. 한편 본 발명의 제2 실시예에 따르면 맵 구축부(114)에 의해 생성된 제3 맵이 스토리지(113)에 저장된다.FIG. 18 is a block diagram schematically illustrating the configuration of the sound source selection unit 112 of FIGS. 2 and 4 according to the second embodiment of the present invention. FIG. 19 is a flowchart illustrating a sound source selection method according to an embodiment of the present invention, which is processed by the sound source selection unit 112 shown in FIG. Hereinafter, one embodiment of the present invention will be described with reference to FIG. 18 and FIG. According to the second embodiment of the present invention, the third map generated by the map construction unit 114 is stored in the storage 113.

도 18을 참조하면, 음원 선택부(112)는 제3 유사도 산출부(24) 및 순위 결정부(23')를 포함한다. Referring to FIG. 18, the sound source selecting unit 112 includes a third similarity calculating unit 24 and a ranking determining unit 23 '.

단계 S191에서 제3 유사도 산출부(24)는 이미지 특징값 추출부(111)에 의해 추출된 이미지 특징 벡터와 스토리지(113)에 저장된 제3 맵 상의 복수의 음원 벡터 중 적어도 일부 음원 벡터 간의 제3 유사도를 산출한다. In step S191, the third similarity degree calculating section 24 calculates the third similarity degree between the image feature vector extracted by the image feature value extracting section 111 and at least some of the plurality of sound source vectors on the third map stored in the storage 113 And calculates the degree of similarity.

단계 S192에서 순위 결정부(23')는 제3 유사도에 기초하여 음원의 순위를 결정하고 사용자에게 추천할 음원을 선택한다. 예를 들어, 순위 결정부(23')는 제3 유사도가 높을수록 음원의 순위를 높게 결정한다. 순위 결정부(23')는 제3 유사도가 기설정된 기준 이상인 음원을 선택할 수 있다. 다른 예에 따르면 순위 결정부(23')는 음원의 순위가 높은 순서대로 기설정된 개수의 음원을 선택할 수 있다. 순위 결정부(23')는 선택된 음원을 순위가 높은 순으로 정렬하는 재생 목록을 사용자 단말(200)에 제공할 수 있다.In step S192, the ranking unit 23 'determines the ranking of the sound sources based on the third degree of similarity and selects a sound source to be recommended to the user. For example, the ranking unit 23 'determines the ranking of the sound sources to be higher as the third degree of similarity is higher. The ranking unit 23 'can select a sound source whose third degree of similarity is equal to or higher than a predetermined reference. According to another example, the ranking unit 23 'may select a predetermined number of sound sources in descending order of the sound sources. The ranking unit 23 'may provide the user terminal 200 with a playlist in which the selected sound sources are sorted in descending order of rank.

도 20은 제1 맵의 예를 도시한 것이다. 도 21은 제2 맵의 예를 도시한 것이다. 이하에서는 도 20 및 도 21을 참조하여 본 발명의 제1 실시예를 설명한다.20 shows an example of the first map. Fig. 21 shows an example of the second map. Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. 20 and 21. FIG.

도 20을 참조하면, 제1 맵(M1)에는 복수의 텍스트 벡터(T1, T2, T3)가 맵핑되어 있다. 도 20을 참조하면, 이미지 특징값 추출부(111)에 의해 추출된 이미지 특징 벡터(i1)가 제1 맵(M1) 상에 맵핑되었다. 도 16의 제1 유사도 산출부(21)는 이미지 특징 벡터(i1)와 각 텍스트 벡터(T1, T2, T3) 간의 거리에 기초하여 이미지 특징 벡터(i1)와 각 텍스트 벡터(T1, T2, T3) 간의 제1 유사도(r1, r2, r3)를 산출한다.Referring to FIG. 20, a plurality of text vectors T1, T2, and T3 are mapped to the first map M1. Referring to FIG. 20, the image feature vector i1 extracted by the image feature value extracting unit 111 is mapped on the first map M1. The first degree of similarity calculation unit 21 of FIG. 16 calculates the degree of similarity between the image feature vector i1 and each of the text vectors T1, T2, and T3 based on the distance between the image feature vector i1 and each text vector T1, T2, (R1, r2, r3).

도 21을 참조하면, 제2 맵(M2)에는 복수의 단어 벡터(W1, W2, W3)와 복수의 음원 벡터(S1, S2, S3, S4)가 맵핑되었다. 복수의 단어 벡터(W1, W2, W3)는 도 20에 도시된 복수의 텍스트 벡터(T1, T2, T3)에 대응되는 텍스트에 포함된 각 단어에 대응되는 벡터일 수 있다. 예를 들어, 단어 벡터(W1)는 텍스트 벡터(T1)에 대응되는 텍스트에 포함되는 단어에 대응되는 벡터일 수 있다. 단어 벡터(W2)는 텍스트 벡터(T2)에 대응되는 텍스트에 포함되는 단어에 대응되는 벡터일 수 있다. 단어 벡터(W3)는 텍스트 벡터(T3)에 대응되는 텍스트에 포함되는 단어에 대응되는 벡터일 수 있다. 도 16의 제2 유사도 산출부(22)는 단어 벡터(W1)와 음원 벡터(S1, S2) 간의 거리에 기초하여 단어 벡터(W1)와 음원 벡터(S1, S2) 간의 제2 유사도(r4, r5)를 산출하고, 단어 벡터(W2)와 음원 벡터(S3) 간의 거리에 기초하여 단어 벡터(W2)와 음원 벡터(S3) 간의 제2 유사도(r6)를 산출하고, 단어 벡터(W3)와 음원 벡터(S4) 간의 거리에 기초하여 단어 벡터(W3)와 음원 벡터(S4) 간의 제2 유사도(r7)를 산출한다.Referring to FIG. 21, a plurality of word vectors W1, W2 and W3 and a plurality of sound source vectors S1, S2, S3 and S4 are mapped in the second map M2. The plurality of word vectors W1, W2 and W3 may be vectors corresponding to the respective words included in the text corresponding to the plurality of text vectors T1, T2 and T3 shown in Fig. For example, the word vector W1 may be a vector corresponding to a word included in the text corresponding to the text vector T1. The word vector W2 may be a vector corresponding to a word included in the text corresponding to the text vector T2. The word vector W3 may be a vector corresponding to a word included in the text corresponding to the text vector T3. The second degree of similarity calculation unit 22 of Fig. 16 calculates the second degree of similarity r4, r2 between the word vector W1 and the sound source vector S1, S2 based on the distance between the word vector W1 and the sound source vector S1, and calculates a second similarity degree r6 between the word vector W2 and the sound source vector S3 on the basis of the distance between the word vector W2 and the sound source vector S3, A second similarity degree r7 between the word vector W3 and the sound source vector S4 is calculated based on the distance between the sound source vectors S4.

도 16의 순위 결정부(23)는 제1 유사도와 제2 유사도를 곱하여 이미지 특징 벡터와 각 음원 벡터 간의 유사도를 산출한다. 예를 들면, 이미지 특징 벡터(i1)와 음원 벡터(S1)간의 유사도는 제1 유사도(r1)와 제2 유사도(r4)의 곱이다. 이미지 특징 벡터(i1)와 음원 벡터(S2) 간의 유사도는 제1 유사도(r1)와 제2 유사도(r5)의 곱이다. 이미지 특징 벡터(i1)와 음원 벡터(S3) 간의 유사도는 제1 유사도(r2)와 제2 유사도(r6)의 곱이다. 이미지 특징 벡터(i1)와 음원벡터(S4) 간의 유사도는 제1 유사도(r3)와 제2 유사도(r7)의 곱이다. 도 16의 순위 결정부(23)는 위와 같이 산출되는 제1 유사도와 제2 유사도의 곱이 높은 순으로 음원의 순위를 결정하고, 제1 유사도와 제2 유사도의 곱이 기준 이상인 음원 또는 순위가 높은 기설정된 개수의 음원을 선택한다.The ranking unit 23 of FIG. 16 calculates the degree of similarity between the image feature vector and each sound source vector by multiplying the first similarity degree by the second similarity degree. For example, the similarity degree between the image feature vector i1 and the sound source vector S1 is a product of the first similarity degree r1 and the second similarity degree r4. The similarity degree between the image feature vector i1 and the sound source vector S2 is a product of the first similarity degree r1 and the second similarity degree r5. The similarity degree between the image feature vector i1 and the sound source vector S3 is a product of the first similarity degree r2 and the second similarity degree r6. The similarity degree between the image feature vector i1 and the sound source vector S4 is a product of the first similarity degree r3 and the second similarity degree r7. The ranking unit 23 of FIG. 16 determines the order of the sound sources in the order of the product of the first degree of similarity and the second degree of similarity calculated as above. When the product of the first degree of similarity and the second degree of similarity is equal to or higher than the reference, Select a set number of sound sources.

도 20 및 도 21에 도시된 것과 같이, 본발명의 제1 실시예는 제1 맵(M1)과 제2 맵(M2)을 거쳐 사용자가 선택한 이미지에 대응하여 사용자에게 추천할 음원을 선택한다.20 and 21, the first embodiment of the present invention selects a sound source to be recommended to the user corresponding to the image selected by the user through the first map M1 and the second map M2.

한편, 도 20 및 도 21의 예에서는 순위 결정부(23)가 제1 유사도 및 제2 유사도를 고려하는 예를 설명하였으나, 전술한 본 발명의 실시예들에서 설명한 것과 같이 순위 결정부(23)는 각 단어에 부여되는 스코어를 더 고려할 수 있다.20 and 21, the ranking determining unit 23 considers the first degree of similarity and the second degree of similarity. However, as described in the embodiments of the present invention, Can further consider the score given to each word.

도 22는 제3 맵의 예를 도시한 것이다. 이하에서는 도 22를 참조하여 본 발명의 제2 실시예를 설명한다.Fig. 22 shows an example of the third map. Hereinafter, a second embodiment of the present invention will be described with reference to FIG.

도 22를 참조하면, 제3 맵(M3)에는 복수의 음원 벡터(S1, S2, S3, S4)가 맵핑되었다. 도 22를 참조하면, 이미지 특징값 추출부(111)에 의해 추출된 이미지 특징 벡터(i1)가 제3 맵(M3) 상에 맵핑되었다. 도 18의 제3 유사도 산출부(24)는 이미지 특징 벡터(i1)와 각 음원 벡터(S1, S2, S3, S4) 간의 거리에 기초하여 이미지 특징 벡터(i1)와 각 음원 벡터(S1, S2, S3, S4) 간의 제3 유사도(r8, r9, r10, r11)를 산출한다.Referring to FIG. 22, a plurality of sound source vectors S1, S2, S3, and S4 are mapped to the third map M3. Referring to FIG. 22, the image feature vector i1 extracted by the image feature value extracting unit 111 is mapped on the third map M3. The third degree of similarity calculation unit 24 of Fig. 18 calculates the degree of similarity between the image feature vector i1 and each of the sound source vectors S1, S2, S3, S4 based on the distance between the image feature vector i1 and each sound source vector S1, , S3, and S4) is calculated based on the first similarity degree (r8, r9, r10, r11).

도 18의 순위 결정부(23')는 제3 유사도가 높은 순으로 음원의 순위를 결정하고, 제3 유사도가 기준 이상인 음원 또는 순위가 높은 기설정된 개수의 음원을 선택한다.The ranking unit 23 'of FIG. 18 determines the ranking of the sound sources in the order of the third degree of similarity, and selects a sound source having the third similarity degree or more or a predetermined number of sound sources having a high ranking.

도 22에 도시된 것과 같이, 본발명의 제2 실시예는 제1 맵과 제2 맵을 병합한 제3 맵(M3)을 이용하여, 사용자가 선택한 이미지에 대응하여 사용자에게 추천할 음원을 선택한다.22, the second embodiment of the present invention uses a third map M3 obtained by combining the first map and the second map to select a sound source to be recommended to the user corresponding to the image selected by the user do.

전술한 본 발명의 실시예들은 이미지를 기반으로 음원을 선택하는 방법에 대하여 설명되었으나, 전술한 모든 실시예들은 반대로 음원을 기반으로 이미지를 선택하는 예에 동일하게 적용될 수 있다. 상세히, 전술한 본 발명의 제1 실시예는, 제2 맵에서 음원을 기반으로 텍스트를 선택하고, 제1 맵에서 텍스트를 기반으로 이미지를 선택하는 방법에 동일하게 적용될 수 있다. 전술한 본 발명의 제2 실시예는, 제3 맵에서 음원을 기반으로 이미지를 선택하는 방법에 동일하게 적용될 수 있다. 다만 이와 같이 본 발명의 실시예들이 반대로 적용되는 경우에는, 스토리지(113)에 저장된 맵 상의 각 이미지 특징 벡터에 대응되는 실제 이미지, 또는 실제 이미지의 식별 정보가 스토리지(113)에 함께 저장되어야 할 것이다. 이에 따르면 서버(100)는 입력 음원에 대응되는 하나 이상의 이미지를 선택하여, 선택된 이미지 또는 선택된 이미지의 식별 정보를 사용자 단말(200)에 제공할 수 있다. Although the embodiments of the present invention have been described with respect to a method for selecting a sound source based on an image, all of the embodiments described above can be similarly applied to an example of selecting an image based on a sound source. Specifically, the first embodiment of the present invention described above can be equally applied to a method of selecting text based on a sound source in a second map and selecting an image based on text in the first map. The above-described second embodiment of the present invention can be similarly applied to a method of selecting an image based on a sound source in the third map. However, when the embodiments of the present invention are applied in the opposite manner, the actual image corresponding to each image feature vector on the map stored in the storage 113, or the identification information of the actual image, must be stored together in the storage 113 . According to this, the server 100 may select one or more images corresponding to the input sound source, and provide identification information of the selected image or the selected image to the user terminal 200.

본 발명의 일 실시예에 따라 각 도면에 도시된 블록은 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치에 해당할 수 있다. 예를 들어, 블록은 적어도 하나 이상의 프로세서(processor)에 해당하거나, 적어도 하나 이상의 프로세서를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 이에 따라, 블록은 마이크로 프로세서나 범용 컴퓨터 시스템과 같은 다른 하드웨어 장치에 포함된 형태로 구동될 수 있다. In accordance with an embodiment of the present invention, the blocks shown in the figures may correspond to all kinds of devices capable of processing data, such as a processor. For example, the block may correspond to at least one processor, or may include at least one processor. Herein, the term " processor " may refer to a data processing apparatus embedded in hardware, for example, having a circuit physically structured to perform a function represented by a code or an instruction contained in the program. As an example of the data processing apparatus built in hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC) circuit, and a field programmable gate array (FPGA), but the scope of the present invention is not limited thereto. Accordingly, the block may be implemented in a form embedded in other hardware devices, such as a microprocessor or a general purpose computer system.

각각의 블록은 하나의 프로세서 상에서 모두 구현되며 기능에 따라 구분되는 것일 수 있으나 이에 한정하지 않으며, 각각 별개의 프로세서 상에 구현될 수도 있음은 물론이다. 또한 각각의 블록의 기능은 하나의 프로그램 코드를 통해 통합적으로 구현될 수도 있지만, 각 블록의 기능이 별개의 프로그램 코드로 작성되고, 각 프로그램 코드가 연계되어 음원 추천 장치(110)가 제공하는 음원 추천 서비스를 제공하는 방식으로 구현될 수도 있다.Each of the blocks may be implemented on a single processor and may be divided according to functions, but it is not limited thereto and may be implemented on a separate processor. The function of each block may be integrally implemented through a single program code. However, the function of each block may be written in separate program code, and each program code may be linked to provide a sound source recommendation Or may be implemented in a manner that provides services.

이상 설명된 본 발명에 따른 실시예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The embodiments of the present invention described above can be embodied in the form of a computer program that can be executed on various components on a computer, and the computer program can be recorded on a computer-readable medium. At this time, the medium may be a program that continuously stores a computer executable program, or temporarily stores the program for execution or downloading. In addition, the medium may be a variety of recording means or storage means in the form of a combination of a single hardware or a plurality of hardware, but is not limited to a medium directly connected to a computer system, but may be dispersed on a network. Examples of the medium include a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floptical disk, And program instructions including ROM, RAM, flash memory, and the like. As another example of the medium, a recording medium or a storage medium managed by a site or a server that supplies or distributes an application store or various other software to distribute the application may be mentioned.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있으며, 균등한 다른 실시 예가 가능함을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. It will be appreciated that other equivalent embodiments are possible. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

100: 서버
200: 사용자 단말
300: 네트워크
110: 음원 추천 장치
111: 이미지 특징값 추출부
112: 음원 선택부
113: 스토리지
114: 맵 구축부100: Server
200: user terminal
300: Network
110: Sound source recommendation device
111: image feature value extracting unit
112: sound source selection unit
113: Storage
114: map building section

Claims

Receiving an image from a user terminal and extracting a feature vector of the image; And
A similarity degree calculating unit for calculating a similarity degree between at least a part of the image feature vector and the excitation vector by referring to a map in which a sound source vector for each of the plurality of sound sources is mapped, Selecting step
Image based sound source selection method.

The method according to claim 1,
The map is a second map in which a text vector and a sound source vector are mapped,
Wherein the sound source selecting step comprises:
A first selecting step of selecting one or more texts based on a first similarity degree between a predetermined text vector and a feature vector of the image for each of a plurality of texts; And
And a second selection step of selecting a sound source corresponding to the selected text with reference to the second map
Image based sound source selection method.

3. The method of claim 2,
The second selecting step selects the sound source based on a second similarity degree between the text vector corresponding to the selected text and at least a part of the sound source vector
Image based sound source selection method.

The method of claim 3,
Wherein the first selection step selects one or more text based on the first similarity,
Wherein the second selecting step selects one or more sound sources based on a second similarity degree between the selected text and at least a part of the sound source vectors,
Wherein the sound source selecting step comprises:
The sound source is selected according to the ranking of the sound sources determined based on the first similarity degree and the second similarity degree
Image based sound source selection method.

3. The method of claim 2,
Wherein the map includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped, and the second map,
Wherein the text corresponding to each of the text vectors mapped to the first map includes a sequence listing a plurality of words and the text corresponding to each of the text vectors mapped to the second map includes one word,
The first selection step selects the one or more texts based on a first similarity degree between the feature vector of the image and at least a part of the plurality of text vectors on a first map on which a plurality of image feature vectors and a plurality of text vectors are mapped and,
Wherein the second selection step selects the sound source based on a second similarity degree between at least a part of the sound vector and a text vector on the second map corresponding to each word included in the one or more texts selected in the first selection step doing,
Image based sound source selection method.

6. The method of claim 5,
The sound source selection step selects the sound source in consideration of the score corresponding to each of the words included in one or more texts selected in the first similarity degree, the second similarity degree, and the first selection step
Image based sound source selection method.

The method according to claim 6,
Wherein the score includes a basic score imparted according to the frequency of each of the words based on a plurality of texts corresponding to the plurality of text vectors mapped to the first map, Calculated on the basis of at least one of the additional scores calculated according to the frequency of
Image based sound source selection method.

The method according to claim 1,
A sound vector corresponding to each of the plurality of sound sources and a text vector corresponding to the sound source are generated by referring to words included in at least one of the hashtag, the lyrics, and the text input by the user for each of the plurality of sound sources And generating a second map including the second map,
The sound source selecting step selects a sound source by referring to the first map and the second map to which a plurality of image feature vectors and a plurality of text vectors are mapped
Image based sound source selection method.

The method according to claim 1,
The map is a third map to which a plurality of sound source vectors are mapped,
Wherein the sound source selecting step comprises:
And selecting a sound source corresponding to the extracted image feature vector based on the extracted image feature vector and at least a part of the plurality of sound source vectors on the third map
Image based sound source selection method.

The method according to claim 1,
Image-text data including a plurality of images and a designated text for each of the plurality of images, and a word included in at least one of a hashtag, a lyrics, and a text input by a user designated for each of the plurality of sound sources And generating a third map including a feature vector for each of the plurality of images and a sound source vector for each of the plurality of sound sources,
The sound source selecting step selects a sound source with reference to the generated third map
Image based sound source selection method.

The method according to claim 1,
And providing a playlist that lists the selected sound sources based on the similarity
Image based sound source selection method.

An image feature value extracting unit that receives an image from a user terminal and extracts a feature vector of the image; And
A similarity degree calculating unit for calculating a similarity degree between at least a part of the image feature vector and the excitation vector by referring to a map in which a sound source vector for each of the plurality of sound sources is mapped, Comprising:
Image based sound source selection device.

13. The method of claim 12,
The map is a second map in which a text vector and a sound source vector are mapped,
Wherein the sound source selection unit comprises:
A first similarity calculation unit for selecting one or more texts based on a first similarity degree between a predetermined text vector and a feature vector of the image for each of a plurality of texts; And
And a second similarity calculation unit for selecting a sound source corresponding to the selected text with reference to the second map
Image based sound source selection device.

14. The method of claim 13,
The second similarity degree calculating unit selects the sound source based on a second similarity degree between at least a part of the word vector and the word text vector corresponding to the selected word text
Image based sound source selection device.

14. The method of claim 13,
Wherein the map includes a first map in which a plurality of image feature vectors and a plurality of text vectors are mapped, and the second map,
Wherein the text corresponding to each of the text vectors mapped to the first map includes a sequence listing a plurality of words and the text corresponding to each of the text vectors mapped to the second map includes one word,
The first similarity degree calculating section selects the one or more texts based on a first similarity degree between a feature vector of the image and at least a part of the plurality of text vectors on a first map on which a plurality of image feature vectors and a plurality of text vectors are mapped and,
The second similarity calculating unit may select the sound source based on a second similarity degree between at least a part of the sound vector and a text vector on the second map corresponding to each of the words included in the one or more texts selected in the first selecting step doing,
Image based sound source selection device.

13. The method of claim 12,
A sound vector corresponding to each of the plurality of sound sources and a text vector corresponding to the sound source are generated by referring to words included in at least one of the hashtag, the lyrics, and the text input by the user for each of the plurality of sound sources And a map construction unit for generating a second map including the map,
The sound source selection unit selects a sound source by referring to a first map in which a plurality of image feature vectors and a plurality of word vectors are mapped and the second map
Image based sound source selection device.

13. The method of claim 12,
The map is a third map to which a plurality of sound source vectors are mapped,
Wherein the sound source selection unit comprises:
And a third similarity calculation unit for selecting a sound source corresponding to the extracted image feature vector based on the extracted image feature vector and the similarity on at least a part of the plurality of sound source vectors on the third map
Image based sound source selection device.

13. The method of claim 12,
Image-text data including a plurality of images and a designated text for each of the plurality of images, and a word included in at least one of a hashtag, a lyrics, and a text input by a user designated for each of the plurality of sound sources And a third map building unit for generating a third map including a feature vector for each of the plurality of images and a sound source vector for each of the plurality of sound sources,
The sound source selection unit selects a sound source by referring to the generated third map
Image based sound source selection device.

13. The method of claim 12,
Wherein the sound source selection unit provides a play list that lists the selected sound sources based on the similarity
Image based sound source selection device.

12. A computer program stored on a medium for carrying out the method of any one of claims 1 to 11 using a computer.