KR20190009821A

KR20190009821A - Method and system for generating playlist using sound source content and meta information

Info

Publication number: KR20190009821A
Application number: KR1020190007510A
Authority: KR
Inventors: 하정우; 김정명; 김정희
Original assignee: 네이버 주식회사
Priority date: 2019-01-21
Filing date: 2019-01-21
Publication date: 2019-01-29
Also published as: KR102031282B1

Abstract

A method for automatically generating playlists using sound source content and meta information and a system thereof are disclosed. The method of generating playlists includes: a step of generating acoustic features unique to a sound source from the sound source data of a sound source content for each sound source content; and a step of generating a playlist based on similarity in the sound source content by calculating the similarity in the sound source content using the acoustic features.

Description

[0001] METHOD AND SYSTEM FOR GENERATING PLAYLIST USING SOUND SOURCE CONTENT AND META INFORMATION [

아래의 설명은 음원에 대한 플레이리스트를 자동으로 생성하는 기술에 관한 것이다.The following description relates to a technique for automatically generating a playlist for a sound source.

최근 사람들은 다양한 형태의 미디어 상의 음악 조각들(music pieces), 비디오 또는 오디오 트랙들과 같은 미디어 컨텐츠를 일상적으로 이용하고 있다. 플레이리스트는 선호들, 테마들, 장르들, 아티스트들 또는 행사들(occasions)에 기초하여, 음악 또는 다른 컨텐츠를 구성하는 능력을 사용자들에게 제공한다.Recently, people have routinely used media content such as music pieces, video or audio tracks on various types of media. A playlist provides users with the ability to organize music or other content based on preferences, themes, genres, artists or occasions.

플레이리스트 생성 기술의 일례로, 한국 등록특허 제10-0810276호(등록일 2008년 02월 27일)에는 음원 데이터 재생 장치에 저장된 다수의 음원 데이터들에 대한 사용자 플레이리스트를 작성하는 기술이 개시되어 있다.Korean Patent No. 10-0810276 (Feb. 27, 2008) discloses a technique for creating a user play list for a plurality of sound source data stored in a sound source data reproducing apparatus as an example of a play list generating technique .

기존에는 재생 이력이나 미리 듣기 이력 등이 있는 음원으로 플레이리스트를 생성하거나, 협업적 여과 특징을 이용한 유사도 측정 방식으로 플레이리스트를 생성하게 된다. 이러한 기존 플레이리스트 생성 방법은 음원 자체의 특징을 인자(factor)로 사용하지 않기 때문에 자동 재생 중에 시작 곡(seed 곡)과 특징이 다른 곡이 나올 확률이 매우 높다. 또한, 음원의 메타 데이터를 강한 필터 조건으로 사용하여 플레이리스트를 생성할 경우에는 플레이리스트의 다양성이 저하되는 문제가 있다.In the past, a playlist is generated from a sound source having a playback history or a preview history, or a playlist is generated by a similarity measurement method using a collaborative filtering feature. Since the existing playlist generation method does not use the characteristics of the sound source itself as a factor, there is a high probability that a song having a characteristic different from that of the start song (automatic play) during automatic playback is obtained. In addition, when the playlist is generated by using the metadata of the sound source as a strong filter condition, there is a problem that the diversity of the playlist is deteriorated.

음원 및 음원의 메타 정보를 학습한 모델을 이용하여 사용자가 선호하는 곡이나 재생 이력이 있는 곡과 유사한 곡들을 선별하여 플레이리스트를 자동으로 생성할 수 있는 방법 및 시스템을 제공한다.There is provided a method and system for automatically generating a playlist by selecting a song that is similar to a song having a user's favorite song or a playback history using a model in which meta information of a sound source and a sound source is learned.

컴퓨터 시스템에서 수행되는 플레이리스트 생성 방법에 있어서, 음원 컨텐츠 각각에 대하여 상기 음원 컨텐츠의 음원 데이터로부터 해당 음원의 고유한 음향 특징(acoustic features)을 생성하는 단계; 및 상기 음향 특징을 이용하여 상기 음원 컨텐츠 간의 유사도를 산출하여 상기 음원 컨텐츠 간의 유사도를 바탕으로 플레이리스트(playlist)를 생성하는 단계를 포함하고, 상기 플레이리스트를 생성하는 단계는, 상기 음원 컨텐츠 간의 유사도를 바탕으로 현재 재생 곡과 유사한 다음 재생 곡을 선정하여 플레이리스트를 생성하되 다음 재생 곡의 선정 시 시드(seed) 곡과의 유사도를 체크함으로써 상기 시드 곡과 유사한 음원 컨텐츠로 체인 형태의 플레이리스트를 생성하는 것을 특징으로 하는 플레이리스트 생성 방법을 제공한다.A playlist generation method performed in a computer system, the method comprising: generating acoustic features unique to a sound source from sound source data of the sound source content for each sound source content; And generating a playlist based on the degree of similarity between the sound source contents by calculating a degree of similarity between the sound source contents using the acoustic feature, A play list similar to the current play song is selected to generate a play list. When a next play song is selected, the similarity with the seed song is checked, And generating a play list based on the generated play list.

상기 플레이리스트 생성 방법을 컴퓨터를 통해 실행시키기 위한 프로그램이 기록되어 있는 것을 특징으로 하는 컴퓨터에서 판독 가능한 기록 매체를 제공한다.There is provided a computer-readable recording medium having recorded thereon a program for causing the computer to execute the playlist generating method.

컴퓨터로 구현되는 플레이리스트 생성 시스템에 있어서, 음원 컨텐츠 각각에 대하여 상기 음원 컨텐츠의 음원 데이터로부터 해당 음원의 고유한 음향 특징을 생성하는 고유 특징 처리부; 및 상기 음향 특징을 이용하여 상기 음원 컨텐츠 간의 유사도를 산출하여 상기 음원 컨텐츠 간의 유사도를 바탕으로 플레이리스트를 생성하는 플레이리스트 생성부를 포함하고, 상기 플레이리스트 생성부는, 상기 음원 컨텐츠 간의 유사도를 바탕으로 현재 재생 곡과 유사한 다음 재생 곡을 선정하여 플레이리스트를 생성하되 다음 재생 곡의 선정 시 시드 곡과의 유사도를 체크함으로써 상기 시드 곡과 유사한 음원 컨텐츠로 체인 형태의 플레이리스트를 생성하는 것을 특징으로 하는 플레이리스트 생성 시스템을 제공한다.A computer-implemented playlist generating system, comprising: a unique feature processor for generating a unique acoustic feature of a corresponding sound source from sound source data of the sound source content for each sound source content; And a playlist generating unit for calculating a similarity between the sound source contents using the acoustic feature and generating a play list based on the similarity between the sound source contents, wherein the play list generating unit generates a play list based on the similarity between the sound source contents, And generating a play list in a chain form from sound source contents similar to the seed song by checking the similarity between the next play song and a seed song when a next play song is selected, Provides a list generation system.

본 발명의 실시예에 따르면, 딥 러닝(deep learning)을 이용하여 음원 자체의 고유 값(eigen-value)을 학습을 위한 인자(factor)로 사용함으로써 곡 간의 유사성을 더욱 정확히 산출할 수 있다.According to the embodiment of the present invention, by using deep learning, eigen-value of the sound source itself is used as a factor for learning, so that the similarity between songs can be more accurately calculated.

본 발명의 실시예에 따르면, 음원의 특징뿐 아니라 음원의 메타 정보를 부가적 정보로 사용하여 곡 간의 유사성을 산출할 수 있고, 또한 시드(seed) 곡과의 유사성을 지속적으로 체크하는 로직을 적용함으로써 플레이리스트의 다양성을 유지하면서 이와 동시에 시드 곡과 너무 다른 곡이 포함되는 것을 방지할 수 있다.According to the embodiment of the present invention, it is possible to calculate the similarity between songs by using the meta information of the sound source as additional information as well as the characteristics of the sound source, and to apply the logic to continuously check the similarity with the seed song So that it is possible to prevent the playlist from being included in a song that is too different from the seed song while maintaining the diversity of the play list.

도 1은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 컴퓨터 시스템이 수행할 수 있는 플레이리스트 생성 방법의 예를 도시한 순서도이다.
도 4는 본 발명의 일 실시예에 따른 플레이리스트 자동 생성 과정의 예를 도시한 도면이다.1 is a block diagram for explaining an example of the internal configuration of a computer system according to an embodiment of the present invention.
2 is a diagram illustrating an example of components that a processor of a computer system according to an embodiment of the present invention may include.
3 is a flowchart illustrating an example of a play list generation method that can be performed by a computer system according to an embodiment of the present invention.
4 is a diagram illustrating an example of a process of automatically generating a playlist according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 실시예들은 플레이리스트를 생성하기 위한 방법들과 시스템들에 관련된 것이며, 더욱 상세하게는 음원 및 음원의 메타 정보를 학습한 모델을 이용하여 플레이리스트를 자동으로 생성할 수 있는 플레이리스트 생성 기술에 관한 것이다.The present embodiments relate to methods and systems for generating playlists, and more particularly to a playlist generating technique capable of automatically generating playlists using a model in which meta information of sound sources and sound sources is learned .

본 명세서에서 구체적으로 개시되는 것들을 포함하는 실시예들은 음원 컨텐츠에 대한 플레이리스트의 생성을 달성하고 이를 통해 비용 절감, 효율성과 정확성 등의 측면에 있어서 상당한 장점들을 달성한다.Embodiments, including those specifically disclosed herein, achieve the creation of playlists for source content and thereby achieve significant advantages in terms of cost savings, efficiency and accuracy, and the like.

음원 컨텐츠에 대해 생성된 플레이리스트는 음원 컨텐츠의 추천, 검색, 분류, 관리 등을 위해 활용될 수 있다. 일례로, 사용자 맞춤형 음원 추천을 위해서는 사용자가 선호하는 곡과 유사한 곡들에 대한 선별 작업이 필요하며, 이러한 선별 작업으로서 음원 및 음원의 메타 정보를 이용하여 플레이리스트를 자동으로 생성할 수 있다.The play list generated for the sound source content can be utilized for recommending, searching, classifying, and managing the sound source content. For example, in order to recommend a user-customized sound source, a sorting operation for songs similar to the user's favorite music is required. As a sorting operation, a playlist can be automatically generated using meta information of a sound source and a sound source.

도 1은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다. 예를 들어, 본 발명의 실시예들에 따른 플레이리스트 생성 시스템이 도 1의 컴퓨터 시스템(100)을 통해 구현될 수 있다. 도 1에 도시한 바와 같이, 컴퓨터 시스템(100)은 플레이리스트 생성 방법을 실행하기 위한 구성요소로서 프로세서(110), 메모리(120), 영구 저장 장치(130), 버스(140), 입출력 인터페이스(150) 및 네트워크 인터페이스(160)를 포함할 수 있다.1 is a block diagram for explaining an example of the internal configuration of a computer system according to an embodiment of the present invention. For example, a playlist generation system in accordance with embodiments of the present invention may be implemented through the computer system 100 of FIG. 1, the computer system 100 includes a processor 110, a memory 120, a persistent storage 130, a bus 140, an input / output interface 150 and a network interface 160.

프로세서(110)는 명령어들의 임의의 시퀀스를 처리할 수 있는 임의의 장치를 포함하거나 그의 일부일 수 있다. 프로세서(110)는 예를 들어 컴퓨터 프로세서, 이동 장치 또는 다른 전자 장치 내의 프로세서 및/또는 디지털 프로세서를 포함할 수 있다. 프로세서(110)는 예를 들어, 서버 컴퓨팅 디바이스, 서버 컴퓨터, 일련의 서버 컴퓨터들, 서버 팜, 클라우드 컴퓨터, 컨텐츠 플랫폼, 이동 컴퓨팅 장치, 스마트폰, 태블릿, 셋톱 박스, 미디어 플레이어 등에 포함될 수 있다. 프로세서(110)는 버스(140)를 통해 메모리(120)에 접속될 수 있다.Processor 110 may include or be part of any device capable of processing any sequence of instructions. The processor 110 may comprise, for example, a processor and / or a digital processor within a computer processor, a mobile device, or other electronic device. The processor 110 may be, for example, a server computing device, a server computer, a series of server computers, a server farm, a cloud computer, a content platform, a mobile computing device, a smart phone, a tablet, a set top box, The processor 110 may be connected to the memory 120 via a bus 140.

메모리(120)는 컴퓨터 시스템(100)에 의해 사용되거나 그에 의해 출력되는 정보를 저장하기 위한 휘발성 메모리, 영구, 가상 또는 기타 메모리를 포함할 수 있다. 메모리(120)는 예를 들어 랜덤 액세스 메모리(RAM: random access memory) 및/또는 동적 RAM(DRAM: dynamic RAM)을 포함할 수 있다. 메모리(120)는 컴퓨터 시스템(100)의 상태 정보와 같은 임의의 정보를 저장하는 데 사용될 수 있다. 메모리(120)는 예를 들어 음원 컨텐츠에 대한 플레이리스트 생성을 위한 명령어들을 포함하는 컴퓨터 시스템(100)의 명령어들을 저장하는 데에도 사용될 수 있다. 컴퓨터 시스템(100)은 필요에 따라 또는 적절한 경우에 하나 이상의 프로세서(110)를 포함할 수 있다.The memory 120 may include volatile memory, permanent, virtual or other memory for storing information used by or output by the computer system 100. Memory 120 may include, for example, random access memory (RAM) and / or dynamic RAM (DRAM). The memory 120 may be used to store any information, such as the state information of the computer system 100. Memory 120 may also be used to store instructions of computer system 100 including, for example, instructions for generating playlists for tone generator content. Computer system 100 may include one or more processors 110 as needed or where appropriate.

버스(140)는 컴퓨터 시스템(100)의 다양한 컴포넌트들 사이의 상호작용을 가능하게 하는 통신 기반 구조를 포함할 수 있다. 버스(140)는 예를 들어 컴퓨터 시스템(100)의 컴포넌트들 사이에, 예를 들어 프로세서(110)와 메모리(120) 사이에 데이터를 운반할 수 있다. 버스(140)는 컴퓨터 시스템(100)의 컴포넌트들 간의 무선 및/또는 유선 통신 매체를 포함할 수 있으며, 병렬, 직렬 또는 다른 토폴로지 배열들을 포함할 수 있다.The bus 140 may comprise a communication infrastructure that enables interaction between the various components of the computer system 100. The bus 140 may, for example, carry data between components of the computer system 100, for example, between the processor 110 and the memory 120. The bus 140 may comprise a wireless and / or wired communication medium between the components of the computer system 100 and may include parallel, serial, or other topology arrangements.

영구 저장 장치(130)는 (예를 들어, 메모리(120)에 비해) 소정의 연장된 기간 동안 데이터를 저장하기 위해 컴퓨터 시스템(100)에 의해 사용되는 바와 같은 메모리 또는 다른 영구 저장 장치와 같은 컴포넌트들을 포함할 수 있다. 영구 저장 장치(130)는 컴퓨터 시스템(100) 내의 프로세서(110)에 의해 사용되는 바와 같은 비휘발성 메인 메모리를 포함할 수 있다. 영구 저장 장치(130)는 예를 들어 플래시 메모리, 하드 디스크, 광 디스크 또는 다른 컴퓨터 판독 가능 매체를 포함할 수 있다.The persistent storage device 130 may be a component such as a memory or other persistent storage device as used by the computer system 100 to store data for a predetermined extended period of time (e.g., as compared to the memory 120) Lt; / RTI > The persistent storage device 130 may include non-volatile main memory as used by the processor 110 in the computer system 100. The persistent storage device 130 may include, for example, flash memory, hard disk, optical disk, or other computer readable medium.

입출력 인터페이스(150)는 키보드, 마우스, 음성 명령 입력, 디스플레이 또는 다른 입력 또는 출력 장치에 대한 인터페이스들을 포함할 수 있다. 구성 명령들 및/또는 플레이리스트 생성을 위한 입력, 예컨대 음원 데이터와 텍스트 정보가 입출력 인터페이스(150)를 통해 수신될 수 있다.The input / output interface 150 may include a keyboard, a mouse, voice command inputs, displays, or interfaces to other input or output devices. Configuration commands and / or inputs for generating a playlist, such as sound source data and text information, may be received via the input / output interface 150.

네트워크 인터페이스(160)는 근거리 네트워크 또는 인터넷과 같은 네트워크들에 대한 하나 이상의 인터페이스를 포함할 수 있다. 네트워크 인터페이스(160)는 유선 또는 무선 접속들에 대한 인터페이스들을 포함할 수 있다. 구성 명령들 및/또는 플레이리스트 생성을 위한 음원 컨텐츠의 음원 데이터와 텍스트 정보는 네트워크 인터페이스(160)를 통해 수신될 수 있다.The network interface 160 may include one or more interfaces to networks such as a local area network or the Internet. The network interface 160 may include interfaces for wired or wireless connections. The sound source data and the text information of the sound source contents for constitutional commands and / or play list generation can be received via the network interface 160.

또한, 다른 실시예들에서 컴퓨터 시스템(100)은 도 1의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 시스템(100)은 상술한 입출력 인터페이스(150)와 연결되는 입출력 장치들 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다. 보다 구체적인 예로, 컴퓨터 시스템(100)이 스마트폰과 같은 모바일 기기의 형태로 구현되는 경우, 일반적으로 스마트폰이 포함하고 있는 가속도 센서나 자이로 센서, 카메라, 각종 물리적인 버튼, 터치패널을 이용한 버튼, 입출력 포트, 진동을 위한 진동기 등의 다양한 구성요소들이 컴퓨터 시스템(100)에 더 포함되도록 구현될 수 있다.Also, in other embodiments, the computer system 100 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, the computer system 100 may be implemented to include at least some of the input / output devices connected to the input / output interface 150 described above, or may include a transceiver, a Global Positioning System (GPS) module, Databases, and the like. More specifically, when the computer system 100 is implemented in the form of a mobile device such as a smart phone, an acceleration sensor, a gyro sensor, a camera, various physical buttons, buttons using a touch panel, An input / output port, a vibrator for vibration, and the like may be further included in the computer system 100.

도 2는 본 발명의 일 실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 컴퓨터 시스템이 수행할 수 있는 플레이리스트 생성 방법의 예를 도시한 순서도이다.FIG. 2 is a diagram illustrating an example of a component that a processor of a computer system according to an exemplary embodiment of the present invention may include; FIG. 3 is a block diagram of a playlist Fig. 8 is a flowchart showing an example of a generation method. Fig.

도 2에 도시된 바와 같이, 프로세서(110)는 음원 입력 제어부(210), 음원 전처리부(220), 고유 특징 처리부(230), 및 플레이리스트 생성부(240)를 포함할 수 있다. 이러한 프로세서(110)의 구성요소들은 적어도 하나의 프로그램 코드에 의해 제공되는 제어 명령에 따라 프로세서(110)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 예를 들어, 프로세서(110)가 음원 데이터를 입력 받도록 컴퓨터 시스템(100)을 제어하기 위해 동작하는 기능적 표현으로서 음원 입력 제어부(210)가 사용될 수 있다. 프로세서(110) 및 프로세서(110)의 구성요소들은 도 3의 플레이리스트 생성 방법이 포함하는 단계들(S310 내지 S350)을 수행할 수 있다. 예를 들어, 프로세서(110) 및 프로세서(110)의 구성요소들은 메모리(120)가 포함하는 운영체제의 코드와 상술한 적어도 하나의 프로그램 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. 여기서 적어도 하나의 프로그램 코드는 상기 플레이리스트 생성 방법을 처리하기 위해 구현된 프로그램의 코드에 대응될 수 있다.2, the processor 110 may include a sound source input control unit 210, a sound source preprocessing unit 220, an intrinsic feature processing unit 230, and a play list generation unit 240. The components of such a processor 110 may be representations of different functions performed by the processor 110 in accordance with control commands provided by at least one program code. For example, the sound source input control 210 may be used as a functional representation in which the processor 110 operates to control the computer system 100 to receive sound source data. The components of the processor 110 and the processor 110 may perform the steps S310 to S350 included in the method of generating a playlist of FIG. For example, the components of processor 110 and processor 110 may be implemented to execute instructions in accordance with the at least one program code described above and the code of the operating system that memory 120 contains. Wherein at least one program code may correspond to a code of a program implemented to process the playlist generation method.

플레이리스트 생성 방법은 도시된 순서대로 발생하지 않을 수 있으며, 단계들 중 일부가 생략되거나 추가의 과정이 더 포함될 수 있다.The method of generating a playlist may not occur in the order shown, and some of the steps may be omitted or an additional process may be further included.

단계(S310)에서 프로세서(110)는 플레이리스트 생성 방법을 위한 프로그램 파일에 저장된 프로그램 코드를 메모리(120)에 로딩할 수 있다. 예를 들어, 플레이리스트 생성 방법을 위한 프로그램 파일은 도 1을 통해 설명한 영구 저장 장치(130)에 저장되어 있을 수 있고, 프로세서(110)는 버스를 통해 영구 저장 장치(130)에 저장된 프로그램 파일로부터 프로그램 코드가 메모리(120)에 로딩되도록 컴퓨터 시스템(110)을 제어할 수 있다.In step S310, the processor 110 may load the program code stored in the program file for the method of generating a play list into the memory 120. [ For example, a program file for the method of generating a playlist may be stored in the persistent storage 130 described above with reference to FIG. 1, and the processor 110 may store the program files stored in the persistent storage 130, The computer system 110 may be controlled such that the program code is loaded into the memory 120. [

이때, 프로세서(110) 및 프로세서(110)가 포함하는 음원 입력 제어부(210), 음원 전처리부(220), 고유 특징 처리부(230), 및 플레이리스트 생성부(240) 각각은 메모리(120)에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(S320 내지 S350)을 실행하기 위한 프로세서(110)의 서로 다른 기능적 표현들일 수 있다. 단계들(S320 내지 S350)의 실행을 위해, 프로세서(110) 및 프로세서(110)의 구성요소들은 직접 제어 명령에 따른 연산을 처리하거나 또는 컴퓨터 시스템(100)을 제어할 수 있다.At this time, each of the sound source input control unit 210, the sound source preprocessing unit 220, the inherent feature processing unit 230, and the play list generation unit 240 included in the processor 110 and the processor 110 is stored in the memory 120 And may be different functional representations of the processor 110 for executing instructions of a corresponding one of the loaded program codes to execute subsequent steps S320 through S350. For the execution of steps S320 through S350, the processor 110 and the components of the processor 110 may process an operation according to a direct control command or control the computer system 100. [

단계(S320)에서 음원 입력 제어부(210)는 음원 컨텐츠에 대하여 음원 컨텐츠의 음원 데이터를 입력받도록 컴퓨터 시스템(100)을 제어할 수 있다. 이때, 음원 컨텐츠는 오디오 파일 포맷을 가진 모든 디지털 데이터를 의미할 수 있으며, 예를 들어 MP3(MPEG Audio Layer-3), WAVE(Waveform Audio Format), FLAC(Free Lossless Audio Codec) 등을 포함할 수 있다. 그리고, 음원 입력 제어부(210)는 음원 컨텐츠와 관련된 텍스트 정보를 입력받도록 컴퓨터 시스템(100)을 제어할 수 있다. 음원 컨텐츠와 관련된 텍스트 정보는 가사를 포함하거나, 혹은 가수, 장르, 제목, 앨범명 등과 같은 메타 정보, 그리고 음원 컨텐츠의 분류나 검색 등과 관련하여 입력된 해시 태그(hashtag), 쿼리(query) 등의 정보를 포함할 수 있다. 음원 데이터와 텍스트 정보는 입출력 인터페이스(150) 또는 네트워크 인터페이스(160)를 통해 컴퓨터 시스템(100)으로 입력 또는 수신되어 메모리(120)나 영구 저장 장치(130)에 저장 및 관리될 수 있다.In step S320, the sound source input control unit 210 may control the computer system 100 to receive the sound source data of the sound source content with respect to the sound source content. In this case, the sound source content may refer to all digital data having an audio file format, and may include, for example, MPEG Audio Layer-3 (MP3), Waveform Audio Format (WAVE), Free Lossless Audio Codec have. The sound source input control unit 210 may control the computer system 100 to receive text information related to the sound source content. The text information related to the sound source content may include lyrics, meta information such as a singer, a genre, a title, an album name, etc., and a hash tag (hashtag), a query Information. The sound source data and text information may be input to or received from the computer system 100 via the input / output interface 150 or the network interface 160 and stored and managed in the memory 120 or the permanent storage 130.

단계(S330)에서 음원 전처리부(220)는 단계(S320)에서 입력받은 음원 데이터를 학습 데이터 형태로 처리할 수 있다. 이때, 음원 전처리부(220)는 전처리를 통해 음원 데이터를 시간-주파수로 표현할 수 있다. 예를 들어 음원 전처리부(220)는 음원 데이터를 멜-스펙트로그램(Mel-spectrogram)이나 MFCC(Mel Frequency Cepstral Coefficient)와 같은 시간-주파수-크기 형태의 데이터로 변환할 수 있다. 그리고, 음원 전처리부(220)는 음원 컨텐츠에 대해 텍스트 정보를 함께 입력 받은 경우 텍스트 정보를 전처리 할 수 있다. 일례로, 음원 전처리부(220)는 형태소 분석기, 색인어 추출기 등 언어 전처리기를 이용하여 입력된 텍스트 정보로부터 무의미한 텍스트들을 필터링할 수 있다. 다시 말해, 음원 전처리부(220)는 텍스트 정보에 포함된 조사, 조용사 등 불필요한 품사의 단어나 특수 기호(!, ?, / 등, 예컨대) 등을 제거하고 체언이나 어근에 해당되는 단어를 추출할 수 있다.In step S330, the sound source preprocessing unit 220 may process the sound source data received in step S320 in the form of learning data. At this time, the sound source preprocessing unit 220 can express the sound source data in a time-frequency manner through the preprocessing. For example, the sound source preprocessing unit 220 may convert the sound source data into data of a time-frequency-size type such as a Mel-spectrogram or a Mel Frequency Cepstral Coefficient (MFCC). The sound source preprocessing unit 220 can preprocess the text information when the text information is inputted together with the sound source content. For example, the sound source preprocessing unit 220 can filter meaningless texts from input text information using a language preprocessor such as a morpheme analyzer and an indexer extractor. In other words, the sound source preprocessing unit 220 removes unnecessary parts of speech words and special symbols (!,?, /, Etc.) included in the text information and extracts words corresponding to the cognition or the root can do.

단계(S340)에서 고유 특징 처리부(230)는 음원 데이터에 대해 전처리된 학습 데이터를 학습 모델을 통해 학습하여 고유 특징을 생성한 후 생성된 고유 특징을 데이터베이스(미도시)에 저장할 수 있다. 고유 특징 처리부(230)는 딥 러닝을 이용하여 음원 데이터 자체의 고유한 음향 특징(acoustic features)을 생성할 수 있다. 일례로, 고유 특징 처리부(230)는 CNN(Convolutional Neural Network) 기반의 학습 모델을 이용할 수 있다. 고유 특징 처리부(230)는 CNN 학습 모델을 이용하여 음원 데이터를 다차원 실수 벡터로 표현할 수 있다. CNN 학습 모델은 음원 데이터 학습 계층을 포함할 수 있으며, 아래 과정 1 내지 3은 음원 데이터 학습 계층에서 음원 데이터에 대응하는 실수 벡터를 생성하는 과정의 예일 수 있다.In step S340, the intrinsic feature processing unit 230 may learn the preprocessed training data on the sound source data through the learning model to generate the intrinsic feature, and store the intrinsic feature in the database (not shown). The intrinsic feature processing unit 230 can use the deep run to generate unique acoustic features of the sound source data itself. For example, the intrinsic feature processor 230 may use a CNN (Convolutional Neural Network) based learning model. The intrinsic feature processor 230 can express the sound source data as a multidimensional real number vector using the CNN learning model. The CNN learning model may include a sound source data learning layer. Steps 1 to 3 below may be an example of a process of generating a real vector corresponding to sound source data in the sound source data learning layer.

과정 1에서 음원 컨텐츠가 포함하는 음원 데이터(일례로, mp3 파일)는 전처리를 통해 멜-스펙트로그램이나 MFCC와 같은 시간-주파수-크기 형태의 데이터로 변환될 수 있다. 예를 들어, 음원 전처리부(220)는 음원 입력 제어부(210)에 의해 입력되는 음원 데이터를 시간-주파수로 표현할 수 있다.In step 1, the sound source data (for example, an mp3 file) included in the sound source content can be converted into data of a time-frequency-size type such as a mel-spectrogram or MFCC through preprocessing. For example, the sound source preprocessing unit 220 may represent the sound source data input by the sound source input control unit 210 in time-frequency.

과정 2에서 변환된 음원 데이터로부터 한 개 이상의 짧은 시간 구간 동안(1초 내지 10초)의 복수 개의 주파수 프레임들이 샘플링 되어 음원 데이터의 학습 모델에 대한 입력 데이터로서 사용될 수 있다. 예를 들어, 고유 특징 처리부(230)는 복수 개의 프레임들을 샘플링 하여 음원 데이터 학습 계층에서 음원 모델의 예시로 제시되는 CNN 모델의 입력으로서 활용할 수 있다. 그러므로, 음원 데이터의 학습을 위한 CNN 모델은 샘플링 된 프레임들의 수와 동일한 수의 채널을 갖는 모델이 될 수 있다. 혹은, 생성된 프레임 각각을 단일 채널로 사용하고 각 프레임에 대하여 고유의 콘벌루션/풀링 과정을 거친 후 생성된 프레임 별 특징벡터들을 접합하여 완전 연결층의 입력으로 사용 가능하다.A plurality of frequency frames for one or more short time intervals (1 second to 10 seconds) may be sampled from the sound source data converted in step 2 and used as input data for the learning model of the sound source data. For example, the intrinsic feature processor 230 may sample a plurality of frames and utilize the sampled frames as an input to a CNN model presented as an example of a sound source model in a sound source data learning layer. Therefore, the CNN model for learning sound source data can be a model having the same number of channels as the number of sampled frames. Alternatively, each generated frame may be used as a single channel, each frame may be subjected to a specific convolution / pooling process, and then the generated feature vectors may be jointly used as an input of a complete connection layer.

과정 3에서는 음원 데이터 학습 계층이 포함할 수 있는 복수 개의 콘벌루션(convolution) 및 풀링(pooling) 계층을 반복적으로 구성함으로써 음원 프레임으로부터 추상화된 특징(feature)을 생성할 수 있다. 콘벌루션에서 패치의 크기는 다양하게 구성될 수 있으며 풀링도 최대값(max)을 이용한 풀링 기법, 평균값(average)을 이용한 풀링 기법 및 상기 두 풀링 기법을 접합한 하이브리드 풀링 기법 등 여러 풀링 기법들 중 적어도 하나가 사용될 수 있다.In step 3, a plurality of convolution and pooling layers, which may be included in the sound source data learning layer, are repeatedly generated, thereby generating abstract features from sound source frames. The size of the patches in the convolution can be variously configured, and the pooling method using the pooling maximum value (max), the pooling method using the average value (average), and the hybrid pooling method combining the two pooling methods At least one can be used.

복수 개의 콘벌루션 및 풀링 계층 위에는 음향 특징을 생성하기 위한 완전 연결(fully-connected) 계층이 있으며 각 계층별 함수는 시그모이드(sigmoid) 함수, 하이퍼볼릭 탄젠트(Hyperbolic Tangent: tanh) 함수, ReLU(Rectified Linear Unit) 함수 등 다양한 함수가 사용될 수 있다. 결국, 음원 데이터에 대해 하나의 다차원 실수 벡터가 생성될 수 있다. 예를 들어, 주어진 첫 번째 음원 데이터 m0은 음원 학습 모델의 출력 계층에서 x0={0.2, -0.1, 0.3, …}의 형태와 같이 하나의 다차원 실수 벡터로 표현될 수 있다. 차원의 수는 0보다 큰 정수이며 경험적으로 통상 50 이상 500 이하의 값을 가질 수 있다. 이러한 과정 3은 앞서 설명한 고유 특징 처리부(230)에 의해 처리될 수 있다.There are fully-connected layers for creating acoustic features on the plurality of convolution and pooling layers, and each layer function includes a sigmoid function, a hyperbolic tangent (tanh) function, a ReLU Rectified Linear Unit) function can be used. As a result, one multi-dimensional real vector can be generated for the sound source data. For example, given the first source data m0, x0 = {0.2, -0.1, 0.3, ... } Can be represented by one multidimensional real number vector. The number of dimensions is an integer greater than 0 and can have a value of typically not less than 50 and not more than 500 empirically. This process 3 can be processed by the inherent feature processor 230 described above.

그리고, 고유 특징 처리부(230)는 음원 컨텐츠에 대해 텍스트 정보를 함께 입력 받은 경우 텍스트 정보 또한 다차원 실수 벡터로 표현할 수 있다. 일례로, 고유 특징 처리부(230)는 텍스트 정보에 대한 전처리를 통해 필터링 된 텍스트들을 사전에 학습된 학습 모델을 이용하여 워드 벡터(word vector)로 생성할 수 있다. 예를 들어, 워드 벡터는 수치형(numerical) 다차원 벡터 형태로 표현될 수 있다. 워드 벡터 생성을 위하여 단어 출현 빈도 히스토그램, TF(term frequency)/IDF(inverse document frequency), 언어 학습 모델(예컨대, word2vec, phrase2vec, document2vec 등) 등이 사용될 수 있다. 예를 들어, "이승환 좋은날 발라드 1992" 등과 같이 가수/장르/제목/년도는 하나의 n차원 실수 벡터 v={0.3, -1.2, 1.2, …}와 같이 표현 가능하다. 또한, 언어 학습 모델을 위한 텍스트 정보 필드(가수, 장르, 제목, 년도 등)의 순서는 고정되지 않고 목적에 맞게 변경이 가능하다.If the intrinsic feature processor 230 receives the text information together with the sound source content, the intrinsic feature processor 230 may also express the text information as a multidimensional real number vector. For example, the intrinsic feature processing unit 230 may generate a word vector by using a pre-learned learning model through a preprocessing of text information. For example, a word vector may be represented as a numerical multidimensional vector form. Word frequency histogram, TF (term frequency) / IDF (inverse document frequency), language learning model (e.g., word2vec, phrase2vec, document2vec, etc.) can be used for word vector generation. For example, Singer / Genre / Title / Year is one n-dimensional real number vector v = {0.3, -1.2, 1.2, ... }. In addition, the order of the text information field (singer, genre, title, year, etc.) for the language learning model is not fixed, but can be changed according to the purpose.

상기한 고유 특징 처리부(230)는 음원 컨텐츠에 대해 음원 데이터에 대한 특징 벡터와 텍스트 정보에 대한 워드 벡터를 데이터베이스에 저장 및 유지할 수 있다. 음원 데이터에 대한 특징 벡터와 텍스트 정보에 대한 워드 벡터는 각각 개별 데이터베이스로 구축되거나, 혹은 하나의 데이터베이스로 구축될 수 있다. 이러한 데이터베이스는 컴퓨터 시스템(100)에 포함된 구성 요소로 구현되거나, 혹은 컴퓨터 시스템(100)과 연동 가능한 별개의 시스템 상에 구축된 외부 데이터베이스로서 존재하는 것 또한 가능하다.The intrinsic feature processor 230 may store and maintain the feature vectors of the sound source data and the word vectors of the text information for the sound source content in the database. The feature vectors for the sound source data and the word vectors for the text information may be constructed as separate databases, respectively, or as a single database. It is also possible that such a database is implemented as an element contained in the computer system 100 or as an external database built on a separate system capable of interfacing with the computer system 100. [

단계(S350)에서 플레이리스트 생성부(240)는 음원 컨텐츠에 대하여 데이터베이스에 저장된 음원 데이터에 대한 고유 특징을 이용하여 음원 간의 유사도를 산출할 수 있고, 음원 간의 유사도를 바탕으로 음원 컨텐츠에 대한 플레이리스트를 자동으로 생성할 수 있다. 일례로, 플레이리스트 생성부(240)는 음원 데이터에 대한 특징 벡터를 이용하여 음원 간 유사도를 산출할 수 있고, 다른 예로 음원 데이터에 대한 특징 벡터와 함께 텍스트 정보에 대한 워드 벡터를 복합적으로 이용하여 음원 간 유사도를 산출할 수 있다. 이때, 플레이리스트 생성부(240)는 사용자가 특정 곡을 선택하면 해당 곡을 시드 곡으로 하여 시드 곡과 유사한 곡들의 체인(chain) 형태로 플레이리스트를 무한히 생성할 수 있다. 그리고, 플레이리스트 생성부(240)는 음원 컨텐츠에 대한 사용자 최근 로그 이력을 기반으로 사용자가 선호하는 곡들로 플레이리스트를 생성할 수 있고, 특정 가수가 입력으로 주어지면 해당 가수와 유사한 가수들의 곡들로 플레이리스트를 생성하는 것 또한 가능하다.In step S350, the play list generation unit 240 may calculate the similarity between the sound sources using the unique characteristics of the sound source data stored in the database with respect to the sound source contents. Based on the similarity between the sound sources, Can be automatically generated. For example, the playlist generation unit 240 may calculate similarities between sound sources using feature vectors of sound source data. In another example, the playlist generation unit 240 may use a word vector for text information in combination with a feature vector for sound source data The similarity between sound sources can be calculated. At this time, the play list generating unit 240 can infinitely generate a play list in the form of a chain of songs similar to the seed song, using the song as a seed song when the user selects a specific song. The play list generating unit 240 may generate a play list based on the user's recent log history of the sound source content, and if a specific singer is given as an input, It is also possible to create playlists.

따라서, 본 발명은 음원 및 음원의 메타 정보와 같은 텍스트 정보를 학습한 모델을 이용하여 곡 간의 유사성을 기반으로 유사한 곡들로 플레이리스트를 자동 생성할 수 있다.Accordingly, the present invention can automatically generate playlists of similar songs based on the similarity between songs using a model in which text information such as sound sources and meta information of sound sources are learned.

도 4는 본 발명의 일 실시예에 따른 플레이리스트 자동 생성 과정의 예를 도시한 도면이다. 도 4의 단계들(S401 내지 S407)은 도 3의 단계(S350)에 포함되어 수행될 수 있다.4 is a diagram illustrating an example of a process of automatically generating a playlist according to an embodiment of the present invention. Steps S401 to S407 of FIG. 4 may be performed in step S350 of FIG.

단계(S401)에서 플레이리스트 생성부(240)는 시드 곡이 선정되면 데이터베이스에 저장된 음원 별 고유 특징을 이용하여 시드 곡과 다른 곡 간의 유사도를 산출할 수 있다. 일례로, 사용자가 특정 곡을 선택하여 재생하는 경우 해당 곡을 시드 곡으로 선정할 수 있다. 다른 예로, 음원 컨텐츠에 대한 사용자 최근 로그 이력을 기반으로 사용자가 선호하는 곡을 시드 곡으로 선정할 수 있다. 예컨대, 시드 곡을 m0이라고 할 때, 프로세서(110)는 시드 곡 m0이 데이터베이스 상에 없는 신곡인 경우 CNN 모델을 이용하여 시드 곡 m0을 음원 자체의 고유 특징을 나타내는 d차원 실수 벡터인 x0으로 변환하고, 시드 곡 m0이 신곡이 아닌 경우 데이터베이스로부터 시드 곡 m0에 해당하는 d차원 실수 벡터를 조회하여 이를 x0으로 정의한다. 음원 특징 벡터는 곡 간의 유사도를 산출하는데 이용될 수 있으며, 다른 예로 각 곡의 메타 정보와 같은 텍스트 정보를 벡터 형태로 표현한 텍스트 특징 벡터 v도 음원 특징 벡터와 함께 곡 간의 유사도를 산출하기 위해 사용될 수 있다. 플레이리스트 생성부(240)는 데이터베이스에 저장된 곡들의 음원 특징 벡터와 시드 곡 m0의 음원 특징 벡터 x0 간의 유사도를 계산할 수 있다. 일례로, 유사도는 벡터 간 거리를 산출하는 코사인 거리(cosine distance)가 이용될 수 있다. 두 벡터 a, b 사이의 코사인 거리(CD)는 수학식 1과 같이 정의될 수 있다. 즉, 벡터 간 거리가 작을수록 곡의 유사도가 높고 벡터 간 거리가 클수록 곡의 유사도가 낮은 것으로 판단할 수 있다.In step S401, when the seed song is selected, the play list generating unit 240 may calculate the similarity between the seed song and the other songs using the unique characteristics of each sound source stored in the database. For example, when a user selects and reproduces a specific song, the song can be selected as a seed song. As another example, a user's favorite song can be selected as a seed song based on the user's recent log history of the sound source content. For example, when the seed song is m0, the processor 110 uses the CNN model to convert the seed song m0 to a d-dimensional real number vector x0 representing the unique characteristics of the sound source itself, when the seed song m0 is a new song not on the database If the seed song m0 is not a new song, a d-dimensional real number vector corresponding to the seed song m0 is retrieved from the database and defined as x0. The sound source feature vector can be used to calculate the similarity between songs. In another example, a text feature vector v in which text information such as meta information of each song is expressed in a vector form can be used to calculate the similarity between songs together with a sound source feature vector have. The play list generating unit 240 may calculate the similarity between the sound source feature vector of the songs stored in the database and the sound source feature vector x0 of the seed song m0. For example, a cosine distance for calculating the distance between vectors may be used as the degree of similarity. The cosine distance (CD) between the two vectors a and b can be defined as: " (1) " That is, the smaller the distance between vectors is, the higher the similarity of the songs is, and the larger the distance between vectors is, the lower the similarity of the songs can be judged to be.

코사인 거리 이외에도 유클리드 거리, 해밍 거리 등 다양한 방법을 사용하여 곡 간 유사도를 산출할 수 있다. 이때, 음원 특징 벡터와 텍스트 특징 벡터는 접합되어 계산될 수도 있고, 각각 벡터 간 거리를 산출한 후 가중치 합으로 표현될 수도 있다.In addition to the cosine distance, the similarity between songs can be calculated using various methods such as Euclidean distance and Hamming distance. At this time, the sound source feature vector and the text feature vector may be jointly calculated or may be expressed as a sum of weights after calculating distances between vectors.

단계(S402)에서 플레이리스트 생성부(240)는 데이터베이스 상에 고유 특징이 저장된 곡 중 시드 곡 m0과의 유사도가 높은 복수 개의 곡을 선택하여 별도의 후보 집합 M*을 생성할 수 있다. 일례로, 플레이리스트 생성부(240)는 시드 곡 m0의 가수가 부른 다른 곡 중 유사도가 일정 값 이상인 곡을 선별하거나, 혹은 유사도가 높은 순으로 일정 개수의 곡을 선별하여 후보 집합 M*을 생성할 수 있다. 다른 예로, 플레이리스트 생성부(240)는 곡 간의 텍스트 특징 벡터 거리가 일정 값 이하인 곡들에 대해서만 데이터베이스를 조회하여 후보 집합 M*을 생성할 수도 있다.In step S402, the play list generating unit 240 may generate a separate candidate set M * by selecting a plurality of songs having similarities with the seed song m0 among the songs having unique characteristics stored in the database. For example, the play list generating unit 240 may select a song having similarity of a certain value or more among other songs called by the song of the seed song m0, or select a certain number of songs in descending order of similarity to generate a candidate set M * can do. As another example, the play list generator 240 may generate a candidate set M * by querying the database only for songs whose text feature vector distance between songs is less than a predetermined value.

단계(S403)에서 플레이리스트 생성부(240)는 데이터베이스 상에 고유 특징이 저장된 곡 간의 유사도를 바탕으로 현재 재생 곡과의 유사도가 높은 곡을 다음 재생 후보 곡으로 선택할 수 있다. 현재 재생되는 곡을 mt라 할 때, 데이터베이스 상의 곡들에 대해 유사도가 높은 곡을 다음 차례 재생할 곡인 mt+1로 선택할 수 있다. 이때, 플레이리스트 생성부(240)는 현재 재생 곡 mt와 유사도가 가장 큰 곡, 혹은 유사도를 기준으로 선정된 복수 개의 곡 중 무작위로 선정된 곡을 다음 재생 후보 곡 mt+1로 결정할 수 있다. 그리고, 이미 재생된 곡은 다음 재생 후보 곡의 선정 대상에서 제외될 수 있다.In step S403, the play list generating unit 240 may select a song having a high degree of similarity with the currently reproduced song as a next reproduced candidate song based on the similarity between the songs whose unique characteristics are stored on the database. If the currently played song is mt, you can select the song with the highest similarity to the songs in the database as mt + 1, which will be played next time. At this time, the play list generating unit 240 may determine a song that has the greatest similarity to the currently reproduced song mt, or a randomly selected one of a plurality of songs selected on the basis of the similarity, as the next reproduction candidate song mt + 1. The already reproduced tune can be excluded from the selection target of the next reproduction candidate tune.

단계(S404)에서 플레이리스트 생성부(240)는 단계(S403)에서 선택된 후보 곡에 대해 시드 곡 또는 이전 재생 곡 중 적어도 하나의 기준 곡과의 유사도를 산출하여 사전에 정의된 임계값(threshold)과 비교할 수 있다. 이때, 임계값은 플레이리스트의 다양성을 유지하면서 시드 곡이나 기준 곡과 너무 다른 곡이 재생되는 것을 방지하기 위한 역할을 한다. 다시 말해, 본 발명은 다음 재생 후보 곡 선정에 있어 임계값을 통해 시드 곡이나 기준 곡과의 유사성을 지속적으로 체크하는 로직을 포함할 수 있다. 예를 들어, 임계값은 전체 곡들 간의 임계값 분포에서 임의로 선택 가능하며 통상 0.2~0.3 이내로 설정될 수 있다. 임계값을 이용하여 플레이리스트의 다양성과 유사성 간의 균형을 유지하거나 비중을 조절할 수 있다. 그리고, 기준 곡은 최초로 재생된 곡이거나 최초 재생 곡 이후 일정 범위 내에서(예컨대, 5번째 이내로, 1시간 이내로) 재생된 이력이 있는 곡 등일 수 있다.In step S404, the play list generator 240 calculates the similarity of at least one of the seed songs or the previous play songs to the candidate song selected in step S403, . At this time, the threshold value serves to prevent playback of a song that is too different from the seed song or the reference song while maintaining the diversity of the play list. In other words, the present invention may include logic for continuously checking the similarity between the seed song and the reference song through the threshold value in the next replay candidate song selection. For example, the threshold value can be arbitrarily selected from the distribution of the threshold values between all songs and can be set usually within 0.2 to 0.3. Thresholds can be used to balance or control the diversity and similarity of playlists. The reference song may be a song reproduced first, or a song having a history reproduced within a certain range (for example, within 5th, within 1 hour) after the first reproduction music.

단계(S405, S406)에서 플레이리스트 생성부(240)는 단계(S403)에서 선택된 후보 곡이 시드 곡 또는 기준 곡과의 유사도가 임계값 이상으로 높은 경우(즉, 벡터 간 거리가 임계 거리보다 작은 경우) 단계(S403)에서 선택된 후보 곡을 최종 다음 재생 곡으로 결정하여 현재 재생 곡 이후 재생할 수 있다.In steps S405 and S406, if the similarity degree between the candidate song selected in step S403 and the seed song or the reference song is higher than the threshold value (i.e., the inter-vector distance is smaller than the critical distance in steps S405 and S406) , The candidate song selected in step S403 may be determined as the final next playable song and played back after the current playable song.

단계(S405, S407)에서 플레이리스트 생성부(240)는 단계(S403)에서 선택된 후보 곡이 시드 곡 또는 기준 곡과의 유사도가 임계값 미만으로 낮은 경우(즉, 벡터 간 거리가 임계 거리보다 큰 경우) 단계(S402)에서 생성된 후보 집합 M*에 포함된 곡 중 어느 한 곡을 최종 다음 재생 곡으로 선택하여 현재 재생 곡 이후 재생할 수 있다. 예를 들어, 플레이리스트 생성부(240)는 단계(S403)에서 선택된 후보 곡이 시드 곡이나 기준 곡과 유사도가 임계값 미만으로 떨어지면 시드 곡과 유사한 곡의 집합인 후보 집합 M*에서 임의의 한 곡을 선택하여 재생할 수 있다.In steps S405 and S407, if the similarity degree between the candidate song selected in step S403 and the seed song or the reference song is lower than the threshold value (i.e., the inter-vector distance is larger than the critical distance in steps S405 and S407) It is possible to select any one piece of the songs included in the candidate set M * generated in the step S402 as the final next piece of music to reproduce after the current piece of music. For example, if the candidate song selected in step S403 is less than the threshold value of the seed song or the reference song, the play list generating unit 240 generates a play list A song can be selected and played back.

플레이리스트 생성부(240)는 상기한 과정에서 단계(S403~S407)를 반복 수행하고, 이를 통해 곡 간 유사성을 기반으로 체인 형태의 플레이리스트를 무한히 생성할 수 있다. 따라서, 본 발명은 곡 간 유사성을 기반으로 확률적으로 다음 연속 재생될 곡을 선택함으로써 플레이리스트의 다양성을 확보할 수 있고, 이와 동시에 다음 재생 곡 선택 시 시드 곡(또는 기준 곡)과의 유사도가 일정 값 이하로 낮아질 때는 시드 곡(또는 기준 곡)과의 유사도 순위가 높은 곡으로 대체하여 플레이리스트의 유사성을 지속시킬 수 있다.The playlist generating unit 240 repeats steps S403 to S407 in the above process, and can thereby infinitely generate a chain-shaped playlist based on the similarity between songs. Therefore, according to the present invention, diversity of the play list can be ensured by selecting the songs to be successively reproduced successively based on the similarity between songs, and at the same time, the similarity with the seed songs (or the reference songs) When the value is lower than a predetermined value, the similarity of the playlist can be maintained by replacing the song with a song having a high degree of similarity with the seed song (or the reference song).

이처럼 본 발명의 실시예들에 따르면, 딥 러닝(deep learning)을 이용하여 음원 자체의 고유 값을 학습을 위한 인자(factor)로 사용함으로써 곡 간의 유사성을 더욱 정확히 산출할 수 있다. 더욱이, 본 발명의 실시예들에 따르면, 음원의 특징뿐 아니라 음원의 메타 정보를 부가적 정보로 사용하여 곡 간의 유사성을 산출할 수 있고 메타 정보를 필터 형태로 사용함으로써 곡 간 유사성 계산속도를 획기적으로 개선 가능하며, 또한 시드(seed) 곡과의 유사성을 지속적으로 체크하는 로직을 적용함으로써 플레이리스트의 다양성을 유지하면서 이와 동시에 시드 곡과 너무 다른 곡이 포함되는 것을 방지할 수 있다.As described above, according to the embodiments of the present invention, by using deep learning, eigenvalues of the sound source itself can be used as a factor for learning, so that the similarity between songs can be more accurately calculated. Furthermore, according to the embodiments of the present invention, it is possible to calculate the similarity between songs by using the meta information of the sound source as additional information as well as the characteristics of the sound source, and by using the meta information in the form of a filter, And by applying the logic to continuously check the similarity with the seed song, it is possible to maintain the diversity of the play list while at the same time preventing the inclusion of the seed song and other songs.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit, a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method of generating playlists performed in a computer system,
Generating unique acoustic features of the sound source from the sound source data of the sound source content for each sound source content; And
Generating a playlist based on the similarity between the sound source contents by calculating a similarity degree between the sound source contents using the acoustic feature;
Lt; / RTI >
Wherein the generating the playlist comprises:
A play list similar to a current play song is selected based on the similarity between the sound source contents to generate a play list, and a similarity with a seed song is checked when a next play song is selected, Creating a playlist of the form
And generating a play list.

A computer-readable recording medium having recorded thereon a program for causing the computer to execute the method of claim 1.

A computer-implemented playlist generation system comprising:
An intrinsic feature processor for generating an intrinsic acoustic feature of the corresponding sound source from the sound source data of the sound source content for each sound source content; And
And generating a play list based on the similarity between the sound source contents by calculating a similarity degree between the sound source contents using the acoustic feature,
Lt; / RTI >
Wherein the playlist generating unit comprises:
A play list similar to the current play song is selected based on the similarity between the sound source contents to generate a play list, and a similarity to the seed song is checked when a next play song is selected, Creating lists
Wherein the playlist generation system comprises: