KR102086784B1

KR102086784B1 - Apparatus and method for recongniting speeech

Info

Publication number: KR102086784B1
Application number: KR1020130082729A
Authority: KR
Inventors: 김경태; 김현수; 송가진
Original assignee: 삼성전자주식회사
Priority date: 2013-07-15
Filing date: 2013-07-15
Publication date: 2020-03-09
Also published as: KR20150008598A

Abstract

본 개시의 다양한 실시 예는 전자 장치에서 음성 인식을 위한 장치 및 방법에 관한 것이다. 이때, 음성 인식 방법은, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 동작; 음성 신호를 수신하는 동작; 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 동작; 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작을 포함할 수 있다.Various embodiments of the present disclosure relate to an apparatus and a method for speech recognition in an electronic device. In this case, the voice recognition method may include: outputting a voice signal or an audio signal including a plurality of consecutive components; Receiving a voice signal; Determining one or more of the plurality of components by using the time point at which the voice signal is received; Generating response information for the voice signal based on the one or more components or at least a portion of the information on the components.

Description

Apparatus and method for speech command recognition {APPARATUS AND METHOD FOR RECONGNITING SPEEECH}

본 개시의 다양한 실시 예는 음성 명령 인식에 관한 것으로, 보다 상세하게는, 사용자의 발화(發話) 시점을 고려하여 음성 명령을 인식하기 위한 장치 및 방법에 관한 것이다.
Various embodiments of the present disclosure relate to voice command recognition, and more particularly, to an apparatus and a method for recognizing a voice command in consideration of a user's speech point.

전자 장치는 반도체 기술 및 통신 기술의 발전으로 인해 음성 통화 및 데이터 통신을 이용한 멀티미디어 서비스를 제공하는 멀티미디어 장치로 발전하고 있다. 예를 들어, 전자 장치는 데이터 검색 및 음성 인식 서비스 등과 같은 다양한 멀티미디어 서비스를 제공할 수 있다. BACKGROUND Electronic devices are developing as multimedia devices that provide multimedia services using voice calls and data communication due to the development of semiconductor technology and communication technology. For example, the electronic device may provide various multimedia services such as data search and voice recognition service.

더욱이, 전자 장치는 별도의 학습 없이 사용자가 직관적으로 사용할 수 있는 자연어 입력에 따른 음성 인식 서비스를 제공할 수 있다. Furthermore, the electronic device may provide a voice recognition service based on a natural language input that can be intuitively used by a user without additional learning.

따라서, 본 개시의 다양한 실시 예에 따라 전자 장치에서 사용자의 발화시점을 고려하여 음성 명령을 인식하기 위한 장치 및 방법을 제공하고자 한다. Accordingly, an aspect of the present disclosure is to provide an apparatus and a method for recognizing a voice command in consideration of a utterance time of a user in an electronic device.

본 개시의 다양한 실시 예에 따라 전자 장치에서 음성 신호를 수신하는 시점에 따른 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 장치 및 방법을 제공하고자 한다.According to various embodiments of the present disclosure, an apparatus and method for recognizing a voice command in consideration of content information according to a point in time at which an electronic device receives a voice signal are provided.

본 개시의 다양한 실시 예에 따라 전자 장치에서 음성 신호를 수신하는 시점에 따른 컨텐츠 정보를 음성 명령 인식을 위한 서버로 전송하기 위한 장치 및 방법을 제공하고자 한다.According to various embodiments of the present disclosure, an apparatus and method for transmitting content information according to a point in time at which an electronic device receives a voice signal to a server for voice command recognition is provided.

본 개시의 다양한 실시 예에 따라 서버에서 전자 장치로부터 제공받은 컨텐츠 정보 및 음성 신호를 고려하여 음성 명령을 인식하기 위한 장치 및 방법을 제공하고자 한다.
According to various embodiments of the present disclosure, an apparatus and method for recognizing a voice command in consideration of a content signal and a voice signal provided from an electronic device in a server are provided.

본 발명의 다양한 실시 예에 따르면, 전자 시스템(electronic system)의 동작 방법은, 다수 개의 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 제공하는 동작과, 음성 신호(voice signal)를 수신하는 동작과; 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 동작과, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, an operation method of an electronic system may include providing a voice signal or an audio signal including a plurality of components, and receiving a voice signal. Operation; Determining one or more of the plurality of components using the time point at which the voice signal is received, and based on the one or more components or at least a portion of information about the components. And generating response information about the voice signal.

본 발명의 다양한 실시 예에 따르면, 전자 장치의 동작 방법은, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 동작과, 음성 신호를 수신하는 동작과, 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 동작과, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, an operation method of an electronic device may include outputting a voice signal or an audio signal including a plurality of consecutive components, receiving a voice signal, and receiving the voice signal. Determining one or more components of the plurality of components by using a time point at which the received data is received; and based on the one or more components or at least a portion of information about the components, And generating response information about the response.

본 발명의 다양한 실시 예에 따르면, 전자 장치의 동작 방법은, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 동작 과, 음성 신호를 수신하는 동작과, 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 동작과, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분 및 상기 음성 신호를 서버로 전송하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, an operating method of an electronic device may include outputting a voice signal or an audio signal including a plurality of consecutive components, receiving a voice signal, and receiving the voice signal. Determining one or more components of the plurality of components by using a time point at which the received data is received; at least a portion of the one or more components or information on the components, and the voice signal to a server. It may include the operation of transmitting.

본 발명의 다양한 실시 예에 따르면, 서버의 동작 방법은, 전자 장치로부터 음성 신호를 수신하는 동작과, 상기 전자 장치에서 출력하는 음성 신호 또는 오디오 신호에 포함되는 다수 개의 구성 요소(component)들 중 상기 음성 신호에 따른 하나 또는 그 이상의 구성 요소를 확인하는 동작과, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작과, 상기 음성 신호에 대한 응답 정보를 상기 전자 장치로 전송하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, a method of operating a server may include receiving a voice signal from an electronic device, and selecting one of a plurality of components included in a voice signal or an audio signal output from the electronic device. Identifying one or more components in accordance with a voice signal, generating response information for the voice signal based on the one or more components or at least a portion of the information about the component, and And transmitting the response information about the voice signal to the electronic device.

본 발명의 다양한 실시 예에 따르면, 전자 장치의 동작 방법은, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 동작과, 상기 출력하는 음성 신호 또는 오디오 신호에 대한 정보를 서버로 전송하는 동작과, 음성 신호를 수신하는 동작과, 상기 음성 신호를 서버로 전송하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, an operation method of an electronic device may include outputting a voice signal or an audio signal including a plurality of consecutive components, and information on the output voice signal or audio signal. And transmitting the voice signal to the server, and receiving the voice signal.

본 발명의 다양한 실시 예에 따르면, 서버의 동작 방법은, 전자 장치로부터 출력 중인 다수 개의 구성 요소들을 포함하는 음성 신호 또는 오디오 신호에 대한 정보를 수신하는 동작과, 상기 전자 장치로부터 음성 신호를 수신하는 동작과, 상기 음성 신호를 이용하여 상기 전자 장치가 상기 음성 신호를 수신한 시점을 결정하는 동작과, 상기 음성 신호 또는 오디오 신호에 대한 정보 및 상기 전자 장치가 상기 음성 신호를 수신한 시점을 이용하여 상기 전자 장치에서 음성 신호 수신 시점에 출력되는 하나 또는 그 이상의 구성 요소를 결정하는 동작과, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작과, 상기 음성 신호에 대한 응답 정보를 상기 전자 장치로 전송하는 동작을 포함할 수 있다.According to various embodiments of the present disclosure, a method of operating a server may include receiving information on a voice signal or an audio signal including a plurality of components being output from an electronic device, and receiving a voice signal from the electronic device. Operation, determining a time point at which the electronic device receives the voice signal using the voice signal, using information on the voice signal or audio signal, and a time point at which the electronic device receives the voice signal. Determining, by the electronic device, one or more components output at the time of receiving a voice signal, and response information about the voice signal based on the one or more components or at least a part of the information on the components And generating response information about the voice signal in the electronic field. Value may include the act of transmitting a value.

본 발명의 다양한 실시 예에 따르면, 전자 장치는, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 출력부와, 음성 신호를 수신하는 수신부와, 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 제어부와, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하는 동작 결정부를 포함할 수 있다.According to various embodiments of the present disclosure, an electronic device may include an output unit configured to output a voice signal or an audio signal including a plurality of consecutive components, a receiver configured to receive a voice signal, and receive the voice signal. A controller for determining one or more of the plurality of components using a point in time, and the response to the voice signal based on the one or more components or at least a portion of information about the components It may include an operation determination unit for generating information.

본 발명의 다양한 실시 예에 따르면, 전자 장치는, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 출력부와, 음성 신호를 수신하는 수신부와, 상기 음성 신호를 수신한 시점을 이용하여 상기 다수 개의 구성 요소들 중 하나 또는 그 이상의 구성 요소를 결정하는 제어부를 포함하며, 상기 전자 장치는, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분 및 상기 음성 신호를 서버로 전송할 수 있다.According to various embodiments of the present disclosure, an electronic device may include an output unit configured to output a voice signal or an audio signal including a plurality of consecutive components, a receiver configured to receive a voice signal, and receive the voice signal. And a controller configured to determine one or more components of the plurality of components by using a viewpoint, wherein the electronic device includes at least a portion of the one or more components or information about the components, and The voice signal can be sent to the server.

본 발명의 다양한 실시 예에 따르면, 서버는, 전자 장치로부터 음성 신호를 수신하는 언어 인식부와, 상기 전자 장치에서 출력하는 음성 신호 또는 오디오 신호에 포함되는 다수 개의 구성 요소(component)들 중 상기 음성 신호에 따른 하나 또는 그 이상의 구성 요소를 확인하는 자연어 처리부와, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하여, 상기 전자 장치로 전송하는 동작 결정부를 포함할 수 있다.According to various embodiments of the present disclosure, a server may include a language recognizer that receives a voice signal from an electronic device, and the voice among a plurality of components included in the voice signal or audio signal output from the electronic device. A natural language processor that identifies one or more components in accordance with a signal, and generates response information for the voice signal based on the one or more components or at least a portion of the information on the components, and the electronic device It may include an operation determination unit for transmitting to.

본 발명의 다양한 실시 예에 따르면, 전자 장치는, 다수 개의 연속적인 구성 요소(component)들을 포함하는 음성 신호 또는 오디오 신호를 출력하는 출력부와, 상기 출력부를 통해 출력하는 음성 신호 또는 오디오 신호에 대한 정보를 생성하는 제어부와, 음성 신호를 수신하는 수신부를 포함하며, 상기 전자 장치는, 상기 음성 신호 또는 오디오 신호에 대한 정보 및 상기 음성 신호를 서버로 전송할 수 있다.According to various embodiments of the present disclosure, an electronic device may include an output unit configured to output a voice signal or an audio signal including a plurality of consecutive components, and an audio signal or audio signal output through the output unit. And a controller configured to generate information and a receiver configured to receive a voice signal, wherein the electronic device may transmit information about the voice signal or the audio signal and the voice signal to a server.

본 발명의 다양한 실시 예에 따르면, 서버는, 전자 장치로부터 음성 신호를 수신하고, 상기 음성 신호를 이용하여 상기 전자 장치가 상기 음성 신호를 수신한 시점을 결정하는 언어 인식부와, 전자 장치로부터 출력 중인 다수 개의 구성 요소들을 포함하는 음성 신호 또는 오디오 신호에 대한 정보를 수신하고, 상기 음성 신호 또는 오디오 신호에 대한 정보 및 상기 언어 인식부에서 결정한 음성 신호를 수신한 시점을 이용하여 상기 전자 장치에서 음성 신호 수신 시점에 출력되는 하나 또는 그 이상의 구성 요소를 결정하는 컨텐츠 결정부와, 상기 하나 또는 그 이상의 구성 요소 또는 상기 구성 요소에 대한 정보의 적어도 일부분을 기반하여 상기 음성 신호에 대한 응답 정보를 생성하여, 상기 전자 장치로 전송하는 동작 결정부를 포함할 수 있다.
According to various embodiments of the present disclosure, the server may include a language recognizer configured to receive a voice signal from an electronic device and determine a point in time at which the electronic device receives the voice signal using the voice signal, and output from the electronic device. Receiving information on a voice signal or an audio signal including a plurality of constituent elements being received, and using the time point at which the information on the voice signal or audio signal and the voice signal determined by the language recognition unit are received A content determination unit that determines one or more components output at the time of signal reception, and generates response information about the voice signal based on the one or more components or at least a part of the information on the components; The apparatus may include an operation determiner for transmitting to the electronic device.

상술한 바와 같이 전자장치가 음성 신호를 수신하는 시점에 전자 장치에서 재생 중인 컨텐츠 정보를 고려하여 음성 명령을 인식함으로써, 음성 신호에 대한 음성 명령을 명확히 인식할 수 있다.
As described above, when the electronic device receives the voice signal, the voice command is recognized in consideration of the content information being played in the electronic device, thereby clearly recognizing the voice command for the voice signal.

도 1은 본 발명의 다양한 실시 예에 따른 음성 명령을 인식하기 위한 전자 장치의 블록 구성을 도시하고 있다.
도 2는 본 발명의 다양한 실시 예에 따른 전자 장치에서 음성 명령을 인식하기 위한 절차를 도시하고 있다.
도 3은 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 4는 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 5는 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 6은 본 발명의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.
도 7은 본 발명의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.
도 8은 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 9는 본 발명의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.
도 10은 본 발명의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.
도 11은 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 12는 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 13은 본 발명의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.
도 14는 본 발명의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.
도 15는 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 16은 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 17은 본 발명의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.
도 18은 본 발명의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.
도 19는 본 발명의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.
도 20은 본 발명의 다양한 실시 예에 따른 음성 명령을 인식하기 위한 화면 구성을 도시하고 있다.
도 21은 본 발명의 다양한 실시 예에 따른 음성 명령을 인식하기 위한 화면 구성을 도시하고 있다.1 is a block diagram illustrating an electronic device for recognizing a voice command according to various embodiments of the present disclosure.
2 is a flowchart illustrating a procedure for recognizing a voice command in an electronic device according to various embodiments of the present disclosure.
3 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
4 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
5 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
6 illustrates a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.
7 illustrates a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.
8 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
9 is a flowchart illustrating a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.
10 is a flowchart illustrating a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.
11 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
12 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
13 is a flowchart illustrating a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.
14 is a flowchart illustrating a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.
15 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
16 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
17 is a flowchart illustrating a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.
18 is a flowchart illustrating a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.
19 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.
20 is a diagram illustrating a screen configuration for recognizing a voice command according to various embodiments of the present disclosure.
21 illustrates a screen configuration for recognizing a voice command according to various embodiments of the present disclosure.

이하 본 개시의 다양한 실시 예에 대한 도면을 참조하여 상세히 설명한다. 그리고, 본 개시의 다양한 실시 예를 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 실시 예의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다. 그리고 후술되는 용어들은 본 개시의 다양한 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of various embodiments of the present disclosure, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present embodiment, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in various embodiments of the present disclosure, and may be changed according to intentions or customs of a user or an operator. Therefore, the definition should be made based on the contents throughout the specification.

이하 본 개시의 다양한 실시 예는 전자 장치에서 음성 신호를 수신하는 시점에 대한 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 기술에 대해 설명한다.Hereinafter, various embodiments of the present disclosure describe a technology for recognizing a voice command in consideration of content information regarding a time point at which an electronic device receives a voice signal.

이하 설명에서 전자 장치는 휴대용 전자 장치(portable electronic device), 휴대용 단말기(portable terminal), 이동 단말기(mobile terminal), 이동 패드(mobile pad), 미디어 플레이어(media player), PDA(Personal Digital Assistant), 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(Laptop computer), 스마트폰(Smart Phone), 넷북(Netbook), 텔레비전(Television), 휴대 인터넷 장치(MID: Mobile Internet Device), 울트라 모바일 PC(UMPC: Ultra Mobile PC), 태블릿 PC(Tablet Personal Computer), 네비게이션 또는 MP3 등과 같은 장치 일 수 있다. 또한, 전자 장치는 상술한 장치들 중 두 가지 이상의 장치들의 기능을 결합한 임의의 전자 장치일 수도 있다.In the following description, an electronic device may be a portable electronic device, a portable terminal, a mobile terminal, a mobile pad, a media player, a personal digital assistant (PDA), Desktop computer, Laptop computer, Smartphone, Netbook, Television, Mobile Internet Device (MID), Ultra Mobile PC (UMPC: Ultra Mobile) PC), tablet PC (Tablet Personal Computer), navigation, or a device such as MP3. Also, the electronic device may be any electronic device that combines the functions of two or more of the above devices.

도 1은 본 개시의 다양한 실시 예에 따른 음성 명령을 인식하기 위한 전자 장치의 블록 구성을 도시하고 있다.1 is a block diagram illustrating an electronic device for recognizing a voice command according to various embodiments of the present disclosure.

도 1을 참조하면 전자 장치(100)는 제어부(101), 데이터 저장부(103), 음성 검출부(105), 언어 인식부(107) 및 자연어 처리부(109)를 포함할 수 있다.Referring to FIG. 1, the electronic device 100 may include a controller 101, a data storage 103, a voice detector 105, a language recognizer 107, and a natural language processor 109.

제어부(101)는 전자 장치(100)의 전반적인 동작을 제어할 수 있다. 이때, 제어부(101)는 자연어 처리부(109)로부터 제공받은 제어 명령에 따른 컨텐츠를 스피커를 통해 출력하도록 제어할 수 있다. 여기서, 컨텐츠는 다수 개의 구성 요소(component)들의 시퀀스를 포함하는 음성 또는 오디오 신호를 포함할 수 있다. 예를 들어, 제어부(101)는 TTS(Text To Speech) 모듈을 포함할 수 있다. 만일, 자연어 처리부(109)로부터 "날씨" 재생에 대한 제어 명령을 제공받은 경우, 제어부(101)는 데이터 저장부(103) 또는 외부 서버로부터 날씨 데이터를 추출할 수 있다. TTS 모듈은 제어부(101)에서 추출한 날씨 데이터를 "2013년 07월 01일 서울 지역의 날씨는 현재 섭씨 34도, 습도 60%로 고온 다습합니다", "금주는 전반적으로 고온 다습하며 주 후반에는 장마 전선의 영향으로 많은 양의 비가 내리겠습니다"와 같은 다수 개의 구성 요소들을 순차적으로 포함하는 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. The controller 101 may control overall operations of the electronic device 100. In this case, the controller 101 may control to output the content according to the control command provided from the natural language processor 109 through the speaker. Here, the content may include a voice or audio signal including a sequence of a plurality of components. For example, the controller 101 may include a text to speech (TTS) module. If the natural language processor 109 receives a control command for "weather" reproduction, the controller 101 may extract weather data from the data storage 103 or an external server. The TTS module uses the weather data extracted from the control unit 101 as "The weather in Seoul area is high temperature and humid at 34 degrees Celsius and 60% humidity at present." The effect of the wires will be a large amount of rain ", such as a plurality of components can be converted into a voice signal or an audio signal that is sequentially included and output through the speaker.

제어부(101)는 음성 검출부(105)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠 정보를 자연어 처리부(109)로 전송할 수 있다. 이때, 제어부(101)는 음성 검출부(105)로부터 수신한 음성 신호 추출 정보에서 음성 검출부(105)가 음성 신호를 추출한 시점 정보를 확인할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 제어부(101)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)와 같은 다수 개의 콤퍼넌트들의 시퀀스를 추출하여 스피커를 통해 출력할 수 있다. 주요 뉴스(2005) 재생 중 음성 검출부(105)에서 음성 신호를 추출한 경우, 제어부(101)는 주요 뉴스(2005)에 대한 컨텐츠 정보를 자연어 처리부(109)로 전송할 수 있다. 다른 예를 들어, 도 21a을 참조하여 음악 재생 서비스를 제공하는 경우, 제어부(101)는 재생 목록에 포함된 하나 이상의 음악 파일을 재생하여 스피커를 통해 출력할 수 있다. "노래 1" 재생 중 음성 검출부(105)에서 음성 신호를 추출한 경우, 제어부(101)는 "노래 1"에 대한 컨텐츠 정보를 자연어 처리부(109)로 전송할 수 있다. 또 다른 예를 들어, 제어부(101)는 음성 검출부(105)에서 음성 신호를 추출한 시점부터 기준 시간만큼 이전에 재생한 컨텐츠 정보를 자연어 처리부(109)로 전송할 수도 있다. 하지만, 음성 검출부(105)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠가 존재하지 않는 경우, 제어부(101)는 자연어 처리부(109)로 컨텐츠 정보를 전송하지 않을 수도 있다.The controller 101 may transmit the content information being output through the speaker to the natural language processor 109 when the voice signal is extracted by the voice detector 105. In this case, the controller 101 may check the timing information at which the voice detector 105 extracts the voice signal from the voice signal extraction information received from the voice detector 105. For example, when providing a daily briefing service with reference to FIG. 20A, the controller 101 may control the weather information 2001, the stock information 2003, and the main news 2005 according to the setting information of the daily briefing service. A sequence of a plurality of components such as) can be extracted and output through a speaker. When the voice detector 105 extracts the voice signal during the reproduction of the main news 2005, the controller 101 may transmit content information about the main news 2005 to the natural language processor 109. For another example, when providing a music playback service with reference to FIG. 21A, the controller 101 may play one or more music files included in a playlist and output the same through a speaker. When the voice detector 105 extracts the voice signal during reproduction of the song 1, the controller 101 may transmit content information about the song 1 to the natural language processor 109. As another example, the controller 101 may transmit the content information previously reproduced by the reference time from the time when the voice signal is extracted by the voice detector 105 to the natural language processor 109. However, when there is no content being output through the speaker at the time when the voice detector 105 extracts the voice signal, the controller 101 may not transmit the content information to the natural language processor 109.

데이터 저장부(103)는 전자장치(100)의 동작을 제어하기 위한 적어도 하나의 프로그램, 프로그램 수행을 위한 데이터 및 프로그램 수행 중에 발생되는 데이터를 저장할 수 있다. 예를 들어, 데이터 저장부(103)는 음성 명령에 대한 다양한 컨텐츠 정보를 저장할 수 있다. The data storage 103 may store at least one program for controlling the operation of the electronic device 100, data for executing a program, and data generated while executing the program. For example, the data storage unit 103 may store various content information about a voice command.

음성 검출부(105)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 언어 인식부(107)로 제공할 수 있다. 예를 들어, 음성 검출부(105)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부(AEC: Adaptive Echo Canceller) 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음(background noise)을 제거할 수 있는 잡음 제거부(NS: Noise Suppressor)를 포함할 수 있다. 이에 따라, 음성 검출부(105)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 105 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the language recognizer 107. For example, the voice detector 105 may remove background echo from an audio signal provided from an echo control unit (AEC) and an echo canceller that may remove echo components from an audio signal collected through a microphone. It may include a Noise Suppressor (NS) that can remove the. Accordingly, the voice detector 105 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

상술한 바와 같이 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출한 경우, 음성 검출부(105)는 음성 신호를 추출한 시점에 제어부(101)로 음성 신호 추출 정보를 제공할 수 있다. 여기서, 음성 신호 추출 정보는 음성 검출부(105)에서 음성 신호를 추출한 시점 정보를 포함할 수 있다.When the voice signal is extracted from the audio signal collected through the microphone as described above, the voice detector 105 may provide the voice signal extraction information to the controller 101 at the time when the voice signal is extracted. Here, the voice signal extraction information may include time point information from which the voice signal is extracted by the voice detector 105.

언어 인식부(107)는 음성 검출부(105)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. The language recognizer 107 may convert the voice signal provided from the voice detector 105 into text data.

자연어 처리부(109)는 언어 인식부(107)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도(intent) 및 핵심 정보(keyword)를 추출할 수 있다. 예컨대, 자연어 처리부(109)는 언어 인식부(107)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. The natural language processor 109 may analyze text data provided from the language recognizer 107 to extract an intent and key information of the user included in the text data. For example, the natural language processor 109 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognizer 107.

자연어 처리부(109)는 동작 결정부를 포함할 수 있다. 동작 결정부는 자연어 처리부(109)에서 추출한 음성 명령에 따른 제어부(101)의 동작을 위한 제어 명령을 생성할 수 있다.The natural language processor 109 may include an operation determiner. The operation determiner may generate a control command for the operation of the controller 101 according to the voice command extracted by the natural language processor 109.

자연어 처리부(109)는 제어부(101)로부터 제공받은 컨텐츠 정보를 이용하여 언어 인식부(107)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 언어 인식부(107)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 자연어 처리부(109)는 언어 인식부(107)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 자연어 처리부(109)는 제어부(101)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The natural language processor 109 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognition unit 107 using the content information provided from the controller 101. For example, when the text data of "now news details" is provided from the language recognition unit 107, the natural language processing unit 109 analyzes the text data provided from the language recognition unit 107 to reproduce the voice signal now. It can be appreciated that it requires detailed information about the news being processed. In this case, the natural language processor 109 may recognize accurate information about the news being played, in consideration of the content information provided from the controller 101.

도 2는 본 개시의 다양한 실시 예에 따른 전자 장치에서 음성 명령을 인식하기 위한 절차를 도시하고 있다.2 illustrates a procedure for recognizing a voice command in an electronic device according to various embodiments of the present disclosure.

도 2를 참조하면 전자 장치는 201 동작에서 컨텐츠를 제공할 수 있다. 예를 들어, 전자 장치는 자연어 처리부(109)에서 추출한 제어 명령에 따른 컨텐츠를 데이터 저장부(103) 또는 외부 서버로부터 추출하여 재생할 수 있다. 이때, 전자 장치는 데이터 저장부(103) 또는 외부 서버로부터 추출한 컨텐츠를 TTS 모듈을 이용하여 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소(component)들의 시퀀스를 포함할 수 있다.Referring to FIG. 2, the electronic device may provide content in operation 201. For example, the electronic device may extract and reproduce the content according to the control command extracted by the natural language processor 109 from the data storage 103 or an external server. In this case, the electronic device may convert the content extracted from the data storage 103 or the external server into a voice signal or an audio signal using the TTS module and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

컨텐츠 제공 중 전자 장치는 203 동작에서 음성 신호를 수신할 수 있다. 예를 들어, 전자 장치는 마이크를 통해 수신된 오디오 신호에서 음성 신호를 추출할 수 있다. During content provision, the electronic device may receive a voice signal in operation 203. For example, the electronic device may extract a voice signal from the audio signal received through the microphone.

음성 신호를 수신한 경우, 전자 장치는 205 동작에서 음성 신호를 수신한 시점에 재생 중인 컨텐츠에 대한 정보를 생성할 수 있다. 전자 장치는 재생 중인 다수 개의 구성 요소들의 시퀀스를 포함하는 음성 또는 오디오 신호 재생 중 음성 신호 수신 시점에 따른 하나 또는 그 이상의 구성 요소를 선택할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑 서비스에 따라 주요 뉴스(2005) 재생 중 음성 신호를 수신한 경우, 전자 장치는 주요 뉴스(2005)에 대한 컨텐츠 정보를 생성할 수 있다. 다른 예를 들어, 도 21a을 참조하여 재생 목록에 포함된 음악 파일 재생 중 음성 신호를 수신한 경우, 전자 장치는 재생 중인 "노래 1"에 대한 컨텐츠 정보를 생성할 수도 있다. 또 다른 예를 들어, 전자 장치는 음성 신호를 수신한 시점부터 기준 시간만큼 이전에 재생한 컨텐츠에 대한 컨텐츠 정보를 생성할 수도 있다. 하지만, 음성 신호를 수신하는 시점에 스피커를 통해 출력 중인 컨텐츠가 존재하지 않는 경우, 전자 장치는 컨텐츠 정보를 생성하지 않을 수도 있다. 여기서, 컨텐츠 정보는 재생중인 컨텐츠에 포함되는 다수 개의 구성 요소들 중 음성 신호를 수신한 시점에 재생 중인 하나 또는 그 이상의 구성 요소에 대한 정보를 포함할 수 있다. 구성 요소에 대한 정보는 구성 요소 세션 정보, 음악 파일 정보 중 하나 또는 그 이상을 포함할 수 있다.When the voice signal is received, the electronic device may generate information on the content being played at the time when the voice signal is received in operation 205. The electronic device may select one or more components according to a voice signal reception time during reproduction of a voice or audio signal including a sequence of a plurality of components being reproduced. For example, with reference to FIG. 20A, when a voice signal is received while playing the main news 2005 according to the daily briefing service, the electronic device may generate content information about the main news 2005. For another example, when a voice signal is received while playing a music file included in a playlist with reference to FIG. 21A, the electronic device may generate content information about "song 1" being played. For another example, the electronic device may generate content information about content previously played back by a reference time from the time when the voice signal is received. However, when there is no content being output through the speaker at the time of receiving the voice signal, the electronic device may not generate the content information. Here, the content information may include information on one or more components that are being played back when a voice signal is received from a plurality of components included in the contents being played. The information about the component may include one or more of component session information and music file information.

전자 장치는 207 동작에서 음성 신호를 수신한 시점에 재생 중인 컨텐츠에 대한 정보를 기반으로 203 동작에서 수신한 음성 신호에 대한 응답 정보를 생성할 수 있다. 예를 들어, 전자 장치는 음성 신호를 수신한 시점에 재생 중인 컨텐츠에 대한 정보와 203 동작에서 수신한 음성 신호에 따른 제어 명령을 생성할 수 있다. 예컨대, 음성 신호를 "지금 뉴스 상세 정보"의 문자 데이터로 변환한 경우, 전자 장치의 자연어 처리부(109)는 문자 데이터를 분석하여 음성 신호가 "지금 재생 중인 뉴스에 대한 상세한 정보를 요구"하는 것으로 인지할 수 있다. 이때, 자연어 처리부(109)는 음성 신호 수신 시점에 재생 중인 컨텐츠 정보에 따라 "휴대폰 전격 공개"에 대한 상세 정보를 요구하는 것으로 인지할 수 있다. 전자 장치는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다. 전자 장치는 음성 신호를 수신한 시점에 재생 중인 컨텐츠에 대한 정보와 203 동작에서 수신한 음성 신호에 따른 제어 명령을 고려하여 음성 신호에 대한 컨텐츠를 생성할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑 서비스 제공 중 "지금 뉴스 상세 정보"의 음성 신호를 수신한 경우, 전자 장치는 도 20b와 같이 "휴대폰 전격 공개"에 대한 상세 뉴스 정보를 재생할 수 있다. 이때, 전자 장치는 TTS 모듈을 통해 "휴대폰 전격 공개"에 대한 상세 뉴스를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다. 다른 예를 들어, 도 21a를 참조하여 음악 재생 중 "지금 노래 가수 정보"의 음성 신호를 수신한 경우, 전자 장치는 도 21b와 같이 "노래 1"에 대한 가수 정보를 재생할 수 있다. 이때, 전자 장치는 TTS 모듈을 통해 "노래 1"에 대한 가수 정보를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다.In operation 207, the electronic device may generate response information about the voice signal received in operation 203 based on the information on the content being played back. For example, the electronic device may generate a control command according to the information on the content being played and the voice signal received in operation 203 when the voice signal is received. For example, when the voice signal is converted into text data of "now news details", the natural language processing unit 109 of the electronic device analyzes the text data so that the voice signal "requires detailed information on the news being reproduced". It can be recognized. At this time, the natural language processing unit 109 may recognize that the request for detailed information on the "mobile phone full disclosure" according to the content information being reproduced at the time of receiving the voice signal. The electronic device may generate a control command for reproducing detailed information on the "mobile phone blunt disclosure". The electronic device may generate the content for the voice signal in consideration of the information on the content being played at the time of receiving the voice signal and the control command according to the voice signal received in operation 203. For example, when receiving a voice signal of "now news detailed information" while providing a daily briefing service with reference to FIG. 20A, the electronic device may play detailed news information of "mobile phone blitz" as shown in FIG. 20B. In this case, the electronic device may convert the detailed news of "mobile phone lightning disclosure" into a voice signal through the TTS module and output it through the speaker. For another example, when a voice signal of "now singer information" is received while playing music with reference to FIG. 21A, the electronic device may play the singer information for "song 1" as shown in FIG. 21B. In this case, the electronic device may convert the mantissa information about “song 1” into a voice signal through the TTS module and output the voice information through the speaker.

상술한 실시 예에서 전자 장치는 제어부(101), 데이터 저장부(103), 음성 검출부(105), 언어 인식부(107) 및 자연어 처리부(109)를 포함하여 음성 신호에 대한 음성 명령을 추출할 수 있다.In the above-described embodiment, the electronic device may include a controller 101, a data storage 103, a voice detector 105, a language recognizer 107, and a natural language processor 109 to extract a voice command for a voice signal. Can be.

다른 실시 예에서 전자 장치는 서버를 이용하여 음성 신호에 대한 음성 명령을 추출하도록 구성될 수도 있다.In another embodiment, the electronic device may be configured to extract a voice command for a voice signal using a server.

도 3은 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.3 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 3을 참조하면 음성 인식 시스템은 전자 장치(300)와 서버(310)를 포함할 수 있다.Referring to FIG. 3, the voice recognition system may include an electronic device 300 and a server 310.

전자 장치(300)는 마이크를 통해 음성 신호를 수신하고, 서버(310)로부터 제공받은 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치(300)는 제어부(301), TTS 모듈(303) 및 음성 검출부(305)를 포함할 수 있다. The electronic device 300 may receive a voice signal through a microphone and play the content provided from the server 310. For example, the electronic device 300 may include a controller 301, a TTS module 303, and a voice detector 305.

제어부(301)는 전자 장치(300)의 전반적인 동작을 제어할 수 있다. 제어부(301)는 서버(310)로부터 제공받은 컨텐츠를 재생하도록 제어할 수 있다. 예를 들어, 제어부(301)는 서버(310)로부터 제공받은 컨텐츠를 TTS 모듈(303)에서 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The controller 301 may control overall operations of the electronic device 300. The controller 301 may control to play the content provided from the server 310. For example, the controller 301 may control the content provided from the server 310 to be converted into a voice signal or an audio signal by the TTS module 303 and output through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

제어부(301)는 음성 검출부(305)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠 정보를 서버(310)로 전송할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 제어부(301)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)와 같은 다수 개의 콤퍼넌트들의 시퀀스를 추출하여 스피커를 통해 출력하도록 제어할 수 있다. 주요 뉴스(2005) 재생 중 음성 검출부(305)에서 음성 신호를 추출한 경우, 제어부(301)는 주요 뉴스(2005)에 대한 컨텐츠 정보를 서버(310)로 전송할 수 있다. 다른 예를 들어, 도 21a을 참조하여 음악 재생 서비스를 제공하는 경우, 제어부(301)는 재생 목록에 포함된 하나 이상의 음악 파일을 재생하여 스피커를 통해 출력하도록 제어할 수 있다. "노래 1" 재생 중 음성 검출부(305)에서 음성 신호를 추출한 경우, 제어부(301)는 "노래 1"에 대한 컨텐츠 정보를 서버(310)로 전송할 수 있다. 또 다른 예를 들어, 제어부(301)는 음성 신호 추출 정보를 수신한 시점부터 기준 시간만큼 이전에 재생한 컨텐츠 정보를 서버(310)로 전송할 수도 있다. 하지만, 음성 검출부(305)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠가 존재하지 않는 경우, 제어부(301)는 컨텐츠 정보를 서버(310)로 전송하지 않을 수도 있다.The controller 301 may transmit the content information being output through the speaker to the server 310 at the time when the voice detector 305 extracts the voice signal. For example, in the case of providing a daily briefing service with reference to FIG. 20A, the controller 301 may use the weather information 2001, the stock information 2003, and the main news 2005 according to the setting information of the daily briefing service. A sequence of a plurality of components such as) can be extracted and controlled to be output through the speaker. When a voice signal is extracted by the voice detector 305 while the main news 2005 is reproduced, the controller 301 may transmit content information about the main news 2005 to the server 310. For another example, when providing a music playback service with reference to FIG. 21A, the controller 301 may control to play one or more music files included in a playlist and output the same through a speaker. When the voice signal is extracted by the voice detector 305 during the playback of the song 1, the controller 301 may transmit content information about the song 1 to the server 310. For another example, the controller 301 may transmit the content information, which was previously played back by the reference time, from the time point at which the voice signal extraction information is received, to the server 310. However, when there is no content being output through the speaker at the time when the voice detector 305 extracts the voice signal, the controller 301 may not transmit the content information to the server 310.

TTS 모듈(303)은 제어부(301)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다.The TTS module 303 may convert the content provided from the controller 301 into a voice signal or an audio signal and output the same through a speaker.

음성 검출부(305)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(310)로 제공할 수 있다. 예를 들어, 음성 검출부(305)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(305)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 305 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 310. For example, the voice detector 305 may include an echo controller for removing echo components from an audio signal collected through a microphone, and a noise remover for removing background noise from an audio signal provided from an echo canceller. have. Accordingly, the voice detector 305 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

상술한 바와 같이 전자 장치(300)에서 컨텐츠 정보와 음성 신호를 서버(310)로 전송하는 경우, 전자 장치(300)는 컨텐츠 정보와 음성 신호를 독립적으로 서버(310)로 전송하거나 음성 신호에 컨텐츠 정보를 추가하여 서버(310)로 전송할 수 있다.As described above, when the electronic device 300 transmits the content information and the voice signal to the server 310, the electronic device 300 independently transmits the content information and the voice signal to the server 310 or the content in the voice signal. The information may be added and transmitted to the server 310.

서버(310)는 전자 장치(300)로부터 제공받은 컨텐츠 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 컨텐츠 제공 서버들(320-1 내지 320-n)로부터 음성 명령에 따른 컨텐츠를 추출하여 전자 장치(300)로 전송할 수 있다. 예를 들어, 서버(310)는 언어 인식부(311), 자연어 처리부(313), 동작 결정부(315) 및 컨텐츠 수집부(317)를 포함할 수 있다.The server 310 extracts a voice command using the content information and the voice signal provided from the electronic device 300, extracts the content according to the voice command from the content providing servers 320-1 to 320-n, Transmit to device 300. For example, the server 310 may include a language recognizer 311, a natural language processor 313, an operation determiner 315, and a content collector 317.

언어 인식부(311)는 전자 장치(300)의 음성 검출부(305)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. The language recognizer 311 may convert a voice signal provided from the voice detector 305 of the electronic device 300 into text data.

자연어 처리부(313)는 언어 인식부(311)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 자연어 처리부(313)는 언어 인식부(311)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 자연어 처리부(313)는 전자 장치(300)의 제어부(301)로부터 제공받은 컨텐츠 정보를 이용하여 언어 인식부(311)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 언어 인식부(311)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 자연어 처리부(313)는 언어 인식부(311)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 자연어 처리부(313)는 제어부(301)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The natural language processor 313 may analyze the text data provided from the language recognizer 311 to extract the intention and key information of the user included in the text data. The natural language processor 313 may analyze text data provided from the language recognizer 311 and extract a voice command included in the voice signal. In this case, the natural language processor 313 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognition unit 311 using the content information provided from the controller 301 of the electronic device 300. Can be. For example, when text data of "now news detail information" is provided from the language recognition unit 311, the natural language processing unit 313 analyzes the text data provided from the language recognition unit 311 and reproduces the voice signal now. It can be appreciated that it requires detailed information about the news being processed. In this case, the natural language processor 313 may recognize accurate information on the news being played, in consideration of the content information provided from the controller 301.

동작 결정부(315)는 자연어 처리부(313)에서 추출한 음성 명령에 따른 제어부(301)의 동작을 위한 제어 명령을 생성할 수 있다. 예를 들어, 자연어 처리부(313)에서 "지금 재생 중인 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(315)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.The operation determiner 315 may generate a control command for the operation of the controller 301 according to the voice command extracted by the natural language processor 313. For example, when the natural language processor 313 recognizes that the user is requesting detailed information on the news (for example, cell phone disclosure) that is being played, the operation determination unit 315 may determine the detailed information on the “mobile phone disclosure”. It is possible to generate a control command for playing the.

컨텐츠 수집부(317)는 동작 결정부(315)로부터 제공받은 제어 명령에 따라 컨텐츠 제공 서버들(320-1 내지 320-n)로부터 전자 장치(300)로 제공하기 위한 컨텐츠를 수집하여 전자 장치(300)로 전송할 수 있다. 예를 들어, 동작 결정부(315)로부터 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 제공받은 경우, 컨텐츠 수집부(317)는 컨텐츠 제공 서버들(320-1 내지 320-n)로부터 "휴대폰 전격 공개"와 관련된 하나 이상의 컨텐츠를 수집하여 전자 장치(300)로 전송할 수 있다.The content collecting unit 317 collects content for providing the electronic device 300 from the content providing servers 320-1 to 320-n to the electronic device 300 according to a control command provided from the operation determining unit 315. 300). For example, when a control command for reproducing detailed information on "mobile phone blunt disclosure" is received from the operation determining unit 315, the content collecting unit 317 may provide content providing servers 320-1 to 320-n. ) May collect one or more contents related to "mobile phone disclosure" to the electronic device 300.

상술한 바와 같이 전자 장치(300)의 제어부(301)는 음성 검출부(305)에서 음성 신호를 검출한 시점에 스피커를 통해 출력 중인 컨텐츠에 대한 컨텐츠 정보를 서버(310)로 전송할 수 있다. 이때, 전자 장치(300)는 하기 도 4 또는 도 5를 참조하여 컨텐츠 추정부(407 또는 507)를 이용하여 음성 검출부(305)에서 음성 신호를 검출한 시점에 재생 중인 컨텐츠를 확인할 수 있다.As described above, the controller 301 of the electronic device 300 may transmit the content information on the content being output through the speaker to the server 310 at the time when the voice detector 305 detects the voice signal. In this case, the electronic device 300 may check the content being played at the time when the voice detector 305 detects the voice signal using the content estimator 407 or 507 with reference to FIG. 4 or 5.

도 4는 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.4 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 4를 참조하면 음성 인식 시스템은 전자 장치(400)와 서버(410)를 포함할 수 있다. 이하 설명에서 서버(410)는 도 3에 도시된 서버(310)와 구성 및 동작이 동일하므로 상세한 설명을 생략한다. Referring to FIG. 4, the voice recognition system may include an electronic device 400 and a server 410. In the following description, since the server 410 has the same configuration and operation as the server 310 illustrated in FIG. 3, a detailed description thereof will be omitted.

전자 장치(400)는 마이크를 통해 음성 신호를 수신하고, 서버(410)로부터 제공받은 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치(400)는 제어부(401), TTS 모듈(403), 음성 검출부(405) 및 컨텐츠 추정부(407)를 포함할 수 있다. The electronic device 400 may receive a voice signal through a microphone and play the content provided from the server 410. For example, the electronic device 400 may include a controller 401, a TTS module 403, a voice detector 405, and a content estimator 407.

제어부(401)는 전자 장치(400)의 전반적인 동작을 제어할 수 있다. 제어부(401)는 서버(410)로부터 제공받은 컨텐츠를 재생하도록 제어할 수 있다. 예를 들어, 제어부(401)는 서버(410)로부터 제공받은 컨텐츠를 TTS 모듈(403)을 통해 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다. The controller 401 may control overall operations of the electronic device 400. The controller 401 may control to play the content provided from the server 410. For example, the controller 401 may control the content provided from the server 410 to be converted into a voice signal or an audio signal through the TTS module 403 and output through a speaker.

TTS 모듈(403)은 제어부(401)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The TTS module 403 may convert the content provided from the controller 401 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

음성 검출부(405)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(410)로 제공할 수 있다. 예를 들어, 음성 검출부(405)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(405)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 405 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 410. For example, the voice detector 405 may include an echo controller for removing echo components from an audio signal collected through a microphone, and a noise remover for removing background noise from an audio signal provided from an echo canceller. have. Accordingly, the voice detector 405 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출한 경우, 음성 검출부(405)는 음성 신호를 추출한 시점에 음성 신호 추출 정보를 생성하여 컨텐츠 추정부(407)로 전송할 수 있다. 여기서, 음성 신호 추출 정보는 음성 검출부(405)에서 음성 신호를 추출한 시점 정보를 포함할 수 있다.When the voice signal is extracted from the audio signal collected through the microphone, the voice detector 405 may generate voice signal extraction information at the time when the voice signal is extracted and transmit the generated voice signal to the content estimator 407. Here, the voice signal extraction information may include time point information from which the voice signal is extracted by the voice detector 405.

컨텐츠 추정부(407)는 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠를 모니터링할 수 있다. 이에 따라, 컨텐츠 추정부(407)는 음성 검출부(405)에서 음성 수신 신호를 추출한 시점에 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠에 대한 정보를 확인하여 서버(410)로 전송할 수 있다. 이때, 컨텐츠 추정부(407)는 음성 검출부(405)로부터 제공받은 음성 신호 추출 정보에서 음성 검출부(405)에서 음성 수신 신호를 추출한 시점을 확인할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 제어부(401)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)와 같은 다수 개의 콤퍼넌트들의 시퀀스를 TTS 모듈(403)로 전송할 수 있다. 주요 뉴스(2005)를 TTS 모듈(403)로 전송 중 음성 검출부(405)에서 음성 신호를 추출한 경우, 컨텐츠 추정부(407)는 주요 뉴스(2005)에 대한 컨텐츠 정보를 서버(410)로 전송할 수 있다. 이때, 컨텐츠 추정부(407)는 음성 검출부(405)에서 음성 신호를 추출한 시점부터 기준 시간만큼 이전에 제어부(401)에서 TTS 모듈(403)로 전송한 컨텐츠에 대한 정보를 서버(410)로 전송할 수도 있다. 하지만, 음성 검출부(405)에서 음성 신호를 추출한 시점에 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠가 존재하지 않는 경우, 컨텐츠 추정부(407)는 컨텐츠 정보를 서버(410)로 전송하지 않을 수도 있다.The content estimator 407 may monitor the content transmitted from the controller 401 to the TTS module 403. Accordingly, the content estimator 407 may check the information on the content transmitted from the control unit 401 to the TTS module 403 at the time when the voice detection unit 405 extracts the voice reception signal and transmit the information to the server 410. have. In this case, the content estimator 407 may check the time point at which the voice detection signal is extracted by the voice detector 405 from the voice signal extraction information provided from the voice detector 405. For example, in the case of providing a daily briefing service with reference to FIG. 20A, the controller 401 may report weather information 2001, stock information 2003, and major news 2005 according to the setting information of the daily briefing service. A sequence of multiple components, such as), may be sent to the TTS module 403. When the voice detector 405 extracts a voice signal while the main news 2005 is transmitted to the TTS module 403, the content estimator 407 may transmit content information about the main news 2005 to the server 410. have. At this time, the content estimator 407 transmits the information on the content transmitted from the controller 401 to the TTS module 403 before the voice signal is extracted by the voice detector 405 to the server 410 for a reference time. It may be. However, when there is no content transmitted from the control unit 401 to the TTS module 403 at the time when the voice detector 405 extracts the voice signal, the content estimator 407 transmits the content information to the server 410. You may not.

도 5는 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.5 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 5를 참조하면 음성 인식 시스템은 전자 장치(500)와 서버(510)를 포함할 수 있다. 이하 설명에서 서버(510)는 도 3에 도시된 서버(310)와 구성 및 동작이 동일하므로 상세한 설명을 생략한다. Referring to FIG. 5, a voice recognition system may include an electronic device 500 and a server 510. In the following description, since the server 510 has the same configuration and operation as the server 310 illustrated in FIG. 3, a detailed description thereof will be omitted.

전자 장치(500)는 마이크를 통해 음성 신호를 수신하고, 서버(510)로부터 제공받은 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치(500)는 제어부(501), TTS 모듈(503), 음성 검출부(505) 및 컨텐츠 추정부(507)를 포함할 수 있다. The electronic device 500 may receive a voice signal through a microphone and play the content provided from the server 510. For example, the electronic device 500 may include a controller 501, a TTS module 503, a voice detector 505, and a content estimator 507.

제어부(501)는 전자 장치(500)의 전반적인 동작을 제어할 수 있다. 제어부(501)는 서버(510)로부터 제공받은 컨텐츠를 재생하도록 제어할 수 있다. 예를 들어, 제어부(501)는 서버(510)로부터 제공받은 컨텐츠를 TTS 모듈(503)을 통해 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다. The controller 501 may control overall operations of the electronic device 500. The controller 501 may control to play the content provided from the server 510. For example, the controller 501 may control the content provided from the server 510 to be converted into a voice signal or an audio signal through the TTS module 503 and output through a speaker.

TTS 모듈(503)은 제어부(501)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The TTS module 503 may convert the content provided from the controller 501 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

음성 검출부(505)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(510)로 제공할 수 있다. 예를 들어, 음성 검출부(505)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(505)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 505 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 510. For example, the voice detector 505 may include an echo controller for removing echo components from an audio signal collected through a microphone, and a noise remover for removing background noise from an audio signal provided from the echo canceller. have. Accordingly, the voice detector 505 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

만일, 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출한 경우, 음성 검출부(505)는 음성 신호를 추출한 시점에 음성 신호 추출 정보를 생성하여 컨텐츠 추정부(507)로 전송할 수 있다. 여기서, 음성 신호 추출 정보는 음성 검출부(505)에서 음성 신호를 추출한 시점 정보를 포함할 수 있다.If the voice signal is extracted from the audio signal collected through the microphone, the voice detector 505 may generate the voice signal extraction information at the time when the voice signal is extracted and transmit the extracted voice signal to the content estimator 507. Here, the voice signal extraction information may include time point information from which the voice signal is extracted by the voice detector 505.

컨텐츠 추정부(507)는 TTS 모듈(503)에서 출력되는 컨텐츠를 모니터링할 수 있다. 이에 따라, 컨텐츠 추정부(507)는 음성 검출부(505)에서 음성 신호를 추출하는 시점에 TTS 모듈(503)에서 출력하는 컨텐츠에 대한 정보를 확인하여 서버(510)로 전송할 수 있다. 이때, 컨텐츠 추정부(507)는 음성 검출부(505)로부터 제공받은 음성 신호 추출 정보에서 음성 검출부(505)에서 음성 신호를 추출한 시점을 확인할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, TTS 모듈(503)은 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)를 음성 신호로 변환하여 스피커를 통해 출력할 수 있다. 만일, TTS 모듈(503)에서 주요 뉴스(2005)에 대한 음성 신호를 스피커를 통해 출력 중 음성 검출부(505)에서 음성 신호를 추출한 경우, 컨텐츠 추정부(507)는 주요 뉴스(2005)에 대한 컨텐츠 정보를 서버(510)로 전송할 수 있다. 이때, 컨텐츠 추정부(507)는 음성 검출부(505)에서 음성 신호를 추출한 시점부터 기준 시간만큼 이전에 TTS 모듈(503)에서 스피커를 통해 출력한 컨텐츠에 대한 컨텐츠 정보를 서버(510)로 전송할 수도 있다. 하지만, 음성 검출부(505)에서 음성 신호를 추출한 시점에 TTS 모듈(503)에서 전송되는 컨텐츠가 존재하지 않는 경우, 컨텐츠 추정부(507)는 컨텐츠 정보를 서버(510)로 전송하지 않을 수도 있다.The content estimator 507 may monitor the content output from the TTS module 503. Accordingly, the content estimator 507 may check the information on the content output from the TTS module 503 and transmit the information to the server 510 at the time when the voice detector 505 extracts the voice signal. In this case, the content estimator 507 may check a time point at which the voice signal is extracted by the voice detector 505 from the voice signal extraction information provided from the voice detector 505. For example, when providing a daily briefing service with reference to FIG. 20A, the TTS module 503 may include weather information 2001, stock information 2003, and main news information according to the setting information of the daily briefing service. 2005) can be converted into a voice signal and output through a speaker. If the TTS module 503 extracts the voice signal for the main news 2005 through the speaker, the content detector 507 extracts the voice signal for the main news 2005 from the speaker. Information may be sent to the server 510. In this case, the content estimator 507 may transmit the content information about the content output through the speaker from the TTS module 503 before the reference time from the time when the voice signal is extracted by the voice detector 505 to the server 510. have. However, if there is no content transmitted from the TTS module 503 at the time when the voice detector 505 extracts the voice signal, the content estimator 507 may not transmit the content information to the server 510.

도 6은 본 개시의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.6 is a flowchart illustrating a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.

도 6을 참조하면 전자 장치는 601 동작에서 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치는 서버로부터 제공받은 컨텐츠를 TTS 모듈을 이용하여 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.Referring to FIG. 6, the electronic device may play content in operation 601. For example, the electronic device may convert the content provided from the server into a voice signal or an audio signal using the TTS module and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

컨텐츠 재생 중 전자 장치는 603 동작에서 음성 신호를 수신할 수 있다. 예를 들어, 전자 장치는 마이크를 통해 수신된 오디오 신호에서 음성 신호를 추출할 수 있다. During content playback, the electronic device may receive a voice signal in operation 603. For example, the electronic device may extract a voice signal from the audio signal received through the microphone.

음성 신호를 수신한 경우, 전자 장치는 605 동작에서 음성 신호를 수신한 시점에 재생 중인 컨텐츠 정보를 생성할 수 있다. 전자 장치는 재생 중인 다수 개의 구성 요소들의 시퀀스를 포함하는 음성 또는 오디오 신호 재생 중 음성 신호 수신 시점에 따른 하나 또는 그 이상의 구성 요소를 선택할 수 있다. 예를 들어, 도 4를 참조하면, 전자장치는 컨텐츠 추정부(407)를 이용하여 음성 검출부(405)에서 음성 수신 신호를 추출한 시점에 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수 있다. 이때, 전자 장치는 음성 검출부(405)에서 음성 수신 신호를 추출한 시점보다 기준 시간만큼 이전에 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수도 있다. 하지만, 음성 신호를 수신하는 시점에 제어부(401)에서 TTS 모듈(403)로 전송하는 컨텐츠가 존재하지 않는 경우, 전자 장치는 컨텐츠 정보를 생성하지 않을 수도 있다. 다른 예를 들어, 도 5를 참조하면 전자장치는 컨텐츠 추정부(507)를 이용하여 음성 검출부(505)에서 음성 수신 신호를 추출한 시점에 TTS 모듈(503)에서 출력되는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수 있다. 이때, 전자 장치는 음성 검출부(505)에서 음성 수신 신호를 추출한 시점보다 기준 시간만큼 이전에 TTS 모듈(503)에서 출력되는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수도 있다. 하지만, 음성 신호를 수신하는 시점에 TTS 모듈(503)에서 출력되는 컨텐츠가 존재하지 않는 경우, 전자 장치는 컨텐츠 정보를 생성하지 않을 수도 있다. 여기서, 컨텐츠 정보는 재생중인 컨텐츠에 포함되는 다수 개의 구성 요소들 중 음성 신호를 수신한 시점에 재생 중인 하나 또는 그 이상의 구성 요소에 대한 정보를 포함할 수 있다. 구성 요소에 대한 정보는 구성 요소 세션 정보, 음악 파일 정보 중 하나 또는 그 이상을 포함할 수 있다.When the voice signal is received, the electronic device may generate content information that is being played back when the voice signal is received in operation 605. The electronic device may select one or more components according to a voice signal reception time during reproduction of a voice or audio signal including a sequence of a plurality of components being reproduced. For example, referring to FIG. 4, the electronic device transmits content transmitted from the control unit 401 to the TTS module 403 when the voice detection signal is extracted by the voice detector 405 using the content estimator 407. The content information may be generated by checking. In this case, the electronic device may generate content information by checking the content transmitted from the control unit 401 to the TTS module 403 before the time point at which the voice detection signal is extracted by the voice detector 405. However, when there is no content transmitted from the control unit 401 to the TTS module 403 at the time of receiving the voice signal, the electronic device may not generate the content information. For another example, referring to FIG. 5, the electronic device checks the content output from the TTS module 503 when the voice detection signal is extracted by the voice detector 505 using the content estimator 507 to obtain content information. Can be generated. In this case, the electronic device may generate content information by checking the content output from the TTS module 503 before the time point at which the voice detector 505 extracts the voice reception signal by the reference time. However, when there is no content output from the TTS module 503 at the time of receiving the voice signal, the electronic device may not generate the content information. Here, the content information may include information on one or more components that are being played back when a voice signal is received from a plurality of components included in the contents being played. The information about the component may include one or more of component session information and music file information.

이후, 전자 장치는 607 동작에서 컨텐츠 정보와 음성 신호를 서버로 전송할 수 있다. 이때, 전자 장치는 컨텐츠 정보와 음성 신호를 독립적으로 서버로 전송하거나 음성 신호에 컨텐츠 정보를 추가하여 서버로 전송할 수 있다.In operation 607, the electronic device may transmit the content information and the voice signal to the server. In this case, the electronic device may independently transmit the content information and the voice signal to the server or add the content information to the voice signal and transmit the content information to the server.

이후, 전자 장치는 609 동작에서 서버로부터 컨텐츠가 수신되는지 확인할 수 있다. 전자 장치는 607 동작에서 서버로 전송한 음성 신호에 대한 응답이 수신되는지 확인할 수 있다. In operation 609, the electronic device may determine whether content is received from the server. The electronic device may check whether a response to the voice signal transmitted to the server is received in operation 607.

서버로부터 컨텐츠를 수신한 경우, 전자 장치는 611 동작에서 서버로부터 제공받은 컨텐츠를 재생할 수 있다. 이때, 전자 장치는 TTS 모듈을 통해 서버로부터 제공받은 컨텐츠를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다. When the content is received from the server, the electronic device may play the content provided from the server in operation 611. In this case, the electronic device may convert the content provided from the server through the TTS module into a voice signal and output the same through the speaker.

도 7은 본 개시의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.7 illustrates a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.

도 7을 참조하면 서버는 701 동작에서 전자 장치로부터 음성 신호가 수신되는지 확인할 수 있다. Referring to FIG. 7, the server may determine whether a voice signal is received from the electronic device in operation 701.

전자 장치로부터 음성 신호를 수신한 경우, 서버는 703 동작에서 전자 장치로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. When the voice signal is received from the electronic device, the server may convert the voice signal provided from the electronic device into text data in operation 703.

서버는 705 동작에서 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수 있다. 예를 들어, 서버는 전자 장치로부터 컨텐츠 정보를 수신할 수 있다. 다른 예를 들어, 서버는 701 동작에서 전자 장치로부터 수신한 음성 신호에 포함된 컨텐츠 정보를 확인할 수도 있다.In operation 705, the server may identify information about the content being played when the electronic device receives the voice signal. For example, the server may receive content information from the electronic device. For another example, the server may check the content information included in the voice signal received from the electronic device in operation 701.

전자 장치는 707 동작에서 컨텐츠 정보와 음성 신호를 고려하여 제어 명령을 생성할 수 있다. 예를 들어, 음성 신호를 "지금 뉴스 상세 정보"의 문자 데이터로 변환한 경우, 서버는 자연어 처리부를 통해 문자 데이터를 분석하여 음성 신호가 "지금 재생 중인 뉴스에 대한 상세한 정보를 요구"하는 것으로 인지할 수 있다. 이때, 자연어 처리부는 전자 장치로부터 제공받은 컨텐츠 정보에 따라 "휴대폰 전격 공개"에 대한 상세 정보를 요구하는 것으로 인지할 수 있다. 이에 따라, 전자 장치는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.In operation 707, the electronic device may generate a control command in consideration of the content information and the voice signal. For example, if the voice signal is converted into text data of "now news details", the server analyzes the text data through the natural language processing unit and recognizes that the voice signal "requires detailed information on the news currently being played." can do. In this case, the natural language processing unit may recognize that the request for the detailed information on the "cell phone disclosure" according to the content information provided from the electronic device. Accordingly, the electronic device may generate a control command for reproducing detailed information on "mobile phone electric shock disclosure".

서버는 709 동작에서 제어 명령에 따른 컨텐츠를 추출하여 전자 장치로 전송할 수 있다. 예를 들어, 도 3을 참조하면 서버는 컨텐츠 제공 서버들(320-1 내지 320-n)로부터 제어 명령에 따른 컨텐츠를 추출하여 전자 장치(300)로 전송할 수 있다.In operation 709, the server extracts and transmits the content according to the control command to the electronic device. For example, referring to FIG. 3, the server may extract content according to a control command from the content providing servers 320-1 to 320-n and transmit the content to the electronic device 300.

상술한 실시 예에서 전자 장치는 음성 신호를 수신한 시점에 스피커를 통해 출력 중인 컨텐츠에 대한 컨텐츠 정보를 서버로 전송할 수 있다. In the above-described embodiment, the electronic device may transmit content information on content being output through the speaker to the server at the time when the voice signal is received.

다른 실시 예에서 전자 장치는 도 8을 참조하여 전자 장치에서 재생하는 컨텐츠 및 컨텐츠의 재생 시점 정보를 서버로 전송할 수도 있다. According to another embodiment of the present disclosure, the electronic device may transmit the content played back by the electronic device and the play time information of the content to the server with reference to FIG. 8.

도 8은 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.8 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 8을 참조하면 음성 인식 시스템은 전자 장치(800)와 서버(810)를 포함할 수 있다.Referring to FIG. 8, a voice recognition system may include an electronic device 800 and a server 810.

전자 장치(800)는 마이크를 통해 음성 신호를 수신하고, 서버(810)로부터 제공받은 컨텐츠를 스피커를 통해 출력할 수 있다. 예를 들어, 전자 장치(800)는 제어부(801), TTS 모듈(803) 및 음성 검출부(805)를 포함할 수 있다. The electronic device 800 may receive a voice signal through a microphone and output content provided from the server 810 through a speaker. For example, the electronic device 800 may include a controller 801, a TTS module 803, and a voice detector 805.

제어부(801)는 전자 장치(800)의 전반적인 동작을 제어할 수 있다. 이때, 제어부(801)는 서버(810)로부터 제공받은 컨텐츠를 스피커를 통해 출력하도록 제어할 수 있다. 여기서, 컨텐츠는 다수 개의 구성 요소들의 시퀀스를 포함하는 음성 또는 오디오 신호를 포함할 수 있다. The controller 801 may control overall operations of the electronic device 800. In this case, the controller 801 may control to output the content provided from the server 810 through the speaker. Here, the content may include a voice or audio signal including a sequence of a plurality of components.

제어부(801)는 스피커를 통해 출력하는 컨텐츠 재생 정보를 서버(810)로 전송할 수 있다. 여기서, 컨텐츠 재생 정보는 제어부(801)의 제어에 따라 전자 장치(800)에서 재생하는 컨텐츠 및 해당 컨텐츠의 재생 시점 정보를 포함할 수 있다. 예를 들어, 도 20a을 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 제어부(801)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)와 같은 다수 개의 콤퍼넌트들의 시퀀스를 추출하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(801)는 스피커를 통해 출력하는 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)의 정보 및 각각의 재생 시점 정보를 서버(810)로 전송할 수 있다. 다른 예를 들어, 도 21a를 참조하여 음악 재생 서비스를 제공하는 경우, 제어부(801)는 재생 목록에 포함된 음악 파일들을 재생하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(801)는 재생되는 음악 파일 정보 및 각 음악 파일의 재생 시점 정보를 서버(810)로 전송할 수 있다. 이때, 제어부(801)는 컨텐츠가 재생될 때 마다 해당 컨텐츠 정보 및 재생 시점 정보를 서버(810)로 전송할 수 있다. The controller 801 may transmit content reproduction information output through the speaker to the server 810. Here, the content reproduction information may include content played by the electronic device 800 under the control of the controller 801 and information on the reproduction time of the corresponding content. For example, when providing a daily briefing service with reference to FIG. 20A, the controller 801 may report weather information 2001, stock information 2003, and main news 2005 according to the setting information of the daily briefing service. A sequence of a plurality of components such as) can be extracted and controlled to be output through the speaker. In this case, the controller 801 may transmit the information of the weather information 2001, the stock information 2003, the main news 2005, and the respective playback time information output through the speaker to the server 810. For another example, when providing a music playback service with reference to FIG. 21A, the controller 801 may control to play music files included in a playlist and output the same through a speaker. In this case, the controller 801 may transmit the music file information to be reproduced and the reproduction time information of each music file to the server 810. In this case, whenever the content is played, the controller 801 may transmit the corresponding content information and the playback time information to the server 810.

TTS 모듈(803)은 제어부(801)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다.The TTS module 803 may convert the content provided from the controller 801 into a voice signal or an audio signal and output the same through a speaker.

음성 검출부(805)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(810)로 제공할 수 있다. 이때, 음성 검출부(805)는 음성 신호를 추출한 시점 정보를 음성 신호와 함께 서버(810)로 전송할 수 있다. 예를 들어, 음성 검출부(805)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(805)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 805 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 810. In this case, the voice detector 805 may transmit the time point information from which the voice signal is extracted to the server 810 together with the voice signal. For example, the voice detector 805 may include an echo controller for removing echo components from an audio signal collected through a microphone, and a noise remover for removing background noise from an audio signal provided from an echo canceller. have. Accordingly, the voice detector 805 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

서버(810)는 전자 장치(800)로부터 제공받은 컨텐츠 재생 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 컨텐츠 제공 서버들(820-1 내지 820-n)로부터 음성 명령에 따른 컨텐츠를 추출하여 전자 장치(800)로 전송할 수 있다. 예를 들어, 서버(810)는 언어 인식부(811), 컨텐츠 결정부(813), 자연어 처리부(815), 동작 결정부(817) 및 컨텐츠 수집부(819)를 포함할 수 있다.The server 810 extracts a voice command using the content reproduction information and the voice signal provided from the electronic device 800, and extracts the content according to the voice command from the content providing servers 820-1 to 820-n. The electronic device 800 may transmit the same. For example, the server 810 may include a language recognizer 811, a content determiner 813, a natural language processor 815, an operation determiner 817, and a content collector 819.

언어 인식부(811)는 전자 장치(800)의 음성 검출부(805)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. 이때, 언어 인식부(811)는 음성 신호의 추출 시점 정보를 컨텐츠 결정부(813)로 전송할 수 있다.The language recognizer 811 may convert the voice signal provided from the voice detector 805 of the electronic device 800 into text data. In this case, the language recognizer 811 may transmit the extraction time information of the voice signal to the content determiner 813.

컨텐츠 결정부(813)는 전자 장치(800)로부터 제공받은 컨텐츠 재생 정보와 언어 인식부(811)로부터 제공받은 음성 신호 추출 시점 정보를 이용하여 전자 장치(800)가 음성 신호를 수신하는 시점에 전자 장치(800)에서 재생 중인 컨텐츠를 확인할 수 있다. 예를 들어, 컨텐츠 결정부(813)는 수신 시점 검출부와 세션 선택부를 포함할 수 있다. 수신 시점 검출부는 언어 인식부(811)로부터 제공받은 음성 신호 추출 시점 정보를 이용하여 전자 장치(800)가 음성 신호를 수신한 시점을 검출할 수 있다. 세션 선택부는 전자 장치(800)로부터 제공받은 컨텐츠 재생 정보와 수신 시점 검출부에서 검출한 전자 장치(800)가 음성 신호를 수신한 시점을 비교하여 전자 장치(800)가 음성 신호를 수신하는 시점에 전자 장치(800)에서 재생 중인 컨텐츠를 확인할 수 있다. 여기서, 컨텐츠 재생 정보는 전자 장치(800)에서 재생하거나 재생 중인 컨텐츠 및 해당 컨텐츠의 재생 시점을 포함할 수 있다. The content determiner 813 uses the content reproduction information provided from the electronic device 800 and the voice signal extraction time information provided from the language recognizer 811 to display the electronic device 800 at the time when the electronic device 800 receives the voice signal. The device 800 may check content being played. For example, the content determiner 813 may include a reception time detector and a session selector. The reception time detector may detect a time when the electronic device 800 receives a voice signal using the voice signal extraction time information provided from the language recognizer 811. The session selector compares the content reproduction information provided from the electronic device 800 with the time point at which the electronic device 800 detected by the reception time detection unit receives a voice signal, and then the electronic device 800 receives the voice signal. The device 800 may check content being played. Here, the content reproduction information may include content that is being played back or reproduced in the electronic device 800 and a playback time of the corresponding content.

자연어 처리부(815)는 언어 인식부(811)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 자연어 처리부(815)는 언어 인식부(811)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 자연어 처리부(815)는 컨텐츠 결정부(813)를 통해 확인한 전자 장치(800)가 음성 신호를 수신하는 시점에 전자 장치(800)에서 재생 중인 컨텐츠에 대한 정보를 이용하여 언어 인식부(811)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 언어 인식부(811)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 자연어 처리부(815)는 언어 인식부(811)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 자연어 처리부(815)는 컨텐츠 결정부(813)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The natural language processor 815 may extract the intention and the key information of the user included in the text data by analyzing the text data provided from the language recognizer 811. The natural language processor 815 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognizer 811. In this case, the natural language processor 815 may use the language recognizing unit 811 using information on the content being played in the electronic device 800 at the time when the electronic device 800 checked through the content determination unit 813 receives a voice signal. The voice command included in the voice signal can be extracted by analyzing the text data provided from the " For example, when the character data of "now news detail information" is provided from the language recognition unit 811, the natural language processing unit 815 analyzes the character data provided from the language recognition unit 811 to reproduce the voice signal now. It can be appreciated that it requires detailed information about the news being processed. In this case, the natural language processor 815 may recognize accurate information about the news being reproduced in consideration of the content information provided from the content determiner 813.

동작 결정부(817)는 자연어 처리부(815)에서 추출한 음성 명령에 따른 제어부(801)의 동작을 위한 제어 명령을 생성할 수 있다. 예를 들어, 자연어 처리부(815)에서 "지금 재생 중인 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(817)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.The operation determiner 817 may generate a control command for the operation of the controller 801 according to the voice command extracted by the natural language processor 815. For example, when the natural language processor 815 recognizes that the user is requesting detailed information on the news (eg, cell phone disclosure) that is currently being played, the operation determination unit 817 has detailed information on the "cell phone disclosure". It is possible to generate a control command for playing the.

컨텐츠 수집부(819)는 동작 결정부(817)로부터 제공받은 제어 명령에 따라 컨텐츠 제공 서버들(820-1 내지 820-n)로부터 전자 장치(800)로 제공하기 위한 컨텐츠를 수집하여 전자 장치(800)로 전송할 수 있다. 예를 들어, 동작 결정부(817)로부터 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 제공받은 경우, 컨텐츠 수집부(819)는 컨텐츠 제공 서버들(820-1 내지 820-n)로부터 "휴대폰 전격 공개"와 관련된 하나 이상의 컨텐츠를 수집하여 전자 장치(800)로 전송할 수 있다.The content collector 819 collects content to be provided to the electronic device 800 from the content providing servers 820-1 to 820-n according to a control command provided from the operation determination unit 817, and the electronic device ( 800). For example, when a control command for reproducing detailed information on "mobile phone blitz" is received from the operation determiner 817, the content collector 819 may provide the content providing servers 820-1 to 820-n. ) May collect and transmit one or more contents related to "mobile phone blitz" to the electronic device 800.

도 9는 본 개시의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.9 illustrates a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.

도 9를 참조하면 전자 장치는 901 동작에서 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치는 서버로부터 제공받은 컨텐츠를 TTS 모듈을 이용하여 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.Referring to FIG. 9, the electronic device may play content in operation 901. For example, the electronic device may convert the content provided from the server into a voice signal or an audio signal using the TTS module and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

컨텐츠를 재생하는 경우, 전자 장치는 903 동작에서 재생하는 컨텐츠 및 재생 시점 정보를 포함하는 컨텐츠 재생 정보를 생성할 수 있다.When the content is played back, the electronic device may generate content play information including the content played back and the play time information in operation 903.

전자 장치는 905 동작에서 컨텐츠 재생 정보를 서버로 전송할 수 있다. 예를 들어, 도 8을 참조하면, 전자 장치(800)의 제어부(801)는 컨텐츠 재생 정보를 서버(810)의 컨텐츠 결정부(813)로 전송할 수 있다. The electronic device transmits the content presentation information to the server in operation 905. For example, referring to FIG. 8, the controller 801 of the electronic device 800 may transmit content reproduction information to the content determiner 813 of the server 810.

전자 장치는 907 동작에서 음성 신호를 수신할 수 있다. 예를 들어, 전자 장치는 마이크를 통해 수신된 오디오 신호에서 음성 신호를 추출할 수 있다. The electronic device may receive a voice signal in operation 907. For example, the electronic device may extract a voice signal from the audio signal received through the microphone.

음성 신호를 수신한 경우, 전자 장치는 909 동작에서 음성 신호를 서버로 전송할 수 있다. 이때, 전자 장치는 음성 신호 및 음성 신호를 추출한 시점 정보를 서버로 전송할 수 있다.When the voice signal is received, the electronic device may transmit the voice signal to the server in operation 909. In this case, the electronic device may transmit a voice signal and time point information from which the voice signal is extracted to the server.

전자 장치는 911 동작에서 서버로부터 컨텐츠가 수신되는지 확인할 수 있다.The electronic device may check whether the content is received from the server in operation 911.

서버로부터 컨텐츠를 수신한 경우, 전자 장치는 913 동작에서 서버로부터 제공받은 컨텐츠를 재생할 수 있다. 이때, 전자 장치는 TTS 모듈을 통해 서버로부터 제공받은 컨텐츠를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다. When the content is received from the server, the electronic device may play the content provided from the server in operation 913. In this case, the electronic device may convert the content provided from the server through the TTS module into a voice signal and output the same through the speaker.

도 10은 본 개시의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.10 illustrates a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.

도 10을 참조하면 서버는 1001 동작에서 전자 장치의 컨텐츠 재생 정보를 확인할 수 있다. 예를 들어, 서버는 전자 장치로부터 제공받은 컨텐츠 재생 정보에서 전자 장치에서 재생하는 컨텐츠 및 해당 컨텐츠의 재생 시간 정보를 확인할 수 있다. 10, in operation 1001, the server may check content reproduction information of an electronic device. For example, the server may check the content played by the electronic device and the play time information of the corresponding content from the content play information provided from the electronic device.

서버는 1003 동작에서 전자 장치로부터 음성 신호가 수신되는지 확인할 수 있다. In operation 1003, the server may determine whether a voice signal is received from the electronic device.

전자 장치로부터 음성 신호를 수신한 경우, 서버는 1005 동작에서 전자 장치로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. When the voice signal is received from the electronic device, the server may convert the voice signal provided from the electronic device into text data in operation 1005.

서버는 1007 동작에서 전자 장치의 컨텐츠 재생 정보와 전자 장치가 음성 신호를 추출한 시점을 이용하여 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수 있다. 이때, 서버는 음성 신호에 포함된 전자 장치에서의 음성 신호의 추출 시점 정보를 확인할 수 있다.In operation 1007, the server may check information on the content being played when the electronic device receives the voice signal using the content reproduction information of the electronic device and the time point at which the electronic device extracts the voice signal. In this case, the server may check extraction time information of the voice signal from the electronic device included in the voice signal.

전자 장치는 1009 동작에서 컨텐츠 정보와 음성 신호를 고려하여 제어 명령을 생성할 수 있다. 예를 들어, 음성 신호를 "지금 뉴스 상세 정보"의 문자 데이터로 변환한 경우, 서버는 자연어 처리부를 통해 문자 데이터를 분석하여 음성 신호가 "지금 재생 중인 뉴스에 대한 상세한 정보를 요구"하는 것으로 인지할 수 있다. 이때, 자연어 처리부는 전자 장치로부터 제공받은 컨텐츠 정보에 따라 "휴대폰 전격 공개"에 대한 상세 정보를 요구하는 것으로 인지할 수 있다. 이에 따라, 전자 장치는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.In operation 1009, the electronic device may generate a control command in consideration of the content information and the voice signal. For example, if the voice signal is converted into text data of "now news details", the server analyzes the text data through the natural language processing unit and recognizes that the voice signal "requires detailed information on the news currently being played." can do. In this case, the natural language processing unit may recognize that the request for the detailed information on the "cell phone disclosure" according to the content information provided from the electronic device. Accordingly, the electronic device may generate a control command for reproducing detailed information on "mobile phone electric shock disclosure".

서버는 1011 동작에서 제어 명령에 따른 컨텐츠를 추출하여 전자 장치로 전송할 수 있다. 예를 들어, 도 8을 참조하면, 서버는 컨텐츠 제공 서버들(820-1 내지 820-n)로부터 제어 명령에 따른 컨텐츠를 추출하여 전자 장치(800)로 전송할 수 있다.In operation 1011, the server extracts the content according to the control command and transmits the content to the electronic device. For example, referring to FIG. 8, the server may extract content according to a control command from the content providing servers 820-1 through 820-n and transmit the content to the electronic device 800.

도 11은 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.11 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 11을 참조하면 음성 인식 시스템은 전자 장치(1100)와 서버(1110)를 포함할 수 있다.Referring to FIG. 11, a voice recognition system may include an electronic device 1100 and a server 1110.

전자 장치(1100)는 마이크를 통해 음성 신호를 수신하고, 서버(1110)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생할 수 있다. 예를 들어, 전자 장치(1100)는 제어부(1101), TTS 모듈(1103) 및 음성 검출부(1105)를 포함할 수 있다. The electronic device 1100 may receive a voice signal through a microphone and extract and play content according to a control command provided from the server 1110. For example, the electronic device 1100 may include a controller 1101, a TTS module 1103, and a voice detector 1105.

제어부(1101)는 전자 장치(1100)의 전반적인 동작을 제어할 수 있다. 제어부(1101)는 컨텐츠 제공 서버들(1120-1 내지 1120-n)로부터 서버(1110)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생하도록 제어할 수 있다. 예를 들어, 제어부(1101)는 서버(1110)로부터 제공받은 제어 명령에 따른 컨텐츠를 TTS 모듈(1103)에서 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다.The controller 1101 may control overall operations of the electronic device 1100. The controller 1101 may control to extract and reproduce the content according to the control command provided from the server 1110 from the content providing servers 1120-1 to 1120-n. For example, the controller 1101 may control the TTS module 1103 to convert the content according to the control command provided from the server 1110 into a voice signal or an audio signal and output the same through a speaker.

제어부(1101)는 음성 검출부(1105)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠 정보를 서버(1110)로 전송할 수 있다. 예를 들어, 도 20a을 참조하여 주요 뉴스(2005) 재생 중 음성 검출부(1105)에서 음성 신호를 추출한 경우, 제어부(1101)는 주요 뉴스(2005)에 대한 컨텐츠 정보를 서버(1110)로 전송할 수 있다. 다른 예를 들어, 도 21a을 참조하여 "노래 1" 재생 중 음성 검출부(1105)에서 음성 신호를 추출한 경우, 제어부(1101)는 "노래 1"에 대한 컨텐츠 정보를 서버(1110)로 전송할 수 있다. 또 다른 예를 들어, 제어부(1101)는 음성 신호 추출 정보를 수신한 시점부터 기준 시간만큼 이전에 재생한 컨텐츠 정보를 서버(1110)로 전송할 수도 있다. 하지만, 음성 검출부(1105)에서 음성 신호를 추출한 시점에 스피커를 통해 출력 중인 컨텐츠가 존재하지 않는 경우, 제어부(1101)는 컨텐츠 정보를 서버(1110)로 전송하지 않을 수도 있다.The controller 1101 may transmit content information being output through the speaker to the server 1110 at the time when the voice signal is extracted by the voice detector 1105. For example, when a voice signal is extracted by the voice detector 1105 while the main news 2005 is reproduced with reference to FIG. 20A, the controller 1101 may transmit content information about the main news 2005 to the server 1110. have. For another example, when the voice detector 1105 extracts a voice signal during playback of "song 1" with reference to FIG. 21A, the controller 1101 may transmit content information about "song 1" to the server 1110. . For another example, the controller 1101 may transmit the content information previously played back by the reference time from the time point at which the voice signal extraction information is received, to the server 1110. However, when there is no content being output through the speaker at the time when the voice detector 1105 extracts the voice signal, the controller 1101 may not transmit the content information to the server 1110.

TTS 모듈(1103)은 제어부(1101)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The TTS module 1103 may convert the content provided from the controller 1101 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

음성 검출부(1105)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(1110)로 제공할 수 있다. 예를 들어, 음성 검출부(1105)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(1105)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 1105 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 1110. For example, the voice detector 1105 may include an echo control unit capable of removing echo components from an audio signal collected through a microphone, and a noise remover capable of removing background noise from an audio signal provided from the echo canceller. have. Accordingly, the voice detector 1105 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

상술한 바와 같이 전자 장치(1100)에서 컨텐츠 정보와 음성 신호를 서버(1110)로 전송하는 경우, 전자 장치(1100)는 컨텐츠 정보와 음성 신호를 독립적으로 서버(1110)로 전송하거나 음성 신호에 컨텐츠 정보를 추가하여 서버(1110)로 전송할 수 있다.As described above, when the electronic device 1100 transmits the content information and the voice signal to the server 1110, the electronic device 1100 independently transmits the content information and the voice signal to the server 1110 or the content in the voice signal. The information may be added and transmitted to the server 1110.

서버(1110)는 전자 장치(1100)로부터 제공받은 컨텐츠 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 음성 명령에 따른 제어 명령을 생성하여 전자 장치(1100)로 전송할 수 있다. 예를 들어, 서버(1110)는 언어 인식부(1111), 자연어 처리부(1113) 및 동작 결정부(1115)를 포함할 수 있다.The server 1110 may extract a voice command using the content information and the voice signal provided from the electronic device 1100, generate a control command according to the voice command, and transmit the generated voice command to the electronic device 1100. For example, the server 1110 may include a language recognizer 1111, a natural language processor 1113, and an operation determiner 1115.

언어 인식부(1111)는 전자 장치(1100)의 음성 검출부(1105)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. The language recognizer 1111 may convert the voice signal provided from the voice detector 1105 of the electronic device 1100 into text data.

자연어 처리부(1113)는 언어 인식부(1111)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 자연어 처리부(1113)는 언어 인식부(1111)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 자연어 처리부(1113)는 전자 장치(1100)의 제어부(1101)로부터 제공받은 컨텐츠 정보를 이용하여 언어 인식부(1111)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 언어 인식부(1111)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 자연어 처리부(1113)는 언어 인식부(1111)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 자연어 처리부(1113)는 제어부(1101)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The natural language processor 1113 may extract the intention and the key information of the user included in the text data by analyzing the text data provided from the language recognizer 1111. The natural language processor 1113 may extract text commands included in the voice signal by analyzing text data provided from the language recognizer 1111. In this case, the natural language processor 1113 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognition unit 1111 using the content information provided from the controller 1101 of the electronic device 1100. Can be. For example, when the character data of "now news detail information" is provided from the language recognizer 1111, the natural language processor 1113 analyzes the text data provided from the language recognizer 1111 to reproduce the voice signal now. It can be appreciated that it requires detailed information about the news being processed. In this case, the natural language processor 1113 may recognize accurate information about the news being played, in consideration of content information provided from the controller 1101.

동작 결정부(1115)는 자연어 처리부(1113)에서 추출한 음성 명령에 따른 제어부(1101)의 동작을 위한 제어 명령을 생성하여 전자 장치(1100)로 전송할 수 있다. 예를 들어, 자연어 처리부(1113)에서 "지금 재생 중인 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(1115)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성하여 전자 장치(1100)로 전송할 수 있다.The operation determiner 1115 may generate a control command for the operation of the controller 1101 according to the voice command extracted by the natural language processor 1113 and transmit the generated control command to the electronic device 1100. For example, when the natural language processor 1113 recognizes that the user is requesting detailed information on the news (eg, cell phone disclosure) that is currently being played, the operation determination unit 1115 has detailed information on the "mobile phone disclosure". A control command for playing the content may be generated and transmitted to the electronic device 1100.

상술한 바와 같이 전자 장치(1100)의 제어부(1101)는 음성 검출부(1105)에서 음성 신호를 검출한 시점에 스피커를 통해 출력 중인 컨텐츠에 대한 컨텐츠 정보를 서버(1110)로 전송할 수 있다. 이때, 전자 장치(1100)는 하기 도 12와 같이 컨텐츠 추정부(1207)를 이용하여 음성 검출부(1205)에서 음성 신호를 검출한 시점에 재생 중인 컨텐츠를 확인할 수 있다.As described above, the controller 1101 of the electronic device 1100 may transmit content information about content being output through the speaker to the server 1110 at the time when the voice detector 1105 detects the voice signal. In this case, as shown in FIG. 12, the electronic device 1100 may check the content being played at the time when the voice detector 1205 detects the voice signal using the content estimator 1207.

도 12는 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.12 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 12를 참조하면 음성 인식 시스템은 전자 장치(1200)와 서버(1210)를 포함할 수 있다. 이하 설명에서 서버(1210)는 도 11에 도시된 서버(1110)와 구성 및 동작이 동일하므로 상세한 설명을 생략한다. Referring to FIG. 12, a speech recognition system may include an electronic device 1200 and a server 1210. In the following description, since the server 1210 has the same configuration and operation as the server 1110 illustrated in FIG. 11, a detailed description thereof will be omitted.

전자 장치(1200)는 마이크를 통해 음성 신호를 수신하고, 서버(1210)로부터 제공받은 제어 명령에 따른 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치(1200)는 제어부(1201), TTS 모듈(1203), 음성 검출부(1205) 및 컨텐츠 추정부(1207)를 포함할 수 있다. The electronic device 1200 may receive a voice signal through a microphone and play content according to a control command provided from the server 1210. For example, the electronic device 1200 may include a controller 1201, a TTS module 1203, a voice detector 1205, and a content estimator 1207.

제어부(1201)는 전자 장치(1200)의 전반적인 동작을 제어할 수 있다. 제어부(1201)는 컨텐츠 제공 서버들(1220-1 내지 1220-n)로부터 서버(1210)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생하도록 제어할 수 있다. 예를 들어, 제어부(1201)는 서버(1210)로부터 제공받은 제어 명령에 따른 컨텐츠를 TTS 모듈(1203)에서 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다. The controller 1201 may control overall operations of the electronic device 1200. The controller 1201 may control to extract and reproduce the content according to the control command provided from the server 1210 from the content providing servers 1220-1 to 1220-n. For example, the controller 1201 may control the TTS module 1203 to convert the content according to the control command provided from the server 1210 into a voice signal or an audio signal and output the same through a speaker.

TTS 모듈(1203)은 제어부(1201)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The TTS module 1203 may convert the content provided from the controller 1201 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

음성 검출부(1205)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(1210)로 제공할 수 있다. 예를 들어, 음성 검출부(1205)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(1205)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 1205 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 1210. For example, the voice detector 1205 may include an echo control unit capable of removing echo components from an audio signal collected through a microphone, and a noise remover capable of removing background noise from an audio signal provided from the echo canceller. have. Accordingly, the voice detector 1205 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출한 경우, 음성 검출부(1205)는 음성 신호를 추출한 시점에 음성 신호 추출 정보를 생성하여 컨텐츠 추정부(1207)로 전송할 수 있다. 여기서, 음성 신호 추출 정보는 음성 검출부(1205)에서 음성 신호를 추출한 시점 정보를 포함할 수 있다.When the voice signal is extracted from the audio signal collected through the microphone, the voice detector 1205 may generate the voice signal extraction information at the time when the voice signal is extracted and transmit the extracted voice signal to the content estimator 1207. Here, the voice signal extraction information may include view information on which the voice signal is extracted by the voice detector 1205.

컨텐츠 추정부(1207)는 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠를 모니터링할 수 있다. 이에 따라, 컨텐츠 추정부(1207)는 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점에 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠에 대한 정보를 확인하여 서버(1210)로 전송할 수 있다. 이때, 컨텐츠 추정부(1207)는 음성 검출부(1205)로부터 제공받은 음성 신호 추출 정보에서 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점을 확인할 수 있다. The content estimator 1207 may monitor the content transmitted from the controller 1201 to the TTS module 1203. Accordingly, the content estimator 1207 may check the information on the content transmitted from the control unit 1201 to the TTS module 1203 at the time when the voice detection unit 1205 extracts the voice reception signal, and transmit the information to the server 1210. have. In this case, the content estimator 1207 may check a time point at which the voice detection signal is extracted by the voice detector 1205 from the voice signal extraction information provided from the voice detector 1205.

상술한 실시 예에서 컨텐츠 추정부(1207)는 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠를 모니터링하여 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점에 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠에 대한 정보를 확인할 수 있다.In the above-described embodiment, the content estimator 1207 monitors the content transmitted from the controller 1201 to the TTS module 1203 and extracts a voice reception signal from the voice detector 1205. Information about the content transmitted to 1203 may be checked.

다른 실시 예에서 컨텐츠 추정부(1207)는 TTS 모듈(1203)에서 출력되는 컨텐츠를 모니터링할 수도 있다. 이에 따라, 컨텐츠 추정부(1207)는 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점에 TTS 모듈(1203)에서 출력되는 컨텐츠에 대한 정보를 확인하여 서버(1210)로 전송할 수도 있다.In another embodiment, the content estimator 1207 may monitor the content output from the TTS module 1203. Accordingly, the content estimator 1207 may check information on the content output from the TTS module 1203 at the time when the voice detector 1205 extracts the voice reception signal, and transmit the information to the server 1210.

도 13은 본 개시의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.13 is a flowchart illustrating a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.

도 13을 참조하면 전자 장치는 1301 동작에서 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치는 서버로부터 제공받은 컨텐츠를 TTS 모듈을 이용하여 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.Referring to FIG. 13, the electronic device may play content in operation 1301. For example, the electronic device may convert the content provided from the server into a voice signal or an audio signal using the TTS module and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

컨텐츠 재생 중 전자 장치는 1303 동작에서 음성 신호를 수신할 수 있다. 예를 들어, 전자 장치는 마이크를 통해 수신된 오디오 신호에서 음성 신호를 추출할 수 있다. During content playback, the electronic device may receive a voice signal in operation 1303. For example, the electronic device may extract a voice signal from the audio signal received through the microphone.

음성 신호를 수신한 경우, 전자 장치는 1305 동작에서 음성 신호를 수신한 시점에 재생 중인 컨텐츠에 대한 정보를 생성할 수 있다. 예를 들어, 도 12를 참조하면, 전자장치는 컨텐츠 추정부(1207)를 이용하여 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점에 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수 있다. 이때, 전자 장치는 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점보다 기준 시간만큼 이전에 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수도 있다. 하지만, 음성 신호를 수신하는 시점에 제어부(1201)에서 TTS 모듈(1203)로 전송하는 컨텐츠가 존재하지 않는 경우, 전자 장치는 컨텐츠 정보를 생성하지 않을 수도 있다. 다른 예를 들어, 도 12를 참조하면, 전자장치는 컨텐츠 추정부(1207)를 이용하여 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점에 TTS 모듈(1203)에서 출력되는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수도 있다. 이때, 전자 장치는 음성 검출부(1205)에서 음성 수신 신호를 추출한 시점보다 기준 시간만큼 이전에 TTS 모듈(1203)에서 출력되는 컨텐츠를 확인하여 컨텐츠 정보를 생성할 수도 있다. 하지만, 음성 신호를 수신하는 시점에 TTS 모듈(1203)에서 출력되는 컨텐츠가 존재하지 않는 경우, 전자 장치는 컨텐츠 정보를 생성하지 않을 수도 있다.When the voice signal is received, the electronic device may generate information on the content being played at the time when the voice signal is received in operation 1305. For example, referring to FIG. 12, the electronic device transmits content transmitted from the control unit 1201 to the TTS module 1203 when the voice detection signal 1205 is extracted by the content estimator 1207. The content information may be generated by checking. In this case, the electronic device may generate content information by checking the content transmitted from the control unit 1201 to the TTS module 1203 by a reference time before the voice detection signal 1205 extracts the voice reception signal. However, when there is no content transmitted from the controller 1201 to the TTS module 1203 at the time of receiving the voice signal, the electronic device may not generate the content information. For another example, referring to FIG. 12, the electronic device checks the content output from the TTS module 1203 when the voice detection signal is extracted by the voice detector 1205 using the content estimator 1207, and thereby the content information. You can also create In this case, the electronic device may generate the content information by checking the content output from the TTS module 1203 before the time point at which the voice detector 1205 extracts the voice reception signal by the reference time. However, when there is no content output from the TTS module 1203 at the time of receiving the voice signal, the electronic device may not generate the content information.

전자 장치는 1307 동작에서 컨텐츠 정보와 음성 신호를 서버로 전송할 수 있다. 이때, 전자 장치는 컨텐츠 정보와 음성 신호를 독립적으로 서버로 전송하거나 음성 신호에 컨텐츠 정보를 추가하여 서버로 전송할 수 있다.In operation 1307, the electronic device transmits the content information and the voice signal to the server. In this case, the electronic device may independently transmit the content information and the voice signal to the server or add the content information to the voice signal and transmit the content information to the server.

전자 장치는 1309 동작에서 서버로부터 제어 명령이 수신되는지 확인할 수 있다.In operation 1309, the electronic device may determine whether a control command is received from the server.

서버로부터 제어 명령을 수신한 경우, 전자 장치는 1311 동작에서 서버로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생할 수 있다. 예를 들어, 전자 장치는 데이터 저장부 또는 컨텐츠 제공 서버들로부터 서버로부터 제공받은 제어 명령에 따른 컨텐츠를 추출할 수 있다. 이후, 전자 장치는 TTS 모듈을 통해 제어 명령에 따른 컨텐츠를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다. When the control command is received from the server, the electronic device may extract and play the content according to the control command provided from the server in operation 1311. For example, the electronic device may extract content according to a control command provided from the server from the data storage unit or the content providing servers. Thereafter, the electronic device may convert the content according to the control command into a voice signal through the TTS module and output the voice signal.

도 14는 본 개시의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.14 illustrates a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.

도 14를 참조하면 서버는 1401 동작에서 전자 장치로부터 음성 신호가 수신되는지 확인할 수 있다. Referring to FIG. 14, in operation 1401, the server may determine whether a voice signal is received from an electronic device.

전자 장치로부터 음성 신호를 수신한 경우, 서버는 1403 동작에서 전자 장치로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. When the voice signal is received from the electronic device, in operation 1403, the server may convert the voice signal provided from the electronic device into text data.

서버는 1405 동작에서 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수 있다. 예를 들어, 서버는 전자 장치로부터 컨텐츠 정보를 수신할 수 있다. 다른 예를 들어, 서버는 1401 동작에서 전자 장치로부터 수신한 음성 신호에 포함된 컨텐츠 정보를 확인할 수도 있다.In operation 1405, the server may check information on the content being played when the electronic device receives the voice signal. For example, the server may receive content information from the electronic device. For another example, the server may check the content information included in the voice signal received from the electronic device in operation 1401.

전자 장치는 1407 동작에서 컨텐츠 정보와 음성 신호를 고려하여 제어 명령을 생성할 수 있다. 예를 들어, 음성 신호를 "지금 뉴스 상세 정보"의 문자 데이터로 변환한 경우, 서버는 자연어 처리부를 통해 문자 데이터를 분석하여 음성 신호가 "지금 재생 중인 뉴스에 대한 상세한 정보를 요구"하는 것으로 인지할 수 있다. 이때, 자연어 처리부는 전자 장치로부터 제공받은 컨텐츠 정보에 따라 "휴대폰 전격 공개"에 대한 상세 정보를 요구하는 것으로 인지할 수 있다. 이에 따라, 전자 장치는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.In operation 1407, the electronic device may generate a control command in consideration of the content information and the voice signal. For example, if the voice signal is converted into text data of "now news details", the server analyzes the text data through the natural language processing unit and recognizes that the voice signal "requires detailed information on the news currently being played." can do. In this case, the natural language processing unit may recognize that the request for the detailed information on the "cell phone disclosure" according to the content information provided from the electronic device. Accordingly, the electronic device may generate a control command for reproducing detailed information on "mobile phone electric shock disclosure".

서버는 1409 동작에서 제어 명령을 전자 장치로 전송할 수 있다. In operation 1409, the server may transmit a control command to the electronic device.

다른 실시 예에서 전자 장치는 도 15 또는 도 16을 참조하여 전자 장치에서 재생하는 컨텐츠 및 컨텐츠의 재생 시점 정보를 서버로 전송할 수도 있다. According to another embodiment of the present disclosure, the electronic device may transmit the content played back by the electronic device and the play time information of the content to the server with reference to FIG.

도 15는 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.15 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 15를 참조하면 음성 인식 시스템은 전자 장치(1500)와 서버(1510)를 포함할 수 있다.Referring to FIG. 15, a voice recognition system may include an electronic device 1500 and a server 1510.

전자 장치(1500)는 마이크를 통해 음성 신호를 수신하고, 서버(1510)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생할 수 있다. 예를 들어, 전자 장치(1500)는 제어부(1501), TTS 모듈(1503) 및 음성 검출부(1505)를 포함할 수 있다. The electronic device 1500 may receive a voice signal through a microphone and extract and play content according to a control command provided from the server 1510. For example, the electronic device 1500 may include a controller 1501, a TTS module 1503, and a voice detector 1505.

제어부(1501)는 전자 장치(1500)의 전반적인 동작을 제어할 수 있다. 제어부(1501)는 컨텐츠 제공 서버들(1520-1 내지 1520-n)로부터 서버(1510)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생하도록 제어할 수 있다. 예를 들어, 제어부(1501)는 서버(1510)로부터 제공받은 제어 명령에 따른 컨텐츠를 TTS 모듈(1503)에서 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다.The controller 1501 may control overall operations of the electronic device 1500. The controller 1501 may control to extract and play content according to a control command provided from the server 1510 from the content providing servers 1520-1 to 1520-n. For example, the controller 1501 may control the TTS module 1503 to convert content according to a control command provided from the server 1510 into a voice signal or an audio signal and output the same through a speaker.

제어부(1501)는 스피커를 통해 출력하도록 제어한 컨텐츠 재생 정보를 서버(1510)로 전송할 수 있다. 여기서, 컨텐츠 재생 정보는 제어부(1501)의 제어에 따라 전자 장치(1500)에서 재생하는 컨텐츠 및 해당 컨텐츠의 재생 시점 정보를 포함할 수 있다. 예를 들어, 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 도 20a을 참조하여 제어부(1501)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)를 순차적으로 추출하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(1501)는 스피커를 통해 출력하는 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)의 정보 및 각각의 재생 시점 정보를 서버(1510)로 전송할 수 있다. 다른 예를 들어, 음악 재생 서비스를 제공하는 경우, 도 21a를 참조하여 제어부(1501)는 재생 목록에 포함된 음악 파일들을 재생하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(1501)는 재생되는 음악 파일 정보 및 각 음악 파일의 재생 시점 정보를 서버(1510)로 전송할 수 있다. 이때, 제어부(1501)는 컨텐츠가 재생될때 마다 해당 컨텐츠 정보 및 재생 시점 정보를 서버(1510)로 전송할 수 있다. The controller 1501 may transmit the content reproduction information controlled to be output through the speaker to the server 1510. Here, the content reproduction information may include content to be reproduced by the electronic device 1500 under the control of the controller 1501 and information on the reproduction time of the corresponding content. For example, in the case of providing a daily briefing service, referring to FIG. 20A, the controller 1501 may determine weather information 2001, stock information 2003, and main news 2005 based on the setting information of the daily briefing service. ) Can be extracted sequentially and controlled to output through the speaker. In this case, the controller 1501 may transmit the weather information 2001, the stock information 2003, the main news 2005, and the information of each play time outputted through the speaker to the server 1510. For another example, when providing a music playback service, the controller 1501 may control music files included in the playlist to be output through the speaker with reference to FIG. 21A. In this case, the controller 1501 may transmit the reproduced music file information and the reproduction time information of each music file to the server 1510. In this case, whenever the content is played, the controller 1501 may transmit the corresponding content information and the playback time information to the server 1510.

TTS 모듈(1503)은 제어부(1501)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The TTS module 1503 may convert content provided from the controller 1501 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

음성 검출부(1505)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(1510)로 제공할 수 있다. 이때, 음성 검출부(1505)는 음성 신호를 추출한 시점 정보를 음성 신호와 함께 서버(1510)로 전송할 수 있다. 예를 들어, 음성 검출부(1505)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(1505)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 1505 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 1510. In this case, the voice detector 1505 may transmit the time point information of the voice signal to the server 1510 together with the voice signal. For example, the voice detector 1505 may include an echo control unit capable of removing echo components from an audio signal collected through a microphone, and a noise remover capable of removing background noise from an audio signal provided from the echo canceller. have. Accordingly, the voice detector 1505 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

서버(1510)는 전자 장치(1500)로부터 제공받은 컨텐츠 재생 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 음성 명령에 따른 제어 명령을 생성하여 전자 장치(1500)로 전송할 수 있다. 예를 들어, 서버(1510)는 언어 인식부(1511), 컨텐츠 결정부(1513), 자연어 처리부(1515) 및 동작 결정부(1517)를 포함할 수 있다.The server 1510 may extract a voice command using content reproduction information and a voice signal provided from the electronic device 1500, generate a control command according to the voice command, and transmit the generated voice command to the electronic device 1500. For example, the server 1510 may include a language recognizer 1511, a content determiner 1513, a natural language processor 1515, and an operation determiner 1517.

언어 인식부(1511)는 전자 장치(1500)의 음성 검출부(1505)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. 이때, 언어 인식부(1511)는 음성 신호의 추출 시점 정보를 컨텐츠 결정부(1513)로 전송할 수 있다.The language recognizer 1511 may convert the voice signal provided from the voice detector 1505 of the electronic device 1500 into text data. In this case, the language recognizer 1511 may transmit the extraction time information of the voice signal to the content determiner 1513.

컨텐츠 결정부(1513)는 전자 장치(1500)로부터 제공받은 컨텐츠 재생 정보와 언어 인식부(1511)로부터 제공받은 음성 신호 추출 시점 정보를 이용하여 전자 장치(1500)가 음성 신호를 수신하는 시점에 전자 장치(1500)에서 재생 중인 컨텐츠를 확인할 수 있다. 예를 들어, 컨텐츠 결정부(1513)는 수신 시점 검출부와 세션 선택부를 포함할 수 있다. 수신 시점 검출부는 언어 인식부(1511)로부터 제공받은 음성 신호 추출 시점 정보를 이용하여 전자 장치(1500)가 음성 신호를 수신한 시점을 검출할 수 있다. 세션 선택부는 전자 장치(1500)로부터 제공받은 컨텐츠 재생 정보와 수신 시점 검출부에서 검출한 전자 장치(1500)가 음성 신호를 수신한 시점을 비교하여 전자 장치(1500)가 음성 신호를 수신하는 시점에 전자 장치(1500)에서 재생 중인 컨텐츠를 확인할 수 있다. 여기서, 컨텐츠 재생 정보는 전자 장치(1500)에서 재생하거나 재생 중인 컨텐츠 및 해당 컨텐츠의 재생 시점을 포함할 수 있다. The content determiner 1513 uses the content reproduction information provided from the electronic device 1500 and the voice signal extraction time information provided from the language recognizer 1511 to determine when the electronic device 1500 receives the voice signal. The device 1500 may check content being played. For example, the content determiner 1513 may include a reception time detector and a session selector. The reception time detector may detect a time when the electronic device 1500 receives the voice signal using the voice signal extraction time information provided from the language recognizer 1511. The session selection unit compares the content reproduction information provided from the electronic device 1500 with the time point at which the electronic device 1500 detected by the reception time detection unit receives a voice signal, and then the electronic device 1500 receives the voice signal. The device 1500 may check content being played. Here, the content reproduction information may include content that is being played back or reproduced in the electronic device 1500 and a playback time of the corresponding content.

자연어 처리부(1515)는 언어 인식부(1511)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 자연어 처리부(1515)는 언어 인식부(1511)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 자연어 처리부(1515)는 컨텐츠 결정부(1513)를 통해 확인한 전자 장치(1500)가 음성 신호를 수신하는 시점에 전자 장치(1500)에서 재생 중인 컨텐츠에 대한 정보를 이용하여 언어 인식부(1511)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 언어 인식부(1511)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 자연어 처리부(1515)는 언어 인식부(1511)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 자연어 처리부(1515)는 컨텐츠 결정부(813)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The natural language processor 1515 may extract the intention and the key information of the user included in the text data by analyzing the text data provided from the language recognizer 1511. The natural language processor 1515 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognizer 1511. In this case, the natural language processor 1515 uses the information on the content being played in the electronic device 1500 when the electronic device 1500 checked through the content determiner 1513 receives a voice signal. The voice command included in the voice signal can be extracted by analyzing the text data provided from the " For example, when the character data of "now news detail information" is provided from the language recognition unit 1511, the natural language processing unit 1515 analyzes the text data provided from the language recognition unit 1511 and reproduces the voice signal now. It can be appreciated that it requires detailed information about the news being processed. In this case, the natural language processor 1515 may recognize accurate information on the news being played, in consideration of the content information provided from the content determiner 813.

동작 결정부(1517)는 자연어 처리부(1515)에서 추출한 음성 명령에 따른 제어부(1501)의 동작을 위한 제어 명령을 생성하여 전자 장치(1500)로 전송할 수 있다. 예를 들어, 자연어 처리부(1515)에서 "지금 재생 중인 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(1517)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성하여 전자 장치(1500)로 전송할 수 있다.The operation determiner 1517 may generate a control command for the operation of the controller 1501 according to the voice command extracted by the natural language processor 1515 and transmit the generated control command to the electronic device 1500. For example, when the natural language processor 1515 recognizes that the user is requesting detailed information on the news (eg, cell phone disclosure) that is being played, the operation determination unit 1517 may perform detailed information on the "mobile phone disclosure". A control command for playing the content may be generated and transmitted to the electronic device 1500.

도 16은 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.16 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 16을 참조하면 음성 인식 시스템은 전자 장치(1600)와 서버(1610)를 포함할 수 있다. 이하 설명에서 전자 장치(1600)는 도 15에 도시된 전자 장치(1500)와 구성 및 동작이 동일하므로 상세한 설명을 생략한다. Referring to FIG. 16, a voice recognition system may include an electronic device 1600 and a server 1610. In the following description, since the electronic device 1600 has the same configuration and operation as the electronic device 1500 illustrated in FIG. 15, a detailed description thereof will be omitted.

서버(1610)는 전자 장치(1600)로부터 제공받은 컨텐츠 재생 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 음성 명령에 따른 제어 명령을 생성하여 전자 장치(1600)로 전송할 수 있다. 예를 들어, 서버(1610)는 언어 인식부(1611), 컨텐츠 결정부(1613), 자연어 처리부(1615) 및 동작 결정부(1617)를 포함할 수 있다.The server 1610 may extract a voice command using the content reproduction information and the voice signal provided from the electronic device 1600, generate a control command according to the voice command, and transmit the generated voice command to the electronic device 1600. For example, the server 1610 may include a language recognizer 1611, a content determiner 1613, a natural language processor 1615, and an operation determiner 1617.

언어 인식부(1611)는 전자 장치(1600)의 음성 검출부(1605)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. 이때, 언어 인식부(1611)는 음성 신호의 추출 시점 정보를 컨텐츠 결정부(1613)로 전송할 수 있다.The language recognizer 1611 may convert the voice signal provided from the voice detector 1605 of the electronic device 1600 into text data. In this case, the language recognizer 1611 may transmit the extraction time information of the voice signal to the content determiner 1613.

자연어 처리부(1615)는 언어 인식부(1611)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 자연어 처리부(1615)는 언어 인식부(1611)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 자연어 처리부(1615)는 음성 신호에 포함된 명확한 사용자 의도 및 핵심 정보를 추출하기 위해 언어 인식부(1611)로부터 제공받은 문자 데이터를 분석하여 추출한 음성 명령을 컨텐츠 결정부(1613)로 전송할 수 있다. 예를 들어, 언어 인식부(1611)로부터 "음~ 방금 전 뉴스 상세 정보 알려줘"라는 문자 데이터를 제공받은 경우, 자연어 처리부(1615)는 "음~"를 제외한 "방금 전"이 음성 신호에 포함된 음성 명령의 시작 시점으로 인식할 수 있다. 이에 따라, 자연어 처리부(1615)는 "방금 전 뉴스 상세 정보"의 음성 명령을 컨텐츠 결정부(1613)로 전송할 수 있다. 자연어 처리부(1615)는 컨텐츠 결정부(1613)를 통해 확인한 전자 장치(1600)가 음성 신호를 수신하는 시점에 전자 장치(1600)에서 재생 중인 컨텐츠에 대한 정보를 이용하여 언어 인식부(1611)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 자연어 처리부(1615)는 전자 장치(1600)에서 "음~ 방금 전 뉴스 상세 정보 알려줘"의 음성 신호를 수신한 경우, "음~"를 수신한 시점이 아닌 "방금 전"을 수신한 시점에 전자 장치(1600)에서 재생 중인 뉴스 정보를 명확히 인지할 수 있다.The natural language processor 1615 may analyze text data provided from the language recognizer 1611 to extract intention and key information of the user included in the text data. The natural language processor 1615 may extract the voice command included in the voice signal by analyzing the text data provided from the language recognizer 1611. In this case, the natural language processor 1615 may transmit the extracted voice command to the content determiner 1613 by analyzing text data provided from the language recognizer 1611 to extract clear user intention and key information included in the voice signal. have. For example, when the text recognition unit 1611 receives the text data "Well, just tell me the news details.", The natural language processor 1615 includes "just before" except for "Won," in the voice signal. It can be recognized as the start point of the voice command. Accordingly, the natural language processor 1615 may transmit a voice command of “just before the news detailed information” to the content determiner 1613. The natural language processor 1615 uses the information on the content being played in the electronic device 1600 at the time when the electronic device 1600 received through the content determiner 1613 receives a voice signal from the language recognition unit 1611. The voice command included in the voice signal may be extracted by analyzing the received text data. For example, when the natural language processor 1615 receives a voice signal of "Well, just tell me the news details information just before" in the electronic device 1600, the natural language processor 1615 receives "just before" instead of a time point of receiving the "tone ~". At one point in time, news information being played by the electronic device 1600 may be clearly recognized.

컨텐츠 결정부(1613)는 전자 장치(1600)로부터 제공받은 컨텐츠 재생 정보와 언어 인식부(1611)로부터 제공받은 음성 신호 추출 시점 정보 및 자연어 처리부(1615)로부터 제공받은 음성 명령을 이용하여 전자 장치(1600)가 음성 신호를 수신하는 시점에 전자 장치(1600)에서 재생 중인 컨텐츠를 확인할 수 있다. 예를 들어, 컨텐츠 결정부(1613)는 음성 명령 검출부와 수신 시점 검출부 및 세션 선택부를 포함할 수 있다. The content determiner 1613 uses the content reproduction information provided from the electronic device 1600, the voice signal extraction timing information provided from the language recognizer 1611, and the voice command provided from the natural language processor 1615. When the 1600 receives a voice signal, the electronic device 1600 may check the content being played. For example, the content determiner 1613 may include a voice command detector, a reception time detector, and a session selector.

음성 명령 검출부는 자연어 처리부(1615)로부터 제공받은 음성 명령 정보를 이용하여 제어 명령 생성을 위한 핵심 정보를 검출할 수 있다. 예를 들어, 자연어 처리부(1615)로부터 "방금 전 뉴스 상세 정보"의 음성 명령 정보를 제공받은 경우, 음성 명령 검출부는 "방금 전 뉴스"를 제어 명령 생성을 위한 핵심 정보로 검출할 수 있다. The voice command detector may detect key information for generating a control command using the voice command information provided from the natural language processor 1615. For example, when the voice command information of "just before news detailed information" is provided from the natural language processor 1615, the voice command detector may detect "just before news" as key information for generating a control command.

수신 시점 검출부는 언어 인식부(1611)로부터 제공받은 음성 신호 추출 시점 정보 및 음성 명령 검출부로부터 제공받은 핵심 정보를 이용하여 전자 장치(1600)가 음성 신호를 수신한 시점을 검출할 수 있다. 예를 들어, 전자 장치(1600)에서 "음~ 방금 전 뉴스 상세 정보 알려줘"의 음성 신호를 수신한 경우, 수신 시점 검출부는 언어 인식부(1611)로부터 전자 장치(1600)가 "음~"을 수신한 시점 정보를 제공받을 수 있다. 하지만, 수신 시점 검출부는 음성 명령 검출부로부터 제공받은 핵심 정보에 따라 "음~"을 수신한 시점이 아닌 "방금 전 뉴스"를 수신한 시점에 전자 장치(1600)에서 재생 중인 컨텐츠를 확인해야 되는 것으로 결정할 수 있다.The reception time detector may detect a time when the electronic device 1600 receives a voice signal using the voice signal extraction time information provided from the language recognizer 1611 and key information provided from the voice command detector. For example, when the electronic device 1600 receives a voice signal of "Tell me the news details information just before", the reception time detection unit detects the "tone ..." from the language recognition unit 1611. The received time point information may be provided. However, the reception time detection unit should check the content being played on the electronic device 1600 at the time when the "new news" is received instead of the time when the "sound ~" is received according to the key information provided from the voice command detection unit. You can decide.

세션 선택부는 전자 장치(1600)로부터 제공받은 컨텐츠 재생 정보와 수신 시점 검출부에서 검출한 전자 장치(1600)가 음성 신호를 수신한 시점을 비교하여 전자 장치(1600)가 음성 신호를 수신하는 시점에 전자 장치(1600)에서 재생 중인 컨텐츠를 확인할 수 있다. 여기서, 컨텐츠 재생 정보는 전자 장치(1600)에서 재생하거나 재생 중인 컨텐츠 및 해당 컨텐츠의 재생 시점을 포함할 수 있다. The session selector compares the content reproduction information provided from the electronic device 1600 with the point in time at which the electronic device 1600 receives the voice signal by comparing the time point at which the electronic device 1600 detected by the reception time detector receives the voice signal. The device 1600 may check the content being played. Here, the content reproduction information may include content that is being played back or reproduced in the electronic device 1600 and a playback time of the corresponding content.

동작 결정부(1617)는 자연어 처리부(1615)에서 추출한 음성 명령에 따른 제어부(1601)의 동작을 위한 제어 명령을 생성하여 전자 장치(1600)로 전송할 수 있다. 예를 들어, 자연어 처리부(1615)에서 "방금 전 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(1617)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성하여 전자 장치(1600)로 전송할 수 있다.The operation determiner 1617 may generate a control command for the operation of the controller 1601 according to the voice command extracted by the natural language processor 1615, and transmit the generated control command to the electronic device 1600. For example, when the natural language processor 1615 recognizes that the user just needs detailed information about the news (for example, cell phone blitz disclosure), the motion determination unit 1617 may provide detailed information about the cell phone blitz disclosure. A control command for reproducing may be generated and transmitted to the electronic device 1600.

도 17은 본 개시의 다양한 실시 예에 따른 전자 장치에서 컨텐츠 정보를 서버로 전송하기 위한 절차를 도시하고 있다.17 illustrates a procedure for transmitting content information to a server in an electronic device according to various embodiments of the present disclosure.

도 17을 참조하면 전자 장치는 1701 동작에서 컨텐츠를 재생할 수 있다. 예를 들어, 전자 장치는 서버로부터 제공받은 컨텐츠를 TTS 모듈을 이용하여 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.Referring to FIG. 17, the electronic device may play content in operation 1701. For example, the electronic device may convert the content provided from the server into a voice signal or an audio signal using the TTS module and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

컨텐츠를 재생하는 경우, 전자 장치는 1703 동작에서 재생하는 컨텐츠 및 재생 시점 정보를 포함하는 컨텐츠 재생 정보를 생성할 수 있다.When the content is played back, the electronic device may generate content play information including the content played back and the play time information in operation 1703.

전자 장치는 1705 동작에서 컨텐츠 재생 정보를 서버로 전송할 수 있다. 예를 들어, 도 15에 도시된 전자 장치(1500)의 제어부(1501)는 컨텐츠 재생 정보를 서버(1510)의 컨텐츠 결정부(1513)로 전송할 수 있다.In operation 1705, the electronic device may transmit the content presentation information to the server. For example, the controller 1501 of the electronic device 1500 illustrated in FIG. 15 may transmit content reproduction information to the content determiner 1513 of the server 1510.

전자 장치는 1707 동작에서 음성 신호를 수신할 수 있다. 예를 들어, 전자 장치는 마이크를 통해 수신된 오디오 신호에서 음성 신호를 추출할 수 있다. In operation 1707, the electronic device may receive a voice signal. For example, the electronic device may extract a voice signal from the audio signal received through the microphone.

음성 신호를 수신한 경우, 전자 장치는 1709 동작에서 음성 신호를 서버로 전송할 수 있다. 이때, 전자 장치는 음성 신호 및 음성 신호를 추출한 시점 정보를 서버로 전송할 수 있다.When the voice signal is received, the electronic device may transmit the voice signal to the server in operation 1709. In this case, the electronic device may transmit a voice signal and time point information from which the voice signal is extracted to the server.

전자 장치는 1711 동작에서 서버로부터 서버로부터 제어 명령이 수신되는지 확인할 수 있다.In operation 1711, the electronic device may determine whether a control command is received from the server from the server.

서버로부터 제어 명령을 수신한 경우, 전자 장치는 1713 동작에서 서버로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생할 수 있다. 예를 들어, 전자 장치는 데이터 저장부 또는 컨텐츠 제공 서버들로부터 서버로부터 제공받은 제어 명령에 따른 컨텐츠를 추출할 수 있다. 이후, 전자 장치는 TTS 모듈을 통해 제어 명령에 따른 컨텐츠를 음성 신호로 변환하여 스피커를 통해 출력할 수도 있다. When the control command is received from the server, the electronic device may extract and play the content according to the control command provided from the server in operation 1713. For example, the electronic device may extract content according to a control command provided from the server from the data storage unit or the content providing servers. Thereafter, the electronic device may convert the content according to the control command into a voice signal through the TTS module and output the voice signal.

도 18은 본 개시의 다양한 실시 예에 따른 서버에서 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 절차를 도시하고 있다.18 illustrates a procedure for recognizing a voice command in consideration of content information of an electronic device in a server according to various embodiments of the present disclosure.

도 18을 참조하면 서버는 1801 동작에서 전자 장치의 컨텐츠 재생 정보를 확인할 수 있다. 예를 들어, 서버는 전자 장치로부터 제공받은 컨텐츠 재생 정보에서 전자 장치에서 재생하는 컨텐츠 및 해당 컨텐츠의 재생 시간 정보를 확인할 수 있다. Referring to FIG. 18, in operation 1801, the server may check content reproduction information of the electronic device. For example, the server may check the content played by the electronic device and the play time information of the corresponding content from the content play information provided from the electronic device.

서버는 1803 동작에서 전자 장치로부터 음성 신호가 수신되는지 확인할 수 있다. In operation 1803, the server may determine whether a voice signal is received from the electronic device.

전자 장치로부터 음성 신호를 수신한 경우, 서버는 1805 동작에서 전자 장치로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. When the voice signal is received from the electronic device, in operation 1805, the server may convert the voice signal provided from the electronic device into text data.

서버는 1807 동작에서 전자 장치의 컨텐츠 재생 정보와 전자 장치가 음성 신호를 추출한 시점을 이용하여 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수 있다. 이때, 서버는 음성 신호에 포함된 전자 장치에서의 음성 신호의 추출 시점 정보를 확인할 수 있다.In operation 1807, the server may check the information on the content being played when the electronic device receives the voice signal using the content reproduction information of the electronic device and the time point at which the electronic device extracts the voice signal. In this case, the server may check extraction time information of the voice signal from the electronic device included in the voice signal.

전자 장치는 1809 동작에서 컨텐츠 정보와 음성 신호를 고려하여 제어 명령을 생성할 수 있다. 예를 들어, 음성 신호를 "지금 뉴스 상세 정보"의 문자 데이터로 변환한 경우, 서버는 자연어 처리부를 통해 문자 데이터를 분석하여 음성 신호가 "지금 재생 중인 뉴스에 대한 상세한 정보를 요구"하는 것으로 인지할 수 있다. 이때, 자연어 처리부는 전자 장치로부터 제공받은 컨텐츠 정보에 따라 "휴대폰 전격 공개"에 대한 상세 정보를 요구하는 것으로 인지할 수 있다. 이에 따라, 전자 장치는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성할 수 있다.In operation 1809, the electronic device may generate a control command in consideration of the content information and the voice signal. For example, if the voice signal is converted into text data of "now news details", the server analyzes the text data through the natural language processing unit and recognizes that the voice signal "requires detailed information on the news currently being played." can do. In this case, the natural language processing unit may recognize that the request for the detailed information on the "cell phone disclosure" according to the content information provided from the electronic device. Accordingly, the electronic device may generate a control command for reproducing detailed information on "mobile phone electric shock disclosure".

서버는 1811 동작에서 제어 명령을 전자 장치로 전송할 수 있다. In operation 1811, the server may transmit a control command to the electronic device.

상술한 실시 예에서 서버는 전자 장치의 컨텐츠 재생 정보와 전자 장치가 음성 신호를 추출한 시점을 이용하여 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수 있다.In the above-described embodiment, the server may check the information on the content being played when the electronic device receives the voice signal using the content reproduction information of the electronic device and the time point at which the electronic device extracts the voice signal.

다른 실시 예에서 서버는 전자 장치의 컨텐츠 재생 정보와 전자 장치가 음성 신호를 추출한 시점 및 음성 신호에 대한 음성 명령을 이용하여 전자 장치가 음성 신호를 수신한 시점에 재생 중이던 컨텐츠에 대한 정보를 확인할 수도 있다.According to another embodiment of the present disclosure, the server may check information on content that is being played when the electronic device receives the voice signal using content reproduction information of the electronic device, a time point at which the electronic device extracts a voice signal, and a voice command for the voice signal. have.

도 19는 본 개시의 다양한 실시 예에 따른 전자 장치의 컨텐츠 정보를 고려하여 음성 명령을 인식하기 위한 음성 인식 시스템의 블록 구성을 도시하고 있다.19 is a block diagram illustrating a voice recognition system for recognizing a voice command in consideration of content information of an electronic device according to various embodiments of the present disclosure.

도 19를 참조하면 음성 인식 시스템은 전자 장치(1900)와 서버(1910)를 포함할 수 있다.Referring to FIG. 19, a voice recognition system may include an electronic device 1900 and a server 1910.

전자 장치(1900)는 마이크를 통해 음성 신호를 수신하고, 서버(1910)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생할 수 있다. 예를 들어, 전자 장치(1900)는 제어부(1901), TTS 모듈(1903), 음성 검출부(1905), 제 1 언어 인식부(1907), 제 1 자연어 처리부(1909) 및 컨텐츠 결정부(1911)를 포함할 수 있다. The electronic device 1900 may receive a voice signal through a microphone and extract and play content according to a control command provided from the server 1910. For example, the electronic device 1900 may include a controller 1901, a TTS module 1903, a voice detector 1905, a first language recognizer 1907, a first natural language processor 1909, and a content determiner 1911. It may include.

제어부(1901)는 전자 장치(1900)의 전반적인 동작을 제어할 수 있다. 제어부(1901)는 컨텐츠 제공 서버들(1930-1 내지 1930-n)로부터 서버(1920)로부터 제공받은 제어 명령에 따른 컨텐츠를 추출하여 재생하도록 제어할 수 있다. 예를 들어, 제어부(1901)는 서버(1920)로부터 제공받은 제어 명령에 따른 컨텐츠를 TTS 모듈(1903)에서 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력하도록 제어할 수 있다. 여기서, 음성 신호 또는 오디오 신호는 다수 개의 구성 요소들의 시퀀스를 포함할 수 있다.The controller 1901 may control the overall operation of the electronic device 1900. The controller 1901 may control to extract and reproduce the content according to the control command provided from the server 1920 from the content providing servers 1930-1 to 1930-n. For example, the controller 1901 may control the TTS module 1903 to convert the content according to the control command provided from the server 1920 into a voice signal or an audio signal and output the same through a speaker. Here, the voice signal or the audio signal may include a sequence of a plurality of components.

제어부(1901)는 스피커를 통해 출력하도록 제어한 컨텐츠 재생 정보를 컨텐츠 결정부(1911)로 전송할 수 있다. 여기서, 컨텐츠 재생 정보는 제어부(1901)의 제어에 따라 전자 장치(1900)에서 재생하는 컨텐츠 및 해당 컨텐츠의 재생 시점 정보를 포함할 수 있다. 예를 들어, 도 20a를 참조하여 데일리 브리핑(daily briefing) 서비스를 제공하는 경우, 제어부(1901)는 데일리 브리핑 서비스의 설정 정보에 따라 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)를 순차적으로 추출하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(1901)는 스피커를 통해 출력하는 날씨 정보(2001), 주식 정보(2003) 및 주요 뉴스(2005)의 정보 및 각각의 재생 시점 정보를 컨텐츠 결정부(1911)로 전송할 수 있다. 다른 예를 들어, 도 21a를 참조하여 음악 재생 서비스를 제공하는 경우, 제어부(1901)는 재생 목록에 포함된 음악 파일들을 재생하여 스피커를 통해 출력하도록 제어할 수 있다. 이 경우, 제어부(1901)는 재생되는 음악 파일 정보 및 각 음악 파일의 재생 시점 정보를 컨텐츠 결정부(1911)로 전송할 수 있다. 이때, 제어부(1901)는 컨텐츠가 재생될 때마다 해당 컨텐츠 정보 및 재생 시점 정보를 컨텐츠 결정부(1911)로 전송할 수 있다. The controller 1901 may transmit the content reproduction information controlled to be output through the speaker to the content determiner 1911. Here, the content reproduction information may include content played by the electronic device 1900 and information on playback time of the content under the control of the controller 1901. For example, when providing a daily briefing service with reference to FIG. 20A, the controller 1901 may include the weather information 2001, the stock information 2003, and the main news 2005 according to the setting information of the daily briefing service. ) Can be extracted sequentially and controlled to output through the speaker. In this case, the controller 1901 may transmit the information of the weather information 2001, the stock information 2003, the main news 2005, and the respective playback time information output through the speaker to the content determiner 1911. For another example, when providing a music playback service with reference to FIG. 21A, the controller 1901 may control to play music files included in a playlist and output the same through a speaker. In this case, the controller 1901 may transmit the reproduced music file information and the reproduction time information of each music file to the content determiner 1911. In this case, whenever the content is played, the controller 1901 may transmit the corresponding content information and the playback time information to the content determiner 1911.

TTS 모듈(1903)은 제어부(1901)로부터 제공받은 컨텐츠를 음성 신호 또는 오디오 신호로 변환하여 스피커를 통해 출력할 수 있다.The TTS module 1903 may convert content provided from the controller 1901 into a voice signal or an audio signal and output the same through a speaker.

음성 검출부(1905)는 마이크를 통해 수집된 오디오 신호에서 음성 신호를 추출하여 서버(1920)와 제 1 언어 인식부(1907)로 제공할 수 있다. 이때, 음성 검출부(1905)는 제 1 언어 인식부(1907)로 음성 신호의 추출 시점 정보를 음성 신호와 함께 제공할 수 있다. 예를 들어, 음성 검출부(1905)는 마이크를 통해 수집된 오디오 신호에서 에코 성분을 제거할 수 있는 에코 제어부 및 에코 제거부로부터 제공받은 오디오 신호에서 배경 잡음을 제거할 수 있는 잡음 제거부를 포함할 수 있다. 이에 따라, 음성 검출부(1905)는 에코 제거부 및 잡음 제거부를 통해 에코 성분 및 배경 잡음이 제거된 오디오 신호에서 음성 신호를 추출할 수 있다. 여기서, 에코는 스피커를 통해 출력되는 오디오 신호가 마이크로 유입되는 현상을 나타낼 수 있다. The voice detector 1905 may extract the voice signal from the audio signal collected through the microphone and provide the extracted voice signal to the server 1920 and the first language recognizer 1907. In this case, the voice detector 1905 may provide the first language recognizer 1907 with the extraction time information of the voice signal together with the voice signal. For example, the voice detector 1905 may include an echo control unit capable of removing echo components from an audio signal collected through a microphone, and a noise remover capable of removing background noise from an audio signal provided from an echo canceller. have. Accordingly, the voice detector 1905 may extract the voice signal from the audio signal from which the echo component and the background noise are removed through the echo remover and the noise remover. Here, the echo may represent a phenomenon in which the audio signal output through the speaker is introduced into the microphone.

제 1 언어 인식부(1907)는 음성 검출부(1905)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. 이때, 언어 인식부(1907)는 음성 신호의 추출 시점 정보를 컨텐츠 결정부(1911)로 전송할 수 있다.The first language recognizer 1907 may convert the voice signal provided from the voice detector 1905 into text data. In this case, the language recognizer 1907 may transmit the extraction time information of the voice signal to the content determiner 1911.

제 1 자연어 처리부(1909)는 제 1 언어 인식부(1907)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 제 1 자연어 처리부(1909)는 제 1 언어 인식부(1907)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 제 1 언어 인식부(1907)로부터 "음~ 방금 전 뉴스 상세 정보 알려줘"라는 문자 데이터를 제공받은 경우, 제 1 자연어 처리부(1909)는 "음~"를 제외한 "방금 전"이 음성 신호에 포함된 음성 명령의 시작 시점으로 인식할 수 있다. 이에 따라, 제 1 자연어 처리부(1909)는 "방금 전 뉴스 상세 정보"의 음성 명령을 컨텐츠 결정부(1911)로 전송할 수 있다. The first natural language processor 1909 may extract the intention and the key information of the user included in the text data by analyzing the text data provided from the first language recognizer 1907. The first natural language processor 1909 may extract the voice command included in the voice signal by analyzing the text data provided from the first language recognizer 1907. For example, when the first language recognition unit 1907 receives the text data "Well, just tell me the news details information just before", the first natural language processor 1909 is "just before" except for "Well." It may be recognized as a start point of a voice command included in the voice signal. Accordingly, the first natural language processor 1909 may transmit a voice command of “just before the news detailed information” to the content determiner 1911.

컨텐츠 결정부(1911)는 제어부(1901)로부터 제공받은 컨텐츠 재생 정보를 이용하여 전자 장치(1900)에서의 컨텐츠 재생 정보를 확인할 수 있다. 여기서, 컨텐츠 재생 정보는 전자 장치(1900)에서 재생하거나 재생 중인 컨텐츠 및 해당 컨텐츠의 재생 시점을 포함할 수 있다. 이에 따라, 컨텐츠 결정부(1911)는 전자 장치(1900)에서의 컨텐츠 재생 정보와 제 1 언어 인식부(1907)로부터 제공받은 음성 신호 추출 시점 정보 및 제 1 자연어 처리부(1909)로부터 제공받은 음성 명령 정보를 이용하여 전자 장치(1900)가 음성 신호를 수신하는 시점에 전자 장치(1900)에서 재생 중인 컨텐츠를 확인할 수 있다. 예를 들어, 전자 장치(1900)에서 "음~ 방금 전 뉴스 상세 정보 알려줘"라는 음성 신호를 수신한 경우, 컨텐츠 결정부(1911)는 제 1 언어 인식부(1907)로부터 전자 장치(1900)가 "음~"을 추출한 시점 정보를 수신할 수 있다. 이후, 컨텐츠 결정부(1911)는 제 1 자연어 처리부(1909)로부터 "방금 전 뉴스 상세 정보"의 음성 명령을 제공받은 경우, 전자 장치(1900)가 "음~"을 추출한 시점이 아닌 "방금 전"을 추출한 시점에 대한 컨텐츠를 확인하여 서버(1920)로 제공할 수 있다. The content determiner 1911 may check the content reproduction information on the electronic device 1900 using the content reproduction information provided from the controller 1901. Here, the content reproduction information may include content that is being played back or reproduced in the electronic device 1900 and a playback time of the corresponding content. Accordingly, the content determiner 1911 may include content reproduction information of the electronic device 1900, voice signal extraction timing information provided from the first language recognizer 1907, and a voice command provided from the first natural language processor 1909. Using the information, the electronic device 1900 may check the content being played in the electronic device 1900 at the time when the electronic device 1900 receives the voice signal. For example, when the electronic device 1900 receives a voice signal "Well, just inform the news details just before," the content determination unit 1911 may determine that the electronic device 1900 is received from the first language recognition unit 1907. It is possible to receive the time information from which "mm" is extracted. Thereafter, when the content determining unit 1911 receives a voice command of “just before the news detailed information” from the first natural language processor 1909, the content determining unit 1911 does not indicate when the electronic device 1900 extracts the sound “just before”. "To check the content of the extracted time can be provided to the server (1920).

컨텐츠 결정부(1911)는 제어부(1901)로부터 제공받은 컨텐츠 재생 정보와 제 1 언어 인식부(1907)로부터 제공받은 음성 신호 추출 시점 정보 및 제 1 자연어 처리부(1909)로부터 제공받은 음성 명령을 이용하여 전자 장치(1900)가 음성 신호를 수신하는 시점에 전자 장치(1900)에서 재생 중인 컨텐츠를 확인할 수 있다. 예를 들어, 컨텐츠 결정부(1911)는 음성 명령 검출부와 수신 시점 검출부 및 세션 선택부를 포함할 수 있다. The content determiner 1911 uses the content reproduction information provided from the controller 1901, the voice signal extraction time information provided from the first language recognizer 1907, and the voice command provided from the first natural language processor 1909. When the electronic device 1900 receives the voice signal, the electronic device 1900 may check the content being played in the electronic device 1900. For example, the content determiner 1911 may include a voice command detector, a reception time detector, and a session selector.

음성 명령 검출부는 제 1 자연어 처리부(1909)로부터 제공받은 음성 명령 정보를 이용하여 제어 명령 생성을 위한 핵심 정보를 검출할 수 있다. 예를 들어, 제 1 자연어 처리부(1909)로부터 "방금 전 뉴스 상세 정보"의 음성 명령 정보를 제공받은 경우, 음성 명령 검출부는 "방금 전 뉴스"를 제어 명령 생성을 위한 핵심 정보로 검출할 수 있다. The voice command detector may detect key information for generating a control command using the voice command information provided from the first natural language processor 1909. For example, when the first natural language processor 1909 receives the voice command information of "just before news detailed information", the voice command detector may detect "just before news" as key information for generating a control command. .

수신 시점 검출부는 제 1 언어 인식부(1907)로부터 제공받은 음성 신호 추출 시점 정보 및 음성 명령 검출부로부터 제공받은 핵심 정보를 이용하여 전자 장치(1900)가 음성 신호를 수신한 시점을 검출할 수 있다. 예를 들어, 전자 장치(1900)에서 "음~ 방금 전 뉴스 상세 정보 알려줘"의 음성 신호를 수신한 경우, 수신 시점 검출부는 제 1 언어 인식부(1907)로부터 전자 장치(1900)가 "음~"을 수신한 시점 정보를 제공받을 수 있다. 하지만, 수신 시점 검출부는 음성 명령 검출부로부터 제공받은 핵심 정보에 따라 "음~"을 수신한 시점이 아닌 "방금 전 뉴스"를 수신한 시점에 전자 장치(1900)에서 재생 중인 컨텐츠를 확인해야 되는 것으로 결정할 수 있다.The reception time detector may detect a time when the electronic device 1900 receives the voice signal using the voice signal extraction time information provided from the first language recognizer 1907 and key information provided from the voice command detector. For example, when the electronic device 1900 receives a voice signal of "Tell me the news details information just before", the reception time detection unit is the first language recognition unit 1907 from the first language recognition unit 1900 " Time information received is received. However, the reception time detection unit should check the content being played on the electronic device 1900 at the time when the "new news" is received instead of the time when the "sound ~" is received according to the key information provided from the voice command detection unit. You can decide.

세션 선택부는 제어부(1901)로부터 제공받은 컨텐츠 재생 정보와 수신 시점 검출부에서 검출한 전자 장치(1900)가 음성 신호를 수신한 시점을 비교하여 전자 장치(1900)가 음성 신호를 수신하는 시점에 전자 장치(1900)에서 재생 중인 컨텐츠를 확인할 수 있다. 여기서, 컨텐츠 재생 정보는 전자 장치(1900)에서 재생하거나 재생 중인 컨텐츠 및 해당 컨텐츠의 재생 시점을 포함할 수 있다. The session selection unit compares the content reproduction information provided from the control unit 1901 with the time point at which the electronic device 1900 detected by the reception time detection unit receives the voice signal, and the electronic device 1900 receives the voice signal. In operation 1900, the content being played may be checked. Here, the content reproduction information may include content that is being played back or reproduced in the electronic device 1900 and a playback time of the corresponding content.

서버(1920)는 전자 장치(1900)로부터 제공받은 컨텐츠 정보 및 음성 신호를 이용하여 음성 명령을 추출하고, 음성 명령에 대한 제어 명령을 생성하여 전자 장치(1900)로 전송할 수 있다. 예를 들어, 서버(1920)는 제 2 언어 인식부(1921), 제 2 자연어 처리부(1923) 및 동작 결정부(1925)를 포함할 수 있다. The server 1920 may extract a voice command using the content information and the voice signal provided from the electronic device 1900, generate a control command for the voice command, and transmit the generated voice command to the electronic device 1900. For example, the server 1920 may include a second language recognizer 1921, a second natural language processor 1923, and an operation determiner 1925.

제 2 언어 인식부(1921)는 전자 장치(1900)의 음성 검출부(1905)로부터 제공받은 음성 신호를 문자 데이터로 변환할 수 있다. The second language recognizer 1921 may convert the voice signal provided from the voice detector 1905 of the electronic device 1900 into text data.

제 2 자연어 처리부(1923)는 제 2 언어 인식부(1921)로부터 제공받은 문자 데이터를 분석하여 문자 데이터에 포함된 사용자의 의도 및 핵심 정보를 추출할 수 있다. 제 2 자연어 처리부(1923)는 제 2 언어 인식부(1921)로부터 제공받은 문자 데이터를 분석하여 음성 신호에 포함된 음성 명령을 추출할 수 있다. 이때, 제 2 자연어 처리부(1923)는 전자 장치(1900)의 제어부(1901)로부터 제공받은 컨텐츠 정보를 이용하여 제 2 언어 인식부(1921)로부터 제공받은 문자 데이터를 분석함으로써 음성 신호에 포함된 음성 명령을 추출할 수 있다. 예를 들어, 제 2 언어 인식부(1921)로부터 "지금 뉴스 상세 정보"의 문자 데이터를 제공받은 경우, 제 2 자연어 처리부(1923)는 제 2 언어 인식부(1921)로부터 제공받은 문자 데이터를 분석하여 음성 신호가 지금 재생 중인 뉴스에 대한 상세한 정보를 요구하는 것으로 인지할 수 있다. 이때, 제 2 자연어 처리부(1923)는 제어부(1901)로부터 제공받은 컨텐츠 정보를 고려하여 지금 재생 중인 뉴스에 대한 정확한 정보를 인지할 수 있다. The second natural language processor 1923 may extract the intention and the key information of the user included in the text data by analyzing the text data provided from the second language recognizer 1921. The second natural language processor 1923 may extract the voice command included in the voice signal by analyzing the text data provided from the second language recognizer 1921. In this case, the second natural language processor 1923 analyzes the text data provided from the second language recognizer 1921 using the content information provided from the controller 1901 of the electronic device 1900, thereby including the voice included in the voice signal. You can extract the command. For example, when the text data of "now news detail information" is provided from the second language recognition unit 1921, the second natural language processing unit 1923 analyzes the text data provided from the second language recognition unit 1921. It can be recognized that the voice signal requires detailed information about the news now being played. In this case, the second natural language processor 1923 may recognize accurate information about the news being played, in consideration of the content information provided from the controller 1901.

동작 결정부(1925)는 제 2 자연어 처리부(1923)에서 추출한 음성 명령에 따른 제어부(1901)의 동작을 위한 제어 명령을 생성할 수 있다. 예를 들어, 제 2 자연어 처리부(1923)에서 "지금 재생 중인 뉴스(예: 휴대폰 전격 공개)에 대한 상세한 정보를 요구하는 것으로 인지한 경우, 동작 결정부(1925)는 "휴대폰 전격 공개"에 대한 상세 정보를 재생하기 위한 제어 명령을 생성하여 전자 장치(1900)로 전송할 수 있다.The operation determiner 1925 may generate a control command for the operation of the controller 1901 according to the voice command extracted by the second natural language processor 1923. For example, when the second natural language processor 1923 recognizes that the user is requesting detailed information on the news (eg, cell phone disclosure) that is currently being played, the operation determination unit 1925 may determine that the "cell phone disclosure" is performed. A control command for reproducing detailed information may be generated and transmitted to the electronic device 1900.

상술한 실시 예에서 전자 장치는 음성 신호 수신 시점에 재생 중인 컨텐츠 정보를 생성할 수 있다. In the above-described embodiment, the electronic device may generate content information being played at the time of receiving the voice signal.

다른 실시 예에서 전자 장치는 사용자의 발화(發話)시점, 음성 신호에 포함된 명령어 입력 시점, 음성 신호를 포함하는 오디오 신호의 수신 시점 중 하나 또는 그 이상의 시점에 재생 중인 컨텐츠 정보를 생성할 수 있다.According to another embodiment of the present disclosure, the electronic device may generate content information that is being played back at one or more of a user's speech time, a command input time included in the voice signal, and a time point at which an audio signal including the voice signal is received. .

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능하다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications may be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the scope of the following claims, but also by those equivalent to the scope of the claims.

Claims

In a method of operating an electronic system,
Continuously outputting a plurality of audio signals respectively corresponding to the plurality of contents through at least one speaker;
Receiving at least one voice signal through at least one microphone while continuously outputting the plurality of audio signals;
In response to receiving the at least one voice signal, corresponding to an audio signal previously outputted by the time point or the reference time from among the plurality of contents based on a time point at which the at least one voice signal is received Determining first content; And
Providing second content associated with the first content via at least one of a display and the at least one speaker and determined based on a user's intention extracted from the first content and the at least one voice signal; How to.

delete

In the operating method of the electronic device,
Continuously outputting a plurality of audio signals respectively corresponding to the plurality of contents through at least one speaker;
Receiving at least one voice signal through at least one microphone while continuously outputting the plurality of audio signals;
In response to receiving the at least one voice signal, corresponding to an audio signal previously outputted by the time point or the reference time from among the plurality of contents based on a time point at which the at least one voice signal is received Determining first content; And
Providing second content associated with the first content via at least one of a display and the at least one speaker and determined based on a user's intention extracted from the first content and the at least one voice signal; How to.

delete

The method of claim 6,
Receiving the at least one voice signal,
Receiving an audio signal through the at least one microphone, and
Removing echo components and background noise included in the received audio signal and extracting a speech signal.

The method of claim 6,
Providing the second content,
Converting the at least one voice signal into text data;
Extracting the intention of the user by analyzing the text data; And
Determining the second content based on at least a portion of the information about the first content and the intent of the extracted user.

In the operating method of the electronic device,
Continuously outputting a plurality of audio signals respectively corresponding to the plurality of contents through at least one speaker;
Receiving at least one voice signal through at least one microphone while continuously outputting the plurality of audio signals;
In response to receiving the at least one voice signal, corresponding to an audio signal previously outputted by the time point or the reference time from among the plurality of contents based on a time point at which the at least one voice signal is received Determining first content;
Transmitting the information on the first content and the at least one voice signal to a server; And
And receiving and providing, from the server, second content related to the first content and determined based on a user's intention extracted from the first content and the at least one voice signal.

delete

The method of claim 10,
Continuously outputting the plurality of audio signals,
Converting the plurality of contents into the plurality of audio signals using a text to speech (TTS) module, and
And continuously outputting the converted plurality of audio signals through the at least one speaker.

The method of claim 13,
The determining of the first content may include:
And determining, as the first content, content corresponding to an audio signal input to the TTS module or output from the TTS module at the time point among the plurality of converted audio signals.

delete

In the operation method of the server,
Receiving at least one voice signal from an electronic device;
Identifying a first content corresponding to a time point at which the at least one voice signal is received from the plurality of contents continuously output from the electronic device or an audio signal output before the reference time by the reference time; And
And transmitting second content related to the first content and determined based on a user's intention extracted from the first content and the at least one voice signal to the electronic device.

The method of claim 18,
The operation of transmitting the second content to the electronic device may include:
Extracting an intention of the user by analyzing text data obtained by converting the at least one voice signal; And
Determining the second content based on at least a portion of the information about the first content and the intent of the extracted user.

delete

In the operating method of the electronic device,
Continuously outputting a plurality of audio signals respectively corresponding to the plurality of contents through at least one speaker;
Transmitting information about the plurality of audio signals to a server;
Receiving at least one voice signal through at least one microphone while continuously outputting the plurality of audio signals;
Transmitting the at least one voice signal to the server; And
The first content and the at least one voice signal are related to a first content corresponding to a time point at which the at least one voice signal is received from the plurality of contents or an audio signal output before a reference time from the time point. And receiving and providing, from the server, second content determined based on a user's intention extracted from the server.

The method of claim 21,
Continuously outputting the plurality of audio signals,
Converting the plurality of contents into the plurality of audio signals using a text to speech (TTS) module; And
And continuously outputting the converted plurality of audio signals through the at least one speaker.

delete

In the operation method of the server,
Receiving information on a plurality of audio signals respectively corresponding to a plurality of contents which are continuously output from the electronic device;
Receiving at least one voice signal from the electronic device;
Determining a time point at which the electronic device receives the at least one voice signal using the at least one voice signal;
Determining first content corresponding to an audio signal previously outputted from the determined time point or the reference time point from the determined time point among the plurality of contents based on the information on the plurality of audio signals and the determined time point; And
And transmitting second content related to the first content and determined based on a user's intention extracted from the first content and the at least one voice signal to the electronic device.

The method of claim 26,
The operation of transmitting the second content to the electronic device may include:
Extracting an intention of the user by analyzing text data obtained by converting the at least one voice signal; And
Determining the second content based on at least a portion of the information about the first content and the intent of the extracted user.

delete

In an electronic device,
An output unit configured to continuously output a plurality of audio signals respectively corresponding to the plurality of contents;
A receiver which receives at least one voice signal while continuously outputting the plurality of audio signals;
In response to receiving the at least one voice signal, corresponding to an audio signal previously outputted by the time point or the reference time from among the plurality of contents based on a time point at which the at least one voice signal is received A controller for determining first content; And
And an operation determiner configured to provide second content related to the first content and determined based on a user's intention extracted from the first content and the at least one voice signal.

The method of claim 29,
Further comprises at least one microphone,
The receiver extracts the at least one voice signal from the audio signal received through the at least one microphone.

The method of claim 29,
A speech recognition unit for converting at least one voice signal received by the receiver into text data; And
Further comprising a natural language processing unit for extracting the intention of the user by analyzing the text data,
And the operation determiner determines the second content based on the one or more components or at least a portion of information about the components and the extracted user's intention.

In an electronic device,
An output unit configured to continuously output a plurality of audio signals respectively corresponding to the plurality of contents;
A receiver which receives at least one voice signal through at least one microphone while continuously outputting the plurality of audio signals; And
In response to receiving the at least one voice signal, corresponding to an audio signal previously outputted by the time point or the reference time from among the plurality of contents based on a time point at which the at least one voice signal is received A control unit for determining the first content,
The electronic device transmits information about the first content and the at least one voice signal to a server, and is associated with the first content and is intended by a user extracted from the first content and the at least one voice signal. The apparatus for receiving and providing the second content determined based on the from the server.

The method of claim 32,
Further comprises at least one microphone,
The receiver extracts the at least one voice signal from the audio signal received through the at least one microphone.

The method of claim 32,
The output unit,
A text to speech (TTS) module for converting the plurality of contents into the plurality of audio signals; And
And at least one speaker for continuously outputting the converted plurality of audio signals to the outside.

The method of claim 34,
The controller may be configured to determine, as the first content, content corresponding to an audio signal input to the TTS module or output from the TTS module, from among the plurality of converted audio signals.

delete

In the server,
A language recognizer configured to receive at least one voice signal from the electronic device;
A natural language processor that checks a first content corresponding to a time point at which the at least one voice signal is received from the plurality of contents continuously output from the electronic device or an audio signal output before the reference time by the reference time; And
And an operation determiner configured to transmit second content related to the first content and determined based on a user's intention extracted from the first content and the at least one voice signal to the electronic device.

The method of claim 38,
And the natural language processor extracts the intention of the user by analyzing text data obtained by converting the at least one voice signal.

The method of claim 38,
And the operation determiner determines the second content based on at least a portion of the information about the first content and the intention of the extracted user.

delete

In an electronic device,
An output unit configured to continuously output a plurality of audio signals respectively corresponding to the plurality of contents;
A controller for generating information on the plurality of audio signals;
A receiver which receives at least one voice signal while continuously outputting the plurality of audio signals,
The electronic device,
Transmitting information on the audio signals and the at least one voice signal to a server,
The first content and the at least one voice signal are related to a first content corresponding to a time point at which the at least one voice signal is received from the plurality of contents or an audio signal output before a reference time from the time point. And receiving from the server the second content determined based on the user's intention extracted from the server.

The method of claim 42,
The output unit,
A text to speech (TTS) module for converting the plurality of contents into the plurality of audio signals, respectively; And
And at least one speaker for continuously outputting the converted plurality of audio signals to the outside.

The method of claim 42,
Further comprises at least one microphone,
The receiver extracts the at least one voice signal from the audio signal received through the at least one microphone.

delete

In the server,
A language recognizer configured to receive at least one voice signal from an electronic device and determine a time point at which the electronic device receives the at least one voice signal using the at least one voice signal;
Receive information on a plurality of audio signals corresponding to a plurality of contents continuously output from the electronic device, and among the plurality of contents based on the information on the plurality of audio signals and the determined time point. A content determination unit that determines first content corresponding to the audio signal output before the determined time point or the reference time point by the predetermined time point; And
And an operation determiner configured to transmit second content related to the first content and determined based on an intention of a user extracted from the first content and the at least one voice signal to the electronic device.

The method of claim 47,
And a natural language processor configured to extract the intention of the user extracted by analyzing the text data converted from the at least one voice signal.

49. The method of claim 48 wherein
And the operation determiner determines the second content based on at least a portion of the information about the first content and the intention of the extracted user.

delete