[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108510981B - Method and system for acquiring voice data - Google Patents

Method and system for acquiring voice data Download PDF

Info

Publication number
CN108510981B
CN108510981B CN201810324045.2A CN201810324045A CN108510981B CN 108510981 B CN108510981 B CN 108510981B CN 201810324045 A CN201810324045 A CN 201810324045A CN 108510981 B CN108510981 B CN 108510981B
Authority
CN
China
Prior art keywords
voice data
voice
recognition model
application object
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810324045.2A
Other languages
Chinese (zh)
Other versions
CN108510981A (en
Inventor
谢晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201810324045.2A priority Critical patent/CN108510981B/en
Publication of CN108510981A publication Critical patent/CN108510981A/en
Priority to KR1020190035388A priority patent/KR102714096B1/en
Priority to US16/382,712 priority patent/US10984795B2/en
Application granted granted Critical
Publication of CN108510981B publication Critical patent/CN108510981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/34Microprocessors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/36Memories
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/38Displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/12Details of telephonic subscriber devices including a sensor for measuring a physical value, e.g. temperature or motion

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明提供一种语音数据的获取方法和系统,包括:当用户进行语音通话时,保存智能终端系统内实时传输的语音数据流,将麦克风的输入语音数据流保存为第一语音数据,将听筒的输出语音数据流保存为第二语音数据;检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,继续判断第一语音数据是否来自语音识别模型的应用对象,若是,将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据;若否,将第一语音数据和第二语音数据标记为非应用对象语音数据。基于本发明的方法,通过改善语音的获取方法,减轻用户训练语音识别模型的负担,提高了用户体验。

Figure 201810324045

The present invention provides a method and system for acquiring voice data, including: when a user makes a voice call, saving the voice data stream transmitted in real time in the intelligent terminal system, saving the input voice data stream of the microphone as the first voice data, and storing the earpiece The output voice data stream is saved as the second voice data; detect whether the first voice data and the second voice data meet the training requirements of the voice recognition model, if so, continue to judge whether the first voice data comes from the application object of the voice recognition model, if so, set the The first voice data is marked as application target voice data, and the second voice data is marked as non-application target voice data; if not, the first voice data and the second voice data are marked as non-application target voice data. Based on the method of the present invention, by improving the voice acquisition method, the user's burden of training the voice recognition model is reduced, and the user experience is improved.

Figure 201810324045

Description

语音数据的获取方法和系统Method and system for acquiring voice data

技术领域technical field

本发明涉及人工智能领域,特别涉及一种语音数据的获取方法和系统。The invention relates to the field of artificial intelligence, in particular to a method and system for acquiring voice data.

背景技术Background technique

移动终端语音识别分为语义识别和说话人识别两大类Mobile terminal speech recognition is divided into two categories: semantic recognition and speaker recognition.

说话人识别通常称为声纹识别。一般分为文本相关(Text-dependent)和文本不相关(Text-independent)两类。Speaker recognition is often referred to as voiceprint recognition. Generally, it is divided into two categories: Text-dependent and Text-independent.

文本相关的语音识别通常要求用户重复跟读固定词句2-3遍。以记录相关的特征信息作为登记(Enroll)。使用时,同样要求用户念读同样的固定词句用以语音判别(Predict)。Text-related speech recognition usually requires the user to repeat the fixed words and sentences 2-3 times. Record related feature information as Enroll. When using, the user is also required to read the same fixed words and phrases for speech discrimination (Predict).

非文本相关的语音识别则不要求用户跟读固定的语句。用户通过输入大量的语音数据作为机器学习的训练(Train),用户的特征信息在大量数据的训练下获得高度的提纯。用以训练的语音数据需要包含本用户的语音数据(语音识别模型的应用对象)和其他人的语音数据。语音判别时也不需要念读固定词句。正常的语音数据就可以用来语音判别。Non-text-related speech recognition does not require users to follow fixed sentences. The user inputs a large amount of speech data as the machine learning training (Train), and the user's characteristic information is highly purified under the training of a large amount of data. The voice data used for training needs to include the user's voice data (the application object of the voice recognition model) and other people's voice data. There is also no need to read fixed words and phrases during speech discrimination. Normal speech data can be used for speech discrimination.

现有技术中,移动智能终端对语音识别尚不能进行用户身份的区分,对不同用户的声音特征值没有区分,导致同一台移动智能终端可以同时为不同用户的语音指令服务,保密性和专属性较差。In the prior art, the mobile intelligent terminal cannot distinguish the user identity for voice recognition, and does not distinguish the voice feature values of different users, so that the same mobile intelligent terminal can serve the voice commands of different users at the same time, confidentiality and exclusivity. poor.

以语音助手为例,现有移动智能终端在启用语音助手服务时都需要有一个固定的唤醒过程。这是文本相关语音识别的缺陷,不能够脱离固定文本的限制,不能够做到对本用户(应用对象)任何的语音指令快速的响应。所有的语音指令都需要在语音助手被唤醒后才可以使用。任何用户都可以通过固定词句唤醒语音助手,并发出语音指令,语音助手无法对用户身份做语音识别,全部的语音指令都会被执行。Taking the voice assistant as an example, existing mobile smart terminals need to have a fixed wake-up process when the voice assistant service is enabled. This is a defect of text-related speech recognition, which cannot escape the limitation of fixed text, and cannot respond quickly to any voice command of the user (application object). All voice commands need to be activated after the voice assistant is awakened. Any user can wake up the voice assistant through fixed words and issue voice commands. The voice assistant cannot perform voice recognition on the user's identity, and all voice commands will be executed.

非文本相关的语音识别利用了机器学习技术,通过建立完整的学习模型,大量的语音数据输入训练来获得高度提纯的用户特征信息及模型参数。基于训练好的模型,用户可以通过任意的语音输入来实现高度正确率的说话人语音识别,不受固定文本的限制。Non-text-related speech recognition uses machine learning technology to obtain highly purified user feature information and model parameters by establishing a complete learning model and inputting a large amount of speech data for training. Based on the trained model, users can achieve highly accurate speaker speech recognition through arbitrary speech input, without being restricted by fixed text.

但是在移动智能终端上实现非文本相关的语音识别,需要获取大量的登记人和非登记人的语音数据。训练的过程漫长而枯燥。对用户的使用体验是很大的挑战。用户不希望花费时间和精力输入语音数据。另外获取非语音识别模型的应用对象的语音数据对终端用户来说也是一个尴尬的问题。没有充足的训练数据就无法达到识别的高准确率。所以现有的移动智能终端还没有出现非文本相关的语音识别系统。However, to realize non-text-related speech recognition on a mobile intelligent terminal, it is necessary to acquire a large amount of voice data of registrants and non-registrants. The training process is long and boring. The user experience is a big challenge. Users do not want to spend time and effort entering voice data. In addition, acquiring speech data of an application object of a non-speech recognition model is also an embarrassing problem for end users. High recognition accuracy cannot be achieved without sufficient training data. Therefore, there is no non-text-related speech recognition system in the existing mobile smart terminals.

针对上述问题,特别是终端应用的非文本语音识别模型时语音数据的获取方法,目前尚未提出有效的解决方案。In view of the above problems, especially the acquisition method of speech data in the non-text speech recognition model of terminal application, no effective solution has been proposed yet.

发明内容SUMMARY OF THE INVENTION

本发明提供一种语音数据的获取方法和系统,通过改善语音数据的获取过程,减轻用户负担。The present invention provides a voice data acquisition method and system, which reduces the burden on users by improving the voice data acquisition process.

本发明提供一种语音数据的获取方法,语音数据用于训练语音识别模型,该方法包括The present invention provides a method for acquiring voice data. The voice data is used for training a voice recognition model, and the method includes the following steps:

步骤A-1:当用户进行语音通话时,保存智能终端系统内实时传输的语音数据流,将麦克风的输入语音数据流保存为第一语音数据,将听筒的输出语音数据流保存为第二语音数据;Step A-1: When the user makes a voice call, save the voice data stream transmitted in real time in the intelligent terminal system, save the input voice data stream of the microphone as the first voice data, and save the output voice data stream of the earpiece as the second voice data;

步骤A-2:检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,执行步骤A-3;Step A-2: Detect whether the first voice data and the second voice data meet the training requirements of the voice recognition model, and if so, perform Step A-3;

步骤A-3:判断第一语音数据是否来自语音识别模型的应用对象,若是,执行步骤A-4,若否,执行步骤A-5;Step A-3: judge whether the first speech data is from the application object of the speech recognition model, if so, execute step A-4, if not, execute step A-5;

步骤A-4:将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据,应用对象语音数据用于语音识别模型中应用对象的语音特征学习;非应用对象语音数据用于语音识别模型中非应用对象的语音特征学习;Step A-4: mark the first speech data as application object speech data, mark the second speech data as non-application object speech data, and use the application object speech data for the speech feature learning of the application object in the speech recognition model; Speech data is used for speech feature learning of non-application objects in speech recognition model;

步骤A-5:将第一语音数据和第二语音数据标记为非应用对象语音数据。Step A-5: Mark the first voice data and the second voice data as non-application target voice data.

本发明还提供一种语音数据的获取系统,语音数据用于训练语音识别模型,该系统包括:The present invention also provides a system for acquiring voice data, the voice data is used for training a voice recognition model, and the system includes:

保存模块:当用户进行语音通话时,保存智能终端系统内实时传输的语音数据流,将麦克风的输入语音数据流保存为第一语音数据,将听筒的输出语音数据流保存为第二语音数据;Save module: when the user makes a voice call, save the voice data stream transmitted in real time in the intelligent terminal system, save the input voice data stream of the microphone as the first voice data, and save the output voice data stream of the earpiece as the second voice data;

检测模块:检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,执行用户判断模块;Detection module: detects whether the first voice data and the second voice data meet the training requirements of the voice recognition model, and if so, executes the user judgment module;

用户判断模块:判断第一语音数据是否来自语音识别模型的应用对象,若是则执行语音对象标记模块1,若否则执行语音对象标记2;User judgment module: judge whether the first voice data comes from the application object of the voice recognition model, if so, execute the voice object marking module 1, if otherwise, execute the voice object marking 2;

语音对象标记1:将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据,应用对象语音数据用于语音识别模型中应用对象的语音特征学习;非应用对象语音数据用于语音识别模型中非应用对象的语音特征学习;Voice object marking 1: mark the first voice data as application object voice data, mark the second voice data as non-application object voice data, and use the application object voice data for the voice feature learning of the application object in the speech recognition model; Speech data is used for speech feature learning of non-application objects in speech recognition model;

语音对象标记2:将第一语音数据和第二语音数据标记为非应用对象语音数据。Voice object marking 2: marking the first voice data and the second voice data as non-application object voice data.

本发明通过保存用户语音通话时的语音数据,将麦克风的输入语音数据(第一语音数据)用于语音识别模型中应用对象的语音特征学习,将听筒的输出语音数据(第二语音数据)用于语音识别模型中非应用对象的语音特征学习,在移动智能终端后台以“静默”的方式将训练语音数据传递给语音识别模型,用户无需做枯燥繁杂的输入工作,减轻了用户的训练负担,提高了用户体验。同时本申请的方法和系统可应用于任一基于神经网络的语音识别模型,适用范围广。基于本申请的语音数据获取方法和获取系统,使得非文本相关的语音识别可以在移动智能终端上得以实现,突破现有的文本相关语音识别的限制,可以让终端更智能理解各个用户的特征、使用习惯,增强专属性和安全性。The present invention saves the voice data during the user's voice call, uses the input voice data (first voice data) of the microphone for the voice feature learning of the application object in the voice recognition model, and uses the output voice data (second voice data) of the earpiece as For the speech feature learning of non-application objects in the speech recognition model, the training speech data is transmitted to the speech recognition model in a "silent" way in the background of the mobile intelligent terminal, and the user does not need to do boring and complicated input work, which reduces the user's training burden. Improved user experience. At the same time, the method and system of the present application can be applied to any speech recognition model based on neural network, and has a wide range of applications. Based on the voice data acquisition method and acquisition system of the present application, non-text-related voice recognition can be implemented on mobile intelligent terminals, breaking through the limitations of existing text-related voice recognition, allowing terminals to more intelligently understand the characteristics of each user, Use habits to enhance exclusivity and security.

附图说明Description of drawings

图1为本发明语音数据的获取方法的流程图;Fig. 1 is the flow chart of the acquisition method of speech data of the present invention;

图2为图1的一个实施例;Fig. 2 is an embodiment of Fig. 1;

图3为本发明语音数据的获取系统的结构图;Fig. 3 is the structure diagram of the acquisition system of speech data of the present invention;

图4为图3的一个实施例。FIG. 4 is an embodiment of FIG. 3 .

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

图1为本发明语音数据的获取方法的流程图,包括以下步骤:Fig. 1 is the flow chart of the acquisition method of speech data of the present invention, comprises the following steps:

步骤A-1(S101):当用户进行语音通话时,保存智能终端系统内实时传输的语音数据流,将麦克风的输入语音数据流保存为第一语音数据,将听筒的输出语音数据流保存为第二语音数据;Step A-1 (S101): when the user makes a voice call, save the voice data stream transmitted in real time in the intelligent terminal system, save the input voice data stream of the microphone as the first voice data, and save the output voice data stream of the earpiece as second voice data;

步骤A-2(S102):检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,执行步骤A-3;Step A-2 (S102): Detect whether the first voice data and the second voice data meet the training requirements of the voice recognition model, if so, execute Step A-3;

步骤A-3(S103):判断第一语音数据是否来自语音识别模型的应用对象,若是,执行步骤A-4,若否,执行步骤A-5;Step A-3 (S103): judge whether the first speech data comes from the application object of the speech recognition model, if so, execute step A-4, if not, execute step A-5;

步骤A-4(S104):将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据,应用对象语音数据用于语音识别模型中应用对象的语音特征学习;非应用对象语音数据用于语音识别模型中非应用对象的语音特征学习;Step A-4 (S104): the first voice data is marked as application object voice data, the second voice data is marked as non-application object voice data, and the application object voice data is used for the voice feature learning of the application object in the voice recognition model; The non-application object speech data is used for the speech feature learning of non-application objects in the speech recognition model;

步骤A-5(S105):将第一语音数据和第二语音数据标记为非应用对象语音数据。Step A-5 (S105): Mark the first voice data and the second voice data as non-application target voice data.

在步骤A-1中,语音通话不仅包括音频通话,也包含VoIP、VoLTE等视频通话;同时包括其他即时通讯app的实时音视频通话,如微信的“视频聊天”或“语音聊天”。In step A-1, the voice call includes not only audio calls, but also video calls such as VoIP and VoLTE; it also includes real-time audio and video calls of other instant messaging apps, such as WeChat's "video chat" or "voice chat".

当用户启动语音通话,即触发执行图1的方法。应用于微信、QQ时,当检测到相应的动作,如“视频聊天”或“语音聊天”按钮按下或生效,即触发执行图1的方法。When the user initiates a voice call, the method of FIG. 1 is triggered to be executed. When applied to WeChat and QQ, when a corresponding action is detected, such as pressing or taking effect of a "video chat" or "voice chat" button, the method in FIG. 1 is triggered to be executed.

步骤A-1保存语音数据流的工作,可以设置在移动智能终端操作系统中的硬件设备操作层,当用户开始语音通话时,在系统的硬件操作层,实时备份并保存麦克风的输入语音数据和听筒的输出语音数据,其中,麦克风的输入语音数据代表终端用户的语音数据,听筒的输出语音数据代表对端实时传送给终端用户的语音数据。Step A-1 saves the work of the voice data stream, which can be set at the hardware device operation layer in the operating system of the mobile intelligent terminal. When the user starts a voice call, the hardware operation layer of the system backs up and saves the input voice data and data of the microphone in real time. The output voice data of the receiver, wherein the input voice data of the microphone represents the voice data of the terminal user, and the output voice data of the receiver represents the voice data transmitted by the opposite end to the terminal user in real time.

以安卓系统为例,硬件设备操作层为Android HAL,通话的状态判断可以参考AudioHAL中tiny_audio_device中的call_connected的相关属性,当adev->call_connected为真,表示用于开启并处于语音通话中。Taking the Android system as an example, the hardware device operation layer is Android HAL, and the call status can be judged by referring to the related attributes of call_connected in tiny_audio_device in AudioHAL. When adev->call_connected is true, it means that it is used to open and in a voice call.

在AudioHAL中,当audio_hw_device为AUDIO_DEVICE_IN_BUILTIN_MIC,则表示当前麦克风设备处于工作状体。当audio_hw_device为AUDIO_DEVICE_OUT_EARPIECE,则表示当前听筒设备处于工作状态。In AudioHAL, when audio_hw_device is AUDIO_DEVICE_IN_BUILTIN_MIC, it means that the current microphone device is in working state. When audio_hw_device is AUDIO_DEVICE_OUT_EARPIECE, it means that the current earpiece device is in working state.

进一步地,在AudioHAL的out_write()函数中备份并保存听筒输出前的语音数据,write对应播放过程。同样地,在in_read()函数中备份并保存麦克风输入的语音数据,read对应录音过程。Further, in the out_write() function of AudioHAL, backup and save the voice data before the earpiece output, and write corresponds to the playback process. Similarly, the voice data input by the microphone is backed up and saved in the in_read() function, and read corresponds to the recording process.

另外,在步骤A-1中,可以将第一语音数据和第二语音数据保存在移动智能终端的ROM或RAM中。In addition, in step A-1, the first voice data and the second voice data may be stored in the ROM or RAM of the mobile smart terminal.

神经网络语音识别模型的语音特征值提取技术需要预先输入大量个人语音进行训练以获得说话人的声纹特征。现有的方法,是由指定程序让用户一句句录入语音数据进行训练,需要耗费用户额外的时间专门进行个人语音特征训练,且过程繁杂枯燥。The voice feature value extraction technology of the neural network speech recognition model needs to input a large number of personal voices in advance for training to obtain the speaker's voiceprint features. The existing method uses a designated program to allow the user to input speech data sentence by sentence for training, which requires the user to spend extra time to train personal voice characteristics, and the process is complicated and tedious.

本申请图1的训练方法,通过智能终端采集用户在日常工作、生活中产生的即时通讯语音数据,并将采集到的语音数据用于训练语音识别模型,以不断提高语音识别模型的识别准确率。与现有技术相比,用户无需做枯燥繁杂的输入工作,减轻了用户的训练负担,提高了用户体验。The training method shown in FIG. 1 of the present application collects instant messaging voice data generated by users in daily work and life through intelligent terminals, and uses the collected voice data to train a voice recognition model, so as to continuously improve the recognition accuracy of the voice recognition model. . Compared with the prior art, the user does not need to do boring and complicated input work, which reduces the training burden of the user and improves the user experience.

同时,使用本申请图1的训练方法,日积月累,使得非文本相关的语音识别可以在移动智能终端上得以实现,突破现有的文本相关语音识别的限制,可以使终端更智能理解各个用户的特征、使用习惯,增强专属性和安全性。At the same time, using the training method shown in FIG. 1 of the present application, over time, non-text-related speech recognition can be implemented on mobile intelligent terminals, breaking through the limitations of existing text-related speech recognition, enabling the terminal to more intelligently understand the characteristics of each user , usage habits, enhance exclusivity and security.

以语音助手为例,增加了非文本语音识别功能后,可实现对用户身份的识别,进而只处理语音识别模型应用对象的语音指令,增强安全性和专属性。Taking the voice assistant as an example, after adding the non-text voice recognition function, it can realize the identification of the user's identity, and then only process the voice commands of the application object of the voice recognition model, which enhances security and exclusivity.

在本申请图1的步骤A-2中,检测第一语音数据和第二语音数据是否符合模型训练要求,举例说明,可以检测第一语音数据和第二语音数据中是否包含非静音特征,如果否,说明第一语音数据和第二语音数据中没有记录人声,则不符合模型训练要求,如果是,则继续检测第一语音数据和第二语音数据中的语音是否清晰,如果不清晰,对模型训练无益,也不符合模型训练要求。In step A-2 of FIG. 1 of the present application, it is detected whether the first voice data and the second voice data meet the model training requirements. For example, it can be detected whether the first voice data and the second voice data contain non-silent features. If No, it means that there is no human voice recorded in the first voice data and the second voice data, it does not meet the model training requirements, if yes, continue to detect whether the voices in the first voice data and the second voice data are clear, if not clear, It is useless for model training and does not meet the model training requirements.

可选地,在图1的步骤A-2中,执行步骤A-3之前,还可以进一步对第一语音数据和第二语音数据进行语音清洗,语音清洗后,再执行步骤A-3。语音清洗包括去噪、降噪等处理,可以使第一语音数据和第二语音数据具有更好的品质,进而提高模型训练效果。Optionally, in step A-2 of FIG. 1 , before step A-3 is performed, voice cleaning may be further performed on the first voice data and the second voice data, and step A-3 is performed after voice cleaning. The voice cleaning includes denoising, noise reduction and other processing, which can make the first voice data and the second voice data have better quality, thereby improving the model training effect.

在本申请图1的步骤A-3中,判断第一语音数据的语音是否来自语音识别模型的应用对象,可以采用人脸识别或指纹判定或输出对话框让用户确认身份。以人脸识别为例,通过摄像头主动收集用户人脸信息,通过比较判定是否为语音识别模型的应用对象,若收集失败,则提示用户输入。指纹判定,一般需提示用户输入指定手指的指纹,通过比较判断是否为语音识别模型的应用对象。In step A-3 of FIG. 1 of the present application, to determine whether the voice of the first voice data comes from the application object of the voice recognition model, face recognition or fingerprint determination or output dialog box may be used to allow the user to confirm the identity. Taking face recognition as an example, the camera actively collects the user's face information, and determines whether it is the application object of the speech recognition model through comparison. If the collection fails, the user is prompted for input. Fingerprint determination, generally need to prompt the user to input the fingerprint of the designated finger, and determine whether it is the application object of the speech recognition model through comparison.

在本申请图1中,为了节省语音识别模型的训练时间,其中语音识别模型中的通用特征模块,特别是非应用对象语音特征模块,可以预先经过训练。In FIG. 1 of the present application, in order to save the training time of the speech recognition model, the general feature module in the speech recognition model, especially the non-application object speech feature module, may be pre-trained.

另一方面,为了避免占用终端系统资源和泄露用户隐私,在步骤A-4或步骤A-5之后,立即将第一语音数据和第二语音数据用于训练语音识别模型,训练结束后,如果第一语音数据、第二语音数据未更新,则清除第一语音数据、第二语音数据并退出图1流程。或者说,训练结束后,清除相关的语音数据并退出图1流程。On the other hand, in order to avoid occupying terminal system resources and leaking user privacy, immediately after step A-4 or step A-5, the first voice data and the second voice data are used to train the speech recognition model. If the first voice data and the second voice data are not updated, the first voice data and the second voice data are cleared, and the process of FIG. 1 is exited. In other words, after the training is over, clear the relevant voice data and exit the process of Figure 1.

图2对图1的方法进行了扩展,给出了一个具体应用的实施例,包括以下步骤:Fig. 2 expands the method of Fig. 1, and provides an embodiment of a specific application, including the following steps:

步骤A-11(S201):当用户进行语音通话时,将麦克风的输入语音数据流保存为第三语音数据,将听筒的输出语音数据流保存为第四语音数据;Step A-11 (S201): when the user makes a voice call, the input voice data stream of the microphone is saved as the third voice data, and the output voice data stream of the earpiece is saved as the fourth voice data;

步骤A-12(S202):当第三语音数据达到预设时长时,令第一语音数据等于第三语音数据,同时令第三语音数据为空,执行步骤A-2,同时返回步骤A-11。Step A-12 (S202): when the third voice data reaches the preset duration, make the first voice data equal to the third voice data, and simultaneously make the third voice data empty, execute step A-2, and return to step A- 11.

步骤A-13(S203):当第四语音数据的语音达到预设时长时,令第二语音数据等于第四语音数据,同时令第四语音数据为空,执行步骤A-2,同时返回步骤A-11。Step A-13 (S203): when the voice of the fourth voice data reaches the preset duration, make the second voice data equal to the fourth voice data, and simultaneously make the fourth voice data empty, execute step A-2, and return to step A-11.

步骤A-2(S204):检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,执行步骤A-3;Step A-2 (S204): Detect whether the first voice data and the second voice data meet the training requirements of the voice recognition model, if so, execute Step A-3;

步骤A-31(S205):利用语音识别模型判断第一语音数据是否来自语音识别模型的应用对象,并输出(判断)结果的置信度;如果置信度小于阈值,则执行步骤A-32;如果判断结果是语音识别模型的应用对象且置信度大于等于阈值,则执行步骤A-4;如果判断结果不是语音识别模型的应用对象且置信度大于等于阈值,则执行步骤A-5;Step A-31 (S205): utilize the speech recognition model to judge whether the first speech data comes from the application object of the speech recognition model, and output (judgment) the confidence of the result; if the confidence is less than the threshold, then execute step A-32; if If the judgment result is the application object of the speech recognition model and the confidence is greater than or equal to the threshold, then step A-4 is performed; if the judgment result is not the application object of the speech recognition model and the confidence is greater than or equal to the threshold, then step A-5 is performed;

步骤A-32(S206):在本次语音通话中,用户是否已确认身份,如果否,让用户确认是否是语音识别模型的应用对象,并记录用户确认结果,如果是语音识别模型的应用对象,执行步骤A-4,如果不是语音识别模型的应用对象,执行步骤A-5。Step A-32 (S206): in this voice call, whether the user has confirmed the identity, if not, let the user confirm whether it is the application object of the speech recognition model, and record the user confirmation result, if it is the application object of the speech recognition model , go to step A-4, if it is not the application object of the speech recognition model, go to step A-5.

步骤A-4(S207):将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据,应用对象语音数据用于语音识别模型中应用对象的语音特征学习;非应用对象语音数据用于语音识别模型中非应用对象的语音特征学习;Step A-4 (S207): the first voice data is marked as application object voice data, the second voice data is marked as non-application object voice data, and the application object voice data is used for the voice feature learning of the application object in the voice recognition model; The non-application object speech data is used for the speech feature learning of non-application objects in the speech recognition model;

步骤A-5(S208):将第一语音数据和第二语音数据标记为非应用对象语音数据。Step A-5 (S208): Mark the first voice data and the second voice data as non-application target voice data.

本申请图1的方法可以将一次语音通话所有数据都保存为第一语音数据和第二语音数据,然后基于该数据训练语音识别模型,也可以如图2的方法所示设置为边保存边训练,同时,图2的步骤A-11至A-13也可以替换为图1的A-1,根据实际需要选用。The method of FIG. 1 of the present application can save all the data of a voice call as the first voice data and the second voice data, and then train the voice recognition model based on the data, or it can be set to save while training as shown in the method of FIG. 2 , Meanwhile, steps A-11 to A-13 in FIG. 2 can also be replaced with A-1 in FIG. 1 , which can be selected according to actual needs.

在图2的步骤A-12和A-13中,预设时间大于10秒,或大于执行图2中步骤A-2至步骤A-5所耗费的时间。In steps A-12 and A-13 of FIG. 2 , the preset time is greater than 10 seconds, or greater than the time taken to perform steps A-2 to A-5 in FIG. 2 .

本申请图2的步骤A-31,未采用人脸识别或指纹识别进行用户身份认证,而是采用语音识别模型自身进行用户认证,考虑到语音识别模型训练初始,判断错误大,辅助用户人工确认身份,待模型识别精度越来越高,则无需人工参与,本申请图2的方法可以实现以“静默”的方式,在后台运行,在用户没有感知的前提下,持续不断地训练语音识别模型。In step A-31 of FIG. 2 of this application, face recognition or fingerprint recognition is not used for user identity authentication, but the voice recognition model itself is used for user authentication. Considering the initial training of the voice recognition model, the judgment error is large, and the user is assisted in manual confirmation. Identity, when the recognition accuracy of the model is getting higher and higher, there is no need for manual participation. The method in Figure 2 of this application can be implemented in a "silent" way, running in the background, and continuously training the speech recognition model without the user's perception. .

本发明还包括一种语音数据的获取系统,如图3所示,包括:The present invention also includes a system for acquiring voice data, as shown in Figure 3, including:

保存模块:当用户进行语音通话时,保存智能终端系统内实时传输的语音数据流,将麦克风的输入语音数据流保存为第一语音数据,将听筒的输出语音数据流保存为第二语音数据;Save module: when the user makes a voice call, save the voice data stream transmitted in real time in the intelligent terminal system, save the input voice data stream of the microphone as the first voice data, and save the output voice data stream of the earpiece as the second voice data;

检测模块:检测第一语音数据和第二语音数据是否符合语音识别模型训练要求,若是,执行用户判断模块;Detection module: detects whether the first voice data and the second voice data meet the training requirements of the voice recognition model, and if so, executes the user judgment module;

用户判断模块:判断第一语音数据是否来自语音识别模型的应用对象,若是则执行语音对象标记模块1,若否则执行语音对象标记2;User judgment module: judge whether the first voice data comes from the application object of the voice recognition model, if so, execute the voice object marking module 1, if otherwise, execute the voice object marking 2;

语音对象标记1:将第一语音数据标记为应用对象语音数据,将第二语音数据标记为非应用对象语音数据,应用对象语音数据用于语音识别模型中应用对象的语音特征学习;非应用对象语音数据用于语音识别模型中非应用对象的语音特征学习;Voice object marking 1: mark the first voice data as application object voice data, mark the second voice data as non-application object voice data, and use the application object voice data for the voice feature learning of the application object in the speech recognition model; Speech data is used for speech feature learning of non-application objects in speech recognition model;

语音对象标记2:将第一语音数据和第二语音数据标记为非应用对象语音数据。Voice object marking 2: marking the first voice data and the second voice data as non-application object voice data.

可选地,如图4所示,记录模块或者包括:循环记录模块、传递模块1和传递模块2。Optionally, as shown in FIG. 4 , the recording module may include: a cyclic recording module, a transmission module 1 and a transmission module 2 .

循环记录模块:将麦克风的输入语音数据流保存为第三语音数据,将听筒的输出语音数据流保存为第四语音数据;Loop recording module: save the input voice data stream of the microphone as the third voice data, and save the output voice data stream of the earpiece as the fourth voice data;

传递模块1:当第三语音数据达到预设时长时,令第一语音数据等于第三语音数据,同时令第三语音数据为空,执行检测模块,同时返回循环记录模块。Transfer module 1: when the third voice data reaches a preset duration, set the first voice data to be equal to the third voice data, and at the same time set the third voice data to be empty, execute the detection module, and return to the loop recording module at the same time.

传递模块2:当第四语音数据的语音达到预设时长时,令第二语音数据等于第四语音数据,同时令第四语音数据为空,执行检测模块,同时返回循环记录模块。Transmission module 2: when the voice of the fourth voice data reaches a preset duration, set the second voice data to be equal to the fourth voice data, and at the same time set the fourth voice data to be empty, execute the detection module, and return to the loop recording module at the same time.

可选地,如图4所示,用户判断模块或者包括:语音识别模型用户判断模块和用户确认模块,Optionally, as shown in Figure 4, the user judgment module may include: a voice recognition model user judgment module and a user confirmation module,

语音识别模型用户判断模块:利用语音识别模型判断第一语音数据是否来自语音识别模型的应用对象,并输出结果的置信度;如果置信度小于阈值,则执行用户确认模块;如果判断结果是语音识别模型的应用对象且置信度大于等于阈值,则执行语音对象标记1;如果判断结果不是语音识别模型的应用对象且置信度大于等于阈值,则执行语音对象标记2;Voice recognition model user judgment module: use the voice recognition model to judge whether the first voice data comes from the application object of the voice recognition model, and output the confidence of the result; if the confidence is less than the threshold, execute the user confirmation module; if the judgment result is voice recognition If the application object of the model and the confidence level is greater than or equal to the threshold, then perform voice object marking 1; if the judgment result is not the application object of the speech recognition model and the confidence level is greater than or equal to the threshold, then perform voice object marking 2;

用户确认模块:在本次语音通话中,用户是否已确认身份,如果否,让用户确认是否是语音识别模型的应用对象,并记录用户确认结果,如果是语音识别模型的应用对象,执行语音对象标记1,如果不是语音识别模型的应用对象,执行语音对象标记2。User confirmation module: In this voice call, whether the user has confirmed the identity, if not, let the user confirm whether it is the application object of the speech recognition model, and record the user confirmation result, if it is the application object of the speech recognition model, execute the voice object Mark 1, if it is not the application object of the speech recognition model, execute the speech object mark 2.

可选地,在检测模块中,检测第一语音数据和第二语音数据是否符合语音识别模型训练要求包括:Optionally, in the detection module, detecting whether the first voice data and the second voice data meet the training requirements of the voice recognition model includes:

检测第一语音数据和第二语音数据中是否包含非静音特征,如果否,则不符合模型训练要求,如果是,则继续检测第一语音数据和第二语音数据中的语音是否清晰,如果不清晰,也不符合模型训练要求。Detect whether the first voice data and the second voice data contain non-silent features, if not, it does not meet the model training requirements, if so, continue to detect whether the voices in the first voice data and the second voice data are clear, if not. It is clear and does not meet the model training requirements.

可选地,在检测模块中,若是,执行用户判断模块包括:Optionally, in the detection module, if yes, executing the user judgment module includes:

若是,则对第一语音数据和第二语音数据进行语音清洗后,执行用户判断模块。If so, the user judgment module is executed after voice cleaning is performed on the first voice data and the second voice data.

需要说明的是,本发明的语音数据的获取系统的实施例,与语音数据的获取方法的实施例原理相同,相关之处可以互相参照。It should be noted that, the embodiment of the voice data acquisition system of the present invention has the same principle as the embodiment of the voice data acquisition method, and relevant parts can be referred to each other.

以上所述仅为本发明的较佳实施例而已,并不用以限定本发明的包含范围,凡在本发明技术方案的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the technical solution of the present invention shall be should be included within the protection scope of the present invention.

Claims (10)

1. A method for obtaining speech data, wherein the speech data is used for training a speech recognition model, the method comprising the steps of:
step A-1: when a user carries out voice call, a voice data stream transmitted in real time in the intelligent terminal system is stored, an input voice data stream of a microphone is stored as first voice data, and an output voice data stream of a receiver is stored as second voice data;
step A-2: detecting whether the first voice data and the second voice data meet the training requirements of a voice recognition model, if so, executing the step A-3;
step A-3: judging whether the first voice data come from an application object of the voice recognition model, if so, executing the step A-4, and if not, executing the step A-5;
step A-4: marking the first voice data as application object voice data, and marking the second voice data as non-application object voice data, wherein the application object voice data is used for voice feature learning of an application object in the voice recognition model; the non-application object voice data is used for voice feature learning of a non-application object in the voice recognition model;
step A-5: and marking the first voice data and the second voice data as the non-application object voice data.
2. The method of claim 1, wherein in the step a-2, if so, performing the step a-3 comprises:
and if so, performing voice cleaning on the voice of the first voice data and the voice of the second voice data, and then executing the step A-3.
3. The method of claim 1, wherein saving an input voice data stream of a microphone as first voice data and an output voice data stream of an earpiece as second voice data comprises:
step A-11: saving the input voice data stream of the microphone as third voice data, saving the output voice data stream of the receiver as fourth voice data, and executing the step A-12 and the step A-13;
step A-12: when the third voice data reaches the preset time length, enabling the first voice data to be equal to the third voice data, enabling the third voice data to be empty, executing the step A-2, and returning to the step A-11;
step A-13: and when the voice of the fourth voice data reaches the preset duration, enabling the second voice data to be equal to the fourth voice data, enabling the fourth voice data to be empty, executing the step A-2, and returning to the step A-11.
4. The method of claim 1, wherein the step a-3 further comprises:
step A-31: judging whether the first voice data comes from an application object of the voice recognition model or not by using the voice recognition model, and outputting the confidence coefficient of a judgment result; if the confidence is less than the threshold, performing step A-32; if the judgment result is the application object of the voice recognition model and the confidence coefficient is larger than or equal to the threshold value, executing the step A-4; if the judgment result is not the application object of the speech recognition model and the confidence coefficient is larger than or equal to the threshold value, executing the step A-5;
step A-32: in the voice call, whether the user confirms the identity or not is judged, if not, the user is judged whether the user is the application object of the voice recognition model or not and the confirmation result of the user is recorded, if the user is the application object of the voice recognition model, the step A-4 is executed, and if the user is not the application object of the voice recognition model, the step A-5 is executed.
5. The method of claim 1, wherein detecting whether the first speech data and the second speech data comply with speech recognition model training requirements comprises:
and detecting whether the first voice data and the second voice data contain non-mute characteristics or not, if not, not meeting the requirement of model training, if so, continuously detecting whether the voices in the first voice data and the second voice data are clear or not, and if not, not meeting the requirement of model training.
6. A system for obtaining speech data for use in training a speech recognition model, the system comprising:
a storage module: when a user carries out voice call, a voice data stream transmitted in real time in the intelligent terminal system is stored, an input voice data stream of a microphone is stored as first voice data, and an output voice data stream of a receiver is stored as second voice data;
a detection module: detecting whether the first voice data and the second voice data meet the training requirements of a voice recognition model, if so, executing a user judgment module;
a user judgment module: judging whether the first voice data come from an application object of the voice recognition model, if so, executing a voice object marking module 1, and otherwise, executing a voice object marking module 2;
voice object flag 1: marking the first voice data as application object voice data, and marking the second voice data as non-application object voice data, wherein the application object voice data is used for voice feature learning of an application object in the voice recognition model; the non-application object voice data is used for voice feature learning of a non-application object in the voice recognition model;
voice object marker 2: and marking the first voice data and the second voice data as the non-application object voice data.
7. The system of claim 6, wherein in the detecting module, if yes, the executing the user determination module further comprises:
and if so, executing a user judgment module after carrying out voice cleaning on the first voice data and the second voice data.
8. The system of claim 6, wherein the save module further comprises:
a cycle recording module: saving the input voice data stream of the microphone as third voice data, saving the output voice data stream of the receiver as fourth voice data, and executing a transmission module 1 and a transmission module 2;
the transmission module 1: when the third voice data reaches a preset time length, enabling the first voice data to be equal to the third voice data, enabling the third voice data to be empty, executing a detection module, and returning to a circulating recording module;
the transmission module 2: and when the voice of the fourth voice data reaches the preset duration, making the second voice data equal to the fourth voice data, and making the fourth voice data empty, executing the detection module, and returning to the circulating recording module.
9. The system of claim 6, wherein the user determination module further comprises:
the voice recognition model user judgment module: judging whether the first voice data comes from an application object of the voice recognition model or not by using the voice recognition model, and outputting the confidence coefficient of a judgment result; if the confidence is less than the threshold, executing a user confirmation module; if the judgment result is the application object of the voice recognition model and the confidence coefficient is larger than or equal to a threshold value, executing a voice object mark 1; if the judgment result is not the application object of the voice recognition model and the confidence coefficient is larger than or equal to a threshold value, executing a voice object mark 2;
a user confirmation module: in the voice call, whether the user confirms the identity or not is judged, if not, the user is judged whether the user is the application object of the voice recognition model or not, the confirmation result of the user is recorded, if the user is the application object of the voice recognition model, the voice object mark 1 is executed, and if the user is not the application object of the voice recognition model, the voice object mark 2 is executed.
10. The system of claim 6, wherein the detecting module detects whether the first speech data and the second speech data conform to speech recognition model training requirements comprises:
and detecting whether the first voice data and the second voice data contain non-mute characteristics or not, if not, not meeting the requirement of model training, if so, continuously detecting whether the voices in the first voice data and the second voice data are clear or not, and if not, not meeting the requirement of model training.
CN201810324045.2A 2018-04-12 2018-04-12 Method and system for acquiring voice data Active CN108510981B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810324045.2A CN108510981B (en) 2018-04-12 2018-04-12 Method and system for acquiring voice data
KR1020190035388A KR102714096B1 (en) 2018-04-12 2019-03-27 Electronic apparatus and operation method thereof
US16/382,712 US10984795B2 (en) 2018-04-12 2019-04-12 Electronic apparatus and operation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810324045.2A CN108510981B (en) 2018-04-12 2018-04-12 Method and system for acquiring voice data

Publications (2)

Publication Number Publication Date
CN108510981A CN108510981A (en) 2018-09-07
CN108510981B true CN108510981B (en) 2020-07-24

Family

ID=63381824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810324045.2A Active CN108510981B (en) 2018-04-12 2018-04-12 Method and system for acquiring voice data

Country Status (2)

Country Link
KR (1) KR102714096B1 (en)
CN (1) CN108510981B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020096078A1 (en) * 2018-11-06 2020-05-14 주식회사 시스트란인터내셔널 Method and device for providing voice recognition service
KR20220120197A (en) * 2021-02-23 2022-08-30 삼성전자주식회사 Electronic apparatus and controlling method thereof
EP4207805A4 (en) * 2021-02-23 2024-04-03 Samsung Electronics Co., Ltd. ELECTRONIC DEVICE AND CONTROL METHOD THEREFOR
US12260865B2 (en) 2021-08-09 2025-03-25 Electronics And Telecommunications Research Institute Automatic interpretation server and method based on zero UI for connecting terminal devices only within a speech-receiving distance

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002196781A (en) * 2000-12-26 2002-07-12 Toshiba Corp Voice interactive system and recording medium used for the same
KR100864828B1 (en) * 2006-12-06 2008-10-23 한국전자통신연구원 System for obtaining speaker's information using the speaker's acoustic characteristics
JP5158174B2 (en) * 2010-10-25 2013-03-06 株式会社デンソー Voice recognition device
US9489950B2 (en) * 2012-05-31 2016-11-08 Agency For Science, Technology And Research Method and system for dual scoring for text-dependent speaker verification
CN104517587B (en) * 2013-09-27 2017-11-24 联想(北京)有限公司 A kind of screen display method and electronic equipment
KR102274317B1 (en) * 2013-10-08 2021-07-07 삼성전자주식회사 Method and apparatus for performing speech recognition based on information of device
KR101564087B1 (en) * 2014-02-06 2015-10-28 주식회사 에스원 Method and apparatus for speaker verification
CN103956169B (en) * 2014-04-17 2017-07-21 北京搜狗科技发展有限公司 A kind of pronunciation inputting method, device and system
KR20160098581A (en) * 2015-02-09 2016-08-19 홍익대학교 산학협력단 Method for certification using face recognition an speaker verification
KR102371697B1 (en) * 2015-02-11 2022-03-08 삼성전자주식회사 Operating Method for Voice function and electronic device supporting the same
KR101618512B1 (en) * 2015-05-06 2016-05-09 서울시립대학교 산학협력단 Gaussian mixture model based speaker recognition system and the selection method of additional training utterance
CN105976820B (en) * 2016-06-14 2019-12-31 上海质良智能化设备有限公司 Voice emotion analysis system

Also Published As

Publication number Publication date
CN108510981A (en) 2018-09-07
KR20190119521A (en) 2019-10-22
KR102714096B1 (en) 2024-10-08

Similar Documents

Publication Publication Date Title
US12080295B2 (en) System and method for dynamic facial features for speaker recognition
US10699702B2 (en) System and method for personalization of acoustic models for automatic speech recognition
CN112037799B (en) Voice interrupt processing method and device, computer equipment and storage medium
CN110428810B (en) Voice wake-up recognition method and device and electronic equipment
CN108510981B (en) Method and system for acquiring voice data
CN112037791B (en) Conference summary transcription method, apparatus and storage medium
TWI659409B (en) Speech endpoint detection method and speech recognition method
EP3890342B1 (en) Waking up a wearable device
WO2017197953A1 (en) Voiceprint-based identity recognition method and device
US8521525B2 (en) Communication control apparatus, communication control method, and non-transitory computer-readable medium storing a communication control program for converting sound data into text data
JP2013527490A (en) Smart audio logging system and method for mobile devices
CN112102850A (en) Processing method, device and medium for emotion recognition and electronic equipment
US20160077792A1 (en) Methods and apparatus for unsupervised wakeup
KR102326853B1 (en) User adaptive conversation apparatus based on monitoring emotion and ethic and method for thereof
CN112802498B (en) Voice detection method, device, computer equipment and storage medium
CN113707154B (en) Model training method, device, electronic equipment and readable storage medium
JPWO2017085992A1 (en) Information processing device
JP6915637B2 (en) Information processing equipment, information processing methods, and programs
CN114420121A (en) Voice interaction method, electronic device and storage medium
WO2017024835A1 (en) Voice recognition method and device
CN117894321B (en) Voice interaction method, voice interaction prompting system and device
CN111354358A (en) Control method, voice interaction device, voice recognition server, storage medium, and control system
CN112151070A (en) Voice detection method and device and electronic equipment
CN113539295B (en) Voice processing method and device
JP2000148187A (en) Speaker recognition method, apparatus using the method, and program recording medium therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant