[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115472161A - Voice wake-up method, device, device and storage medium - Google Patents

Voice wake-up method, device, device and storage medium Download PDF

Info

Publication number
CN115472161A
CN115472161A CN202210891087.0A CN202210891087A CN115472161A CN 115472161 A CN115472161 A CN 115472161A CN 202210891087 A CN202210891087 A CN 202210891087A CN 115472161 A CN115472161 A CN 115472161A
Authority
CN
China
Prior art keywords
voice
threshold
wake
confidence
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210891087.0A
Other languages
Chinese (zh)
Inventor
李良斌
王宇剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202210891087.0A priority Critical patent/CN115472161A/en
Publication of CN115472161A publication Critical patent/CN115472161A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4418Suspend and resume; Hibernate and awake
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明提供一种语音唤醒方法、装置、设备和存储介质,该方法包括:对拾音装置采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;基于所述目标语音唤醒阈值,进行唤醒处理。本发明提供的语音唤醒方法、装置、设备和存储介质可以提高语音唤醒的可靠性。

Figure 202210891087

The present invention provides a voice wake-up method, device, equipment and storage medium, the method comprising: identifying historical sampling data collected by a sound pickup device, and obtaining recognition result information used to represent whether voice is included in the historical sampling data; Determine a target voice wake-up threshold corresponding to the recognition result information based on a preset voice wake-up threshold; perform wake-up processing based on the target voice wake-up threshold. The voice wake-up method, device, equipment and storage medium provided by the invention can improve the reliability of voice wake-up.

Figure 202210891087

Description

语音唤醒方法、装置、设备和存储介质Voice wake-up method, device, device and storage medium

技术领域technical field

本发明涉及语音处理技术领域,尤其涉及一种语音唤醒方法、装置、设备和存储介质。The present invention relates to the technical field of voice processing, in particular to a voice wake-up method, device, equipment and storage medium.

背景技术Background technique

随着语音技术的发展,很多电子设备引入了语音识别技术,从而可以通过语音进行唤醒,以控制电子设备从非工作状态切换到工作状态,进行工作。With the development of voice technology, many electronic devices have introduced voice recognition technology, so that they can be woken up by voice, so as to control the switching of the electronic device from a non-working state to a working state to perform work.

目前,为了提高电子设备的唤醒成功率、降低电子设备的误唤醒率,常见的做法是通过唤醒模型对语音信息中的唤醒词进行识别。当识别出的唤醒词置信度满足预设的语音唤醒阈值,则电子设备唤醒成功。At present, in order to improve the wake-up success rate of the electronic device and reduce the false wake-up rate of the electronic device, a common practice is to use a wake-up model to identify wake-up words in voice information. When the confidence level of the recognized wake-up word meets the preset voice wake-up threshold, the electronic device wakes up successfully.

然而,上述的唤醒方式中,经常会出现误唤醒或者即使提及唤醒词,也无法唤醒电子设备的情形。因此,如何提高语音唤醒的可靠性,是目前亟需解决的技术问题。However, in the above-mentioned wake-up methods, false wake-up often occurs or the electronic device cannot be woken up even if a wake-up word is mentioned. Therefore, how to improve the reliability of voice wake-up is an urgent technical problem to be solved at present.

发明内容Contents of the invention

本发明提供一种语音唤醒方法、装置、设备和存储介质,用以解决现有技术中语音唤醒的可靠性低的缺陷,实现了提高语音唤醒的可靠性。The invention provides a voice wake-up method, device, equipment and storage medium, which are used to solve the defect of low reliability of voice wake-up in the prior art, and realize improvement of the reliability of voice wake-up.

本发明提供一种语音唤醒方法,包括:The present invention provides a voice wake-up method, comprising:

对拾音装置采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;Recognizing the historical sampling data collected by the sound pickup device, and obtaining recognition result information for characterizing whether speech is included in the historical sampling data;

基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;Based on a preset voice wake-up threshold, determine a target voice wake-up threshold corresponding to the recognition result information;

基于所述目标语音唤醒阈值,进行唤醒处理。Perform wake-up processing based on the target voice wake-up threshold.

根据本发明提供的一种语音唤醒方法,所述基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值,包括:According to a voice wake-up method provided by the present invention, the determination of the target voice wake-up threshold corresponding to the recognition result information based on the preset voice wake-up threshold includes:

在所述识别结果信息表征所述历史采样数据中包括语音的情况下,将所述预设语音唤醒阈值确定为所述目标语音唤醒阈值,或者将经过调高处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;In the case where the recognition result information indicates that voice is included in the historical sampling data, determine the preset voice wake-up threshold as the target voice wake-up threshold, or set the preset voice wake-up threshold that has been adjusted up determined as the target voice wake-up threshold;

在所述识别结果信息表征所述历史采样数据中不包括语音的情况下,将经过调低处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值。In a case where the recognition result information indicates that no voice is included in the historical sampling data, the preset voice wake-up threshold that has been adjusted down is determined as the target voice wake-up threshold.

根据本发明提供的一种语音唤醒方法,所述识别结果信息包括用于表征在所述历史采样数据中包含语音的语音置信度,和用于表征在所述历史采样数据中包含非语音的非语音置信度;According to a voice wake-up method provided by the present invention, the recognition result information includes a speech confidence level used to characterize that speech is contained in the historical sample data, and a non-speech confidence level used to represent non-speech contained in the historical sample data. Speech Confidence;

所述在所述识别结果信息表征所述历史采样数据中包括语音的情况下,将所述预设语音唤醒阈值确定为所述目标语音唤醒阈值,或者将经过调高处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值,包括:In the case where the recognition result information indicates that voice is included in the historical sampling data, the preset voice wake-up threshold is determined as the target voice wake-up threshold, or the preset voice that has been adjusted up The wake-up threshold is determined as the target voice wake-up threshold, comprising:

在所述语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值的情况,将经过调高处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;When the voice confidence is greater than or equal to the first confidence threshold and less than the second confidence threshold, the preset voice wake-up threshold that has been adjusted up is determined as the target voice wake-up threshold;

在所述语音置信度大于或等于所述第二置信度阈值,且所述非语音置信度小于第三置信度阈值的情况,将所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;When the voice confidence is greater than or equal to the second confidence threshold, and the non-voice confidence is less than a third confidence threshold, determining the preset voice wake-up threshold as the target voice wake-up threshold;

其中,所述第一置信度阈值小于所述第二置信度阈值,所述第三置信度阈值小于所述第二置信度阈值。Wherein, the first confidence threshold is smaller than the second confidence threshold, and the third confidence threshold is smaller than the second confidence threshold.

根据本发明提供的一种语音唤醒方法,在所述语音置信度小于所述第一置信度阈值的情况下,所述识别结果信息表征所述历史采样数据中不包括语音;以及According to a voice wake-up method provided by the present invention, when the voice confidence is less than the first confidence threshold, the recognition result information indicates that the historical sampling data does not include voice; and

所述识别结果信息还包括用于表征在所述历史采样数据中既不包含语音也不包含非语音的静音置信度,在所述静音置信度大于或等于第四置信度阈值的情况下,所述识别结果信息表征所述历史采样数据中不包括语音。The recognition result information also includes a confidence degree of silence used to characterize that neither speech nor non-speech is contained in the historical sampling data, and when the confidence degree of silence is greater than or equal to a fourth confidence threshold, the The recognition result information indicates that speech is not included in the historical sampling data.

根据本发明提供的一种语音唤醒方法,所述对麦克风采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息包括:According to a voice wake-up method provided by the present invention, the identification of the historical sampling data collected by the microphone, and obtaining the recognition result information used to represent whether voice is included in the historical sampling data include:

将所述历史采样数据输入语音识别模型中,得到所述识别结果信息;Inputting the historical sampling data into the speech recognition model to obtain the recognition result information;

其中,所述语音识别模型为基于多个音频样本对初始语音识别模型进行训练得到的,所述多个音频样本中包括包含语音的音频样本、包含非语音的音频样本以及既不包含语音也不包含非语音的音频样本。Wherein, the speech recognition model is obtained by training the initial speech recognition model based on a plurality of audio samples, and the plurality of audio samples include audio samples containing speech, audio samples containing non-speech, and audio samples containing neither speech nor Contains audio samples that are not speech.

根据本发明提供的一种语音唤醒方法,According to a voice wake-up method provided by the present invention,

所述对拾音装置采集的历史采样数据进行识别,包括:The historical sampling data collected by the sound pickup device is identified, including:

对以当前时刻为结束时刻的预设时间段内、所述拾音装置采集的历史采样数据进行识别。Identify historical sampling data collected by the sound pickup device within a preset time period with the current moment as the end moment.

根据本发明提供的一种语音唤醒方法,所述基于所述目标语音唤醒阈值,进行唤醒处理,包括:According to a voice wake-up method provided by the present invention, performing wake-up processing based on the target voice wake-up threshold includes:

获取所述拾音装置在所述当前时刻采集的当前采样数据;Acquiring the current sampling data collected by the sound pickup device at the current moment;

获取所述当前采样数据包含预设唤醒词的唤醒置信度,以及所述当前采样数据对应的当前声音强度;Acquiring the wake-up confidence level of the preset wake-up word included in the current sampling data, and the current sound intensity corresponding to the current sampling data;

在所述唤醒置信度大于或等于所述目标语音唤醒阈值、且所述当前声音强度大于目标声音强度阈值的情况下,执行唤醒操作。When the wake-up confidence is greater than or equal to the target voice wake-up threshold and the current sound intensity is greater than the target sound intensity threshold, perform a wake-up operation.

根据本发明提供的一种语音唤醒方法,所述方法还包括:According to a voice wake-up method provided by the present invention, the method further includes:

获取所述历史采样数据对应的历史声音强度;Acquiring the historical sound intensity corresponding to the historical sampling data;

基于所述历史声音强度,确定所述目标声音强度阈值。Based on the historical sound intensity, the target sound intensity threshold is determined.

本发明还提供一种语音唤醒装置,包括:The present invention also provides a voice wake-up device, comprising:

识别模块,用于对麦克风采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;An identification module, configured to identify the historical sampling data collected by the microphone, and obtain identification result information representing whether speech is included in the historical sampling data;

确定模块,用于基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;A determining module, configured to determine a target voice wake-up threshold corresponding to the recognition result information based on a preset voice wake-up threshold;

处理模块,用于基于所述目标语音唤醒阈值,进行唤醒处理。A processing module, configured to perform wake-up processing based on the target voice wake-up threshold.

本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述语音唤醒方法。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the voice wake-up method described above is implemented. .

本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述语音唤醒方法。The present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the voice wake-up method described in any one of the above-mentioned methods is implemented.

本发明还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述语音唤醒方法。The present invention also provides a computer program product, including a computer program. When the computer program is executed by a processor, any one of the voice wake-up methods described above is implemented.

本发明实施例提供的语音唤醒方法、装置、设备和存储介质,通过对拾音装置采集的历史采样数据进行识别,获得用于表征历史采样数据中是否包括语音的识别结果信息后,基于预设语音唤醒阈值,确定识别结果信息对应的目标语音唤醒阈值,并基于该目标语音唤醒阈值,进行唤醒处理。由于在包括语音和不包括语音的场景中,分别设置不同的目标语音唤醒阈值,这样,可以根据历史采样数据中是否包含语音的识别结果,确定对应的目标语音唤醒阈值,以进行唤醒处理,从而可以降低误唤醒率,提高唤醒成功率,由此可以提高语音唤醒的可靠性。The voice wake-up method, device, device, and storage medium provided by the embodiments of the present invention identify the historical sampling data collected by the sound pickup device, obtain the recognition result information used to represent whether voice is included in the historical sampling data, and then based on the preset The voice wake-up threshold determines a target voice wake-up threshold corresponding to the recognition result information, and performs wake-up processing based on the target voice wake-up threshold. Since different target voice wake-up thresholds are respectively set in scenarios including voice and voice-free scenes, in this way, the corresponding target voice wake-up threshold can be determined according to the recognition results of whether voice is included in the historical sampling data to perform wake-up processing, thereby The false wake-up rate can be reduced, and the wake-up success rate can be improved, thereby improving the reliability of voice wake-up.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the present invention or the technical solutions in the prior art, the accompanying drawings that need to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are the present invention. For some embodiments of the invention, those skilled in the art can also obtain other drawings based on these drawings without creative effort.

图1为本发明实施例提供的语音唤醒方法的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a voice wake-up method provided by an embodiment of the present invention;

图2为本发明实施例提供的语音唤醒方法的流程示意图之一;Fig. 2 is one of the schematic flow charts of the voice wake-up method provided by the embodiment of the present invention;

图3为本发明实施例提供的语音唤醒方法的流程示意图之二;Fig. 3 is the second schematic flow diagram of the voice wake-up method provided by the embodiment of the present invention;

图4为本发明实施例提供的语音唤醒装置的示意图;FIG. 4 is a schematic diagram of a voice wake-up device provided by an embodiment of the present invention;

图5是本发明提供的一种电子设备的实体结构示意图。Fig. 5 is a schematic diagram of the physical structure of an electronic device provided by the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

语音唤醒技术是语音识别技术中重要的分支,语音唤醒技术通过监听用户语音来判断用户是否说出了指定唤醒词,从而进行唤醒处理,目前在车载,导航,智能家居等方面有着重要的应用。在实际的使用过程中,语音信息中可能会包含一些和唤醒词相似的词语,当用户在聊天过程中提及这些相似的词语时,电子设备可能会出现误唤醒的情况,或者在噪声较大的环境中,即使用户说了唤醒词,电子设备也不一定会被唤醒,因而存在误唤醒率较高,或者唤醒成功率较低的问题,导致语音唤醒的可靠性较差。Voice wake-up technology is an important branch of voice recognition technology. Voice wake-up technology monitors the user's voice to determine whether the user has spoken the specified wake-up word, so as to perform wake-up processing. Currently, it has important applications in vehicle, navigation, and smart home. In actual use, the voice information may contain some words similar to the wake-up words. When the user mentions these similar words in the chat process, the electronic device may be awakened by mistake, or the electronic device may wake up in a noisy environment. In such an environment, even if the user says the wake-up word, the electronic device may not be woken up, so there is a problem of high false wake-up rate or low wake-up success rate, resulting in poor reliability of voice wake-up.

考虑到上述问题,本发明实施例提供了一种语音唤醒方法,该方法的技术构思在于,在当前环境中包含语音和不包含语音的不同场景下,电子设备对唤醒词的识别准确度可能会存在差异,因而可以基于历史采样数据中是否包含语音的识别结果信息,确定不同的目标语音唤醒阈值,以匹配不同的场景。这样,在基于目标语音唤醒阈值进行唤醒处理的过程中,可以降低误唤醒率,提高唤醒成功率,从而可以提高语音唤醒的可靠性。In consideration of the above problems, an embodiment of the present invention provides a voice wake-up method. The technical idea of the method is that the recognition accuracy of the wake-up word by the electronic device may be different in the current environment that contains voice or does not contain voice. There are differences, so based on whether the historical sampling data contains voice recognition result information, different target voice wakeup thresholds can be determined to match different scenarios. In this way, during the wake-up processing based on the target voice wake-up threshold, the false wake-up rate can be reduced, the wake-up success rate can be increased, and thus the reliability of voice wake-up can be improved.

下面结合图1-图3描述本发明的语音唤醒方法的技术方案。The technical solution of the voice wake-up method of the present invention is described below with reference to FIGS. 1-3 .

示例性的,图1为本发明实施例提供的语音唤醒方法的一种应用场景示意图,如图1所示,假设电子设备103的唤醒词为“小灵小灵”,当用户在说出了上述的唤醒词后,即可启动电子设备103。而在实际应用中,用户101和用户102在聊天的过程中,假设提及“就小赢小赢一下而已”。若电子设备103处于比较噪杂的环境中,或者用户101相对于电子设备103距离较远的情况下,电子设备103很有可能会将“小赢小赢”识别为“小灵小灵”从而进行唤醒。为了避免这种现象,在本发明中,电子设备103在识别出历史采样数据中包含语音,也即用户101和用户102在聊天时,可以基于预设语音唤醒阈值,确定在该场景下的目标语音唤醒阈值,例如可以提高预设语音唤醒阈值,或者维持预设语音唤醒阈值不变,这样,可以避免电子设备103被误唤醒。在电子设备103识别出历史采样数据中不包含语音,也即用户101和用户102聊天结束后,可以重新确定在不包含语音这一场景下的目标语音唤醒阈值,例如可以降低预设语音唤醒阈值,从而可以保证电子设备103的唤醒成功率。通过上述动态调整目标语音唤醒阈值的方式,可以提高语音唤醒的可靠性。Exemplarily, FIG. 1 is a schematic diagram of an application scenario of the voice wake-up method provided by the embodiment of the present invention. As shown in FIG. After the above-mentioned wake-up word, the electronic device 103 can be started. However, in practical applications, the user 101 and the user 102 may mention "just a small win and a small win" during the chatting process. If the electronic device 103 is in a relatively noisy environment, or the user 101 is far away from the electronic device 103, the electronic device 103 is likely to recognize "Xiaoying Xiaoying" as "Xiaoling Xiaoling" and thus to wake up. In order to avoid this phenomenon, in the present invention, the electronic device 103 recognizes that the historical sampling data contains voice, that is, when the user 101 and the user 102 are chatting, they can determine the target in this scene based on the preset voice wake-up threshold. The voice wake-up threshold, for example, can increase the preset voice wake-up threshold, or keep the preset voice wake-up threshold unchanged, so that the electronic device 103 can be prevented from being woken up by mistake. After the electronic device 103 recognizes that the historical sampling data does not contain voice, that is, after the chat between the user 101 and the user 102 ends, the target voice wake-up threshold in the scenario that does not contain voice can be re-determined, for example, the preset voice wake-up threshold can be lowered , so that the wake-up success rate of the electronic device 103 can be guaranteed. Through the foregoing method of dynamically adjusting the target voice wake-up threshold, the reliability of voice wake-up can be improved.

其中,本发明实施例提供的语音唤醒方法可以应用于如图1所示的电子设备103中,图1中的电子设备仅为示例性说明,在实际应用中,该电子设备可以是被唤醒的设备本身,也可以是除该设备外的、进行唤醒识别并控制该设备唤醒的其他设备。其中,该电子设备可以是移动终端、智能音箱、智能空调等任一具有语音控制功能的设备,进行唤醒识别并控制该电子设备唤醒的其他设备可以是服务器、移动终端等。Wherein, the voice wake-up method provided by the embodiment of the present invention can be applied to the electronic device 103 shown in FIG. 1, the electronic device in FIG. The device itself may also be other devices other than the device that perform wake-up identification and control the wake-up of the device. Wherein, the electronic device may be any device with a voice control function such as a mobile terminal, a smart speaker, and a smart air conditioner, and other devices that perform wake-up identification and control the wake-up of the electronic device may be a server, a mobile terminal, and the like.

图2为本发明实施例提供的语音唤醒方法的流程示意图之一,如图2所示,该方法包括:Fig. 2 is one of the flow diagrams of the voice wake-up method provided by the embodiment of the present invention. As shown in Fig. 2, the method includes:

步骤201:对拾音装置采集的历史采样数据进行识别,获得用于表征历史采样数据中是否包括语音的识别结果信息。Step 201: Recognize the historical sampling data collected by the sound pickup device, and obtain recognition result information used to indicate whether speech is included in the historical sampling data.

其中,电子设备中设置的拾音装置例如可以为麦克风或者麦克风阵列,也可以为其他能够采集到声音信息的装置。Wherein, the sound pickup device provided in the electronic device may be, for example, a microphone or a microphone array, or other devices capable of collecting sound information.

在本步骤中,在对拾音装置采集的历史采样数据进行识别时,可以将该历史采样数据进行分帧处理,以确定每帧数据对应的识别结果信息。将每帧数据对应的识别结果信息进行汇总,即可得到历史采样数据的识别结果信息,其中,该识别结果信息用于表征历史采样数据中是否包括语音。In this step, when identifying the historical sampling data collected by the sound pickup device, the historical sampling data may be divided into frames to determine the identification result information corresponding to each frame of data. By summarizing the recognition result information corresponding to each frame of data, the recognition result information of the historical sampling data can be obtained, wherein the recognition result information is used to indicate whether the historical sampling data includes speech.

步骤202:基于预设语音唤醒阈值,确定识别结果信息对应的目标语音唤醒阈值。Step 202: Based on the preset voice wake-up threshold, determine a target voice wake-up threshold corresponding to the recognition result information.

其中,预设语音唤醒阈值,可以理解为初始语音唤醒阈值,或者也可以理解为电子设备在出厂时设置的语音唤醒阈值。Wherein, the preset voice wake-up threshold may be understood as an initial voice wake-up threshold, or may also be understood as a voice wake-up threshold set by the electronic device at the factory.

在本步骤中,由于电子设备在不同场景中,对唤醒词的识别准确度会存在差异。例如,对于存在语音的场景,如多人聊天或者多人会议的场景中,电子设备可能会将非唤醒词识别为唤醒词,从而出现误唤醒的情况,而在静音或者不存在语音的场景中,出现误唤醒的概率则相对较低。因此,电子设备可以针对不同的场景,基于预设语音唤醒阈值,设置不同的目标语音唤醒阈值。如在包含语音的场景中,确定目标语音唤醒阈值为阈值1,在不包含语音的场景中,确定目标语音唤醒阈值为阈值2,其中,阈值1大于阈值2。In this step, since the electronic device is in different scenarios, the recognition accuracy of the wake-up word may vary. For example, in a scene where there is voice, such as a multi-person chat or a multi-person meeting, the electronic device may recognize a non-wake-up word as a wake-up word, resulting in a false wake-up, while in a silent or no-voice scene , the probability of false wake-up is relatively low. Therefore, the electronic device can set different target voice wake-up thresholds based on the preset voice wake-up threshold for different scenarios. For example, in a scene containing speech, the target speech wakeup threshold is determined to be threshold 1; in a scene not containing speech, the target speech wakeup threshold is determined to be threshold 2, where threshold 1 is greater than threshold 2.

另外,为了避免来回调整目标语音唤醒阈值,电子设备可以在基于历史采样数据,确定出历史采样数据的识别结果信息后,在确定该识别结果信息维持预设时长没有发生变化的情况下,再基于预设语音唤醒阈值,确定该识别结果信息对应的目标语音唤醒阈值。其中,该预设时长可以根据经验设置,例如,可以设置为5s等。In addition, in order to avoid adjusting the target voice wake-up threshold back and forth, the electronic device can determine the recognition result information of the historical sampling data based on the historical sampling data, and then determine that the recognition result information has not changed for a preset time period, and then based on the A voice wake-up threshold is preset, and a target voice wake-up threshold corresponding to the recognition result information is determined. Wherein, the preset duration can be set according to experience, for example, it can be set to 5s or the like.

步骤203:基于目标语音唤醒阈值,进行唤醒处理。Step 203: Perform wake-up processing based on the target voice wake-up threshold.

在本步骤中,在确定出目标语音唤醒阈值之后,将基于该目标语音唤醒阈值,确定是否唤醒电子设备。示例性的,可以通过语音唤醒模型确定当前时刻采集的当前采样数据中包含预设唤醒词的唤醒置信度,并判断该唤醒置信度是否高于目标语音唤醒阈值,在该唤醒置信度高于目标语音唤醒阈值的情况下,唤醒电子设备,否则,不唤醒电子设备。In this step, after the target voice wake-up threshold is determined, it is determined whether to wake up the electronic device based on the target voice wake-up threshold. Exemplarily, the voice wake-up model can be used to determine the wake-up confidence level of the preset wake-up word contained in the current sampling data collected at the current moment, and determine whether the wake-up confidence level is higher than the target voice wake-up threshold, and if the wake-up confidence level is higher than the target In the case of the voice wake-up threshold, the electronic device is woken up; otherwise, the electronic device is not woken up.

应理解,由于上述目标语音唤醒阈值是动态更新的,在确定出新的目标语音唤醒阈值之前,电子设备将基于该目标语音唤醒阈值进行唤醒处理。It should be understood that since the above-mentioned target voice wake-up threshold is dynamically updated, before a new target voice wake-up threshold is determined, the electronic device will perform wake-up processing based on the target voice wake-up threshold.

本发明实施例提供的语音唤醒方法,通过对拾音装置采集的历史采样数据进行识别,获得用于表征历史采样数据中是否包括语音的识别结果信息后,基于预设语音唤醒阈值,确定识别结果信息对应的目标语音唤醒阈值,并基于该目标语音唤醒阈值,进行唤醒处理。由于在包括语音和不包括语音的场景中,分别设置不同的目标语音唤醒阈值,这样,可以根据历史采样数据中是否包含语音的识别结果,确定对应的目标语音唤醒阈值,以进行唤醒处理,从而可以降低误唤醒率,提高唤醒成功率,由此可以提高语音唤醒的可靠性。另外,通过动态调整预设语音唤醒阈值,在不改变唤醒模型精度的情况下,可以降低语音唤醒的误唤醒率。In the voice wake-up method provided by the embodiment of the present invention, by identifying the historical sampling data collected by the sound pickup device, after obtaining the recognition result information representing whether the historical sampling data includes voice, the recognition result is determined based on the preset voice wake-up threshold The target voice wake-up threshold corresponding to the information, and perform wake-up processing based on the target voice wake-up threshold. Since different target voice wake-up thresholds are respectively set in scenarios including voice and voice-free scenes, in this way, the corresponding target voice wake-up threshold can be determined according to the recognition results of whether voice is included in the historical sampling data to perform wake-up processing, thereby The false wake-up rate can be reduced, and the wake-up success rate can be improved, thereby improving the reliability of voice wake-up. In addition, by dynamically adjusting the preset voice wakeup threshold, the false wakeup rate of voice wakeup can be reduced without changing the accuracy of the wakeup model.

图3为本发明实施例提供的语音唤醒方法的流程示意图之二,本实施例在图2所示实施例的基础上,对步骤202中如何基于预设语音唤醒阈值,确定识别结果信息对应的目标语音唤醒阈值的过程进行详细说明。如图3所示,该方法包括:FIG. 3 is the second schematic flow diagram of the voice wake-up method provided by the embodiment of the present invention. On the basis of the embodiment shown in FIG. The process of targeting the voice arousal threshold is described in detail. As shown in Figure 3, the method includes:

步骤301:对拾音装置采集的历史采样数据进行识别,获得用于表征历史采样数据中是否包括语音的识别结果信息。Step 301: Recognize the historical sampling data collected by the sound pickup device, and obtain recognition result information used to indicate whether the historical sampling data includes speech.

在一种可能的实现方式中,在对历史采样数据进行识别时,可以将历史采样数据输入语音识别模型中,得到该识别结果信息;其中,语音识别模型为基于多个音频样本对初始语音识别模型进行训练得到的,多个音频样本中包括包含语音的音频样本、包含非语音的音频样本以及既不包含语音也不包含非语音的音频样本。In a possible implementation, when the historical sampling data is recognized, the historical sampling data can be input into the speech recognition model to obtain the recognition result information; wherein, the speech recognition model is based on multiple audio samples for the initial speech recognition The multiple audio samples include audio samples containing speech, audio samples containing non-speech, and audio samples containing neither speech nor non-speech.

具体地,可以将历史采样数据输入预先训练的语音识别模型中,即可输出识别结果信息。该语音识别模型可以为深度神经网络(Deep Neural Networks;DNN)。Specifically, the historical sampling data can be input into the pre-trained speech recognition model, and the recognition result information can be output. The speech recognition model may be a deep neural network (Deep Neural Networks; DNN).

其中,上述识别结果信息中可以包括用于表征在历史采样数据中包含语音的语音置信度,和用于表征在历史采样数据中包含非语音的非语音置信度;或者,该识别结果信息中可以包括用于表征在历史采样数据中包含语音的语音置信度、用于表征在历史采样数据中包含非语音的非语音置信度,以及用于表征在历史采样数据中既不包含语音也不包含非语音的静音置信度。Wherein, the above-mentioned recognition result information may include a speech confidence degree used to characterize the speech contained in the historical sample data, and a non-speech confidence degree used to represent the non-speech contained in the historical sample data; or, the recognition result information may be These include speech confidence for the presence of speech in historical sample data, non-speech confidence for the presence of non-speech in historical sample data, and non-speech confidence for the presence of neither speech nor non-speech in historical sample data. The silence confidence of the speech.

上述语音识别模型可以通过如下方式训练得到:The above speech recognition model can be trained as follows:

首先,采集大量的音频样本,这些音频样本中包括包含语音的音频样本、包含非语音的音频样本以及既不包含语音也不包含非语音的音频样本。应理解,为了使得语音识别模型能够识别出多种场景下的历史采样数据,上述的音频样本可以在多种场景下进行采集,如聊天场景、会议讨论场景和户外场景等。在采集到音频样本之后,可以对每个音频样本进行标注,如标注为包含语音、包含非语音或者静音等。First, a large number of audio samples are collected, and these audio samples include audio samples containing speech, audio samples containing non-speech, and audio samples containing neither speech nor non-speech. It should be understood that, in order to enable the speech recognition model to recognize historical sampling data in various scenarios, the above audio samples may be collected in various scenarios, such as chatting scenarios, meeting discussion scenarios, and outdoor scenarios. After the audio samples are collected, each audio sample may be marked, for example, as containing speech, containing non-speech, or mute.

将采集的音频样本输入初始语音识别模型中,得到每个音频样本对应的预测结果,再将预测结果和该音频样本的标注信息进行比对,从而可以得到损失信息。基于该损失信息调整初始语音识别模型的模型参数,并不断重复上述过程,直至得到的语音识别模型收敛或者损失信息最小,并将最后得到的模型确定为语音识别模型。Input the collected audio samples into the initial speech recognition model to obtain the prediction result corresponding to each audio sample, and then compare the prediction result with the label information of the audio sample, so as to obtain the loss information. Adjust the model parameters of the initial speech recognition model based on the loss information, and repeat the above process until the obtained speech recognition model converges or the loss information is the smallest, and the finally obtained model is determined as the speech recognition model.

在本实施例中,可以将历史采样数据输入语音识别模型中,即可得到识别结果信息,由此可以提高确定识别结果信息的效率。In this embodiment, the historical sampling data can be input into the speech recognition model to obtain the recognition result information, thereby improving the efficiency of determining the recognition result information.

步骤302:在识别结果信息表征历史采样数据中包括语音的情况下,将预设语音唤醒阈值确定为目标语音唤醒阈值,或者将经过调高处理的预设语音唤醒阈值确定为目标语音唤醒阈值。Step 302: In the case that the recognition result information represents the historical sampling data including voice, determine the preset voice wake-up threshold as the target voice wake-up threshold, or determine the preset voice wake-up threshold that has been adjusted up as the target voice wake-up threshold.

可选地,识别结果信息中包括用于表征在历史采样数据中包含语音的语音置信度,和用于表征在历史采样数据中包含非语音的非语音置信度,因此,可以基于语音置信度和非语音置信度的大小,确定历史采样数据中是否包括语音。示例的,可以在语音置信度大于或等于第一置信度阈值的情况下,确定该历史采样数据中包括语音,在语音置信度小于第一置信度阈值的情况下,确定该历史采样数据中不包括语音。Optionally, the recognition result information includes speech confidence used to characterize speech contained in historical sample data, and non-speech confidence used to represent non-speech contained in historical sample data. Therefore, based on speech confidence and The size of the non-speech confidence determines whether speech is included in the historical sampling data. For example, when the voice confidence is greater than or equal to the first confidence threshold, it can be determined that the historical sampling data includes voice; including speech.

应理解,上述的语音置信度可以理解为对历史采样数据中语音的打分,其与语音清晰度和/或语音的声音强度呈正相关。例如,若历史采样数据中包含近场语音、或者包含的语音较为清晰、或者声音强度较大,则该历史采样数据中的语音置信度则较高,否则,该历史采样数据中的语音置信度则较低。It should be understood that the above speech confidence can be understood as a score for the speech in the historical sampling data, which is positively correlated with the speech clarity and/or the sound intensity of the speech. For example, if the historical sampling data contains near-field speech, or the contained speech is relatively clear, or the sound intensity is relatively high, then the speech confidence in the historical sampling data is higher; otherwise, the speech confidence in the historical sampling data is higher. is lower.

进一步地,在识别结果信息表征历史采样数据中包括语音的情况下,该语音可能为近场语音,也可能为远场语音,或者该语音可能较为清晰,也可能较为含糊,而这些内容可以通过识别结果信息中的语音置信度来表示,因而,针对上述不同的情况,电子设备可以基于语音置信度的大小,通过不同的方式确定目标语音唤醒阈值。Further, in the case that the recognition result information represents speech in the historical sampling data, the speech may be near-field speech or far-field speech, or the speech may be relatively clear or vague, and these contents can be obtained through The voice confidence in the recognition result information is represented. Therefore, for the above different situations, the electronic device may determine the target voice wake-up threshold in different ways based on the voice confidence.

示例性的,在语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值的情况下,将经过调高处理的预设语音唤醒阈值确定为目标语音唤醒阈值;在语音置信度大于或等于所述第二置信度阈值,且非语音置信度小于第三置信度的情况下,将预设语音唤醒阈值确定为目标语音唤醒阈值;其中,第一置信度阈值小于第二置信度阈值,第三置信度阈值小于第二置信度阈值。Exemplarily, when the voice confidence is greater than or equal to the first confidence threshold and less than the second confidence threshold, the preset voice wake-up threshold that has been adjusted up is determined as the target voice wake-up threshold; In the case of greater than or equal to the second confidence threshold, and the non-speech confidence is less than the third confidence, the preset voice wake-up threshold is determined as the target voice wake-up threshold; wherein, the first confidence threshold is less than the second confidence Threshold, the third confidence threshold is smaller than the second confidence threshold.

具体地,为了便于描述,可以将语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值的场景,称为第一场景,将语音置信度大于或等于第二置信度阈值,且非语音置信度小于第三置信度阈值的场景,称为第二场景。在第一场景下,由于语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值,说明在历史采样数据中包含的语音较为含糊,或者声音强度不高,因此,在该第一场景下,电子设备可能会将非唤醒词识别为唤醒词,从而出现误唤醒。为了解决这一问题,本发明实施例中可以在第一场景下调高预设语音唤醒阈值,并将经过调高处理的预设语音唤醒阈值确定为目标语音唤醒阈值。Specifically, for the convenience of description, the scene in which the speech confidence is greater than or equal to the first confidence threshold and smaller than the second confidence threshold can be called the first scene, and the speech confidence is greater than or equal to the second confidence threshold, And the scene whose non-speech confidence degree is less than the third confidence degree threshold is called the second scene. In the first scenario, since the speech confidence is greater than or equal to the first confidence threshold and less than the second confidence threshold, it means that the speech contained in the historical sampling data is relatively vague, or the sound intensity is not high. Therefore, in this first In one scenario, the electronic device may recognize a non-wake-up word as a wake-up word, resulting in false wake-up. In order to solve this problem, in the embodiment of the present invention, the preset voice wake-up threshold may be increased in the first scenario, and the preset voice wake-up threshold after the increase processing may be determined as the target voice wake-up threshold.

在实际应用中,可以将预设语音唤醒阈值调高预设值,如将预设语音唤醒阈值提高0.1,也可以根据历史采样数据中所包含的语音的声音强度调高预设语音唤醒阈值,如声音强度越大,则调高的幅度越大。In practical applications, the preset voice wake-up threshold can be increased by a preset value, such as increasing the preset voice wake-up threshold by 0.1, or the preset voice wake-up threshold can be increased according to the sound intensity of the voice contained in the historical sampling data, Such as the louder the sound intensity, the greater the range of pitch.

另外,在第二场景下,由于语音置信度大于或等于第二置信度阈值,且非语音置信度小于第三置信度阈值,说明历史采样数据中包含的语音较为清晰或者声音强度较大,而包含的非语音则较为含糊或者声音强度较小,因此,在该第二场景下出现误唤醒率的概率不高。为了提高电子设备的唤醒成功率,以及节省电子设备的功耗,此时将不对预设语音唤醒阈值做任何处理,也即直接将预设语音唤醒阈值确定为目标语音唤醒阈值。In addition, in the second scenario, since the speech confidence is greater than or equal to the second confidence threshold, and the non-speech confidence is less than the third confidence threshold, it means that the speech contained in the historical sampling data is relatively clear or the sound intensity is relatively high, while The contained non-speech is relatively vague or the sound intensity is low. Therefore, the probability of false awakening rate in this second scenario is not high. In order to improve the wake-up success rate of the electronic device and save the power consumption of the electronic device, no processing is performed on the preset voice wake-up threshold at this time, that is, the preset voice wake-up threshold is directly determined as the target voice wake-up threshold.

当然,为了进一步降低误唤醒率,在具体的实现过程中,在第二场景下,也可以调高该预设语音唤醒阈值,但是预设语音唤醒阈值的调整幅度通常小于第一场景下预设语音唤醒阈值的调整幅度。Of course, in order to further reduce the false wake-up rate, in the specific implementation process, in the second scene, the preset voice wake-up threshold can also be increased, but the adjustment range of the preset voice wake-up threshold is usually smaller than the preset voice wake-up threshold in the first scene. The adjustment range of the voice wakeup threshold.

其中,第一置信度阈值、第二置信度阈值和第三置信度阈值的具体取值,可以根据实际情况或者经验进行设置,例如第一置信度阈值可以设置为0.5,第二置信度阈值可以设置为0.8,第三置信度阈值可以设置为0.4,对于各置信度阈值的具体取值,本发明实施例不做限制。Wherein, the specific values of the first confidence threshold, the second confidence threshold and the third confidence threshold can be set according to the actual situation or experience, for example, the first confidence threshold can be set to 0.5, and the second confidence threshold can be It is set to 0.8, and the third confidence threshold may be set to 0.4. The embodiment of the present invention does not limit the specific values of each confidence threshold.

在本实施例中,在语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值的情况下,将经过调高处理的预设语音唤醒阈值确定为目标语音唤醒阈值,由此可以减少误唤醒的情况,降低电子设备的误唤醒率。在语音置信度大于或等于第二置信度阈值,且非语音置信度小于第三置信度阈值的情况下,将预设语音唤醒阈值确定为目标语音唤醒阈值,不仅可以提高电子设备的唤醒成功率,而且由于电子设备不对预设语音唤醒阈值做调整,因而可以降低电子设备的功耗。In this embodiment, when the voice confidence is greater than or equal to the first confidence threshold and less than the second confidence threshold, the preset voice wake-up threshold that has been adjusted up is determined as the target voice wake-up threshold, thereby The situation of false wake-up can be reduced, and the false wake-up rate of electronic equipment can be reduced. When the voice confidence is greater than or equal to the second confidence threshold, and the non-voice confidence is less than the third confidence threshold, determining the preset voice wake-up threshold as the target voice wake-up threshold can not only improve the wake-up success rate of the electronic device , and since the electronic device does not adjust the preset voice wake-up threshold, the power consumption of the electronic device can be reduced.

步骤303:在识别结果信息表征历史采样数据中不包括语音的情况下,将经过调低处理的预设语音唤醒阈值确定为目标语音唤醒阈值。Step 303: In the case that the recognition result information indicates that the historical sampling data does not include voice, determine the preset voice wake-up threshold that has been lowered as the target voice wake-up threshold.

具体地,可以将历史采样数据中不包括语音的场景称为第三场景,其中,该第三场景又可以包括只包含有非语音的情况和静音的情况。应理解,不论在上述哪种情况中,由于历史采样数据中不包括语音,因此,电子设备将非唤醒词识别为唤醒词的概率很小,也即该第三场景下的误唤醒率较低。为了提高电子设备的唤醒成功率,在该第三场景下,可以调低预设语音唤醒阈值,也即将经过调低处理的预设语音唤醒阈值确定为目标语音唤醒阈值。Specifically, the scene that does not include speech in the historical sampling data may be referred to as a third scene, where the third scene may include only non-speech and mute. It should be understood that no matter in any of the above cases, since the historical sampling data does not include speech, the probability of the electronic device recognizing the non-wake-up word as the wake-up word is very small, that is, the false wake-up rate in the third scenario is low . In order to improve the wake-up success rate of the electronic device, in the third scenario, the preset voice wake-up threshold may be lowered, that is, the lowered preset voice wake-up threshold is determined as the target voice wake-up threshold.

在实际应用中,可以将预设语音唤醒阈值调低预设值,如将预设语音唤醒阈值调低0.1。在只包含有非语音的情况下,也可以根据历史采样数据中所包含的非语音的声音强度调低预设语音唤醒阈值,如声音强度越大,则调低的幅度越大。In practical applications, the preset voice wake-up threshold may be lowered by a preset value, for example, the preset voice wake-up threshold may be lowered by 0.1. In the case where only non-speech is included, the preset voice wake-up threshold may also be lowered according to the sound intensity of the non-speech included in the historical sampled data. For example, the greater the sound intensity, the greater the lowering range.

其中,对于历史采样数据中不包含语音的情况,也可以采用语音置信度的大小来确定。可选地,在语音置信度小于第一置信度阈值的情况下,识别结果信息表征历史采样数据中不包括语音。Wherein, for the case that the historical sampling data does not contain speech, it may also be determined by using the magnitude of the speech confidence. Optionally, when the speech confidence is less than the first confidence threshold, the recognition result information indicates that the historical sample data does not include speech.

另外,识别结果信息中还包括用于表征在历史采样数据中既不包含语音也不包含非语音的静音置信度,在静音置信度大于或等于第四置信度阈值的情况下,识别结果信息表征历史采样数据中不包括语音。In addition, the recognition result information also includes a confidence degree of silence used to indicate that neither speech nor non-speech is contained in the historical sampling data. When the silence confidence degree is greater than or equal to the fourth confidence threshold, the recognition result information represents Speech is not included in historical sample data.

应理解,在识别结果信息中包括语音置信度、非语音置信度以及静音置信度的情况下,这三者的和应该为1,因此,在静音置信度大于或等于第四置信度阈值的情况下,意味着语音置信度和非语音置信度都较低,此时,可以确定处于静音的场景。其中,第四置信度阈值与第三置信度阈值的具体取值可以相同,也可以不同。It should be understood that when the recognition result information includes speech confidence, non-speech confidence and silence confidence, the sum of these three should be 1, therefore, in the case where the silence confidence is greater than or equal to the fourth confidence threshold Below, it means that both the speech confidence level and the non-speech confidence level are low, at this time, it can be determined that the scene is in silence. Wherein, specific values of the fourth confidence threshold and the third confidence threshold may be the same or different.

为了提高静音场景确定的准确性,在具体的实现过程中,也可以是在静音置信度大于或等于第四置信度阈值、语音置信度和非语音置信度均小于第五置信度阈值的情况下,确定当前处于静音场景。In order to improve the accuracy of silent scene determination, in the specific implementation process, it may also be the case that the silence confidence is greater than or equal to the fourth confidence threshold, and the speech confidence and non-speech confidence are both less than the fifth confidence threshold , to make sure that it is currently in a silent scene.

在本实施例中,可以在语音置信度小于第一置信度阈值,或者在静音置信度大于或等于第四置信度阈值的情况下,确定历史采样数据中不包括语音,使得历史采样数据中是否包含语音的场景分类方式较简单。In this embodiment, it may be determined that no speech is included in the historical sampling data when the speech confidence is less than the first confidence threshold, or when the silence confidence is greater than or equal to the fourth confidence threshold, so that whether the historical sampling data Scenes that include speech are classified in an easier way.

需要进行说明的是,上述步骤302和步骤303中提及的第一场景、第二场景和第三场景,也可以通过预先训练的场景分类模型确定。具体地,可以将语音置信度、非语音置信度以及静音置信度输入场景分类模型中,可以得到第一场景对应的置信度、第二场景对应的置信度和第三场景对应的置信度,将这三个置信度中置信度最高的场景确定为最终的场景。其中,场景分类模型可以为DNN。It should be noted that the first scene, the second scene and the third scene mentioned in step 302 and step 303 above may also be determined by a pre-trained scene classification model. Specifically, the speech confidence, non-speech confidence and silence confidence can be input into the scene classification model, and the confidence corresponding to the first scene, the confidence corresponding to the second scene and the confidence corresponding to the third scene can be obtained, and the The scene with the highest confidence among the three confidence levels is determined as the final scene. Wherein, the scene classification model may be DNN.

该场景分类模型可以通过如下的方式训练得到:将语音识别模型输出的语音置信度、非语音置信度和静音置信度作为训练样本,并对每个训练样本进行标注,标注信息包括场景分类的结果。将训练样本输入初始场景分类模型中,得到每个训练样本对应的场景预测结果,将该场景预测结果和标注信息进行比对,从而得到损失信息。基于该损失信息,调整初始场景分类模型的模型参数,并不断重复上述过程,直至得到的模型收敛或者损失信息最小,将最后得到的模型确定为场景分类模型。The scene classification model can be trained in the following way: the speech confidence, non-speech confidence and silence confidence output by the speech recognition model are used as training samples, and each training sample is marked, and the marked information includes the result of scene classification . Input the training samples into the initial scene classification model, obtain the scene prediction result corresponding to each training sample, compare the scene prediction result with the label information, and obtain the loss information. Based on the loss information, the model parameters of the initial scene classification model are adjusted, and the above process is repeated until the obtained model converges or the loss information is minimized, and the finally obtained model is determined as the scene classification model.

步骤304:基于目标语音唤醒阈值,进行唤醒处理。Step 304: Perform wake-up processing based on the target voice wake-up threshold.

本发明实施例提供的语音唤醒方法,在识别结果信息表征历史采样数据中包括语音的情况下,将预设语音唤醒阈值确定为目标语音唤醒阈值,或者将经过调高处理的预设语音唤醒阈值确定为目标语音唤醒阈值,从而基于该目标语音唤醒阈值进行唤醒处理时,可以避免通过与唤醒词相似的词语误唤醒电子设备的现象,降低了电子设备的误唤醒率。另外,在识别结果信息表征历史采样数据中不包括语音的情况下,将经过调低处理的预设语音唤醒阈值确定为目标语音唤醒阈值,从而基于该目标语音唤醒阈值进行唤醒处理时,可以避免在说出唤醒词后无法唤醒电子设备的现象,提高了电子设备的唤醒率。In the voice wake-up method provided by the embodiment of the present invention, in the case that the recognition result information indicates that the historical sampling data includes voice, the preset voice wake-up threshold is determined as the target voice wake-up threshold, or the preset voice wake-up threshold that has been adjusted up Determined as the target voice wake-up threshold, so that when the wake-up process is performed based on the target voice wake-up threshold, the phenomenon of falsely waking up the electronic device through words similar to the wake-up word can be avoided, and the false wake-up rate of the electronic device is reduced. In addition, in the case that the recognition result information does not include voice in the history sampling data, the preset voice wake-up threshold that has been adjusted down is determined as the target voice wake-up threshold, so that when the wake-up process is performed based on the target voice wake-up threshold, it can avoid The phenomenon that the electronic device cannot be woken up after speaking the wake-up word improves the wake-up rate of the electronic device.

在上述任一实施例的基础上,为了保证目标语音唤醒阈值的有效性,在对拾音装置采集的历史采样数据进行识别时,可以是对以当前时刻为结束时刻的预设时间段内、拾音装置采集的历史采样数据进行识别。On the basis of any of the above-mentioned embodiments, in order to ensure the effectiveness of the target voice wake-up threshold, when identifying the historical sampling data collected by the sound pickup device, it may be within the preset time period with the current moment as the end moment, The historical sampling data collected by the sound pickup device is used for identification.

其中,预设时间段可以根据经验进行设置,例如可以设置为5s或8s等。该预设时间段设置的越小,电子设备语音唤醒的可靠性越高。Wherein, the preset time period can be set according to experience, for example, it can be set to 5s or 8s. The smaller the preset time period is set, the higher the reliability of voice wake-up of the electronic device.

在上述实施例的基础上,在基于目标语音唤醒阈值,进行唤醒处理时,可以获取拾音装置在当前时刻采集的当前采样数据,并获取当前采样数据包含预设唤醒词的唤醒置信度,以及当前采样数据对应的当前声音强度;在唤醒置信度大于或等于目标语音唤醒阈值、且当前声音强度大于目标声音强度阈值的情况下,执行唤醒操作。On the basis of the above-mentioned embodiments, when performing wake-up processing based on the target voice wake-up threshold, the current sampling data collected by the sound pickup device at the current moment can be obtained, and the wake-up confidence that the current sampling data contains a preset wake-up word can be obtained, and The current sound intensity corresponding to the current sampling data; when the wake-up confidence is greater than or equal to the target voice wake-up threshold, and the current sound intensity is greater than the target sound intensity threshold, perform a wake-up operation.

具体地,唤醒置信度可以用于表征当前采样数据中包含预设唤醒词的概率,唤醒置信度越高,则说明当前采样数据中包含预设唤醒词的概率越大。在电子设备确定出唤醒置信度大于或等于目标语音唤醒阈值的情况下,说明当前采样数据中可能包含有预设唤醒词。Specifically, the wake-up confidence can be used to represent the probability that the current sampled data contains the preset wake-up word, and the higher the wake-up confidence, the greater the probability that the current sampled data contains the preset wake-up word. If the electronic device determines that the wake-up confidence is greater than or equal to the target voice wake-up threshold, it indicates that the current sampled data may contain preset wake-up words.

进一步地,用户通常在使用预设唤醒词唤醒电子设备时,会通过相对较高的音量说出预设唤醒词,因此,在唤醒置信度大于或等于目标语音唤醒阈值的情况下,还可以判断当前声音强度是否大于目标声音强度阈值,若大于,则说明用户说了预设唤醒词,从而电子设备会执行唤醒操作,否则,电子设备将不会被唤醒。Furthermore, when the user usually uses the preset wake-up word to wake up the electronic device, he will speak the preset wake-up word at a relatively high volume. Therefore, when the wake-up confidence level is greater than or equal to the target voice wake-up threshold, it can also be judged Whether the current sound intensity is greater than the target sound intensity threshold. If it is greater, it means that the user has said a preset wake-up word, so that the electronic device will perform a wake-up operation; otherwise, the electronic device will not be woken up.

在上述方式中,在当前采样数据包含预设唤醒词的唤醒置信度大于或等于目标语音唤醒阈值、且当前采样数据对应的当前声音强度大于目标声音强度阈值的情况下,再执行唤醒操作,由于从不同的维度判断当前采样数据中是否包含预设唤醒词,从而可以进一步降低电子设备的误唤醒率。In the above method, when the wake-up confidence level of the current sampled data containing the preset wake-up word is greater than or equal to the target voice wake-up threshold, and the current sound intensity corresponding to the current sampled data is greater than the target sound intensity threshold, the wake-up operation is performed again, because It is judged from different dimensions whether the current sampling data contains preset wake-up words, so as to further reduce the false wake-up rate of electronic devices.

其中,上述的目标声音强度阈值可以是实时计算的,示例性的,可以通过获取历史采样数据对应的历史声音强度,并基于历史声音强度进行确定。Wherein, the above-mentioned target sound intensity threshold may be calculated in real time. Exemplarily, the historical sound intensity corresponding to the historical sampling data may be acquired and determined based on the historical sound intensity.

例如,可以将预设时间段内、历史采样数据对应的历史声音强度的平均值,确定为目标声音强度阈值,也可以将历史采样数据对应的历史声音强度中的最大值,确定为目标声音强度阈值。For example, the average value of the historical sound intensity corresponding to the historical sampling data within a preset period of time may be determined as the target sound intensity threshold, or the maximum value of the historical sound intensity corresponding to the historical sampling data may be determined as the target sound intensity threshold.

在上述实施例中,通过历史采样数据对应的历史声音强度,确定目标声音强度阈值的方式,使得目标声音强度阈值可以基于环境中的声音强度进行动态调整,从而可以进一步降低电子设备的误唤醒率。In the above embodiment, the target sound intensity threshold is determined through the historical sound intensity corresponding to the historical sampling data, so that the target sound intensity threshold can be dynamically adjusted based on the sound intensity in the environment, thereby further reducing the false wake-up rate of the electronic device .

下面对本发明提供的语音唤醒装置进行描述,下文描述的语音唤醒装置与上文描述的语音唤醒方法可相互对应参照。The voice wake-up device provided by the present invention is described below, and the voice wake-up device described below and the voice wake-up method described above can be referred to in correspondence.

图4为本发明实施例提供的语音唤醒装置的示意图,如图4所示,该装置包括:FIG. 4 is a schematic diagram of a voice wake-up device provided by an embodiment of the present invention. As shown in FIG. 4, the device includes:

识别模块11,用于对麦克风采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;The identification module 11 is used to identify the historical sampling data collected by the microphone, and obtain the recognition result information used to characterize whether speech is included in the historical sampling data;

确定模块12,用于基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;A determining module 12, configured to determine a target voice wake-up threshold corresponding to the recognition result information based on a preset voice wake-up threshold;

处理模块13,用于基于所述目标语音唤醒阈值,进行唤醒处理。The processing module 13 is configured to perform wake-up processing based on the target voice wake-up threshold.

本实施例的装置,可以用于执行前述电子设备侧方法实施例中任一实施例的方法,其具体实现过程与技术效果与电子设备侧方法实施例中类似,具体可以参见电子设备侧方法实施例中的详细介绍,此处不再赘述。The device in this embodiment can be used to execute the method in any of the aforementioned electronic device side method embodiments, and its specific implementation process and technical effect are similar to those in the electronic device side method embodiments. For details, please refer to the electronic device side method implementation The detailed introduction in the example will not be repeated here.

可选地,所述确定模块12,具体用于:Optionally, the determining module 12 is specifically configured to:

在所述识别结果信息表征所述历史采样数据中包括语音的情况下,将所述预设语音唤醒阈值确定为所述目标语音唤醒阈值,或者将经过调高处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;In the case where the recognition result information indicates that voice is included in the historical sampling data, determine the preset voice wake-up threshold as the target voice wake-up threshold, or set the preset voice wake-up threshold that has been adjusted up determined as the target voice wake-up threshold;

在所述识别结果信息表征所述历史采样数据中不包括语音的情况下,将经过调低处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值。In a case where the recognition result information indicates that no voice is included in the historical sampling data, the preset voice wake-up threshold that has been adjusted down is determined as the target voice wake-up threshold.

可选地,所述识别结果信息包括用于表征在所述历史采样数据中包含语音的语音置信度,和用于表征在所述历史采样数据中包含非语音的非语音置信度;Optionally, the recognition result information includes speech confidence used to indicate that speech is included in the historical sample data, and non-speech confidence used to indicate that non-speech is included in the historical sample data;

所述确定模块12,具体用于:The determination module 12 is specifically used for:

在所述语音置信度大于或等于第一置信度阈值,且小于第二置信度阈值的情况,将经过调高处理的所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;When the voice confidence is greater than or equal to the first confidence threshold and less than the second confidence threshold, the preset voice wake-up threshold that has been adjusted up is determined as the target voice wake-up threshold;

在所述语音置信度大于或等于所述第二置信度阈值,且所述非语音置信度小于第三置信度阈值的情况,将所述预设语音唤醒阈值确定为所述目标语音唤醒阈值;When the voice confidence is greater than or equal to the second confidence threshold, and the non-voice confidence is less than a third confidence threshold, determining the preset voice wake-up threshold as the target voice wake-up threshold;

其中,所述第一置信度阈值小于所述第二置信度阈值,所述第三置信度阈值小于所述第二置信度阈值。Wherein, the first confidence threshold is smaller than the second confidence threshold, and the third confidence threshold is smaller than the second confidence threshold.

可选地,在所述语音置信度小于所述第一置信度阈值的情况下,所述识别结果信息表征所述历史采样数据中不包括语音;以及Optionally, in a case where the speech confidence is less than the first confidence threshold, the recognition result information indicates that speech is not included in the historical sampling data; and

所述识别结果信息还包括用于表征在所述历史采样数据中既不包含语音也不包含非语音的静音置信度,在所述静音置信度大于或等于第四置信度阈值的情况下,所述识别结果信息表征所述历史采样数据中不包括语音。The recognition result information also includes a confidence degree of silence used to characterize that neither speech nor non-speech is contained in the historical sampling data, and when the confidence degree of silence is greater than or equal to a fourth confidence threshold, the The recognition result information indicates that speech is not included in the historical sampling data.

可选地,所述识别模块11,具体用于:Optionally, the identification module 11 is specifically used for:

将所述历史采样数据输入语音识别模型中,得到所述识别结果信息;Inputting the historical sampling data into the speech recognition model to obtain the recognition result information;

其中,所述语音识别模型为基于多个音频样本对初始语音识别模型进行训练得到的,所述多个音频样本中包括包含语音的音频样本、包含非语音的音频样本以及既不包含语音也不包含非语音的音频样本。Wherein, the speech recognition model is obtained by training the initial speech recognition model based on a plurality of audio samples, and the plurality of audio samples include audio samples containing speech, audio samples containing non-speech, and audio samples containing neither speech nor Contains audio samples that are not speech.

可选地,所述识别模块11,具体用于:Optionally, the identification module 11 is specifically used for:

对以当前时刻为结束时刻的预设时间段内、所述拾音装置采集的历史采样数据进行识别。Identify historical sampling data collected by the sound pickup device within a preset time period with the current moment as the end moment.

可选地,所述处理模块13,具体用于:Optionally, the processing module 13 is specifically configured to:

获取所述拾音装置在所述当前时刻采集的当前采样数据;Acquiring the current sampling data collected by the sound pickup device at the current moment;

获取所述当前采样数据包含预设唤醒词的唤醒置信度,以及所述当前采样数据对应的当前声音强度;Acquiring the wake-up confidence level of the preset wake-up word included in the current sampling data, and the current sound intensity corresponding to the current sampling data;

在所述唤醒置信度大于或等于所述目标语音唤醒阈值、且所述当前声音强度大于目标声音强度阈值的情况下,执行唤醒操作。When the wake-up confidence is greater than or equal to the target voice wake-up threshold and the current sound intensity is greater than the target sound intensity threshold, perform a wake-up operation.

可选地,该装置还包括获取模块;其中:Optionally, the device also includes an acquisition module; wherein:

获取模块,用于获取所述历史采样数据对应的历史声音强度;An acquisition module, configured to acquire the historical sound intensity corresponding to the historical sampling data;

确定模块12,用于基于所述历史声音强度,确定所述目标声音强度阈值。The determination module 12 is configured to determine the target sound intensity threshold based on the historical sound intensity.

本实施例的装置,可以用于执行前述电子设备侧方法实施例中任一实施例的方法,其具体实现过程与技术效果与电子设备侧方法实施例中类似,具体可以参见电子设备侧方法实施例中的详细介绍,此处不再赘述。The device in this embodiment can be used to execute the method in any of the aforementioned electronic device side method embodiments, and its specific implementation process and technical effect are similar to those in the electronic device side method embodiments. For details, please refer to the electronic device side method implementation The detailed introduction in the example will not be repeated here.

图5是本发明提供的一种电子设备的实体结构示意图,如图5所示,该电子设备可以包括:处理器(processor)510、通信接口(Communications Interface)520、存储器(memory)530和通信总线540,其中,处理器510,通信接口520,存储器530通过通信总线540完成相互间的通信。处理器510可以调用存储器530中的逻辑指令,以执行语音唤醒方法,该方法包括:对拾音装置采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;基于所述目标语音唤醒阈值,进行唤醒处理。FIG. 5 is a schematic diagram of the physical structure of an electronic device provided by the present invention. As shown in FIG. The bus 540 , wherein the processor 510 , the communication interface 520 , and the memory 530 communicate with each other through the communication bus 540 . The processor 510 can call the logic instructions in the memory 530 to execute the voice wake-up method. The method includes: identifying the historical sampling data collected by the sound pickup device, and obtaining a recognition result for characterizing whether voice is included in the historical sampling data information; determine a target voice wake-up threshold corresponding to the recognition result information based on a preset voice wake-up threshold; perform wake-up processing based on the target voice wake-up threshold.

此外,上述的存储器530中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above logic instructions in the memory 530 may be implemented in the form of software function units and be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .

另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的语音唤醒方法,该方法包括:对拾音装置采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;基于所述目标语音唤醒阈值,进行唤醒处理。On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Executing the voice wake-up method provided by each of the above methods, the method includes: identifying the historical sampling data collected by the sound pickup device, and obtaining the recognition result information used to characterize whether voice is included in the historical sampling data; waking up based on the preset voice A threshold, determining a target voice wake-up threshold corresponding to the recognition result information; performing wake-up processing based on the target voice wake-up threshold.

又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的语音唤醒方法,该方法包括:对拾音装置采集的历史采样数据进行识别,获得用于表征所述历史采样数据中是否包括语音的识别结果信息;基于预设语音唤醒阈值,确定所述识别结果信息对应的目标语音唤醒阈值;基于所述目标语音唤醒阈值,进行唤醒处理。In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to perform the voice wake-up method provided by the above methods, the method comprising: Recognizing the historical sampling data collected by the sound pickup device, and obtaining recognition result information representing whether voice is included in the historical sampling data; based on a preset voice wake-up threshold, determining a target voice wake-up threshold corresponding to the recognition result information; Perform wake-up processing based on the target voice wake-up threshold.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the implementations, those skilled in the art can clearly understand that each implementation can be implemented by means of software plus a necessary general hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims (11)

1. A voice wake-up method, comprising:
identifying historical sampling data collected by a pickup device to obtain identification result information for representing whether the historical sampling data comprises voice or not;
determining a target voice awakening threshold corresponding to the recognition result information based on a preset voice awakening threshold;
and performing awakening processing based on the target voice awakening threshold value.
2. The voice wakeup method according to claim 1, wherein the determining the target voice wakeup threshold corresponding to the recognition result information based on a preset voice wakeup threshold includes:
determining the preset voice awakening threshold as the target voice awakening threshold or determining the preset voice awakening threshold subjected to heightening as the target voice awakening threshold under the condition that the identification result information represents that the historical sampling data comprises voice;
and under the condition that the identification result information represents that no voice is included in the historical sampling data, determining the preset voice awakening threshold subjected to the lowering processing as the target voice awakening threshold.
3. The voice wake-up method according to claim 2, wherein the recognition result information includes a voice confidence level for characterizing voice contained in the historical sample data, and a non-voice confidence level for characterizing non-voice contained in the historical sample data;
the determining, when the recognition result information indicates that the historical sample data includes a voice, the preset voice wake-up threshold as the target voice wake-up threshold, or determining the preset voice wake-up threshold that is subjected to the increase processing as the target voice wake-up threshold includes:
determining the preset voice awakening threshold subjected to heightening processing as the target voice awakening threshold when the voice confidence is greater than or equal to a first confidence threshold and smaller than a second confidence threshold;
determining the preset voice awakening threshold as the target voice awakening threshold under the condition that the voice confidence is greater than or equal to the second confidence threshold and the non-voice confidence is less than a third confidence threshold;
wherein the first confidence threshold is less than the second confidence threshold and the third confidence threshold is less than the second confidence threshold.
4. A voice wake-up method according to claim 3, characterised in that:
in the case that the speech confidence is smaller than the first confidence threshold, the recognition result information represents that speech is not included in the historical sample data; and
the recognition result information further includes a silence confidence level for characterizing that neither speech nor non-speech is included in the historical sample data, and the recognition result information characterizes that no speech is included in the historical sample data if the silence confidence level is greater than or equal to a fourth confidence level threshold.
5. The voice wake-up method according to claim 3 or 4, wherein the recognizing the historical sample data collected by the microphone and obtaining the recognition result information for characterizing whether the historical sample data includes the voice comprises:
inputting the historical sampling data into a voice recognition model to obtain the recognition result information;
the voice recognition model is obtained by training an initial voice recognition model based on a plurality of audio samples, wherein the plurality of audio samples comprise voice-containing audio samples, non-voice-containing audio samples and audio samples which do not contain voice or non-voice.
6. The voice wake-up method according to any one of claims 1 to 4, wherein the identifying historical sample data collected by a sound pickup device comprises:
and identifying historical sampling data collected by the sound pickup device within a preset time period taking the current moment as an ending moment.
7. The voice wake-up method according to claim 6, wherein the performing wake-up processing based on the target voice wake-up threshold comprises:
acquiring current sampling data acquired by the pickup device at the current moment;
acquiring a wakeup confidence coefficient that the current sampling data contain a preset wakeup word and current sound intensity corresponding to the current sampling data;
and executing a wake-up operation under the condition that the wake-up confidence is greater than or equal to the target voice wake-up threshold and the current sound intensity is greater than a target sound intensity threshold.
8. The voice wake-up method according to claim 7, characterized in that the method further comprises:
acquiring historical sound intensity corresponding to the historical sampling data;
determining the target sound intensity threshold based on the historical sound intensity.
9. A voice wake-up apparatus, comprising:
the identification module is used for identifying historical sampling data acquired by a microphone and acquiring identification result information for representing whether the historical sampling data comprises voice;
the determining module is used for determining a target voice awakening threshold corresponding to the recognition result information based on a preset voice awakening threshold;
and the processing module is used for performing awakening processing based on the target voice awakening threshold value.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the voice wake-up method according to any of claims 1 to 8 when executing the program.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the voice wake-up method according to any one of claims 1 to 8.
CN202210891087.0A 2022-07-27 2022-07-27 Voice wake-up method, device, device and storage medium Pending CN115472161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210891087.0A CN115472161A (en) 2022-07-27 2022-07-27 Voice wake-up method, device, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210891087.0A CN115472161A (en) 2022-07-27 2022-07-27 Voice wake-up method, device, device and storage medium

Publications (1)

Publication Number Publication Date
CN115472161A true CN115472161A (en) 2022-12-13

Family

ID=84368000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210891087.0A Pending CN115472161A (en) 2022-07-27 2022-07-27 Voice wake-up method, device, device and storage medium

Country Status (1)

Country Link
CN (1) CN115472161A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047487A (en) * 2019-06-05 2019-07-23 广州小鹏汽车科技有限公司 Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111128155A (en) * 2019-12-05 2020-05-08 珠海格力电器股份有限公司 Awakening method, device, equipment and medium for intelligent equipment
CN111554288A (en) * 2020-04-27 2020-08-18 北京猎户星空科技有限公司 Awakening method and device of intelligent device, electronic device and medium
CN111816178A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment control method, device and equipment
CN111968644A (en) * 2020-08-31 2020-11-20 深圳市欧瑞博科技股份有限公司 Intelligent device awakening method and device and electronic device
CN112102821A (en) * 2019-06-18 2020-12-18 北京京东尚科信息技术有限公司 Data processing method, device, system and medium applied to electronic equipment
CN112509596A (en) * 2020-11-19 2021-03-16 北京小米移动软件有限公司 Wake-up control method and device, storage medium and terminal
WO2021147018A1 (en) * 2020-01-22 2021-07-29 Qualcomm Incorporated Electronic device activation based on ambient noise

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047487A (en) * 2019-06-05 2019-07-23 广州小鹏汽车科技有限公司 Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment
CN112102821A (en) * 2019-06-18 2020-12-18 北京京东尚科信息技术有限公司 Data processing method, device, system and medium applied to electronic equipment
CN111128155A (en) * 2019-12-05 2020-05-08 珠海格力电器股份有限公司 Awakening method, device, equipment and medium for intelligent equipment
WO2021147018A1 (en) * 2020-01-22 2021-07-29 Qualcomm Incorporated Electronic device activation based on ambient noise
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111554288A (en) * 2020-04-27 2020-08-18 北京猎户星空科技有限公司 Awakening method and device of intelligent device, electronic device and medium
CN111816178A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment control method, device and equipment
CN111968644A (en) * 2020-08-31 2020-11-20 深圳市欧瑞博科技股份有限公司 Intelligent device awakening method and device and electronic device
CN112509596A (en) * 2020-11-19 2021-03-16 北京小米移动软件有限公司 Wake-up control method and device, storage medium and terminal

Similar Documents

Publication Publication Date Title
CN110428810B (en) Voice wake-up recognition method and device and electronic equipment
CN103971680B (en) A kind of method, apparatus of speech recognition
US9805715B2 (en) Method and system for recognizing speech commands using background and foreground acoustic models
KR102451034B1 (en) Speaker diarization
CN103065631B (en) A kind of method of speech recognition, device
US20110066433A1 (en) System and method for personalization of acoustic models for automatic speech recognition
CN111445899B (en) Speech emotion recognition method, device and storage medium
EP3989217A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN111145763A (en) GRU-based voice recognition method and system in audio
CN104766608A (en) Voice control method and voice control device
CN112581960A (en) Voice wake-up method and device, electronic equipment and readable storage medium
CN112802498A (en) Voice detection method and device, computer equipment and storage medium
CN110895941A (en) Voiceprint recognition method and device and storage device
KR102365611B1 (en) Meeting management system using automatic speech recognition(ASR)
CN112289311A (en) Voice wake-up method and device, electronic equipment and storage medium
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN112885341A (en) Voice wake-up method and device, electronic equipment and storage medium
CN115472161A (en) Voice wake-up method, device, device and storage medium
CN116364107A (en) Voice signal detection method, device, equipment and storage medium
CN112786047B (en) Voice processing method, device, equipment, storage medium and intelligent sound box
CN115547345A (en) Voiceprint recognition model training and related recognition method, electronic device and storage medium
CN111970311B (en) Session segmentation method, electronic device and computer readable medium
CN111785277A (en) Speech recognition method, speech recognition device, computer-readable storage medium and processor
CN114387968A (en) Voice unlocking method and device, electronic equipment and storage medium
CN114512128A (en) Speech recognition method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination