CN113259793A

CN113259793A - Intelligent microphone and signal processing method thereof

Info

Publication number: CN113259793A
Application number: CN202010082783.8A
Authority: CN
Inventors: 张钟宣; 顾渝骢; 傅仁杰
Original assignee: Hangzhou Zhixinke Microelectronics Technology Co ltd
Current assignee: Hangzhou Zhixinke Microelectronics Technology Co ltd
Priority date: 2020-02-07
Filing date: 2020-02-07
Publication date: 2021-08-13
Anticipated expiration: 2040-02-07
Also published as: CN113259793B

Abstract

The invention relates to an intelligent microphone and a signal processing method thereof, wherein the intelligent microphone comprises a sound sensor and an AI special sound processor, the sound sensor collects sound signals and converts the sound signals into audio signals, the AI special sound processor identifies and processes the audio signals, extracts audio characteristics from the audio signals and judges whether to output a control signal according to the audio characteristics, the control signal is used for waking up a back-end processor, the back-end processor can respond to the sound signals collected by the intelligent microphone after waking up, the intelligent microphone is arranged in a semiconductor packaging body, and the sound sensor and the AI special sound processor are arranged in a bare chip of the semiconductor packaging body; in this scheme, intelligent microphone itself sets up special sound processor of AI and carries out identification process to sound signal, and the back end treater need not to awaken up identification process to sound signal, has reduced the consumption of back end treater.

Description

Intelligent microphone and signal processing method thereof

Technical Field

The invention relates to the technical field of audio signal processing, in particular to an intelligent microphone and a signal processing method thereof.

Background

At present, a microphone on the market is generally used as an interface for collecting voice information, converts the collected voice information into an electric signal, and sends the electric signal to a processor at the rear end for data processing.

The requirement for voice control technology is higher and higher nowadays, and real-time response needs to be performed on voice signals received by a microphone, and for this purpose, the microphone needs to be in a wake-up state continuously, so that the power consumption of the microphone is increased. The microphone configured on most of the existing terminals has the problem of high power consumption, and a few terminals are configured with a voice awakening function, a processor of the terminal receives voice through the microphone, processes a voice signal through a voice awakening algorithm and further operates and awakens the terminal; the current of the processor of the terminal is large when the processor of the terminal executes the voice wake-up algorithm, and the power consumption of the processor of the terminal is increased.

Disclosure of Invention

Therefore, it is necessary to provide an intelligent microphone and a signal processing method thereof for solving the problems that the power consumption of the conventional microphone is high when the microphone is continuously in the wake-up state and the power consumption of the voice wake-up algorithm executed by the processor of the terminal is high.

An intelligent microphone includes a sound sensor and an AI-dedicated sound processor connected to each other;

the sound sensor is used for collecting sound signals, converting the sound signals into audio signals and transmitting the audio signals to the AI special sound processor;

the AI special sound processor is used for receiving the audio signal, extracting audio characteristics from the audio signal and judging whether to output a control signal according to the audio characteristics, wherein the control signal is used for waking up the back-end processor, and the back-end processor is used for responding to the sound signal collected by the intelligent microphone;

the smart microphone is provided in a semiconductor package, and the sound sensor and the AI-specific sound processor are provided in a die of the semiconductor package.

According to the intelligent microphone, the intelligent microphone comprises the sound sensor and the AI special sound processor which are connected with each other, the sound sensor can sense and collect a sound signal and convert the sound signal into an audio signal, the AI special sound processor adopts an AI intelligent identification technology to identify and process the audio signal, extracts audio characteristics from the audio signal and judges whether the audio characteristics meet preset requirements or not so as to determine whether a control signal is output or not according to a judgment result, the control signal is used for awakening the back-end processor, and the back-end processor can respond to the sound signal collected by the intelligent microphone after awakening; in the scheme, the intelligent microphone is provided with an AI special sound processor for identifying and processing the sound signal, the back-end processor outputs a control signal to wake up the back-end processor, the back-end processor responds to the sound signal, the back-end processor does not need to wake up and identify and process the sound signal, the power consumption of the back-end processor is reduced, the AI special sound processor is used for waking up the sound, the power consumption of the back-end processor for executing a voice wake-up algorithm is higher than that of the back-end processor with a complex structure, the intelligent microphone is in a low power consumption state when no sound signal exists, the AI special sound processor tracks the received sound signal with smaller power, the back-end processor can be in a dormant state, when the sound signal exists and the output control signal is judged according to the audio characteristics, the back-end processor can be woken up to enable the intelligent microphone to be in a wake-up state, compared with the traditional microphone which is continuously in a wake-up state, the power consumption is reduced; in addition, the smart microphone is arranged in the semiconductor package, the sound sensor and the AI-dedicated sound processor are arranged in a bare chip of the semiconductor package, and the smart microphone can be integrated and miniaturized through the bare chip and the semiconductor package, so that the smart microphone can be conveniently applied to different scenes.

In one embodiment, the smart microphone further includes an audio processor connected between the sound sensor and the AI-specific sound processor, the audio processor being disposed in a die of the semiconductor package;

the sound sensor is used for converting the sound signal into an electric signal and transmitting the electric signal to the audio processor;

the audio processor is configured to receive the electrical signal, convert the electrical signal into an audio signal, and transmit the audio signal to the AI-specific sound processor.

In one embodiment, the AI-specific sound processor includes a neural network, and determines the audio characteristics via the neural network, and outputs a control signal if the audio characteristics match a predetermined wake-up characteristic.

In one embodiment, the AI-specific sound processor further includes a digital IO interface, and obtains the neural network data through the digital IO interface, where the neural network data is obtained through machine learning and reorganization.

In one embodiment, the AI-specific sound processor comprises at least one of a bone conduction recognition module, a voiceprint recognition module, a keyword recognition module, a command word recognition module;

the bone conduction identification module is used for identifying bone conduction voiceprints in the audio features through a neural network, and outputting control signals corresponding to the bone conduction voiceprints if the bone conduction voiceprints are matched with preset bone conduction voiceprints;

or the voiceprint recognition module is used for recognizing the voiceprint in the audio features through the neural network, and outputting a control signal if the voiceprint is matched with the preset voiceprint;

or the keyword identification module is used for identifying keywords in the audio features through a neural network, and outputting a control signal corresponding to the keywords if the keywords are matched with preset keywords;

or the command word recognition module is used for recognizing the command words in the audio features through the neural network, and outputting control signals corresponding to the command words if the command words are matched with preset command words.

In one embodiment, an audio processor includes an audio amplifier and an analog-to-digital converter;

the audio amplifier is used for carrying out analog amplification on the sound signal to obtain an analog audio signal;

the analog-to-digital converter is used for performing analog-to-digital conversion on the analog audio signal to obtain a digital audio signal, and transmitting the digital audio signal to the AI special sound processor.

In one embodiment, the AI-specific sound processor further comprises a speech detection module;

if the voice detection module is an analog voice detection module, the analog voice detection module is used for extracting a first audio characteristic from an analog audio signal output by the audio amplifier and transmitting the first audio characteristic to the neural network;

if the voice detection module is a digital voice detection module, the digital voice detection module is used for extracting a second audio characteristic from a digital audio signal output by the analog-to-digital converter and transmitting the second audio characteristic to the neural network;

if the voice detection module is a mixed voice detection module, the mixed voice detection module is used for extracting a third audio feature from the analog audio signal output by the audio amplifier and the digital audio signal output by the analog-to-digital converter and transmitting the third audio feature to the neural network.

In one embodiment, the semiconductor package includes a first die, a second die, and a third die, the sound sensor being disposed in the first die, the audio amplifier and the analog-to-digital converter being disposed in the second die, the AI-specific sound processor being disposed in the third die, the first die, the second die, and the third die being connected in sequence;

in one embodiment, the semiconductor package includes a first die, a fourth die, and a fifth die, the sound sensor being disposed in the first die, the audio amplifier being disposed in the fourth die, the analog-to-digital converter and the AI-specific sound processor being disposed in the fifth die, the first die, the fourth die, and the fifth die being connected in sequence;

in one embodiment, the semiconductor package includes a first die and a sixth die, the sound sensor is disposed in the first die, the audio processor and the AI-specific sound processor are disposed in the sixth die, and the first die and the sixth die are connected to each other.

In one embodiment, the smart microphone further comprises a wireless transmitter connected to the AI-specific sound processor and disposed in the same die; or the wireless transmitter and the AI special sound processor are respectively arranged in different bare chips, and the two bare chips where the wireless transmitter and the AI special sound processor are arranged are mutually connected;

the wireless transmitter is used for sending out the control signal in a wireless mode.

In one embodiment, the smart microphone further comprises a voice-to-digital interface;

the voice digital interface is connected with the output end of the analog-to-digital converter and used for outputting a digital audio signal to the back-end processor;

or the voice digital interface is connected with the output end of the automatic gain controller and used for outputting the digital audio signal to the back-end processor, wherein the automatic gain controller is connected with the analog-to-digital converter.

In one embodiment, the smart microphone further comprises a clock management circuit coupled to the AI-specific sound processor, the clock management circuit comprising a crystal interface for receiving an external clock signal.

A signal processing method applying the intelligent microphone comprises the following steps:

collecting a sound signal through a sound sensor, converting the sound signal into an audio signal, and transmitting the audio signal to an AI special sound processor;

the audio signal is received through the AI special sound processor, the audio characteristics are extracted from the audio signal, and whether a control signal is output or not is judged according to the audio characteristics, wherein the control signal is used for waking up the back-end processor, and the back-end processor is used for responding to the sound signal collected by the intelligent microphone.

According to the signal processing method using the intelligent microphone, the sound sensor is used for sensing and collecting the sound signal and converting the sound signal into the audio signal, the AI special sound processor is used for identifying and processing the audio signal, extracting the audio characteristic from the audio signal and judging whether the audio characteristic meets the preset requirement or not, and whether a control signal is output or not is determined according to the judgment result, wherein the control signal is used for awakening the back-end processor, and the back-end processor can respond to the sound signal collected by the intelligent microphone after being awakened; in the scheme, the back-end processor does not need to wake up the sound signal for identification, the power consumption of the back-end processor is reduced, the effect of the special AI sound processor is specific to sound wake-up, the power consumption is not high when the back-end processor with a complex structure executes a voice wake-up algorithm, when no sound signal exists, the intelligent microphone is in a low power consumption state, the special AI sound processor tracks the received sound signal with smaller power, the back-end processor can be in a dormant state, when the sound signal exists, and the back-end processor can be awakened when judging to output a control signal according to audio characteristics, so that the intelligent microphone enters the wake-up state, compared with the traditional microphone which is continuously in the wake-up state, the power consumption is reduced.

In one embodiment, the signal processing method further comprises the steps of:

after the rear-end processor is awakened through the control signal, a voice signal is collected through the sound sensor, converted into a voice audio signal and transmitted to the AI special sound processor;

the AI special sound processor receives the voice audio signal, extracts the voice characteristics from the voice audio signal and judges whether to output an instruction signal according to the voice characteristics, wherein the instruction signal is used for instructing the back-end processor to execute corresponding operation.

Drawings

FIG. 1 is a schematic diagram of a smart microphone in one embodiment;

fig. 2 is a schematic structural diagram of a smart microphone in another embodiment;

FIG. 3 is a schematic structural diagram of a smart microphone in yet another embodiment;

FIG. 4 is a schematic diagram of an audio processor in a smart microphone in one embodiment;

FIG. 5 is a schematic diagram of an audio processor in a smart microphone in another embodiment;

FIG. 6 is a schematic diagram of an audio processor in a smart microphone in a further embodiment;

fig. 7-9 are schematic diagrams illustrating the connection of the speech detection module in the AI-specific sound processor of the smart microphone, in one embodiment;

FIGS. 10-12 are schematic diagrams in die form of a sound sensor, audio amplifier, analog-to-digital converter, and AI-specific sound processor in one embodiment;

FIGS. 13-14 are schematic diagrams of die attach of a wireless transmitter in one embodiment;

FIGS. 15-18 are schematic diagrams of the connection of an infrared remote control transmitter in one embodiment;

FIGS. 19-20 are schematic diagrams of the connection of the voice-to-digital interface in one embodiment;

FIG. 21 is a schematic diagram illustrating the structural connections of a clock management circuit of a smart microphone in one embodiment;

fig. 22 is a flow diagram of a signal processing method for a smart microphone in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that the terms "first \ second \ third" related to the embodiments of the present invention only distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that the terms first, second, and third, as used herein, are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or otherwise described herein.

The application provides an intelligent microphone can be applied to various intelligent device terminals for audio control terminal. Smart devices with smart microphones may perform various commands and operations through voice control.

Referring to fig. 1, a schematic structural diagram of an intelligent microphone according to an embodiment is shown. The smart microphone in this embodiment includes the sound sensor 100 and the AI-dedicated sound processor 200 connected to each other;

a sound sensor (mic sensor)100 for collecting a sound signal, converting the sound signal into an audio signal, and transmitting the audio signal to the AI-dedicated sound processor 200;

an AI special sound processor (AI voice processor)200 for receiving the audio signal, extracting audio features from the audio signal, and determining whether to output a control signal according to the audio features, the control signal being used to wake up a back-end processor, the back-end processor being used to respond to the sound signal collected by the smart microphone;

the smart microphone is provided in a semiconductor package, and the sound sensor 100 and the AI-dedicated sound processor 200 are provided in a die of the semiconductor package.

In this embodiment, the smart microphone includes the sound sensor 100 and the AI-dedicated sound processor 200 that are connected to each other, the sound sensor 100 can sense and collect a sound signal and convert the sound signal into an audio signal, the AI-dedicated sound processor 200 identifies and processes the audio signal by using an AI intelligent identification technology, extracts an audio feature from the audio signal, and determines whether the audio feature meets a preset requirement, so as to determine whether to output a control signal according to a determination result, where the control signal is used to wake up the back-end processor, and the back-end processor can respond to the sound signal collected by the smart microphone after waking up; in the scheme, the intelligent microphone is provided with the AI special sound processor 200 for identifying and processing the sound signal, the output control signal wakes up the back-end processor, the back-end processor responds to the sound signal, the back-end processor does not need to wake up and identify and process the sound signal, the power consumption of the back-end processor is reduced, the AI special sound processor 200 is used for waking up the sound, the power consumption is higher when the back-end processor with a complex structure executes a voice wake-up algorithm, the intelligent microphone is in a low power consumption state when no sound signal exists, the AI special sound processor 200 tracks the received sound signal with smaller power, the back-end processor can be in a dormant state, when the sound signal exists and the output control signal is judged according to the audio characteristics, the back-end processor can be woken up to enable the intelligent microphone to enter the wake-up state, compared with the traditional microphone which is continuously in the awakening state, the power consumption is reduced; in addition, the smart microphone is provided in a semiconductor package, and the sound sensor 100 and the AI-specific sound processor 200 are provided in a die of the semiconductor package, by which the smart microphone can be integrated and miniaturized, facilitating its application in different scenes.

Further, the smart microphone may be separated from the back-end processor, and a wireless transmission module may be integrated in the AI-dedicated sound processor 200 to transmit the control signal to the back-end processor in a wireless transmission manner.

Further, the sound sensor 100 may detect sound in different manners by using a sound sensing manner, a bone conduction manner, or a combination of the sound sensing manner and the bone conduction manner, and the audio signal may be a bone conduction signal.

In one embodiment, as shown in fig. 2, the smart microphone further includes an audio processor 300 connected between the sound sensor 100 and the AI-specific sound processor 200, the audio processor 200 being provided in a die of a semiconductor package;

the sound sensor 100 is configured to convert a sound signal into an electrical signal and transmit the electrical signal to the audio processor 300;

the audio processor 300 is configured to receive the electrical signal, convert the electrical signal into an audio signal, and transmit the audio signal to the AI-specific sound processor 200.

In the present embodiment, an audio processor 300 is provided between the sound sensor 100 and the AI-dedicated sound processor 200; the audio processor 300 may perform a preliminary process on the electrical signal converted by the sound sensor 100 to convert the electrical signal into an audio signal for easy recognition and processing.

Further, the sound sensor 100 may be connected to both the AI-dedicated sound processor 200 and the audio processor 300, so that the sound sensor 100 may transmit a signal directly to the AI-dedicated sound processor 200 when the audio processor 300 malfunctions.

In one embodiment, the AI-specific sound processor 200 includes a neural network through which the audio characteristics are determined and a control signal is output if the audio characteristics match a predetermined wake-up characteristic.

In this embodiment, the AI dedicated sound processor 200 stores a neural network, the neural network is a nonlinear complex network system formed by connecting a large number of processing units similar to neurons, and it completes the information processing function like human brain by simulating the processing and information memorizing of the brain neural network, and is a nonlinear parallel structure, when the neural network receives the audio features, it can make a quick and accurate judgment and identification, the AI dedicated sound processor 200 stores a preset wake-up feature, and the judgment is made by the analysis of the neural network, whether the audio features are consistent with the preset wake-up feature, if so, the control signal is output, the neural network can reduce the possibility of error, improve the accuracy of audio feature identification, and the power consumption generated by the working mode of the neural network is low.

Further, as shown in fig. 3, the AI-dedicated sound processor 200 further includes a digital IO interface 210, and obtains the neural network data through the digital IO interface, where the neural network data is obtained through machine learning and reorganization. Through machine learning and training, the audio features can be stored in the weights of the connection nodes of the neural network in a distributed manner, so that the neural network can accurately identify the audio features.

Further, the AI-dedicated sound processor 200 may obtain the neural network data from the cloud server through the digital IO interface, and the process of machine learning and reconstructing the neural network data may be implemented in the cloud server.

Further, the AI dedicated sound processor may include at least one of a bone conduction recognition module, a voiceprint recognition (voice print) module, a keyword recognition (keyword recognition) module, and a command word recognition (command recognition) module, and after receiving the audio feature, the AI dedicated sound processor may recognize the bone conduction voiceprint in the audio feature through a neural network, compare the bone conduction voiceprint in the audio feature with preset bone conduction voiceprint information, and if the two are matched, indicate that the current bone conduction signal is sent by a valid user, may output a control signal or further identify the audio feature; the voiceprint recognition module can recognize voiceprint information in the audio features through a neural network, compare the voiceprint information in the audio features with preset voiceprint information, if the voiceprint information in the audio features is matched with the preset voiceprint information, the current sound is sent by a legal user, a control signal can be output or the audio features can be further recognized, and if the voiceprint information in the audio features is not matched with the preset voiceprint information, the voiceprint recognition module does not respond to the audio features; the keyword identification module can identify keywords in the audio features through a neural network, compare the keywords in the audio features with preset keywords, and if the keywords are matched with the preset keywords, output a control signal corresponding to the keywords, specifically, the control signal corresponding to the keywords can be a signal for waking up or sleeping a back-end processor; the command word recognition module may recognize a command word in the audio feature through the neural network, compare the command word in the audio feature with a preset command word, and if the command word and the preset command word are matched, may output a control signal corresponding to the command word, specifically, the control signal corresponding to the command word may be a signal for performing specific function control on the back-end processor, such as "play music", "turn up volume", "turn down volume", "next", "previous", and the like; the AI-dedicated sound processor may include a voiceprint recognition module and a keyword recognition module, perform keyword recognition after confirming that the user is legitimate through voiceprint recognition, and output a control signal after recognizing the keyword, or the AI-dedicated sound processor may include a voiceprint recognition module and a command word recognition module, perform command word recognition after confirming that the user is legitimate through voiceprint recognition, and output a control signal after recognizing the command word, or the AI-dedicated sound processor may include a keyword recognition module and a command word recognition module, recognize the keyword and the command word at the same time, and output a control signal after recognizing the keyword or the command word, or the AI-dedicated sound processor may include a voiceprint recognition module, a keyword recognition module and a command word recognition module, perform keyword and command word recognition after confirming that the user is legitimate through voiceprint recognition, the control signal is output after the keyword or command word is recognized.

In one embodiment, as shown in FIG. 4, audio processor 300 includes an audio amplifier 310 and an analog-to-digital converter 320;

the audio amplifier 310 is configured to perform analog amplification on the sound signal to obtain an analog audio signal;

the analog-to-digital converter 320 is configured to perform analog-to-digital conversion on the analog audio signal, obtain a digital audio signal, and transmit the digital audio signal to the AI-dedicated sound processor 200.

In this embodiment, the audio processor 300 mainly includes two components, the audio amplifier 310 may perform analog amplification on a sound signal to obtain an analog audio signal; the acquired sound signal is weak in strength, and is subjected to analog amplification, so that subsequent identification is facilitated; the analog-to-digital converter 320 may perform analog-to-digital conversion on the analog audio signal to obtain a digital audio signal, which is a binary code and is convenient for storing, processing and exchanging.

In one embodiment, as shown in fig. 5, audio processor 300 includes an audio amplifier 310, an analog-to-digital converter 320, and an automatic gain controller 330;

an audio amplifier (audio amplifier)310 is configured to perform analog amplification on the sound signal to obtain an analog audio signal;

an analog-to-digital converter (ADC) 320 is configured to perform analog-to-digital conversion on the analog audio signal to obtain a digital audio signal;

an Automatic Gain Controller (AGC) 330 for adjusting the gain amplitude of the audio amplifier 310 according to the intensity of the analog audio signal and transmitting the gain-amplified analog audio signal to the AI-dedicated sound processor 200; alternatively, the gain amplitude of the audio amplifier 310 is adjusted according to the intensity of the digital audio signal, and the gain-amplified digital audio signal is transmitted to the AI-dedicated sound processor 200.

In this embodiment, the audio processor 300 mainly includes three components, the audio amplifier 310 may perform analog amplification on a sound signal to obtain an analog audio signal; the acquired sound signal is weak in strength, and is subjected to analog amplification, so that subsequent identification is facilitated; the analog-to-digital converter 320 may perform analog-to-digital conversion on the analog audio signal to obtain a digital audio signal, where the digital audio signal is a binary code, and is convenient for storage, processing and exchange; the automatic gain controller 330 may adjust the gain amplitude of the audio amplifier 310 according to the intensity of the analog audio signal or the intensity of the digital audio signal, thereby achieving smooth adjustment of the amplitude of the audio signal, preventing the audio signal from fluctuating by a large margin, and stably transmitting the analog audio signal or the digital audio signal to the AI-dedicated sound processor 200.

Further, the automatic gain controller 330 may be replaced with a fixed gain controller, and the automatic gain controller 330 or the fixed gain controller may be integrated in the audio amplifier 310 or the analog-to-digital converter 320 by amplifying the analog audio signal or the digital audio signal with a fixed gain.

In one embodiment, as shown in fig. 6, the audio processor 300 further comprises a charge pump 340 connected to the audio amplifier 310, the charge pump 340 being configured to increase the voltage input to the audio amplifier 310 by the sound sensor 100.

In one embodiment, as shown in fig. 7-9, the AI-specific sound processor 200 also includes a Voice Activity Detection (VAD);

if the voice detection module is the analog voice detection module 220, the analog voice detection module 220 is configured to extract a first audio feature from the analog audio signal output by the audio amplifier 310, and transmit the first audio feature to the neural network;

if the voice detection module is the digital voice detection module 230, the digital voice detection module 230 is configured to extract a second audio feature from the digital audio signal output by the analog-to-digital converter 320, and transmit the second audio feature to the neural network;

if the voice detection module is the mixed voice detection module 240, the mixed voice detection module 240 is configured to extract a third audio feature from the analog audio signal output by the audio amplifier 310 and the digital audio signal output by the analog-to-digital converter 320, and transmit the third audio feature to the neural network.

In this embodiment, the voice detection module may be an analog voice detection module 220, a digital voice detection module 230, or a mixed voice detection module 240, and may extract audio features from an analog audio signal output by the audio amplifier 310 and/or a digital audio signal output by the analog-to-digital converter 320, generally, the digital voice detection module 230 may be adopted, and other two voice detection modules may also be adopted, so as to adapt to different scenes and parameter requirements.

Further, the audio features are analog features or digital features, if the sound signal is a speech signal, the audio features may be keywords or keyword groups in the speech, and may further include the tones of the keywords or the keyword groups, and the like, the corresponding wake features may be wake words or wake phrases, including the tones of the wake words or wake phrases, and the like, the wake words or wake phrases are editable, and the number may be between 1 and 128.

Further, the voice detection module may detect audio characteristics of the bone conduction form as well as other forms of sound signals.

In one embodiment, as shown in fig. 10-12, the semiconductor package includes a first die in which the sound sensor 100 is disposed, a second die in which the audio amplifier 310 and the analog-to-digital converter 320 are disposed, and a third die in which the AI-specific sound processor 200 is disposed, the first die, the second die, and the third die being connected in this order;

alternatively, the semiconductor package includes a first die in which the sound sensor 100 is disposed, a fourth die in which the audio amplifier 310 is disposed, and a fifth die in which the analog-to-digital converter 320 and the AI-dedicated sound processor 200 are disposed, the first die, the fourth die, and the fifth die being connected in this order;

alternatively, the semiconductor package includes a first die in which the sound sensor 100 is disposed and a sixth die in which the audio processor 300 and the AI-specific sound processor 200 are disposed, the first die and the sixth die being connected to each other.

In the present embodiment, the sound sensor 100 is a main sensor for receiving a sound signal, and in order to reduce the interference effect of other circuits, the sound sensor 100 is separately disposed in a first die, the audio amplifier 310, the analog-to-digital converter 320 and the AI-dedicated sound processor 200 may have different die arrangement modes, for example, the audio amplifier 310 and the analog-to-digital converter 320 are disposed in a second die, and the AI-dedicated sound processor 200 is disposed in a third die, and the first die, the second die and the third die are connected in sequence so that the sound sensor 100, the audio amplifier 310, the analog-to-digital converter 320 and the AI-dedicated sound processor 200 are connected in sequence, which is suitable for a scenario of producing the second die and the third die; alternatively, the audio amplifier 310 is disposed in a fourth die, and the analog-to-digital converter 320 and the AI-dedicated sound processor 200 are disposed in a fifth die, which is suitable for a scenario where the fourth die and the fifth die are produced, and the first die, the fourth die, and the fifth die may have the sound sensor 100, the audio amplifier 310, the analog-to-digital converter 320, and the AI-dedicated sound processor 200 connected in sequence by connecting in sequence; alternatively, the audio processor 300 and the AI-specific sound processor 200 are provided in a sixth die, adapted to a scenario where the sixth die is produced, and the first die and the sixth die are connected to each other so that the sound sensor 100, the audio amplifier 310, the analog-to-digital converter 320, and the AI-specific sound processor 200 are connected in order; so when actual equipment, can set up the constitution of intelligent microphone according to the actual production condition, realize diversified equipment.

Further, if the smart microphone includes the automatic gain controller 330, the automatic gain controller 330 may be disposed in the same die as the audio amplifier 310 or disposed in the same die as the analog-to-digital converter 320; additionally, if the smart microphone includes the charge pump 340, the charge pump 340 may be disposed in the same die as the audio amplifier 310.

In one embodiment, as shown in fig. 13 to 14, the smart microphone further includes a wireless transmitter 400, the wireless transmitter 400 being connected to the AI-specific sound processor 200 and being provided in the same die; or the wireless transmitter 400 and the AI-specific sound processor 200 are provided in different dies, respectively, and the two dies on which the wireless transmitter 400 and the AI-specific sound processor 200 are located are connected to each other;

the wireless transmitter 400 is used for transmitting the control signal in a wireless manner.

In this embodiment, in the practical application of the smart microphone, the smart microphone may be separated from the back-end processor, so that a wireless transmitter may be provided, which may transmit the control signal output by the AI-specific sound processor 200 in a wireless manner, and the control signal may be received by the wireless receiver and further processed, such as stored, analyzed by the processor to realize function control, and so on.

Further, the transmission mode of the wireless transmitter may be various wireless communication modes such as infrared, bluetooth, Wifi, near field communication, Zigbee, and the like.

As shown in fig. 15 to 18, taking an infrared remote control transmitter as an example, the infrared remote control transmitter may be connected to the AI-dedicated sound processor 200 and provided in the same die; or the infrared remote control transmitter and the AI-dedicated sound processor 200 are respectively disposed in different dies, and the two dies on which the infrared remote control transmitter and the AI-dedicated sound processor 200 are disposed are connected to each other; in addition, if two or more types of wireless transmitters, such as an infrared remote control transmitter and a bluetooth transmitter, are used, they may be connected to the AI-specific sound processor 200, respectively. In one embodiment, as shown in fig. 19-20, the smart microphone further includes a voice-to-digital interface 500;

the voice digital interface 500 is connected to the output end of the analog-to-digital converter 320, and is used for outputting a digital audio signal to the back-end processor;

alternatively, the voice digital interface 500 is connected to an output terminal of the automatic gain controller 330 for outputting the digital audio signal to the back-end processor, wherein the automatic gain controller 330 is connected to the analog-to-digital converter 320.

In this embodiment, the AI-specific sound processor 200 mainly recognizes sound signals and outputs control signals, and in practical applications of the smart microphone, it is also necessary to record and play collected sound signals, so that the speech digital interface 500 may be provided, and the digital audio signals are transmitted from the analog-to-digital converter 320 or the automatic gain controller 330 to the back-end processor, and the back-end processor may further process the digital audio signals, such as storing, playing the digital audio signals by using a speaker, and the like.

Further, the voice digital interface may be I²S (integrated circuit built-in audio bus), PDM (Pulse Density Modulation), TDM (time division multiplexing), and mipi sound wire.

In one embodiment, as shown in fig. 21, the smart microphone further includes a clock management circuit 600 connected to the AI-specific sound processor 200, the clock management circuit 600 including a crystal interface 610 for receiving an external clock signal;

the clock management circuit 600 further includes a time processor 620 and a time register 630, the time register 630 being used to hold time information when the AI-specific sound processor 200 recognizes the time information from the audio features;

the time processor 620 is configured to output an interrupt signal to the AI-specific sound processor 200 when the crystal time corresponding to the arrival time information is reached, where the interrupt signal is configured to instruct the AI-specific sound processor 200 to output the control signal.

In this embodiment, the smart microphone further includes a clock management circuit 600 connected to the AI-dedicated sound processor 200, where the clock management circuit 600 includes a crystal interface 610 for receiving an external clock signal, such as a clock signal of a back-end processor, and implementing signal synchronization with the back-end processor; the AI-dedicated sound processor 200 may further recognize whether or not the audio signal includes time information when determining whether or not to output the control signal based on the audio signal, and if the time information is included, the time information is stored in the time register 630 of the time management circuit 600, and when the crystal oscillator time corresponding to the time information is reached, the time processor 620 of the time management circuit 600 outputs an interrupt signal to the AI-dedicated sound processor 200 to instruct the AI-dedicated sound processor 200 to output the control signal, and the timing control of the control signal may be realized by the time information processing of the time management circuit 600.

Further, the AI-dedicated sound processor 200 of the smart microphone may further include a microprocessor, a nonvolatile memory, and the like; the time management circuit 600 may include an RTC (real time clock) circuit, which may implement a periodic interrupt output, and a 32KHz clock output; the time management circuit 600 may also be integrated inside the AI-specific sound processor 200.

According to the smart microphone, embodiments of the present invention further provide a signal processing method using the smart microphone, and embodiments of the signal processing method using the smart microphone are described in detail below.

Referring to fig. 22, a flowchart of a signal processing method using a smart microphone according to an embodiment is shown. The signal processing method using the smart microphone in this embodiment includes the steps of:

step S710: collecting a sound signal by a sound sensor, converting the sound signal into an audio signal, and transmitting the audio signal to an AI-dedicated sound processor step S720: the audio signal is received through the AI special sound processor, the audio characteristics are extracted from the audio signal, and whether a control signal is output or not is judged according to the audio characteristics, wherein the control signal is used for waking up the back-end processor, and the back-end processor is used for responding to the sound signal collected by the intelligent microphone.

In this embodiment, a sound sensor senses and collects a sound signal and converts the sound signal into an audio signal, an AI-dedicated sound processor identifies and processes the audio signal, extracts audio features from the audio signal, and determines whether the audio features meet preset requirements, and determines whether to output a control signal according to a determination result, wherein the control signal is used for waking up a back-end processor, and the back-end processor can respond to the sound signal collected by the smart microphone after being woken up; the back-end processor need not to awaken the sound signal and discern the processing, the consumption of back-end processor has been reduced, and the effect of the special sound processor of AI is aimed at sound awakening, not as the back-end processor that the structure is complicated is high at the consumption of carrying out the pronunciation algorithm of awakening, when not having the sound signal, the intelligent microphone is in low-power consumption state, the special sound processor of AI is with the sound signal of less power tracking receipt, the back-end processor can be in the dormant state, when sound signal, and when judging output control signal according to the audio frequency characteristic, can awaken the back-end processor, make the intelligent microphone get into the state of awakening, compare with traditional microphone continuously is in the state of awakening, the power consumption has been reduced.

In one embodiment, the signal processing method further comprises the steps of:

In this embodiment, after waking up the back-end processor and the intelligent microphone in the wake-up state, the intelligent microphone may further continue to receive the voice signal and convert the voice signal into a voice audio signal, the AI-dedicated sound processor may perform recognition processing on the voice audio signal, extract a voice feature from the voice audio signal, and after recognition, may determine whether to output an instruction signal for instructing the back-end processor to perform a corresponding operation; the process is controlled through voice, different from the awakening process, the instruction signal is equivalent to an action command, the back-end processor can be made to execute actions related to audio, such as calling, playing music, video and the like, and if the back-end processor is interconnected with other intelligent equipment, other intelligent equipment can be controlled, so that voice intelligent control is achieved.

The signal processing method using the intelligent microphone of the embodiment of the invention corresponds to the intelligent microphone, and the technical characteristics and the beneficial effects described in the embodiment of the intelligent microphone are all applicable to the embodiment using the signal processing method of the intelligent microphone.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An intelligent microphone, comprising a sound sensor and an AI-specific sound processor connected to each other;

the AI special sound processor is used for receiving the audio signal, extracting audio features from the audio signal and judging whether to output a control signal according to the audio features, wherein the control signal is used for waking up a back-end processor, and the back-end processor is used for responding to the sound signal collected by the intelligent microphone;

2. The smart microphone of claim 1, further comprising an audio processor connected between the sound sensor and the AI-specific sound processor, the audio processor disposed in a die of the semiconductor package;

the audio processor is configured to receive the electrical signal, convert the electrical signal into the audio signal, and transmit the audio signal to the AI-specific sound processor.

3. The smart microphone of claim 2, wherein the audio processor comprises an audio amplifier and an analog-to-digital converter;

the audio amplifier is used for carrying out analog amplification on the electric signal to obtain an analog audio signal;

the analog-to-digital converter is used for performing analog-to-digital conversion on the analog audio signal to obtain a digital audio signal and transmitting the digital audio signal to the AI special sound processor;

the semiconductor package includes a first die, a second die, and a third die, the sound sensor being disposed in the first die, the audio amplifier and the analog-to-digital converter being disposed in the second die, the AI-specific sound processor being disposed in the third die, the first die, the second die, and the third die being connected in sequence.

4. The smart microphone of claim 2, wherein the audio processor comprises an audio amplifier and an analog-to-digital converter;

the semiconductor package includes a first die, a fourth die, and a fifth die, the sound sensor being disposed in the first die, the audio amplifier being disposed in the fourth die, the analog-to-digital converter and the AI-specific sound processor being disposed in the fifth die, the first die, the fourth die, and the fifth die being connected in this order.

5. The smart microphone of claim 2, wherein the semiconductor package includes a first die and a sixth die, the sound sensor being disposed in the first die, the audio processor and the AI-specific sound processor being disposed in the sixth die, the first die and the sixth die being connected to each other.

6. The smart microphone according to any one of claims 1 to 5, further comprising a wireless transmitter connected to the AI-specific sound processor and provided in the same die; or the wireless transmitter and the AI special sound processor are respectively arranged in different bare chips, and the two bare chips where the wireless transmitter and the AI special sound processor are arranged are mutually connected;

the wireless transmitter is used for transmitting the control signal in a wireless mode.

7. The smart microphone according to claim 1, wherein the AI-specific sound processor includes a neural network, and the audio feature is determined by the neural network, and the control signal is output if the audio feature matches a preset wake-up feature.

8. The smart microphone of claim 7, wherein the AI-specific sound processor further comprises a digital IO interface through which neural network data is obtained, the neural network data being re-organized via machine learning, wherein the neural network data comprises adjustable weight parameters for audio features.

9. The smart microphone of claim 7, wherein the AI-specific sound processor comprises at least one of a bone conduction recognition module, a voiceprint recognition module, a keyword recognition module, a command word recognition module;

the bone conduction identification module is used for identifying a bone conduction voiceprint in the audio frequency feature through the neural network, and outputting a control signal corresponding to the bone conduction voiceprint if the bone conduction voiceprint is matched with a preset bone conduction voiceprint;

or the voiceprint recognition module is used for recognizing the voiceprint in the audio features through the neural network, and outputting the control signal if the voiceprint is matched with a preset voiceprint;

or the keyword identification module is used for identifying keywords in the audio features through the neural network, and outputting a control signal corresponding to the keywords if the keywords are matched with preset keywords;

10. The smart microphone of claim 2, wherein the audio processor comprises an audio amplifier and an analog-to-digital converter;

the AI-specific sound processor further comprises a voice detection module;

if the voice detection module is an analog voice detection module, the analog voice detection module is used for extracting a first audio characteristic from an analog audio signal output by the audio amplifier and transmitting the first audio characteristic to the neuron network;

if the voice detection module is a digital voice detection module, the digital voice detection module is used for extracting a second audio characteristic from a digital audio signal output by the analog-to-digital converter and transmitting the second audio characteristic to the neuron network;

11. The smart microphone of claim 10, further comprising a voice-to-digital interface;

the voice digital interface is connected with the output end of the analog-to-digital converter and used for outputting the digital audio signal to the back-end processor;

or, the voice digital interface is connected with an output end of an automatic gain controller, and is configured to output the digital audio signal to the back-end processor, where the automatic gain controller is connected with the analog-to-digital converter.

12. The smart microphone according to any one of claims 1 to 5 and 7 to 11, further comprising a clock management circuit connected to the AI-specific sound processor, the clock management circuit including a crystal oscillator interface for receiving an external clock signal;

the clock management circuit further includes a time processor and a time register for holding time information when the AI-specific sound processor recognizes the time information based on the audio feature;

the time processor is configured to output an interrupt signal to the AI-dedicated sound processor when the crystal oscillator time corresponding to the time information is reached, where the interrupt signal is configured to instruct the AI-dedicated sound processor to output the control signal.

13. A signal processing method using the smart microphone according to any one of claims 1 to 12, comprising the steps of:

collecting a sound signal through the sound sensor, converting the sound signal into an audio signal, and transmitting the audio signal to the AI-dedicated sound processor;

and receiving the audio signal through the AI special sound processor, extracting audio characteristics from the audio signal, and judging whether to output a control signal according to the audio characteristics, wherein the control signal is used for waking up a back-end processor, and the back-end processor is used for responding to the sound signal collected by the intelligent microphone.

14. The signal processing method of claim 13, further comprising the steps of:

after waking up the back-end processor by the control signal, acquiring a voice signal by the sound sensor, converting the voice signal into a voice audio signal, and transmitting the voice audio signal to the AI-dedicated sound processor;

and receiving the voice audio signal through the AI special sound processor, extracting voice characteristics from the voice audio signal, and judging whether to output an instruction signal according to the voice characteristics, wherein the instruction signal is used for instructing a back-end processor to execute corresponding operation.