CN113254579A

CN113254579A - Voice retrieval method and device and electronic equipment

Info

Publication number: CN113254579A
Application number: CN202110559408.2A
Authority: CN
Inventors: 李彬
Original assignee: Beijing Ziroom Information Technology Co Ltd
Current assignee: Beijing Ziroom Information Technology Co Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-13

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a voice retrieval method, a voice retrieval device and electronic equipment, wherein the method comprises the steps of receiving retrieval voice and obtaining a voice text corresponding to the retrieval voice; performing keyword matching by using the voice text, and determining a target keyword corresponding to the voice text; and forming retrieval data based on the target keywords, and sending the retrieval data to a server to obtain corresponding target data, wherein the target data is determined by the server based on a retrieval model, the retrieval model is obtained based on user portrait training, and the user portrait comprises the keywords and the corresponding data. The retrieval model is obtained by utilizing user portrait training, when voice retrieval is carried out, on one hand, the efficiency of data retrieval can be improved by utilizing voice retrieval, on the other hand, the retrieval of target data is carried out by combining the user portrait, the obtained target data can be ensured to meet the requirements of users, and the time for users to screen data is reduced.

Description

Voice retrieval method and device and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a voice retrieval method and device and electronic equipment.

Background

Speech retrieval is speech-centric retrieval, using speech recognition and processing techniques to retrieve audio information. Such as radio programs, telephone conversations, conference recordings, etc. For speech retrieval, the common methods are: searching using large vocabulary speech recognition techniques, subword unit based searching, recognized keyword based searching, segmentation and indexing based on speaker recognition, and the like.

Disclosure of Invention

In view of this, embodiments of the present invention provide a voice retrieval method, a voice retrieval device and an electronic device, so as to solve the problem of voice retrieval.

According to a first aspect, an embodiment of the present invention provides a speech retrieval method, including:

receiving retrieval voice, and obtaining a voice text corresponding to the retrieval voice;

performing keyword matching by using the voice text, and determining a target keyword corresponding to the voice text;

and forming retrieval data based on the target keywords, and sending the retrieval data to a server to obtain corresponding target data, wherein the target data is determined by the server based on a retrieval model, the retrieval model is obtained based on user portrait training, and the user portrait comprises the keywords and the corresponding data.

According to the voice retrieval method provided by the embodiment of the invention, the retrieval model is obtained by utilizing user portrait training, and corresponding target data is obtained based on the retrieval model when voice retrieval is carried out, so that on one hand, the efficiency of data retrieval can be improved by utilizing voice retrieval, on the other hand, the retrieval of the target data is carried out by combining the user portrait, the obtained target data can be ensured to meet the requirements of users, the time for screening data by the users is reduced, and the accuracy of data retrieval is improved.

With reference to the first aspect, in a first implementation manner of the first aspect, the determining a target keyword corresponding to a voice text by performing keyword matching using the voice text includes:

acquiring a keyword library;

dividing the keywords in the keyword library;

and matching keywords of the voice text in the divided keyword library to determine target keywords corresponding to the voice text.

According to the voice retrieval method provided by the embodiment of the invention, the keywords in the keyword library are divided, and then the keywords are matched in the divided keyword library, so that the efficiency of target keyword retrieval is improved.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the forming search data based on the target keyword includes: and encrypting the target keyword to determine the retrieval data.

The voice retrieval method provided by the embodiment of the invention carries out encryption transmission on the target keyword, thereby ensuring the safety of data transmission.

According to a second aspect, an embodiment of the present invention further provides a speech retrieval method, including:

receiving retrieval data, wherein the retrieval data is formed based on target keywords, and the target keywords are obtained by performing keyword matching on a voice text corresponding to retrieval voice;

obtaining the target keywords based on the retrieval data, and inputting the target keywords into a retrieval model to determine target data, wherein the retrieval model is obtained based on user portrait training, and the user portrait comprises the keywords and corresponding data;

and feeding the target data back to a voice input end.

With reference to the second aspect, in a first embodiment of the second aspect, the training process of the search model includes the following steps:

obtaining a plurality of user portraits, wherein the user portraits comprise key words and corresponding data;

and training a preset retrieval model based on the plurality of user portrait, and determining the retrieval model.

According to the voice retrieval method provided by the embodiment of the invention, the preset retrieval model is trained by utilizing a plurality of user images, so that the retrieval model can learn the user images, and the accurate pushing of data is realized.

With reference to the first embodiment of the second aspect, in a second embodiment of the second aspect, the acquiring a plurality of user representations includes:

and adding the target keywords and the target data into a user portrait of a current user, and updating the user portrait of the current user so as to update the retrieval model.

According to the voice retrieval method provided by the embodiment of the invention, the target keyword and the target data are added into the user portrait of the current user, the user portrait of the current user is updated, and then the retrieval model is updated, namely the retrieval model is obtained by utilizing real data training, so that the reliability of the retrieval model is further ensured.

According to a third aspect, an embodiment of the present invention provides a speech retrieval apparatus, including:

the first receiving module is used for receiving retrieval voice and obtaining a voice text corresponding to the retrieval voice;

the matching module is used for performing keyword matching by using the voice text and determining a target keyword corresponding to the voice text;

and the acquisition module is used for forming retrieval data based on the target keywords and sending the retrieval data to the server to obtain corresponding target data, wherein the target data is determined by the server based on a retrieval model, the retrieval model is obtained based on user portrait training, and the user portrait comprises the keywords and the corresponding data.

The voice retrieval device provided by the embodiment of the invention utilizes the user portrait to train to obtain the retrieval model, and obtains the corresponding target data based on the retrieval model when performing voice retrieval, so that the efficiency of data retrieval can be improved by utilizing voice retrieval on one hand, and on the other hand, the retrieval of the target data is performed by combining the user portrait, the obtained target data can be ensured to meet the requirements of users, the time for screening data by the users is reduced, and the accuracy of data retrieval is improved.

According to a fourth aspect, an embodiment of the present invention provides a speech retrieval apparatus, including:

the second receiving module is used for receiving retrieval data, the retrieval data is formed based on target keywords, and the target keywords are obtained by performing keyword matching on a voice text corresponding to the retrieval voice;

the determining module is used for obtaining the target key words based on the retrieval data and inputting the target key words into a retrieval model to determine the target data, the retrieval model is obtained based on user portrait training, and the user portrait comprises the key words and corresponding data;

and the feedback module is used for feeding the target data back to the voice input end.

According to a fifth aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the voice retrieval method according to the first aspect or any one of the embodiments of the first aspect.

According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores computer instructions for causing a computer to execute the voice retrieval method described in the first aspect or any one of the implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method of voice retrieval according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of voice retrieval according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of voice retrieval according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method of voice retrieval according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method of voice retrieval according to an embodiment of the present invention;

fig. 6 is a block diagram of the structure of a voice retrieval apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of the structure of a voice retrieval apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The voice retrieval method provided by the embodiment of the invention can be used in APP, such as the retrieval of a house source, the retrieval of a certain nearby target location and the like. The user inputs the retrieval voice through the electronic equipment, and the electronic equipment outputs the target data corresponding to the retrieval voice by executing the voice retrieval method. The target data may be displayed in a voice output manner, may be displayed on an interface of the electronic device, and the like. The method is not limited in any way, and the method can be set according to actual requirements.

In accordance with an embodiment of the present invention, there is provided a speech retrieval method embodiment, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

In this embodiment, a voice retrieval method is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, and fig. 1 is a flowchart of the voice retrieval method according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

and S11, receiving the search voice and obtaining the voice text corresponding to the search voice.

The electronic equipment is provided with a voice input trigger label, and a user can trigger a voice input function by clicking the label. The user inputs the retrieval voice, and accordingly the electronic equipment can receive the retrieval voice.

Alternatively, the retrieved voice received by the electronic device may be obtained from a third-party device, rather than being directly input by the user, or may be received in other ways. And are not intended to be limiting in any way.

The electronic device converts the retrieved speech to speech text after receiving it. The electronic equipment can be integrated with a voice recognition module, and the retrieved voice is converted into a voice text by using the voice recognition module; or the electronic equipment sends the retrieval voice to a third party for text processing, and the third party feeds the voice text back to the electronic equipment after processing the voice text. For example, the electronic device may call a voice processing SDK of the third party, and the voice text obtained by the voice processing SDK of the third party is fed back to the electronic device, so that the electronic device may obtain the voice text corresponding to the retrieved voice.

And S12, performing keyword matching by using the voice text, and determining a target keyword corresponding to the voice text.

After the electronic device obtains the speech text, the electronic device may pre-process the speech text to remove some meaningless words in the speech text. Such as words of tone, stop words, etc. Specifically, the word segmentation processing may be performed on the voice text, and then each word may be compared with each word in the preset vocabulary table in sequence, so as to remove the above-mentioned meaningless words.

After the electronic equipment preprocesses the voice text, keyword matching is carried out on each word in the preprocessed voice text so as to determine a target keyword corresponding to the voice text. The number of the target keywords may be one, two or more, the specific number is not limited at all, and the target keywords may be set correspondingly according to actual requirements.

And S13, forming retrieval data based on the target keywords, and sending the retrieval data to the server to obtain corresponding target data.

The target data are determined by the server based on a retrieval model, the retrieval model is obtained based on user portrait training, and the user portrait comprises keywords and corresponding data.

After the target keywords are determined, the electronic equipment can directly send the target keywords to the server as search data, or can process the target keywords and then send the target keywords to the server. For example, the electronic device may encrypt the target keyword, and use the encrypted data as the search data.

After the electronic equipment sends the retrieval data to the server, the server retrieves the target data by using the retrieval model. The retrieval model is obtained based on user portrait training, so that the retrieval model can recommend corresponding data based on user requirements. Specific details regarding the retrieval model will be described in detail below.

According to the voice retrieval method provided by the embodiment, the retrieval model is obtained by utilizing user portrait training, and corresponding target data is obtained based on the retrieval model when voice retrieval is carried out, so that on one hand, the efficiency of data retrieval can be improved by utilizing voice retrieval, on the other hand, the retrieval of the target data is carried out by combining the user portrait, the obtained target data can be ensured to meet the user requirements, the time for screening data by the user is reduced, and the accuracy of data retrieval is improved.

In this embodiment, a voice retrieval method is provided, which can be used in electronic devices, such as a mobile phone, a tablet computer, and the like, fig. 2 is a flowchart of the voice retrieval method according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

and S21, receiving the search voice and obtaining the voice text corresponding to the search voice.

Please refer to S11 in fig. 1, which is not described herein again.

And S22, performing keyword matching by using the voice text, and determining a target keyword corresponding to the voice text.

Specifically, the above S22 may include:

s221, obtaining a keyword library.

The keyword library can be established according to business requirements, and different businesses correspond to different keyword libraries; of course, the same keyword library may also be used to contain all the service requirements, and the setting may be specifically performed according to the actual requirements.

S222, dividing the keywords in the keyword library.

Before the electronic device performs keyword matching, the electronic device may first divide the keywords in the keyword library into, for example, 2 parts; that is, the keywords in the keyword library are divided into halves.

And S223, matching keywords of the voice text in the divided keyword library, and determining target keywords corresponding to the voice text.

And the electronic equipment performs matching retrieval on the voice text on the basis of the divided keyword library to obtain target keywords corresponding to the voice text. For example, the corresponding target keywords may be retrieved by a regular matching algorithm.

And S23, forming retrieval data based on the target keywords, and sending the retrieval data to the server to obtain corresponding target data.

The electronic device may form the search data by encrypting the target keyword. Specifically, the electronic device encrypts the target keyword to form search data. The retrieval data is represented in a form of key character codes, and a subsequent server receives the retrieval data and analyzes the retrieval data to extract corresponding target keywords. The target keywords are encrypted and transmitted, so that the safety of data transmission is ensured.

For the rest, please refer to S13 in the embodiment shown in fig. 1, which is not described herein again.

According to the voice retrieval method provided by the embodiment, the keywords in the keyword library are divided, and then the keywords are matched in the divided keyword library, so that the efficiency of target keyword retrieval is improved.

In this embodiment, a voice retrieval method is provided, which can be used in an electronic device, such as a server, etc., and fig. 3 is a flowchart of the voice retrieval method according to the embodiment of the present invention, as shown in fig. 3, the flowchart includes the following steps:

s31, receiving the search data.

The retrieval data is formed based on target keywords, and the target keywords are obtained by performing keyword matching on voice texts corresponding to retrieval voices.

This step corresponds to S13 in the embodiment shown in fig. 1 or S23 in the embodiment shown in fig. 2, and as to the specific forming manner of the search data, reference may be made to the description of the corresponding step in fig. 1 or fig. 2, which is not described herein again.

And S32, obtaining a target keyword based on the retrieval data, and inputting the target keyword into the retrieval model to determine the target data.

The retrieval model is obtained through training based on a user portrait, and the user portrait comprises keywords and corresponding data.

For the retrieval model, the specific network structure can be set correspondingly according to actual requirements, the network structure is not limited at all, and only the input of the retrieval model is ensured to comprise the target keyword, and the output of the retrieval model comprises the target data.

For example, the electronic device may train user behavior data through a TensorFlow model file, i.e., train the search model using a user portrait.

And S33, feeding the target data back to the voice input end.

After the target data is retrieved, the electronic device can directly feed the target data back to the voice input end, can encrypt the target data and then send the encrypted target data, or can process the encrypted target data in other modes and then send the encrypted target data, and the like.

In some optional implementations of this embodiment, the training process of the search model includes:

(1) a plurality of user representations are obtained, wherein the user representations comprise keywords and corresponding data.

The user portrait can be collected by firstly acquiring the identification of the current user, and then storing the confirmed target key words and the searched target data into the database of the current user to form the user portrait of the current user; alternatively, the matched target keyword may be stored in a database as a tag of the user image to form the user image.

(2) And training a preset retrieval model based on a plurality of user figures, and determining the retrieval model.

And training the user portrait serving as sample data of a preset retrieval model, and finally determining the retrieval model.

A plurality of user images are used for training a preset retrieval model, so that the retrieval model can learn the user images, and accurate pushing of data is achieved.

Further optionally, the step (1) may include: and adding the target keywords and the target data into the user portrait of the current user, and updating the user portrait of the current user so as to update the retrieval model.

And adding the target keywords and the target data into the user portrait of the current user, updating the user portrait of the current user, and further updating the retrieval model, namely training by using real data to obtain the retrieval model, thereby further ensuring the reliability of the retrieval model.

An embodiment of the present invention further provides a voice retrieval method, as shown in fig. 4, including:

and S41, the client receives the retrieval voice to obtain the voice text corresponding to the retrieval voice. Please refer to S11 in the embodiment shown in fig. 1 or S21 in the embodiment shown in fig. 2 for details, which are not repeated herein.

And S42, the client side performs keyword matching by using the voice text and determines a target keyword corresponding to the voice text. Please refer to S12 in the embodiment shown in fig. 1 or S22 in the embodiment shown in fig. 2 for details, which are not repeated herein.

And S43, the client forms the retrieval data based on the target keywords and sends the retrieval data to the server. Please refer to S13 in the embodiment shown in fig. 1 or S23 in the embodiment shown in fig. 2 for details, which are not repeated herein.

S44, the server receives the retrieval data. Please refer to S31 in fig. 3 for details, which are not described herein.

And S45, the server obtains the target keywords based on the retrieval data, and inputs the target keywords into the retrieval model to determine the target data. Please refer to S32 in fig. 3 for details, which are not described herein.

And S46, the server side feeds the target data back to the voice input end. Please refer to S33 in fig. 3 for details, which are not described herein.

In some optional implementations of this embodiment, as shown in fig. 5, the voice retrieval method includes:

(1) user APP voice input;

(2) calling a Baidu voice SDK by the APP;

(3) the SDK converts the voice into text information and feeds the text information back to the user APP through the internet;

(4) comparing the voice information with a key-value data form through an algorithm, so that the specific key value is retrieved by the Coueza;

(5) storing the keys as labels in a database of the user portrait;

(6) the server background filters the key data through the user portrait key value, converts the data into json format, and transmits the json format to the APP terminal for corresponding operation through an interface mode.

The voice retrieval method provided by the embodiment utilizes the voice function to retrieve the required keywords; and the background server stores the user label into a user image database according to the searched keyword as the user label, and searches related label data based on the user image. The method can improve the user experience and reduce the process of operating the page by the user; the user portrait is combined in the retrieval process, the user requirement can be quickly positioned, the time cost of the user is saved, and the corresponding accurate pushing function can be performed subsequently according to the fantasy of the user.

In this embodiment, a voice retrieval apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides a voice search device, as shown in fig. 6, including:

a first receiving module 51, configured to receive a retrieval voice, and obtain a voice text corresponding to the retrieval voice;

the matching module 52 is configured to perform keyword matching by using the voice text, and determine a target keyword corresponding to the voice text;

and an obtaining module 53, configured to form search data based on the target keyword, and send the search data to a server to obtain corresponding target data, where the target data is determined by the server based on a search model, the search model is obtained based on user portrait training, and the user portrait includes the keyword and corresponding data.

As shown in fig. 7, the speech retrieval apparatus provided in this embodiment includes:

a second receiving module 61, configured to receive search data, where the search data is formed based on a target keyword, and the target keyword is obtained by performing keyword matching on a speech text corresponding to a search speech;

a determining module 62, configured to obtain the target keyword based on the search data, and input the target keyword into a search model to determine the target data, where the search model is obtained based on user portrait training, and the user portrait includes the keyword and corresponding data;

and the feedback module 63 is configured to feed back the target data to the voice input end.

The voice retrieval device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that may provide the above-described functionality.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, which has the voice retrieval apparatus shown in fig. 6 or fig. 7.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 8, the electronic device may include: at least one processor 71, such as a CPU (Central Processing Unit), at least one communication interface 73, memory 74, at least one communication bus 72. Wherein a communication bus 72 is used to enable the connection communication between these components. The communication interface 73 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 73 may also include a standard wired interface and a standard wireless interface. The Memory 74 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 74 may alternatively be at least one memory device located remotely from the processor 71. Wherein the processor 71 may be in connection with the apparatus described in fig. 6 or fig. 7, an application program is stored in the memory 74, and the processor 71 calls the program code stored in the memory 74 for performing any of the above-mentioned method steps.

The communication bus 72 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 72 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The memory 74 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 74 may also comprise a combination of memories of the kind described above.

The processor 71 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.

The processor 71 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 74 is also used for storing program instructions. Processor 71 may invoke program instructions to implement a voice retrieval method as shown in the embodiments of fig. 1-2, or fig. 3 of the present application.

An embodiment of the present invention further provides a non-transitory computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the voice retrieval method in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method for voice retrieval, comprising:

2. The method of claim 1, wherein the determining the target keyword corresponding to the speech text by performing keyword matching using the speech text comprises:

acquiring a keyword library;

dividing the keywords in the keyword library;

3. The method of claim 2, wherein the forming search data based on the target keyword comprises: and encrypting the target keyword to determine the retrieval data.

4. A method for voice retrieval, comprising:

and feeding the target data back to a voice input end.

5. The method of claim 4, wherein the training process of the search model comprises the steps of:

6. The method of claim 5, wherein said obtaining a plurality of user representations comprises:

7. A speech retrieval apparatus, comprising:

8. A speech retrieval apparatus, comprising:

9. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform any one of claims 1 to 3, or to perform the voice retrieval method of any one of claims 4 to 6.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform any one of claims 1 to 3, or to perform the voice retrieval method of any one of claims 4 to 6.