CN116631412A

CN116631412A - Method for judging voice robot through voiceprint matching

Info

Publication number: CN116631412A
Application number: CN202310519066.0A
Authority: CN
Inventors: 靳晓鹏; 林古山; 苏雷; 张建建; 刘强
Original assignee: Beijing Weihu Technology Co ltd
Current assignee: Beijing Weihu Technology Co ltd
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-08-22

Abstract

The application provides a method for judging a voice robot through voiceprint matching, which is characterized by comprising the following steps: analyzing the information of the calling number and matching with the code; sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed; establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client; the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library; if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client; the voice features are indiscriminate, so that the user is always experienced by a real person in communication, the communication desire of the user is enhanced, and better experience is brought to the user.

Description

Method for judging voice robot through voiceprint matching

Technical Field

The application relates to the field, in particular to a method for judging a voice robot through voiceprint matching.

Background

The artificial intelligence technology has been rapidly developed in recent years, and the condition of applying the artificial intelligence technology to the voice communication direction has become mature. These mainly include speech recognition technology (ASR), natural language understanding (NLP), multi-round Dialog Management (DM), natural Language Generation (NLG), and text-to-speech (TTS) among others. The intelligent voice robot can realize the main functions of the intelligent voice robot by comprehensively utilizing the technology, including actively initiating dialogue with a client, supporting multiple rounds of dialogue of tasks, recognizing the voice of the client through an ASR technology, performing intelligent interaction, and answering the robot by using the record of a customer service representative or converting a text prepared in advance into the voice through a TTS technology to be played to the client. The robot can also automatically recognize the intention of the customer, label the customer and analyze the intention, and can transfer to multiple processing platforms according to different intention, such as the functions of switching manual seats and automatically sending short messages. The intelligent voice robot stores voices and texts of all conversations simultaneously, can be used for post analysis, continuously optimizes a tree structure of branches of a preset conversation scene, can perform various dimension analysis and statistics, and can automatically generate customer portraits to sort and score the intention of customers;

the intelligent voice robot is widely applied to the field of enterprise telemarketing at present, and particularly has a relatively simple calling scene, such as financial product marketing, arrearage payment, telephone early warning and other application scenes. Because the intelligent voice robot has the characteristics of high calling efficiency and lower cost than a manual seat, in the field of enterprise telephone calling, the robot is a trend to replace manual calling in the long term. It is counted that there are currently 5 hundred million voice robot initiated calls per year in the united states. The market in China has a large development space; the traditional voice robot plays the fixed phone according to the preset text after dialing the telephone of the customer, but the traditional voice robot has the following defects:

(1) The voice robot has the advantages that the voice robot is hard in voice operation, the voice robot can clearly hear the voice of the voice robot, and customers do not have the desire to communicate;

(2) The requirements of the clients cannot be accurately known, and the questions presented by the clients are answered;

therefore, the application provides a method for judging the voice robot by voiceprint matching.

Disclosure of Invention

In order to solve the above problems, the present application proposes a method for judging a voice robot by voiceprint matching to more exactly solve the above problems.

The application is realized by the following technical scheme:

the application provides a method for judging a voice robot through voiceprint matching, which comprises the following steps:

s1: analyzing the information of the calling number and matching with the code;

s2: sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed;

s3: establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client;

s4: the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library;

s5: if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client.

Further, the method for judging the voice robot through voiceprint matching, before the step S3, includes:

sampling voice to be processed to obtain a voice signal;

framing, windowing and extracting features of the voice signals;

training the characteristics and constructing a voiceprint model.

Further, in the voice robot judging method through voiceprint matching, the step of sampling the voice to be processed to obtain a voice signal includes:

the amplitude of the analog signal of the speech to be processed is valued according to a fixed frequency, which is the sampling rate, representing the number of samples taken per second;

after the analog signal is sampled, the values of the original signal at each sampling point are obtained, which are expressed as integers for efficient storage and transmission.

Further, in the method for judging the voice robot by voiceprint matching, the steps of framing, windowing and extracting features of the voice signal include:

dividing the voice signal into a plurality of short and small fragments on a time axis, wherein the skewness is called a process of dividing a dense voice signal into a sparse signal frame by a frame, and the process is called framing;

in the framing process, according to the length of the frame, the following can be obtained:

if the interval is equal to the length, there is no overlap between frames;

if the interval is smaller than the length, the frames overlap;

if the interval is greater than the length, then there will be gaps between frames that do not completely cover the original signal.

Further, in the method for judging the voice robot by voiceprint matching, the steps of framing, windowing and extracting the features of the voice signal further include:

extracting the characteristics of the voice signal taking time as an independent variable;

after framing and windowing, the audio frame x (N) with the length of N is more than or equal to 0 and less than or equal to N-1, and the short-time energy can be expressed simply as:

further, in the voice robot judging method through voiceprint matching, in the audio signal processing, discrete voice signals after framing and windowing processing are analyzed, the signal frames at the moment belong to limited discrete signals, and the frequency domain characteristics are calculated by using discrete Fourier transform:

after the discrete fourier transform, e is a constant, and the speech signal x is a complex spectrum.

Further, in the method for judging a voice robot through voiceprint matching, the step S4 includes:

a (i) is voice voiceprint data of the client, a (j) is voice voiceprint data in a voice library, ω is similarity, and a judgment threshold T is set:

omega is more than or equal to T, truejectTR;

omega < T, false accept FA;

truejecttr is match correct and false accept fa is match error.

Furthermore, in the voice robot judging method through voiceprint matching, the similarity can be connected with a similarity curve,

and taking the voice voiceprint data of the client as an abscissa and the voice voiceprint data in the voice library as an ordinate, wherein each judgment threshold T corresponds to a point on a plane.

Further, in the method for judging a voice robot through voiceprint matching, the step S5 includes:

s51, pre-emphasis processing is carried out on the audio signal, so that partial high-frequency energy is reduced;

s52, framing and windowing the pre-emphasis processed signal;

s53, performing fast Fourier transform on each frame of signal to obtain a frequency spectrum;

s54, the frequency spectrum passes through a set of triangular filter banks designed according to the Mel scale to obtain a filtered result;

s55, correcting nonlinearity of the human ear with respect to sound intensity by using a logarithmic function;

s56, calculating a cepstrum through inverse discrete Fourier transform;

s57, obtaining 12 cepstrum coefficients in the previous step, adding the capacity of one frame to obtain 13 th features, calculating the first-order difference and the second-order difference of the 13 features through adjacent frames, and finally obtaining 39 features, wherein the 39 features are MFCC features.

Further, in the voice robot judging method through voice print matching, voice print data of matched customer service personnel are compared with the current voice print database, and 39 MFCC features are further adjusted.

The application has the beneficial effects that:

the voice robot voice print matching judging method can simulate the communication between voice print data of customer service personnel and a client, call up the last call record, know the current demands of the client, timely reply the problems of the client, adopt a manual access mode to realize the seamless connection between the manual and the machine when the voice robot cannot understand the meaning of the client, have no difference in voice characteristics, always provide the real person with the communication feeling, enhance the communication desire of the client and bring better experience to the client.

Because the interference quantity of the characteristics of the sound is more, emotion, state and the like can influence the voice print data of customer service personnel at the moment, in order to ensure the consistency of the voice print data, firstly, the voice print data of the customer service personnel recorded today are compared with the voice print data in a voice library of the customer service personnel, the difference of the MFCC characteristics is found, and the MFCC characteristics are correspondingly regulated so as to carry out voice communication with a client as the same sound as the present.

Drawings

FIG. 1 is a flow chart of a method for determining a voice robot by voiceprint matching according to the present application;

FIG. 2 is a flow chart of voiceprint data of a simulated customer service person of the method for judging a voice robot by voiceprint matching according to the present application;

FIG. 3 is a block diagram of a computer device of an embodiment of a voice robot for voice print matching determination according to the present application;

Detailed Description

In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below;

it should be noted that the terms "first," "second," "symmetric," "array," and the like are used merely for distinguishing between description and location descriptions, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of features indicated. Thus, a feature defining "first," "symmetry," or the like, may explicitly or implicitly include one or more such feature; also, where certain features are not limited in number by words such as "two," "three," etc., it should be noted that the feature likewise pertains to the explicit or implicit inclusion of one or more feature quantities;

in the present application, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature; meanwhile, all axial descriptions such as X-axis, Y-axis, Z-axis, one end of X-axis, the other end of Y-axis, or the other end of Z-axis are based on a cartesian coordinate system.

In the present application, unless explicitly specified and limited otherwise, terms such as "mounted," "connected," "secured," and the like are to be construed broadly; for example, the connection can be fixed connection, detachable connection or integrated molding; the connection may be mechanical, direct, welded, indirect via an intermediate medium, internal communication between two elements, or interaction between two elements. The specific meaning of the terms described above in the present application will be understood by those skilled in the art from the specification and drawings in combination with specific cases.

In the prior art, intelligent voice robots are increasingly and widely applied to the field of enterprise telemarketing, in particular to product sales with relatively simple calling scenes, such as financial product marketing, arrearage payment, telephone early warning and the like. Because the intelligent voice robot has the characteristics of high calling efficiency and lower cost than a manual seat, in the field of enterprise telephone calling, the robot is a trend to replace manual calling in the long term. It is counted that there are currently 5 hundred million voice robot initiated calls per year in the united states. The market in China has a large development space; the traditional voice robot plays a fixed phone according to a preset text after dialing a telephone of a customer, but the voice robot in the prior art is stiff in phone and has no emotion feature, so that the voice robot is easy to be heard by the customer and loses the communication desire; for this reason, referring to fig. 1-3, the present application provides a technical solution to solve the above technical problems: a method for judging a voice robot through voiceprint matching is provided.

In this embodiment, a method for determining a voice robot by voiceprint matching includes:

s1: analyzing the information of the calling number and matching with the code;

In this embodiment, the method for judging the voice robot through voiceprint matching can be suitable for incoming and outgoing calls of a phone, through analysis of a calling number, if the number is historically called, call records can be called, the purpose and problem of the last call can be known, to which step is communicated, if the number is historically not called, the number is encoded and stored with the call records, after the voice robot calls in or out, a first sentence is spoken according to a preset text, such as "do you, here is what can be provided for you by the XX service company? "; after the client replies, the replied voice is used as the voice to be processed, the voice to be processed is put into a voice print recognition model for processing, voice print data of the client are obtained, the voice print data are put into a client voice library for matching, if the voice print data are not matched, codes and the voice print data are transferred into the client voice library for marking the client, if the voice print data of matched customer service personnel are called, the voice print data of the customer service personnel are simulated for communication with the client, the last call record is called, the current demands of the client are known, the problem of the client is timely replied, meanwhile, when the voice robot cannot understand the meaning of the client, a manual access mode can be adopted, the seamless connection between the client and the machine is realized, no difference is caused in voice characteristics, the customer always has the feeling of being in communication, the communication desire of the customer is enhanced, and better experience is brought to the customer.

In another embodiment, a semantic reader and a reply term library are further arranged to analyze voice semantic information of the client, voice of the client is firstly converted into text, corresponding reply term is obtained from the reply term library through searching keywords in the text, and the client is replied, wherein a BM25 algorithm is adopted in a searching mode; keyword q _i With text collectionThe relevance between each text in (a), given text D E D, calculate q _i Relationship with d:

w _i represents q _i Weight of R (q) _i D) is q _i Correlation with d, score (q _i D) is q _i A weighted sum of correlations with each text in the text set, m being the number of text set texts;

loan scenario, in the course of a voice robot talking to a customer, the customer speaks: the method comprises the steps of inquiring information such as loan interest rate, amount of credit and the like from a return term library according to rules predefined in a telephone operation according to the conditions, dynamically constructing a prompt statement such as 5% of loan interest rate, at least 1 ten thousand yuan for individual borrowing, 1-3 days in advance of borrowing and informing a mechanism, and inquiring history summary experience and reply statement from a return term library according to prompt information, wherein the intention recognized by a pronunciation understanding model is 'amount of credit' when the I ask for the most energy of credit, and the keyword is 'amount of credit' when the I ask for the most energy of credit.

In one embodiment, the process of processing speech to be processed includes:

sampling voice to be processed to obtain a voice signal;

framing, windowing and extracting features of the voice signals;

training the characteristics and constructing a voiceprint model.

Further, in the step of sampling the voice to be processed to obtain a voice signal, the method includes:

Further, the steps of framing, windowing and extracting features of the voice signal include:

if the interval is equal to the length, there is no overlap between frames;

if the interval is smaller than the length, the frames overlap;

Further, in the steps of framing, windowing and extracting features of the voice signal, the method further includes:

further, in the audio signal processing, the discrete voice signal after framing and windowing is analyzed, and the signal frame at this time belongs to a limited discrete signal, and the frequency domain characteristics are calculated by using discrete fourier transform:

In one embodiment, the process of placing voice voiceprint data into a customer voice library for matching includes:

omega is more than or equal to T, truejectTR;

omega < T, false accept FA;

truejecttr is match correct and false accept fa is match error.

Furthermore, the similarity can be connected with a similarity curve,

In one embodiment, the process of simulating voiceprint data of the customer service person comprises:

s52, framing and windowing the pre-emphasis processed signal;

s56, calculating a cepstrum through inverse discrete Fourier transform;

Further, the matching customer service personnel voice print data is compared with the current voice print database, and 39 MFCC features are further adjusted.

In this embodiment, since the interference of the characteristics of the sound is relatively large, since emotion, state and the like can affect the voice print data of the customer service personnel at the time, in order to ensure consistency of the voice print data, firstly, the voice print data of the customer service personnel recorded today are compared with the voice print data in the voice library of the customer service personnel, the difference in the MFCC characteristics is found out, and the MFCC characteristics are correspondingly adjusted so as to carry out voice communication with the client as the same sound as the present.

In another embodiment, the voice robot is also emotionally modeled by simulation, e.g., the emotional states are excited, fast, peaceful, slow, etc., wherein the mood-assisted words of the peaceful emotional states are, for example: en, fu, hao and the like, if the preset voice text information is "you good", here is the XX service company, what can provide help for you, "you good, here is the XX service company, what can provide help for you, so as to realize voice robot breathing, and prevent the problem of foggy voice robot; words conforming to the semantic context, such as oral Buddhist and the like of customer service personnel, can be added, and the words can respond when the customer recites, so that the communication desire of the customer is increased.

Referring to fig. 3, in an embodiment of the present application, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for data such as a customer voice library. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of determining a voice robot by voiceprint matching.

It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.

An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for determining a voice robot by voiceprint matching, specifically:

s1: analyzing the information of the calling number and matching with the code;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by hardware associated with a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or direct or indirect application in other related technical fields are included in the scope of the present application.

Although embodiments of the present application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.

Of course, the present application can be implemented in various other embodiments, and based on this embodiment, those skilled in the art can obtain other embodiments without any inventive effort, which fall within the scope of the present application.

Claims

1. A method for judging a voice robot by voiceprint matching, comprising:

s1: analyzing the information of the calling number and matching with the code;

2. The method for determining a voice robot according to claim 1, wherein before the step S3, the method comprises:

sampling voice to be processed to obtain a voice signal;

framing, windowing and extracting features of the voice signals;

training the characteristics and constructing a voiceprint model.

3. The method for determining a voice robot according to claim 2, wherein the step of sampling the voice to be processed to obtain a voice signal comprises:

4. The method for determining a voice robot by voiceprint matching according to claim 3, wherein the steps of framing, windowing, and extracting features of the voice signal comprise:

if the interval is equal to the length, there is no overlap between frames;

if the interval is smaller than the length, the frames overlap;

5. The method for determining a voice robot through voiceprint matching according to claim 4, wherein the steps of framing, windowing, and extracting features of the voice signal further comprise:

6. the method according to claim 5, wherein in the audio signal processing, the discrete voice signals after framing and windowing are analyzed, and the signal frames belong to finite discrete signals, and the frequency domain characteristics are calculated by using discrete fourier transform:

7. The method for determining a voice robot according to claim 1, wherein the step S4 comprises:

omega is more than or equal to T, truejectTR;

omega < T, false accept FA;

truejecttr is match correct and false accept fa is match error.

8. The method for voice robot determination by voiceprint matching of claim 7 wherein the similarity is further connectable to a similarity curve,

9. The method for determining a voice robot according to claim 1, wherein the step S5 comprises:

s52, framing and windowing the pre-emphasis processed signal;

s56, calculating a cepstrum through inverse discrete Fourier transform;

10. The method for voice robot determination by voiceprint matching of claim 9 wherein the matching customer service personnel voiceprint data is compared to a current voiceprint database to further adjust 39 MFCC characteristics.