CN116631412A - Method for judging voice robot through voiceprint matching - Google Patents
Method for judging voice robot through voiceprint matching Download PDFInfo
- Publication number
- CN116631412A CN116631412A CN202310519066.0A CN202310519066A CN116631412A CN 116631412 A CN116631412 A CN 116631412A CN 202310519066 A CN202310519066 A CN 202310519066A CN 116631412 A CN116631412 A CN 116631412A
- Authority
- CN
- China
- Prior art keywords
- voice
- voiceprint
- signal
- client
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000009432 framing Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 12
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000005236 sound signal Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 14
- 238000005516 engineering process Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008451 emotion Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Telephonic Communication Services (AREA)
Abstract
The application provides a method for judging a voice robot through voiceprint matching, which is characterized by comprising the following steps: analyzing the information of the calling number and matching with the code; sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed; establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client; the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library; if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client; the voice features are indiscriminate, so that the user is always experienced by a real person in communication, the communication desire of the user is enhanced, and better experience is brought to the user.
Description
Technical Field
The application relates to the field, in particular to a method for judging a voice robot through voiceprint matching.
Background
The artificial intelligence technology has been rapidly developed in recent years, and the condition of applying the artificial intelligence technology to the voice communication direction has become mature. These mainly include speech recognition technology (ASR), natural language understanding (NLP), multi-round Dialog Management (DM), natural Language Generation (NLG), and text-to-speech (TTS) among others. The intelligent voice robot can realize the main functions of the intelligent voice robot by comprehensively utilizing the technology, including actively initiating dialogue with a client, supporting multiple rounds of dialogue of tasks, recognizing the voice of the client through an ASR technology, performing intelligent interaction, and answering the robot by using the record of a customer service representative or converting a text prepared in advance into the voice through a TTS technology to be played to the client. The robot can also automatically recognize the intention of the customer, label the customer and analyze the intention, and can transfer to multiple processing platforms according to different intention, such as the functions of switching manual seats and automatically sending short messages. The intelligent voice robot stores voices and texts of all conversations simultaneously, can be used for post analysis, continuously optimizes a tree structure of branches of a preset conversation scene, can perform various dimension analysis and statistics, and can automatically generate customer portraits to sort and score the intention of customers;
the intelligent voice robot is widely applied to the field of enterprise telemarketing at present, and particularly has a relatively simple calling scene, such as financial product marketing, arrearage payment, telephone early warning and other application scenes. Because the intelligent voice robot has the characteristics of high calling efficiency and lower cost than a manual seat, in the field of enterprise telephone calling, the robot is a trend to replace manual calling in the long term. It is counted that there are currently 5 hundred million voice robot initiated calls per year in the united states. The market in China has a large development space; the traditional voice robot plays the fixed phone according to the preset text after dialing the telephone of the customer, but the traditional voice robot has the following defects:
(1) The voice robot has the advantages that the voice robot is hard in voice operation, the voice robot can clearly hear the voice of the voice robot, and customers do not have the desire to communicate;
(2) The requirements of the clients cannot be accurately known, and the questions presented by the clients are answered;
therefore, the application provides a method for judging the voice robot by voiceprint matching.
Disclosure of Invention
In order to solve the above problems, the present application proposes a method for judging a voice robot by voiceprint matching to more exactly solve the above problems.
The application is realized by the following technical scheme:
the application provides a method for judging a voice robot through voiceprint matching, which comprises the following steps:
s1: analyzing the information of the calling number and matching with the code;
s2: sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed;
s3: establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client;
s4: the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library;
s5: if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client.
Further, the method for judging the voice robot through voiceprint matching, before the step S3, includes:
sampling voice to be processed to obtain a voice signal;
framing, windowing and extracting features of the voice signals;
training the characteristics and constructing a voiceprint model.
Further, in the voice robot judging method through voiceprint matching, the step of sampling the voice to be processed to obtain a voice signal includes:
the amplitude of the analog signal of the speech to be processed is valued according to a fixed frequency, which is the sampling rate, representing the number of samples taken per second;
after the analog signal is sampled, the values of the original signal at each sampling point are obtained, which are expressed as integers for efficient storage and transmission.
Further, in the method for judging the voice robot by voiceprint matching, the steps of framing, windowing and extracting features of the voice signal include:
dividing the voice signal into a plurality of short and small fragments on a time axis, wherein the skewness is called a process of dividing a dense voice signal into a sparse signal frame by a frame, and the process is called framing;
in the framing process, according to the length of the frame, the following can be obtained:
if the interval is equal to the length, there is no overlap between frames;
if the interval is smaller than the length, the frames overlap;
if the interval is greater than the length, then there will be gaps between frames that do not completely cover the original signal.
Further, in the method for judging the voice robot by voiceprint matching, the steps of framing, windowing and extracting the features of the voice signal further include:
extracting the characteristics of the voice signal taking time as an independent variable;
after framing and windowing, the audio frame x (N) with the length of N is more than or equal to 0 and less than or equal to N-1, and the short-time energy can be expressed simply as:
further, in the voice robot judging method through voiceprint matching, in the audio signal processing, discrete voice signals after framing and windowing processing are analyzed, the signal frames at the moment belong to limited discrete signals, and the frequency domain characteristics are calculated by using discrete Fourier transform:
after the discrete fourier transform, e is a constant, and the speech signal x is a complex spectrum.
Further, in the method for judging a voice robot through voiceprint matching, the step S4 includes:
a (i) is voice voiceprint data of the client, a (j) is voice voiceprint data in a voice library, ω is similarity, and a judgment threshold T is set:
omega is more than or equal to T, truejectTR;
omega < T, false accept FA;
truejecttr is match correct and false accept fa is match error.
Furthermore, in the voice robot judging method through voiceprint matching, the similarity can be connected with a similarity curve,
and taking the voice voiceprint data of the client as an abscissa and the voice voiceprint data in the voice library as an ordinate, wherein each judgment threshold T corresponds to a point on a plane.
Further, in the method for judging a voice robot through voiceprint matching, the step S5 includes:
s51, pre-emphasis processing is carried out on the audio signal, so that partial high-frequency energy is reduced;
s52, framing and windowing the pre-emphasis processed signal;
s53, performing fast Fourier transform on each frame of signal to obtain a frequency spectrum;
s54, the frequency spectrum passes through a set of triangular filter banks designed according to the Mel scale to obtain a filtered result;
s55, correcting nonlinearity of the human ear with respect to sound intensity by using a logarithmic function;
s56, calculating a cepstrum through inverse discrete Fourier transform;
s57, obtaining 12 cepstrum coefficients in the previous step, adding the capacity of one frame to obtain 13 th features, calculating the first-order difference and the second-order difference of the 13 features through adjacent frames, and finally obtaining 39 features, wherein the 39 features are MFCC features.
Further, in the voice robot judging method through voice print matching, voice print data of matched customer service personnel are compared with the current voice print database, and 39 MFCC features are further adjusted.
The application has the beneficial effects that:
the voice robot voice print matching judging method can simulate the communication between voice print data of customer service personnel and a client, call up the last call record, know the current demands of the client, timely reply the problems of the client, adopt a manual access mode to realize the seamless connection between the manual and the machine when the voice robot cannot understand the meaning of the client, have no difference in voice characteristics, always provide the real person with the communication feeling, enhance the communication desire of the client and bring better experience to the client.
Because the interference quantity of the characteristics of the sound is more, emotion, state and the like can influence the voice print data of customer service personnel at the moment, in order to ensure the consistency of the voice print data, firstly, the voice print data of the customer service personnel recorded today are compared with the voice print data in a voice library of the customer service personnel, the difference of the MFCC characteristics is found, and the MFCC characteristics are correspondingly regulated so as to carry out voice communication with a client as the same sound as the present.
Drawings
FIG. 1 is a flow chart of a method for determining a voice robot by voiceprint matching according to the present application;
FIG. 2 is a flow chart of voiceprint data of a simulated customer service person of the method for judging a voice robot by voiceprint matching according to the present application;
FIG. 3 is a block diagram of a computer device of an embodiment of a voice robot for voice print matching determination according to the present application;
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below;
it should be noted that the terms "first," "second," "symmetric," "array," and the like are used merely for distinguishing between description and location descriptions, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of features indicated. Thus, a feature defining "first," "symmetry," or the like, may explicitly or implicitly include one or more such feature; also, where certain features are not limited in number by words such as "two," "three," etc., it should be noted that the feature likewise pertains to the explicit or implicit inclusion of one or more feature quantities;
in the present application, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature; meanwhile, all axial descriptions such as X-axis, Y-axis, Z-axis, one end of X-axis, the other end of Y-axis, or the other end of Z-axis are based on a cartesian coordinate system.
In the present application, unless explicitly specified and limited otherwise, terms such as "mounted," "connected," "secured," and the like are to be construed broadly; for example, the connection can be fixed connection, detachable connection or integrated molding; the connection may be mechanical, direct, welded, indirect via an intermediate medium, internal communication between two elements, or interaction between two elements. The specific meaning of the terms described above in the present application will be understood by those skilled in the art from the specification and drawings in combination with specific cases.
In the prior art, intelligent voice robots are increasingly and widely applied to the field of enterprise telemarketing, in particular to product sales with relatively simple calling scenes, such as financial product marketing, arrearage payment, telephone early warning and the like. Because the intelligent voice robot has the characteristics of high calling efficiency and lower cost than a manual seat, in the field of enterprise telephone calling, the robot is a trend to replace manual calling in the long term. It is counted that there are currently 5 hundred million voice robot initiated calls per year in the united states. The market in China has a large development space; the traditional voice robot plays a fixed phone according to a preset text after dialing a telephone of a customer, but the voice robot in the prior art is stiff in phone and has no emotion feature, so that the voice robot is easy to be heard by the customer and loses the communication desire; for this reason, referring to fig. 1-3, the present application provides a technical solution to solve the above technical problems: a method for judging a voice robot through voiceprint matching is provided.
In this embodiment, a method for determining a voice robot by voiceprint matching includes:
s1: analyzing the information of the calling number and matching with the code;
s2: sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed;
s3: establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client;
s4: the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library;
s5: if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client.
In this embodiment, the method for judging the voice robot through voiceprint matching can be suitable for incoming and outgoing calls of a phone, through analysis of a calling number, if the number is historically called, call records can be called, the purpose and problem of the last call can be known, to which step is communicated, if the number is historically not called, the number is encoded and stored with the call records, after the voice robot calls in or out, a first sentence is spoken according to a preset text, such as "do you, here is what can be provided for you by the XX service company? "; after the client replies, the replied voice is used as the voice to be processed, the voice to be processed is put into a voice print recognition model for processing, voice print data of the client are obtained, the voice print data are put into a client voice library for matching, if the voice print data are not matched, codes and the voice print data are transferred into the client voice library for marking the client, if the voice print data of matched customer service personnel are called, the voice print data of the customer service personnel are simulated for communication with the client, the last call record is called, the current demands of the client are known, the problem of the client is timely replied, meanwhile, when the voice robot cannot understand the meaning of the client, a manual access mode can be adopted, the seamless connection between the client and the machine is realized, no difference is caused in voice characteristics, the customer always has the feeling of being in communication, the communication desire of the customer is enhanced, and better experience is brought to the customer.
In another embodiment, a semantic reader and a reply term library are further arranged to analyze voice semantic information of the client, voice of the client is firstly converted into text, corresponding reply term is obtained from the reply term library through searching keywords in the text, and the client is replied, wherein a BM25 algorithm is adopted in a searching mode; keyword q i With text collectionThe relevance between each text in (a), given text D E D, calculate q i Relationship with d:
w i represents q i Weight of R (q) i D) is q i Correlation with d, score (q i D) is q i A weighted sum of correlations with each text in the text set, m being the number of text set texts;
loan scenario, in the course of a voice robot talking to a customer, the customer speaks: the method comprises the steps of inquiring information such as loan interest rate, amount of credit and the like from a return term library according to rules predefined in a telephone operation according to the conditions, dynamically constructing a prompt statement such as 5% of loan interest rate, at least 1 ten thousand yuan for individual borrowing, 1-3 days in advance of borrowing and informing a mechanism, and inquiring history summary experience and reply statement from a return term library according to prompt information, wherein the intention recognized by a pronunciation understanding model is 'amount of credit' when the I ask for the most energy of credit, and the keyword is 'amount of credit' when the I ask for the most energy of credit.
In one embodiment, the process of processing speech to be processed includes:
sampling voice to be processed to obtain a voice signal;
framing, windowing and extracting features of the voice signals;
training the characteristics and constructing a voiceprint model.
Further, in the step of sampling the voice to be processed to obtain a voice signal, the method includes:
the amplitude of the analog signal of the speech to be processed is valued according to a fixed frequency, which is the sampling rate, representing the number of samples taken per second;
after the analog signal is sampled, the values of the original signal at each sampling point are obtained, which are expressed as integers for efficient storage and transmission.
Further, the steps of framing, windowing and extracting features of the voice signal include:
dividing the voice signal into a plurality of short and small fragments on a time axis, wherein the skewness is called a process of dividing a dense voice signal into a sparse signal frame by a frame, and the process is called framing;
in the framing process, according to the length of the frame, the following can be obtained:
if the interval is equal to the length, there is no overlap between frames;
if the interval is smaller than the length, the frames overlap;
if the interval is greater than the length, then there will be gaps between frames that do not completely cover the original signal.
Further, in the steps of framing, windowing and extracting features of the voice signal, the method further includes:
extracting the characteristics of the voice signal taking time as an independent variable;
after framing and windowing, the audio frame x (N) with the length of N is more than or equal to 0 and less than or equal to N-1, and the short-time energy can be expressed simply as:
further, in the audio signal processing, the discrete voice signal after framing and windowing is analyzed, and the signal frame at this time belongs to a limited discrete signal, and the frequency domain characteristics are calculated by using discrete fourier transform:
after the discrete fourier transform, e is a constant, and the speech signal x is a complex spectrum.
In one embodiment, the process of placing voice voiceprint data into a customer voice library for matching includes:
a (i) is voice voiceprint data of the client, a (j) is voice voiceprint data in a voice library, ω is similarity, and a judgment threshold T is set:
omega is more than or equal to T, truejectTR;
omega < T, false accept FA;
truejecttr is match correct and false accept fa is match error.
Furthermore, the similarity can be connected with a similarity curve,
and taking the voice voiceprint data of the client as an abscissa and the voice voiceprint data in the voice library as an ordinate, wherein each judgment threshold T corresponds to a point on a plane.
In one embodiment, the process of simulating voiceprint data of the customer service person comprises:
s51, pre-emphasis processing is carried out on the audio signal, so that partial high-frequency energy is reduced;
s52, framing and windowing the pre-emphasis processed signal;
s53, performing fast Fourier transform on each frame of signal to obtain a frequency spectrum;
s54, the frequency spectrum passes through a set of triangular filter banks designed according to the Mel scale to obtain a filtered result;
s55, correcting nonlinearity of the human ear with respect to sound intensity by using a logarithmic function;
s56, calculating a cepstrum through inverse discrete Fourier transform;
s57, obtaining 12 cepstrum coefficients in the previous step, adding the capacity of one frame to obtain 13 th features, calculating the first-order difference and the second-order difference of the 13 features through adjacent frames, and finally obtaining 39 features, wherein the 39 features are MFCC features.
Further, the matching customer service personnel voice print data is compared with the current voice print database, and 39 MFCC features are further adjusted.
In this embodiment, since the interference of the characteristics of the sound is relatively large, since emotion, state and the like can affect the voice print data of the customer service personnel at the time, in order to ensure consistency of the voice print data, firstly, the voice print data of the customer service personnel recorded today are compared with the voice print data in the voice library of the customer service personnel, the difference in the MFCC characteristics is found out, and the MFCC characteristics are correspondingly adjusted so as to carry out voice communication with the client as the same sound as the present.
In another embodiment, the voice robot is also emotionally modeled by simulation, e.g., the emotional states are excited, fast, peaceful, slow, etc., wherein the mood-assisted words of the peaceful emotional states are, for example: en, fu, hao and the like, if the preset voice text information is "you good", here is the XX service company, what can provide help for you, "you good, here is the XX service company, what can provide help for you, so as to realize voice robot breathing, and prevent the problem of foggy voice robot; words conforming to the semantic context, such as oral Buddhist and the like of customer service personnel, can be added, and the words can respond when the customer recites, so that the communication desire of the customer is increased.
Referring to fig. 3, in an embodiment of the present application, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for data such as a customer voice library. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of determining a voice robot by voiceprint matching.
It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for determining a voice robot by voiceprint matching, specifically:
s1: analyzing the information of the calling number and matching with the code;
s2: sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed;
s3: establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client;
s4: the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library;
s5: if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by hardware associated with a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or direct or indirect application in other related technical fields are included in the scope of the present application.
Although embodiments of the present application have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the application, the scope of which is defined in the appended claims and their equivalents.
Of course, the present application can be implemented in various other embodiments, and based on this embodiment, those skilled in the art can obtain other embodiments without any inventive effort, which fall within the scope of the present application.
Claims (10)
1. A method for judging a voice robot by voiceprint matching, comprising:
s1: analyzing the information of the calling number and matching with the code;
s2: sending the voice text information to a client according to preset voice text information, and waiting for the client to reply to the voice to be processed;
s3: establishing a voiceprint recognition model, and putting the voice to be processed into the voiceprint recognition model for processing to obtain voice voiceprint data of the client;
s4: the voice voiceprint data are put into a customer voice library for matching, and if the voice voiceprint data are not matched, the codes and the voice voiceprint data are transferred into the customer voice library;
s5: if so, calling out voiceprint data of matched customer service personnel, and simulating the voiceprint data of the customer service personnel to communicate with a client.
2. The method for determining a voice robot according to claim 1, wherein before the step S3, the method comprises:
sampling voice to be processed to obtain a voice signal;
framing, windowing and extracting features of the voice signals;
training the characteristics and constructing a voiceprint model.
3. The method for determining a voice robot according to claim 2, wherein the step of sampling the voice to be processed to obtain a voice signal comprises:
the amplitude of the analog signal of the speech to be processed is valued according to a fixed frequency, which is the sampling rate, representing the number of samples taken per second;
after the analog signal is sampled, the values of the original signal at each sampling point are obtained, which are expressed as integers for efficient storage and transmission.
4. The method for determining a voice robot by voiceprint matching according to claim 3, wherein the steps of framing, windowing, and extracting features of the voice signal comprise:
dividing the voice signal into a plurality of short and small fragments on a time axis, wherein the skewness is called a process of dividing a dense voice signal into a sparse signal frame by a frame, and the process is called framing;
in the framing process, according to the length of the frame, the following can be obtained:
if the interval is equal to the length, there is no overlap between frames;
if the interval is smaller than the length, the frames overlap;
if the interval is greater than the length, then there will be gaps between frames that do not completely cover the original signal.
5. The method for determining a voice robot through voiceprint matching according to claim 4, wherein the steps of framing, windowing, and extracting features of the voice signal further comprise:
extracting the characteristics of the voice signal taking time as an independent variable;
after framing and windowing, the audio frame x (N) with the length of N is more than or equal to 0 and less than or equal to N-1, and the short-time energy can be expressed simply as:
6. the method according to claim 5, wherein in the audio signal processing, the discrete voice signals after framing and windowing are analyzed, and the signal frames belong to finite discrete signals, and the frequency domain characteristics are calculated by using discrete fourier transform:
after the discrete fourier transform, e is a constant, and the speech signal x is a complex spectrum.
7. The method for determining a voice robot according to claim 1, wherein the step S4 comprises:
a (i) is voice voiceprint data of the client, a (j) is voice voiceprint data in a voice library, ω is similarity, and a judgment threshold T is set:
omega is more than or equal to T, truejectTR;
omega < T, false accept FA;
truejecttr is match correct and false accept fa is match error.
8. The method for voice robot determination by voiceprint matching of claim 7 wherein the similarity is further connectable to a similarity curve,
and taking the voice voiceprint data of the client as an abscissa and the voice voiceprint data in the voice library as an ordinate, wherein each judgment threshold T corresponds to a point on a plane.
9. The method for determining a voice robot according to claim 1, wherein the step S5 comprises:
s51, pre-emphasis processing is carried out on the audio signal, so that partial high-frequency energy is reduced;
s52, framing and windowing the pre-emphasis processed signal;
s53, performing fast Fourier transform on each frame of signal to obtain a frequency spectrum;
s54, the frequency spectrum passes through a set of triangular filter banks designed according to the Mel scale to obtain a filtered result;
s55, correcting nonlinearity of the human ear with respect to sound intensity by using a logarithmic function;
s56, calculating a cepstrum through inverse discrete Fourier transform;
s57, obtaining 12 cepstrum coefficients in the previous step, adding the capacity of one frame to obtain 13 th features, calculating the first-order difference and the second-order difference of the 13 features through adjacent frames, and finally obtaining 39 features, wherein the 39 features are MFCC features.
10. The method for voice robot determination by voiceprint matching of claim 9 wherein the matching customer service personnel voiceprint data is compared to a current voiceprint database to further adjust 39 MFCC characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310519066.0A CN116631412A (en) | 2023-05-10 | 2023-05-10 | Method for judging voice robot through voiceprint matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310519066.0A CN116631412A (en) | 2023-05-10 | 2023-05-10 | Method for judging voice robot through voiceprint matching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116631412A true CN116631412A (en) | 2023-08-22 |
Family
ID=87620457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310519066.0A Pending CN116631412A (en) | 2023-05-10 | 2023-05-10 | Method for judging voice robot through voiceprint matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116631412A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116884437A (en) * | 2023-09-07 | 2023-10-13 | 北京惠朗时代科技有限公司 | Speech recognition processor based on artificial intelligence |
CN117153185A (en) * | 2023-10-31 | 2023-12-01 | 建信金融科技有限责任公司 | Call processing method, device, computer equipment and storage medium |
-
2023
- 2023-05-10 CN CN202310519066.0A patent/CN116631412A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116884437A (en) * | 2023-09-07 | 2023-10-13 | 北京惠朗时代科技有限公司 | Speech recognition processor based on artificial intelligence |
CN116884437B (en) * | 2023-09-07 | 2023-11-17 | 北京惠朗时代科技有限公司 | Speech recognition processor based on artificial intelligence |
CN117153185A (en) * | 2023-10-31 | 2023-12-01 | 建信金融科技有限责任公司 | Call processing method, device, computer equipment and storage medium |
CN117153185B (en) * | 2023-10-31 | 2024-01-30 | 建信金融科技有限责任公司 | Call processing method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112804400B (en) | Customer service call voice quality inspection method and device, electronic equipment and storage medium | |
CN108737667B (en) | Voice quality inspection method and device, computer equipment and storage medium | |
CN111246027B (en) | Voice communication system and method for realizing man-machine cooperation | |
US10771627B2 (en) | Personalized support routing based on paralinguistic information | |
CN111916111B (en) | Intelligent voice outbound method and device with emotion, server and storage medium | |
US6704708B1 (en) | Interactive voice response system | |
CN110310663A (en) | Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations | |
CN107818798A (en) | Customer service quality evaluating method, device, equipment and storage medium | |
CN111311327A (en) | Service evaluation method, device, equipment and storage medium based on artificial intelligence | |
CN110188361A (en) | Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics | |
CN112581964B (en) | Multi-domain oriented intelligent voice interaction method | |
CN116631412A (en) | Method for judging voice robot through voiceprint matching | |
CN112614510B (en) | Audio quality assessment method and device | |
CN114818649A (en) | Service consultation processing method and device based on intelligent voice interaction technology | |
CN112800743A (en) | Voice scoring model construction system and method based on specific field | |
CN114708857A (en) | Speech recognition model training method, speech recognition method and corresponding device | |
CN115643341A (en) | Artificial intelligence customer service response system | |
CN114328867A (en) | Intelligent interruption method and device in man-machine conversation | |
CN112102807A (en) | Speech synthesis method, apparatus, computer device and storage medium | |
Koolagudi et al. | Speaker recognition in the case of emotional environment using transformation of speech features | |
CN117041430B (en) | Method and device for improving outbound quality and robustness of intelligent coordinated outbound system | |
CN113593580B (en) | Voiceprint recognition method and device | |
CN115641850A (en) | Method and device for recognizing ending of conversation turns, storage medium and computer equipment | |
CN115691500A (en) | Power customer service voice recognition method and device based on time delay neural network | |
CN110853674A (en) | Text collation method, apparatus, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |