A kind of automatic answering system and method applied to session operational scenarios
Technical field
The invention belongs to telephone communicating technology fields, and in particular to be a kind of automatic answering system applied to session operational scenarios
And method.
Background technique
It between enterprise and client in daily communication process, needs to expend a large amount of costs of labor and personnel's energy problem, markets
Respectively knowledge experience is different by attendant, while understanding, expressing, the features such as mood, spoken language, causes service effectiveness irregular,
To influence service quality and result.Therefore, be born intelligent sound, attends a banquet and Communication with Customer to substitute.
Currently, intelligent sound can pass through ASR(Real-time speech recognition) and NLP(natural language understanding), realize that machine is real
When understand human sound language, carry out AI in scenes such as customer service, sale and intelligently link up, extensive language is carried out to human sound language
Material training, under given scenario, the preferable identification model of available identification quality, voice gateway module is by the sound of mankind reality
When be sent in ASR and identified, obtain the recognition result of textual form, be used for keyword match or semantic processes, obtain
Preset question and answer, plays back with audio form, is linked up with matching man-machine voice.
Although existing scheme can support voice gateway module and the language communication of the mankind, substantially with people and voice gateways
Between module based on the form of question-response, it is difficult to accomplish the exchange of chipping in of human levels, it is more inflexible and unnatural.For visit
The access of chipping in suddenly of visitor, if voice gateway module is unmoved, it appears rude, exchange is unfriendly, and user must hear out
The default words art of whole voice gateway module, and can not interrupt within the period of voice gateway module words art output or statement into question,
It is difficult to realize in communication timely, quick;On the other hand, chipping in for visitor is interrupted, and may have the inquiry of more urgent problems, if not
It is switched on relevant issues node in time, client's time can be wasted.In conclusion existing intelligent sound voice gateway module and people
Speech exchange scheme it is still to be improved in interactive experience, communication efficiency.
Summary of the invention
It is an object of the invention to overcoming defect and deficiency mentioned above, and provide it is a kind of applied to session operational scenarios from
Dynamic answering system.
Another object of the present invention is to provide a kind of auto-answer methods applied to session operational scenarios.
A kind of automatic answering system applied to session operational scenarios characterized by comprising client session processing module and
AI session engine module;
The client session processing module, as, to adaptation layer, help scene dialogue is preceding to content recognition and event before interaction
Identification, including client modules, voice gateway module;The client modules, for speech communication function mobile phone,
Base, or can communication text social tool;The voice gateway module is actively initiated or is answered according to dial plan
Call, provides corresponding ESL event for a series of actions generated according to client modules after closing of the circuit and is issued to AI meeting
Engine modules are talked about, and is received from AI session engine module and executes corresponding movement;
The AI session engine module, record and control current sessions state, and the event for combining voice gateway module to be passed to
To issuing different instructions to client session processing module.
Further, the communication protocol of the various channels of client modules is converted unification by the voice gateway module
Communication protocol, and event recognition is carried out, following several core events can be generated in the life cycle for the call connected all the way:
1, the event SPEECH CONNET: is generated after establishing communication connection with client modules;
2, SPEECH CHANNEL_ANSWER: client modules generate the event after receiving calls;
3, SPEECH CHANNEL_EXECUTE: start to generate the event when playing one section of voice to client modules;
4, SPEECH CHANNEL_EXECUTE_COMPLETE: voice generates the event after finishing;
5, SPEECH ASR_START: listen to client modules voice flow it is incoming after generate the event;
6, SPEECH ASR_END: generating the event after client voice flow end of transmission, which carries the ASR knot of client
Fruit, i.e. voiced translation result;
7, SPEECH HANGUP: either party generates the event after actively hanging up the telephone.
Further, the voice gateway module has event generation ability and ASR understandability, and client modules are passed to
Voice flow be converted to text and be issued to AI session engine module in the form of event.
Further, the AI session engine module inside sets dialogue control module, tree-like data store organisation, commonly knows
Know library and system knowledge base;Data store organisation is the storage unit for talking about art;General knowledge library is the storage unit of general knowledge;
System knowledge base is the storage unit of systematic knowledge.
Further, the session status is divided into voice gateway module not in playback, voice gateway module playback, voice network
It closes in module pause, call has terminated;The instruction that the incoming event of session status combination voice gateway module can produce includes
Initialization, actively initiation dialogue, other side's speech content, other side just interrupt voice gateway module and speak in silencing, other side;
The session state transfer logic judgment scheme of AI session engine module:
1, when the incoming event " SPEECH CONNET " of voice gateway module arrives, initialization directive, hair AI CONNET: are generated
It is sent to voice gateway module, does not change session status at this time;
2, AI CHANNEL_ANSWER: when the incoming event " SPEECH CHANNEL_ANSWER " of voice gateway module arrives,
The instruction for generating other side's speech content, is sent to voice gateway module, session status is transferred to voice gateway module not at this time
In playback;After " SPEECH CHANNEL_ANSWER ", words art is executed, generates the instruction for actively initiating dialogue;
3, AI CHANNEL_EXECUTE: the incoming event " SPEECH CHANNEL_EXECUTE " of voice gateway module arrives
When;Judge current sessions state if state in pause, is not then changed, otherwise session state transfer to voice gateway module playback
In, and other side is generated just in the instruction of silencing;
4, AI CHANNE_EXECUTE_COMPLETE: incoming event " the SPEECH CHANNEL_ of voice gateway module
When EXECUTE_COMPLETE " arrives, it will speech phase is transferred to voice gateway module not in playback;
5, AI ASR_START: when the incoming event " SPEECH ASR_END " of voice gateway module arrives, if at current state
An other side is then generated in voice gateway module playback interrupts the instruction that voice gateway module is spoken;AI session engine module root
Language is interrupted according to speech recognition module identification, decides whether pause voice gateway module playback, and session status is turned
It moves on in pause or not transfering state: if pause voice gateway module playback, session status is transferred in pause, and is generated
The instruction that one other side is speaking is sent to voice gateway module, if not suspending voice gateway module playback, does not shift meeting
Speech phase;
6, AI ASR_END: session state transfer is decided whether to when pause according to the event that voice gateway module returns
On-hook generates one and actively initiates the instruction of dialogue, and shifts session status position and terminated or voice gateway module to conversing
In playback;
7, AI HANGUP: either party generates the event after actively hanging up the telephone;
A kind of auto-answer method using above system, comprising the following steps:
Step 1. client modules and voice gateway module establish bi-directionally established both-way communication access, generate event " SPEECH
CONNET";
Step 2.AI session engine module generates event " AI CONNET ", generates initialization directive, reads words art, reads knowledge
Library loads global ambiguity dictionary, and initialization directive is returned to client modules back through voice gateway module;
Step 3. client modules receive calls, and generate event " SPEECH CHANNEL_ANSWER ", and voice gateway module receives
The speech information that client modules are sent, and speech information is sent to AI session engine module, carry out words art knowledge
Matching;AI session engine module generates event " AI CHANNEL_ANSWER ";
Step 3a: information of making a speech is carried out to the correction of ambiguity word;It includes the correction of homonym and the correction of synonym that ambiguity, which is corrected,;
Step 3b: according to the progress of the session node of current speech information, art keyword in speech information is obtained, by making
Intention branch is matched with session content of the regular expression technology to speech information;
If, according to tree-like data store organisation, getting next words art of connection after information matches to intention branch of making a speech
Node;Words art node can be divided into ordinary node, adjustment node;
Ordinary node: the art to answer user executes step 4, can increase the action mark for sending short message;
Jump node: be divided into following movement: 1. jump to next main flow node, continue as user and answer words art, execute step
Rapid 4;2. being adjusted to specified main flow node, words art is answered for user, executes step 4;3. on-hook acts, terminate this session;
4. sending short message, the short message of user demand is sent;
After the feedback action for having collected words art node, movement is sent to voice gateway module and is executed;
If being not matched to intention branch, the matching of knowledge base is carried out;Knowledge base is broadly divided into 2 classes:
The traffic issues of user are fed back in general knowledge library, execute step 4;
System knowledge base, is subdivided into 3 classes: 1. can not reply process: the problem of user, can not retrieve answer, at this point, AI session
Engine modules execute preset voice playback action, execute step 4;2. interrupting processing: being acted in AI session engine module feedback
In execution, user is interrupted midway, and the movement that stopping is carrying out by AI engine executes step 5;3. repetitive operation handle: with
Family is not heard or the movement without understanding feedback, and movement before is repeated execution one time by AI session engine module;
Step 4. voice gateway module generates event " SPEECH CHANNEL_EXECUTE ", and AI session engine module generates event
" AI CHANNEL_EXECUTE ": AI session engine module judges whether one section of speech information receives;When one section of speech is believed
Breath receives, and AI session engine module, which generates, initiates dialogue instruction, sends voice gateway module for art if matching, it will
Speech phase is transferred in voice gateway module playback;Voice gateway module starts the art to one section of matching of client modules broadcasting
Voice;Then step 3 is executed;
When step 5. talks about the broadcasting of term sound, if the voice flow that voice gateway module listens to client modules is incoming, voice gateways
After module carries out noise filtering to voice flow, it is transmitted to the operation that AI session engine module interrupt word filtering;That is voice network
It closes module and generates event " SPEECH ASR_START " and " SPEECH ASR_END ";If voice flow is all filter word, AI
Session engine module does not generate new movement;If voice flow is not all filter word, the execution of AI session engine module interrupts behaviour
Make, instruction will be interrupted and be sent to voice gateway module, voice gateway module stops the broadcasting of words term sound, i.e. AI session engine mould
Block generates event " AI ASR_START " and " AI ASR_END ", meanwhile, execute step 3.
Further, in step 4, AI session engine module judges whether one section of speech information receives, and uses with lower section
Method:
AI session engine module, conversate sampling, presets the N number of sampled point of setting in a period of time T;N is of total sampled point
Number is fixed sample point number n1With stochastical sampling point number n2The sum of, i.e. N=n1+n2;The sampling time of fixed sample point
Are as follows: xt1±t2, wherein x is no more than n1Positive integer, t1=T/n1;t2The time interval generated by random function, and 0 < t2<
t1;Stochastical sampling point number n2The random acquisition in time T, 0 < n2≤n1/2;
When data record is sonance, this is effective sampling points;When data record is silent state, this is invalid sampling
Point;When more than half of the number N of the total sampled point of effective sampling points Zhan, then judges that speech information no-reception finishes, otherwise sentence
Disconnected speech information receives.
The object of the invention is to utilize AI the relevant technologies using a kind of, realizes that machine and true man's interactive scene, dialog interface are full
The scenes such as sufficient phone, voice-over-net, IM message.The present invention can help enterprise to establish intelligent customer service system, with overwhelming majority weight
Multiple inter-work is executed by machine, provides product introduction and guide service interactive service simultaneously for client in machine, helps enterprise
Process significant data in interaction is collected, provides data basis for subsequent big data analysis.
Compared to the similar product in current industry, retrieval knowledge of the invention is quick;Access diversification of forms, can voice,
Text or language and characters mix three's mode, also expansible other business switching channel;More compared to conventional intelligence access voice
Add flexibly, using face is more extensive.
Detailed description of the invention
Fig. 1 is the structural block diagram of this system;
Fig. 2 is flow chart of the invention;
Fig. 3 is words art processing timing diagram.
Specific embodiment
A kind of automatic answering system applied to session operational scenarios, including client session processing module and AI session engine mould
Block.
The client session processing module, as to adaptation layer, helped before interaction before scene dialogue to content recognition with
Event recognition, including client modules, voice gateway module.
The client modules, for mobile phone, base with speech communication function, or can communication text it is micro-
The social tools such as letter, wechat public platform.
Call is initiated or answered to the voice gateway module actively according to dial plan, by basis after closing of the circuit
A series of actions that client modules generate provides corresponding ESL (Event Socket Library) event and is issued to AI session
Engine modules, and received from AI session engine module and execute corresponding movement.
The communication protocol of the various channels of client modules is converted unified communication protocol by voice gateway module, and
Event recognition is carried out, following several core events can be generated in the life cycle for the call connected all the way:
1, the event SPEECH CONNET: is generated after establishing communication connection with client modules;
2, SPEECH CHANNEL_ANSWER: client modules generate the event after receiving calls;
3, SPEECH CHANNEL_EXECUTE: start to generate the event when playing one section of voice to client modules;
4, SPEECH CHANNEL_EXECUTE_COMPLETE: voice generates the event after finishing;
5, SPEECH ASR_START: listen to client modules voice flow it is incoming after generate the event;
6, SPEECH ASR_END: generating the event after client voice flow end of transmission, which carries the ASR knot of client
Fruit, i.e. voiced translation result.
7, SPEECH HANGUP: either party generates the event after actively hanging up the telephone.
The generation of these events is all relatively orderly under normal circumstances, and SPEECH CHANNEL_EXECUT, SPEECH
CHANNEL_EXECUTE_COMPLETE is mostly that pairs of form successively occurs, SPEECH ASR_START, SPEECH ASR_
It is that pairs of form successively occurs that END is also mostly.The event finally occurred is SPEECH HANGUP.
Voice gateway module has event generation ability and ASR (Automatic Speech Recognition) understands energy
The voice flow that client modules are passed to can be converted to text and be issued to AI session engine module in the form of event by power.Example
Such as, voice gateway module is integrated with speech recognition module, which can be used the Aitalk of Iflytek company
2.0, InterReco 2.0 etc..
The AI session engine module, record and control current sessions state, and voice gateway module is combined to be passed to
Event to different instructions is issued to client session processing module, inside set dialogue control module, tree-like data store organisation,
General knowledge library and system knowledge base.Data store organisation is the storage unit for talking about art.General knowledge library is depositing for general knowledge
Store up component.System knowledge base is the storage unit of systematic knowledge, facilitates the domain requirement according to dialogue, increases and decreases the profession in the field
Knowledge.
Wherein session status is divided into voice gateway module not in playback, voice gateway module playback, voice gateway module
In pause, conversing has terminated.The instruction that the incoming event of session status combination voice gateway module can produce include initialization,
Actively initiation dialogue, other side's speech content, other side just interrupt voice gateway module and speak in silencing, other side.
The session state transfer logic judgment scheme of AI session engine module:
1, when the incoming event " SPEECH CONNET " of voice gateway module arrives, initialization directive, hair AI CONNET: are generated
It is sent to voice gateway module, does not change session status at this time;
2, AI CHANNEL_ANSWER: when the incoming event " SPEECH CHANNEL_ANSWER " of voice gateway module arrives,
The instruction for generating other side's speech content, is sent to voice gateway module, session status is transferred to voice gateway module not at this time
In playback;After " SPEECH CHANNEL_ANSWER ", words art is executed, generates the instruction for actively initiating dialogue;
3, AI CHANNEL_EXECUTE: the incoming event " SPEECH CHANNEL_EXECUTE " of voice gateway module arrives
When;Judge current sessions state if state in pause, is not then changed, otherwise session state transfer to voice gateway module playback
In, and other side is generated just in the instruction of silencing;
4, AI CHANNE_EXECUTE_COMPLETE: incoming event " the SPEECH CHANNEL_ of voice gateway module
When EXECUTE_COMPLETE " arrives, it will speech phase is transferred to voice gateway module not in playback;
5, AI ASR_START: when the incoming event " SPEECH ASR_END " of voice gateway module arrives, if at current state
An other side is then generated in voice gateway module playback interrupts the instruction that voice gateway module is spoken;AI session engine module root
Language is interrupted according to speech recognition module identification, decides whether pause voice gateway module playback, and session status is turned
It moves on in pause or not transfering state: if pause voice gateway module playback, session status is transferred in pause, and is generated
The instruction that one other side is speaking is sent to voice gateway module, if not suspending voice gateway module playback, does not shift meeting
Speech phase;
6, AI ASR_END: session state transfer is decided whether to when pause according to the event that voice gateway module returns
On-hook generates one and actively initiates the instruction of dialogue, and shifts session status position and terminated or voice gateway module to conversing
In playback;
7, AI HANGUP: either party generates the event after actively hanging up the telephone.
A kind of automatic answering system and logical method applied to session operational scenarios, comprising the following steps:
Step 1. client modules and voice gateway module establish bi-directionally established both-way communication access, generate event " SPEECH
CONNET";
Step 2.AI session engine module generates event " AI CONNET ", generates initialization directive, reads words art, reads knowledge
Library loads global ambiguity dictionary, and initialization directive is returned to client modules back through voice gateway module.
Step 3. client modules receive calls, and generate event " SPEECH CHANNEL_ANSWER ", voice gateway module
The speech information that client modules are sent is received, and speech information is sent to AI session engine module, words art is carried out and knows
The matching of knowledge;AI session engine module generates event " AI CHANNEL_ANSWER ".
Step 3a: information of making a speech is carried out to the correction of ambiguity word.Ambiguity correction includes correction and the synonym of homonym
It corrects.
Step 3b: according to the progress of the session node of current speech information, art keyword in speech information is obtained, is led to
It crosses and matches intention branch using session content of the regular expression technology to speech information.
If, according to tree-like data store organisation, getting the next of connection after information matches to intention branch of making a speech
Talk about art node;Words art node can be divided into ordinary node, adjustment node.
Ordinary node: the art to answer user executes step 4, can increase the action mark for sending short message.
Jump node: be divided into following movement: 1. jump to next main flow node, continue as user and answer words art, hold
Row step 4;2. being adjusted to specified main flow node, words art is answered for user, executes step 4;3. on-hook acts, terminate this meeting
Words;4. sending short message, the short message of user demand is sent.
After the feedback action for having collected words art node, movement is sent to voice gateway module and is executed.
If being not matched to intention branch, the matching of knowledge base is carried out;Knowledge base is broadly divided into 2 classes:
The traffic issues of user are fed back in general knowledge library, execute step 4;
System knowledge base, is subdivided into 3 classes: 1. can not reply process: the problem of user, can not retrieve answer, at this point, AI session
Engine modules execute preset voice playback action, execute step 4;2. interrupting processing: being acted in AI session engine module feedback
In execution, user is interrupted midway, and the movement that stopping is carrying out by AI engine executes step 5;3. repetitive operation handle: with
Family is not heard or the movement without understanding feedback, and movement before is repeated execution one time by AI session engine module.
Step 4. voice gateway module generates event " SPEECH CHANNEL_EXECUTE ", and AI session engine module generates
Event " AI CHANNEL_EXECUTE ": AI session engine module judges whether one section of speech information receives;It is sent out when one section
Speech information receives, and AI session engine module, which generates, initiates dialogue instruction, sends voice gateway module for art if matching,
Session status is transferred in voice gateway module playback;Voice gateway module starts matched to one section of client modules broadcasting
Talk about term sound;Then step 3 is executed.
AI session engine module judges whether one section of speech information receives, using following methods:
AI session engine module, conversate sampling, presets the N number of sampled point of setting in a period of time T;N is of total sampled point
Number is fixed sample point number n1With stochastical sampling point number n2The sum of, i.e. N=n1+n2.The sampling time of fixed sample point
Are as follows: xt1±t2, wherein x is no more than n1Positive integer, t1=T/n1;t2The time interval generated by random function, and 0 < t2<
t1;Stochastical sampling point number n2The random acquisition in time T, 0 < n2≤n1/2。
When data record is sonance, this is effective sampling points;When data record is silent state, this is invalid
Sampled point.When more than half of the number N of the total sampled point of effective sampling points Zhan, then judge that speech information no-reception finishes, it is no
Then judge that speech information receives.
For example, 35 sampled points are arranged in default 30 seconds a period of times;Fixed sample point number n1It is 30, stochastical sampling
Point number n2It is 5.Its t before and after per second of fixed sample point2Interior acquisition, stochastical sampling o'clock random acquisition in 30 seconds.
The method has fully ensured that the randomness and harmony of sampled point.Some users speak with timing, if solid
When determining the time interval between sampled point to determine numerical value, the two is easy to produce overlapping, and continuous several sampled points is easy to cause to fall
In sound point or noiseless point.Therefore, traditional fixed sample point has one-sidedness.This method falls within every fixation using point for fixed
Time interval point t1Front and back t2Interior acquisition, by Fixed Time Interval point t1It is laid in whole section of time interval T, and adjacent fixation is adopted
The time interval of collection point all has randomness.
Meanwhile this method also sets up stochastical sampling point, can not adopt so as to avoid time starting point or time end point
The disadvantage of sample.Such as: since first sampled point is in (t1±t2), and 0 < t2<t1, cause time starting point that can not sample.If t2=
t1, it will lead to and generate the possibility that continuous 3 adjacent fixed sample points fall within same point, increase Duplication, therefore t2≠t1.Meanwhile
Stochastical sampling point increases sampling density at random, increases the reduction degree of sampling.
When step 5. talks about the broadcasting of term sound, if the voice flow that voice gateway module listens to client modules is incoming, voice
After gateway module carries out noise filtering to voice flow, it is transmitted to the operation that AI session engine module interrupt word filtering;That is language
Sound gateway module generates event " SPEECH ASR_START " and " SPEECH ASR_END ";If voice flow is all filter word,
Then AI session engine module does not generate new movement;If voice flow is not all filter word, the execution of AI session engine module is beaten
Disconnected operation will interrupt instruction and be sent to voice gateway module, and voice gateway module stops the broadcasting of words term sound, i.e. AI session is drawn
It holds up module and generates event " AI ASR_START " and " AI ASR_END ", meanwhile, execute step 3.Filter word contains for not essence
Justice word, such as: uh, eh, it is good.
The present invention can to answer the call, any communication class tool such as wechat, microblogging, webpage IM, be suitable for covering tools of communications more;
Only need to extend engaging tool mode, the intelligent answer engine of bottom is adapted to without making an amendment set;Primary configuration is i.e. applicable
In enterprise phone customer service, the scene by all kinds of means such as wechat customer service;Rapid expansion user interaction channel and experience, simplify enterprise's maintenance at
This;
In order to improve the response speed and the request of possible massive concurrent, event processing module and voice gateways of system entirety
Communication architecture between module is built using the Netty network frame that current industry is had excellent performance.
It is interrupted in processing what voice was linked up, the present invention is more humanized compared to doing in traditional initial mode, to normal
Rule link up in answer situation that such as " pair ", " good " etc. do not need to interrupt identify, without interrupting.
Compared to traditional training method, the present invention provides more training informations;Matching classification including knowledge point,
Keyword, ambiguity word, which other intention branch being matched to, knowledge point and the knowledge points preferentially selected are;Improve instruction
Pilot's training words art is the efficiency of investigation problem;
Conversational mode is flexible, actively initiates dialogue, or receive service session, and only simple modification, should all fall within this technology side
In the frame of case.
It, can according to the technique and scheme of the present invention and its hair it is understood that for those of ordinary skills
Bright design is subject to equivalent substitution or change, and all these changes or replacement all should belong to the guarantor of appended claims of the invention
Protect range.