CN109977218A

CN109977218A - A kind of automatic answering system and method applied to session operational scenarios

Info

Publication number: CN109977218A
Application number: CN201910324994.5A
Authority: CN
Inventors: 孟宪坤; 田文; 曹金龙
Original assignee: Zhejiang Huakun Dove Data Technology Co Ltd
Current assignee: Zhejiang Huakun Dove Data Technology Co Ltd
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2019-07-05
Anticipated expiration: 2039-04-22
Also published as: CN109977218B

Abstract

A kind of automatic answering system and method applied to session operational scenarios, belongs to telephone communicating technology field, comprising: client session processing module and AI session engine module；The client session processing module, as, to adaptation layer, help scene dialogue is preceding to content recognition and event recognition, including client modules, voice gateway module before interaction；The AI session engine module, record and control current sessions state, and the event for combining voice gateway module to be passed to is to issuing different instructions to client session processing module.Retrieval knowledge of the invention is quick；Access diversification of forms, can voice, text or language and characters mix three's mode, also other expansible business are transferred channel；More flexible compared to conventional intelligence access voice, using face is more extensive.

Description

A kind of automatic answering system and method applied to session operational scenarios

Technical field

The invention belongs to telephone communicating technology fields, and in particular to be a kind of automatic answering system applied to session operational scenarios And method.

Background technique

It between enterprise and client in daily communication process, needs to expend a large amount of costs of labor and personnel's energy problem, markets Respectively knowledge experience is different by attendant, while understanding, expressing, the features such as mood, spoken language, causes service effectiveness irregular, To influence service quality and result.Therefore, be born intelligent sound, attends a banquet and Communication with Customer to substitute.

Currently, intelligent sound can pass through ASR(Real-time speech recognition) and NLP(natural language understanding), realize that machine is real When understand human sound language, carry out AI in scenes such as customer service, sale and intelligently link up, extensive language is carried out to human sound language Material training, under given scenario, the preferable identification model of available identification quality, voice gateway module is by the sound of mankind reality When be sent in ASR and identified, obtain the recognition result of textual form, be used for keyword match or semantic processes, obtain Preset question and answer, plays back with audio form, is linked up with matching man-machine voice.

Although existing scheme can support voice gateway module and the language communication of the mankind, substantially with people and voice gateways Between module based on the form of question-response, it is difficult to accomplish the exchange of chipping in of human levels, it is more inflexible and unnatural.For visit The access of chipping in suddenly of visitor, if voice gateway module is unmoved, it appears rude, exchange is unfriendly, and user must hear out The default words art of whole voice gateway module, and can not interrupt within the period of voice gateway module words art output or statement into question, It is difficult to realize in communication timely, quick；On the other hand, chipping in for visitor is interrupted, and may have the inquiry of more urgent problems, if not It is switched on relevant issues node in time, client's time can be wasted.In conclusion existing intelligent sound voice gateway module and people Speech exchange scheme it is still to be improved in interactive experience, communication efficiency.

Summary of the invention

It is an object of the invention to overcoming defect and deficiency mentioned above, and provide it is a kind of applied to session operational scenarios from Dynamic answering system.

Another object of the present invention is to provide a kind of auto-answer methods applied to session operational scenarios.

A kind of automatic answering system applied to session operational scenarios characterized by comprising client session processing module and AI session engine module；

The client session processing module, as, to adaptation layer, help scene dialogue is preceding to content recognition and event before interaction Identification, including client modules, voice gateway module；The client modules, for speech communication function mobile phone, Base, or can communication text social tool；The voice gateway module is actively initiated or is answered according to dial plan Call, provides corresponding ESL event for a series of actions generated according to client modules after closing of the circuit and is issued to AI meeting Engine modules are talked about, and is received from AI session engine module and executes corresponding movement；

The AI session engine module, record and control current sessions state, and the event for combining voice gateway module to be passed to To issuing different instructions to client session processing module.

Further, the communication protocol of the various channels of client modules is converted unification by the voice gateway module Communication protocol, and event recognition is carried out, following several core events can be generated in the life cycle for the call connected all the way:

1, the event SPEECH CONNET: is generated after establishing communication connection with client modules；

2, SPEECH CHANNEL_ANSWER: client modules generate the event after receiving calls；

3, SPEECH CHANNEL_EXECUTE: start to generate the event when playing one section of voice to client modules；

4, SPEECH CHANNEL_EXECUTE_COMPLETE: voice generates the event after finishing；

5, SPEECH ASR_START: listen to client modules voice flow it is incoming after generate the event；

6, SPEECH ASR_END: generating the event after client voice flow end of transmission, which carries the ASR knot of client Fruit, i.e. voiced translation result；

7, SPEECH HANGUP: either party generates the event after actively hanging up the telephone.

Further, the voice gateway module has event generation ability and ASR understandability, and client modules are passed to Voice flow be converted to text and be issued to AI session engine module in the form of event.

Further, the AI session engine module inside sets dialogue control module, tree-like data store organisation, commonly knows Know library and system knowledge base；Data store organisation is the storage unit for talking about art；General knowledge library is the storage unit of general knowledge； System knowledge base is the storage unit of systematic knowledge.

Further, the session status is divided into voice gateway module not in playback, voice gateway module playback, voice network It closes in module pause, call has terminated；The instruction that the incoming event of session status combination voice gateway module can produce includes Initialization, actively initiation dialogue, other side's speech content, other side just interrupt voice gateway module and speak in silencing, other side；

The session state transfer logic judgment scheme of AI session engine module:

1, when the incoming event " SPEECH CONNET " of voice gateway module arrives, initialization directive, hair AI CONNET: are generated It is sent to voice gateway module, does not change session status at this time；

2, AI CHANNEL_ANSWER: when the incoming event " SPEECH CHANNEL_ANSWER " of voice gateway module arrives, The instruction for generating other side's speech content, is sent to voice gateway module, session status is transferred to voice gateway module not at this time In playback；After " SPEECH CHANNEL_ANSWER ", words art is executed, generates the instruction for actively initiating dialogue；

3, AI CHANNEL_EXECUTE: the incoming event " SPEECH CHANNEL_EXECUTE " of voice gateway module arrives When；Judge current sessions state if state in pause, is not then changed, otherwise session state transfer to voice gateway module playback In, and other side is generated just in the instruction of silencing；

4, AI CHANNE_EXECUTE_COMPLETE: incoming event " the SPEECH CHANNEL_ of voice gateway module When EXECUTE_COMPLETE " arrives, it will speech phase is transferred to voice gateway module not in playback；

5, AI ASR_START: when the incoming event " SPEECH ASR_END " of voice gateway module arrives, if at current state An other side is then generated in voice gateway module playback interrupts the instruction that voice gateway module is spoken；AI session engine module root Language is interrupted according to speech recognition module identification, decides whether pause voice gateway module playback, and session status is turned It moves on in pause or not transfering state: if pause voice gateway module playback, session status is transferred in pause, and is generated The instruction that one other side is speaking is sent to voice gateway module, if not suspending voice gateway module playback, does not shift meeting Speech phase；

6, AI ASR_END: session state transfer is decided whether to when pause according to the event that voice gateway module returns On-hook generates one and actively initiates the instruction of dialogue, and shifts session status position and terminated or voice gateway module to conversing In playback；

7, AI HANGUP: either party generates the event after actively hanging up the telephone；

A kind of auto-answer method using above system, comprising the following steps:

Step 1. client modules and voice gateway module establish bi-directionally established both-way communication access, generate event " SPEECH CONNET"；

Step 2.AI session engine module generates event " AI CONNET ", generates initialization directive, reads words art, reads knowledge Library loads global ambiguity dictionary, and initialization directive is returned to client modules back through voice gateway module；

Step 3. client modules receive calls, and generate event " SPEECH CHANNEL_ANSWER ", and voice gateway module receives The speech information that client modules are sent, and speech information is sent to AI session engine module, carry out words art knowledge Matching；AI session engine module generates event " AI CHANNEL_ANSWER "；

Step 3a: information of making a speech is carried out to the correction of ambiguity word；It includes the correction of homonym and the correction of synonym that ambiguity, which is corrected,；

Step 3b: according to the progress of the session node of current speech information, art keyword in speech information is obtained, by making Intention branch is matched with session content of the regular expression technology to speech information；

If, according to tree-like data store organisation, getting next words art of connection after information matches to intention branch of making a speech Node；Words art node can be divided into ordinary node, adjustment node；

Ordinary node: the art to answer user executes step 4, can increase the action mark for sending short message；

Jump node: be divided into following movement: 1. jump to next main flow node, continue as user and answer words art, execute step Rapid 4；2. being adjusted to specified main flow node, words art is answered for user, executes step 4；3. on-hook acts, terminate this session； 4. sending short message, the short message of user demand is sent；

After the feedback action for having collected words art node, movement is sent to voice gateway module and is executed；

If being not matched to intention branch, the matching of knowledge base is carried out；Knowledge base is broadly divided into 2 classes:

The traffic issues of user are fed back in general knowledge library, execute step 4；

System knowledge base, is subdivided into 3 classes: 1. can not reply process: the problem of user, can not retrieve answer, at this point, AI session Engine modules execute preset voice playback action, execute step 4；2. interrupting processing: being acted in AI session engine module feedback In execution, user is interrupted midway, and the movement that stopping is carrying out by AI engine executes step 5；3. repetitive operation handle: with Family is not heard or the movement without understanding feedback, and movement before is repeated execution one time by AI session engine module；

Step 4. voice gateway module generates event " SPEECH CHANNEL_EXECUTE ", and AI session engine module generates event " AI CHANNEL_EXECUTE ": AI session engine module judges whether one section of speech information receives；When one section of speech is believed Breath receives, and AI session engine module, which generates, initiates dialogue instruction, sends voice gateway module for art if matching, it will Speech phase is transferred in voice gateway module playback；Voice gateway module starts the art to one section of matching of client modules broadcasting Voice；Then step 3 is executed；

When step 5. talks about the broadcasting of term sound, if the voice flow that voice gateway module listens to client modules is incoming, voice gateways After module carries out noise filtering to voice flow, it is transmitted to the operation that AI session engine module interrupt word filtering；That is voice network It closes module and generates event " SPEECH ASR_START " and " SPEECH ASR_END "；If voice flow is all filter word, AI Session engine module does not generate new movement；If voice flow is not all filter word, the execution of AI session engine module interrupts behaviour Make, instruction will be interrupted and be sent to voice gateway module, voice gateway module stops the broadcasting of words term sound, i.e. AI session engine mould Block generates event " AI ASR_START " and " AI ASR_END ", meanwhile, execute step 3.

Further, in step 4, AI session engine module judges whether one section of speech information receives, and uses with lower section Method:

AI session engine module, conversate sampling, presets the N number of sampled point of setting in a period of time T；N is of total sampled point Number is fixed sample point number n₁With stochastical sampling point number n₂The sum of, i.e. N=n₁+n₂；The sampling time of fixed sample point Are as follows: xt₁±t₂, wherein x is no more than n₁Positive integer, t₁=T/n₁；t₂The time interval generated by random function, and 0 < t₂< t₁；Stochastical sampling point number n₂The random acquisition in time T, 0 < n₂≤n₁/2；

When data record is sonance, this is effective sampling points；When data record is silent state, this is invalid sampling Point；When more than half of the number N of the total sampled point of effective sampling points Zhan, then judges that speech information no-reception finishes, otherwise sentence Disconnected speech information receives.

The object of the invention is to utilize AI the relevant technologies using a kind of, realizes that machine and true man's interactive scene, dialog interface are full The scenes such as sufficient phone, voice-over-net, IM message.The present invention can help enterprise to establish intelligent customer service system, with overwhelming majority weight Multiple inter-work is executed by machine, provides product introduction and guide service interactive service simultaneously for client in machine, helps enterprise Process significant data in interaction is collected, provides data basis for subsequent big data analysis.

Compared to the similar product in current industry, retrieval knowledge of the invention is quick；Access diversification of forms, can voice, Text or language and characters mix three's mode, also expansible other business switching channel；More compared to conventional intelligence access voice Add flexibly, using face is more extensive.

Detailed description of the invention

Fig. 1 is the structural block diagram of this system；

Fig. 2 is flow chart of the invention；

Fig. 3 is words art processing timing diagram.

Specific embodiment

A kind of automatic answering system applied to session operational scenarios, including client session processing module and AI session engine mould Block.

The client session processing module, as to adaptation layer, helped before interaction before scene dialogue to content recognition with Event recognition, including client modules, voice gateway module.

The client modules, for mobile phone, base with speech communication function, or can communication text it is micro- The social tools such as letter, wechat public platform.

Call is initiated or answered to the voice gateway module actively according to dial plan, by basis after closing of the circuit A series of actions that client modules generate provides corresponding ESL (Event Socket Library) event and is issued to AI session Engine modules, and received from AI session engine module and execute corresponding movement.

The communication protocol of the various channels of client modules is converted unified communication protocol by voice gateway module, and Event recognition is carried out, following several core events can be generated in the life cycle for the call connected all the way:

6, SPEECH ASR_END: generating the event after client voice flow end of transmission, which carries the ASR knot of client Fruit, i.e. voiced translation result.

The generation of these events is all relatively orderly under normal circumstances, and SPEECH CHANNEL_EXECUT, SPEECH CHANNEL_EXECUTE_COMPLETE is mostly that pairs of form successively occurs, SPEECH ASR_START, SPEECH ASR_ It is that pairs of form successively occurs that END is also mostly.The event finally occurred is SPEECH HANGUP.

Voice gateway module has event generation ability and ASR (Automatic Speech Recognition) understands energy The voice flow that client modules are passed to can be converted to text and be issued to AI session engine module in the form of event by power.Example Such as, voice gateway module is integrated with speech recognition module, which can be used the Aitalk of Iflytek company 2.0, InterReco 2.0 etc..

The AI session engine module, record and control current sessions state, and voice gateway module is combined to be passed to Event to different instructions is issued to client session processing module, inside set dialogue control module, tree-like data store organisation, General knowledge library and system knowledge base.Data store organisation is the storage unit for talking about art.General knowledge library is depositing for general knowledge Store up component.System knowledge base is the storage unit of systematic knowledge, facilitates the domain requirement according to dialogue, increases and decreases the profession in the field Knowledge.

Wherein session status is divided into voice gateway module not in playback, voice gateway module playback, voice gateway module In pause, conversing has terminated.The instruction that the incoming event of session status combination voice gateway module can produce include initialization, Actively initiation dialogue, other side's speech content, other side just interrupt voice gateway module and speak in silencing, other side.

The session state transfer logic judgment scheme of AI session engine module:

7, AI HANGUP: either party generates the event after actively hanging up the telephone.

A kind of automatic answering system and logical method applied to session operational scenarios, comprising the following steps:

Step 2.AI session engine module generates event " AI CONNET ", generates initialization directive, reads words art, reads knowledge Library loads global ambiguity dictionary, and initialization directive is returned to client modules back through voice gateway module.

Step 3. client modules receive calls, and generate event " SPEECH CHANNEL_ANSWER ", voice gateway module The speech information that client modules are sent is received, and speech information is sent to AI session engine module, words art is carried out and knows The matching of knowledge；AI session engine module generates event " AI CHANNEL_ANSWER ".

Step 3a: information of making a speech is carried out to the correction of ambiguity word.Ambiguity correction includes correction and the synonym of homonym It corrects.

Step 3b: according to the progress of the session node of current speech information, art keyword in speech information is obtained, is led to It crosses and matches intention branch using session content of the regular expression technology to speech information.

If, according to tree-like data store organisation, getting the next of connection after information matches to intention branch of making a speech Talk about art node；Words art node can be divided into ordinary node, adjustment node.

Ordinary node: the art to answer user executes step 4, can increase the action mark for sending short message.

Jump node: be divided into following movement: 1. jump to next main flow node, continue as user and answer words art, hold Row step 4；2. being adjusted to specified main flow node, words art is answered for user, executes step 4；3. on-hook acts, terminate this meeting Words；4. sending short message, the short message of user demand is sent.

After the feedback action for having collected words art node, movement is sent to voice gateway module and is executed.

System knowledge base, is subdivided into 3 classes: 1. can not reply process: the problem of user, can not retrieve answer, at this point, AI session Engine modules execute preset voice playback action, execute step 4；2. interrupting processing: being acted in AI session engine module feedback In execution, user is interrupted midway, and the movement that stopping is carrying out by AI engine executes step 5；3. repetitive operation handle: with Family is not heard or the movement without understanding feedback, and movement before is repeated execution one time by AI session engine module.

Step 4. voice gateway module generates event " SPEECH CHANNEL_EXECUTE ", and AI session engine module generates Event " AI CHANNEL_EXECUTE ": AI session engine module judges whether one section of speech information receives；It is sent out when one section Speech information receives, and AI session engine module, which generates, initiates dialogue instruction, sends voice gateway module for art if matching, Session status is transferred in voice gateway module playback；Voice gateway module starts matched to one section of client modules broadcasting Talk about term sound；Then step 3 is executed.

AI session engine module judges whether one section of speech information receives, using following methods:

AI session engine module, conversate sampling, presets the N number of sampled point of setting in a period of time T；N is of total sampled point Number is fixed sample point number n₁With stochastical sampling point number n₂The sum of, i.e. N=n₁+n₂.The sampling time of fixed sample point Are as follows: xt₁±t₂, wherein x is no more than n₁Positive integer, t₁=T/n₁；t₂The time interval generated by random function, and 0 < t₂< t₁；Stochastical sampling point number n₂The random acquisition in time T, 0 < n₂≤n₁/2。

When data record is sonance, this is effective sampling points；When data record is silent state, this is invalid Sampled point.When more than half of the number N of the total sampled point of effective sampling points Zhan, then judge that speech information no-reception finishes, it is no Then judge that speech information receives.

For example, 35 sampled points are arranged in default 30 seconds a period of times；Fixed sample point number n₁It is 30, stochastical sampling Point number n₂It is 5.Its t before and after per second of fixed sample point₂Interior acquisition, stochastical sampling o'clock random acquisition in 30 seconds.

The method has fully ensured that the randomness and harmony of sampled point.Some users speak with timing, if solid When determining the time interval between sampled point to determine numerical value, the two is easy to produce overlapping, and continuous several sampled points is easy to cause to fall In sound point or noiseless point.Therefore, traditional fixed sample point has one-sidedness.This method falls within every fixation using point for fixed Time interval point t₁Front and back t₂Interior acquisition, by Fixed Time Interval point t₁It is laid in whole section of time interval T, and adjacent fixation is adopted The time interval of collection point all has randomness.

Meanwhile this method also sets up stochastical sampling point, can not adopt so as to avoid time starting point or time end point The disadvantage of sample.Such as: since first sampled point is in (t₁±t₂), and 0 < t₂<t₁, cause time starting point that can not sample.If t₂= t₁, it will lead to and generate the possibility that continuous 3 adjacent fixed sample points fall within same point, increase Duplication, therefore t₂≠t₁.Meanwhile Stochastical sampling point increases sampling density at random, increases the reduction degree of sampling.

When step 5. talks about the broadcasting of term sound, if the voice flow that voice gateway module listens to client modules is incoming, voice After gateway module carries out noise filtering to voice flow, it is transmitted to the operation that AI session engine module interrupt word filtering；That is language Sound gateway module generates event " SPEECH ASR_START " and " SPEECH ASR_END "；If voice flow is all filter word, Then AI session engine module does not generate new movement；If voice flow is not all filter word, the execution of AI session engine module is beaten Disconnected operation will interrupt instruction and be sent to voice gateway module, and voice gateway module stops the broadcasting of words term sound, i.e. AI session is drawn It holds up module and generates event " AI ASR_START " and " AI ASR_END ", meanwhile, execute step 3.Filter word contains for not essence Justice word, such as: uh, eh, it is good.

The present invention can to answer the call, any communication class tool such as wechat, microblogging, webpage IM, be suitable for covering tools of communications more； Only need to extend engaging tool mode, the intelligent answer engine of bottom is adapted to without making an amendment set；Primary configuration is i.e. applicable In enterprise phone customer service, the scene by all kinds of means such as wechat customer service；Rapid expansion user interaction channel and experience, simplify enterprise's maintenance at This；

In order to improve the response speed and the request of possible massive concurrent, event processing module and voice gateways of system entirety Communication architecture between module is built using the Netty network frame that current industry is had excellent performance.

It is interrupted in processing what voice was linked up, the present invention is more humanized compared to doing in traditional initial mode, to normal Rule link up in answer situation that such as " pair ", " good " etc. do not need to interrupt identify, without interrupting.

Compared to traditional training method, the present invention provides more training informations；Matching classification including knowledge point, Keyword, ambiguity word, which other intention branch being matched to, knowledge point and the knowledge points preferentially selected are；Improve instruction Pilot's training words art is the efficiency of investigation problem；

Conversational mode is flexible, actively initiates dialogue, or receive service session, and only simple modification, should all fall within this technology side In the frame of case.

It, can according to the technique and scheme of the present invention and its hair it is understood that for those of ordinary skills Bright design is subject to equivalent substitution or change, and all these changes or replacement all should belong to the guarantor of appended claims of the invention Protect range.

Claims

1. a kind of automatic answering system applied to session operational scenarios characterized by comprising client session processing module and AI Session engine module；

2. a kind of automatic answering system applied to session operational scenarios as described in claim 1, which is characterized in that the voice network Module is closed, converts unified communication protocol for the communication protocol of the various channels of client modules, and carry out event recognition, Following several core events can be generated in the life cycle for the call connected all the way:

3. a kind of automatic answering system applied to session operational scenarios as described in claim 1, which is characterized in that the voice network It closes module and has event generation ability and ASR understandability, the voice flow that client modules are passed to is converted into text with event Form be issued to AI session engine module.

4. a kind of automatic answering system applied to session operational scenarios as claimed in claim 2, which is characterized in that the AI session Engine modules inside set dialogue control module, tree-like data store organisation, general knowledge library and system knowledge base；Data storage Structure is to talk about the storage unit of art；General knowledge library is the storage unit of general knowledge；System knowledge base is depositing for systematic knowledge Store up component.

5. a kind of automatic answering system applied to session operational scenarios as claimed in claim 4, it is characterised in that: the session shape State be divided into voice gateway module not in playback, voice gateway module playback, voice gateway module pause in, call terminated； The instruction that the incoming event of session status combination voice gateway module can produce includes initialization, actively initiates dialogue, other side Speech content, other side just interrupt voice gateway module and speak in silencing, other side；

The session state transfer logic judgment scheme of AI session engine module:

6. a kind of auto-answer method using system described in claim 5, comprising the following steps:

7. a kind of auto-answer method applied to session operational scenarios as claimed in claim 6, it is characterised in that: in step 4, AI Session engine module judges whether one section of speech information receives, using following methods: