[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109841207A - A kind of exchange method and robot, server and storage medium - Google Patents

A kind of exchange method and robot, server and storage medium Download PDF

Info

Publication number
CN109841207A
CN109841207A CN201910157240.5A CN201910157240A CN109841207A CN 109841207 A CN109841207 A CN 109841207A CN 201910157240 A CN201910157240 A CN 201910157240A CN 109841207 A CN109841207 A CN 109841207A
Authority
CN
China
Prior art keywords
interactive object
interactive
voice signal
sound
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910157240.5A
Other languages
Chinese (zh)
Inventor
徐文浩
马世奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Shenzhen Robotics Systems Co Ltd
Cloudminds Inc
Original Assignee
Cloudminds Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Inc filed Critical Cloudminds Inc
Priority to CN201910157240.5A priority Critical patent/CN109841207A/en
Publication of CN109841207A publication Critical patent/CN109841207A/en
Pending legal-status Critical Current

Links

Landscapes

  • Manipulator (AREA)

Abstract

The invention relates to interactive voice field, a kind of exchange method and robot, server and storage medium are disclosed.In the application, which is applied to robot, comprising: obtains the voice signal of at least two interactive objects;The audio-frequency information of each voice signal is identified respectively;The interaction sequences of the interactive object are determined according to each audio-frequency information;According to the interaction sequences, information exchange is carried out with each interactive object one by one.So that robot during interacting, can recognize that the voice signal including multiple interactive objects, realization is interacted with more people.

Description

A kind of exchange method and robot, server and storage medium
Technical field
The invention relates to interactive voice field, in particular to a kind of exchange method and robot, server and deposit Storage media.
Background technique
Intellect service robot, which refers to, to complete certain clothes in human lives or the workplace substitution mankind or the auxiliary mankind The robot of business.This kind of robot needs more human-computer interactions, makes robot identify or understand the mankind by human-computer interaction The instruction of sending, and specified task is completed according to the instruction of the mankind, to realize man-machine collaboration.
However, it is found by the inventors that at least there are the following problems in the prior art: intelligent Service humanoid robot common at present Including customer service robot, guest-meeting robot and safety protection robot etc., this kind of robot can be handed over by language identification and the mankind Mutually, there are some problems when with human conversation, and e.g., when more people actively engage in the dialogue with robot simultaneously, robot does not have Specific response method;Robot is interrupted during engaging in the dialogue with people by other people, then cannot be further continued for talking with.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The application embodiment is designed to provide a kind of exchange method and robot, server and storage medium, makes Robot is obtained during interacting, can recognize that the voice signal including multiple interactive objects, realizes and is handed over more people Mutually.
For the above-mentioned technical purpose of realization, presently filed embodiment provides a kind of exchange method, is applied to robot, Include:
Obtain the voice signal of at least two interactive objects;
The audio-frequency information of each voice signal is identified respectively;
The interaction sequences of interactive object are determined according to each audio-frequency information;
According to interaction sequences, information exchange is carried out with each interactive object one by one.
Presently filed embodiment additionally provides a kind of robot, comprising: obtain module, identification module, determining module and Interactive module;
It obtains module to be used for, obtains the voice signal of at least two interactive objects;
Identification module is used for, and identifies the audio-frequency information of each voice signal respectively;
Determining module is used for, and the interaction sequences of interactive object are determined according to each audio-frequency information;
Interactive module is used for, and according to interaction sequences, carries out information exchange with each interactive object one by one.
Presently filed embodiment additionally provides a kind of server, comprising: at least one processor;And
The memory being connect at least one processor communication;
Wherein, memory is stored with the instruction that can be executed by least one processor, and instruction is held by least one processor Row, so that at least one processor is able to carry out above-mentioned exchange method.
Presently filed embodiment additionally provides a kind of computer readable storage medium, is stored with computer program, calculates Machine program realizes above-mentioned exchange method when being executed by processor.
In terms of existing technologies, robot can obtain the sound including at least two interactive objects to the application embodiment Sound signal, and robot may recognize that the audio-frequency information of each voice signal, so that robot can be handled including multiple interactions pair The voice signal of elephant improves robot to the processing capacity of voice signal, and robot and the sound according to each interactive object Frequency signal determines interaction sequences, carries out information exchange with each interactive object one by one, improves the friendship of robot and interactive object Mutual efficiency also can continue to carry out letter when solving that acoustic information includes at least two interactive object during robot interactive The problem for ceasing interaction, improves the interaction capabilities of robot.
In addition, identifying the audio-frequency information of each voice signal respectively, specifically include: each voice signal being carried out respectively It handles below: voice starting point and voice terminating point in identification voice signal, it is true according to voice starting point and voice terminating point Sound bite is made, analysis sound bite obtains audio signal.
In the embodiment, the sound bite of each voice signal is identified, and then audio is obtained according to the sound bite Signal, in order to which robot carries out information exchange according to sound bite and interactive object.
In addition, audio-frequency information includes voiceprint and loudness value;
The interaction sequences that interactive object is determined according to each audio-frequency information, specifically include:
Determine that the voiceprint for belonging to default vocal print library in each audio-frequency information is first kind voiceprint, according to the first kind The corresponding each loudness value of voiceprint is ranked up the corresponding interactive object of first kind voiceprint, generates First ray;Really The voiceprint that default vocal print library is not belonging in fixed each audio-frequency information is the second class voiceprint, according to the second class voiceprint Corresponding each loudness value is ranked up the corresponding interactive object of the second class voiceprint, generates the second sequence;According to the first sequence Column and the second sequence determine the interaction sequences of interactive object, wherein the interaction sequences of interactive object are than the second sequence in First ray The interaction sequences of interactive object are forward in column.
In addition, carrying out information exchange according to interaction sequences with each interactive object one by one, specifically including: suitable according to interaction Sequence determines current interactive object;According to the voice signal of current interactive object, the interaction of current interactive object is identified Instruction;After responding interactive instruction, switch interactive object according to interaction sequences.
In the embodiment, robot can carry out information exchange with interactive object one by one according to interaction sequences, change machine The interactive mode of device people improves the degree of intelligence of robot.
In addition, after response interactive instruction, before switching interactive object according to interaction sequences, the exchange method further include: really Determine the voice messaging that current interactive object has not been obtained in preset time.
In addition, voice starting point and voice terminating point in identification voice signal, terminate according to voice starting point and voice Point determines sound bite, specifically includes: voice signal being carried out sub-frame processing, generates i frame voiced frame, wherein i is greater than 1 Positive integer;Identify the speech frame and sound end in i frame voiced frame;Determine that sound end is according to the time of sound end Voice starting point or voice terminating point;Complete sound bite is determined according to sound starting point and sound terminating point.
In addition, identifying the speech frame and sound end in i frame voiced frame, specifically include: respectively to every frame voiced frame into The following processing of row:
Calculate the short-time energy of voiced frame;If it is determined that short-time energy is less than or equal to preset energy threshold values, voiced frame is determined For sound endpoint;If it is determined that short-time energy is greater than preset energy threshold values, the Average zero-crossing rate of voiced frame is calculated, judges average zero passage Whether rate is greater than default zero-crossing rate, if it is, determining that voiced frame is sound endpoint, if it has not, determining that voiced frame is speech frame.
In addition, according to interaction sequences, after determining current interactive object, the exchange method further include: it is unknown right to obtain The voice signal of elephant;If it is determined that voice signal of the interaction sequences of the voice signal of unknown object prior to current interactive object Interaction sequences;Update unknown object is current interactive object.
In addition, before update unknown object is current interactive object, the exchange method further include: for current interaction Object issues and waits prompt information.
Detailed description of the invention
One or more embodiments are illustrated by the picture in corresponding attached drawing, these exemplary theorys The bright restriction not constituted to embodiment, the element in attached drawing with same reference numbers label are expressed as similar element, remove Non- to have special statement, composition does not limit the figure in attached drawing.
Fig. 1 is the flow chart of exchange method in the application first embodiment;
Fig. 2 is the flow chart of exchange method in the application second embodiment;
Fig. 3 is the schematic diagram of stacking process in the application second embodiment;
Fig. 4 is the schematic diagram of process of popping in the application second embodiment;
Fig. 5 is the structural schematic diagram of robot in the application third embodiment;
Fig. 6 is the structural schematic diagram of robot in the 4th embodiment of the application;
Fig. 7 is the structural schematic diagram of server in the 5th embodiment of the application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with attached drawing to the application Each embodiment be explained in detail.However, it will be understood by those skilled in the art that in each embodiment party of the application In formula, in order to make the reader understand this application better, many technical details are proposed.But even if without these technical details And various changes and modifications based on the following respective embodiments, the application technical solution claimed also may be implemented.With Under the division of each embodiment be for convenience, any restriction should not to be constituted to the specific implementation of the application, it is each Embodiment can be combined with each other mutual reference under the premise of reconcilable.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
The first embodiment of the application is related to a kind of exchange method.Applied to robot, detailed process as shown in Figure 1, Including step is implemented as follows:
Step 101: obtaining the voice signal of at least two interactive objects.
Specifically, present embodiment is applied to during robot interactive, the interaction of people and robot specifically can be Process is also possible to the interactive process of robot and robot, is not particularly limited herein.Present embodiment is with people and robot Interactive process for be illustrated, interactive object behave.
Wherein, the process that machine person to person interacts is mainly that interactive object passes through voice and robot progress information friendship Mutually, for example, the machine artificially intelligent service robot is believed such as customer service robot according to the sound of the interactive object got Number corresponding speech answering is provided, for another example, which can also be that order executes robot, such as sweeping robot, according to obtaining The voice signal for the interactive object got determines pending clean up task, and complete corresponding clean up task etc., this implementation For being not particularly limited using the robot of the exchange method in mode, the voice signal of interactive object can be obtained.
One in the specific implementation, the robot includes microphone, loudspeaker, preposition processing list by taking session robotic as an example Member.Loudspeaker is responsible for robot and is made a sound, and realizes linguistic function;Microphone is responsible for the acquisition of sound, realizes listening for robot Feel function;Preprocessing unit completes the functions such as pretreatment and the transmission of sound.Wherein, robot carries out sound by microphone After signal acquisition, the quantity of interactive object is determined, the voice signal of each interactive object is pre-processed respectively, and will be pre- Treated, and voice signal is transmitted through the network to cloud processing system, at least two interactive objects obtained due to robot Voice signal, cloud processing system carries out statistics calculating etc. to pretreated voice signal and determines interaction sequences, according to friendship Mutually sequentially determine interactive voice signal, by interactive transmission of sound signals to robot, robot is issued by loudspeaker Interactive voice signal realizes the information exchange with interactive object.
It should be noted that the interactive mode provided in present embodiment, robot can get at least two interactions pair The voice signal of elephant realizes the purpose that information exchange is carried out at least two interactive objects.
Step 102: identifying the audio-frequency information of each voice signal respectively.
Specifically, robot is when getting the voice signal including at least two interactive objects, robot is still Information exchange is carried out with each interactive object one by one, wherein robot needs know the voice signal of each interactive object Not.
In one specific implementation, each voice signal is handled as follows respectively: the voice in identification voice signal rises Initial point and voice terminating point determine that sound bite, analysis sound bite obtain sound according to voice starting point and voice terminating point Frequency signal.Wherein, the detailed process of audio signal is obtained without limitation by sound bite.
Wherein, the voice starting point and voice terminating point in voice signal are identified, is terminated according to voice starting point and voice Point determines sound bite, specifically includes: voice signal being carried out sub-frame processing, generates i frame voiced frame, wherein i is greater than 1 Positive integer;Identify the speech frame and sound end in i frame voiced frame;Determine that sound end is according to the time of sound end Voice starting point or voice terminating point;Complete sound bite is determined according to sound starting point and sound terminating point.
In one specific implementation, sub-frame processing is carried out to sound, detects voice starting point and the voice in voice signal Terminating point, including every frame voice signal is handled as follows: calculating the short-time energy of voiced frame;If it is determined that short-time energy is less than Or it is equal to preset energy threshold values, determine that voiced frame is sound endpoint;If it is determined that short-time energy is greater than preset energy threshold values, calculating sound The Average zero-crossing rate of sound frame, judges whether Average zero-crossing rate is greater than default zero-crossing rate, if it is, determining that voiced frame is sound end Point, if it has not, determining that voiced frame is speech frame.
For example, voice signal is x (n), x (n) is divided into the voice signal of i frame, each frame is denoted as si(n), n=1,2, The short-time energy of 3 ... N, voiced frame are as shown in formula 1:
Wherein, EiIndicate the short-time energy of the i-th frame voice signal, i indicates that the voice signal of the i-th frame, N indicate voice signal Frame length, n indicate voice signal time series.
The Average zero-crossing rate for calculating voiced frame, needs first to calculate the short-time zero-crossing rate of every frame voice signal, according to mistake in short-term Zero rate calculates the Average zero-crossing rate of voiced frame, and the specific calculating of short-time zero-crossing rate is as shown in formula 2:
Wherein, ZiIndicate the short-time zero-crossing rate of the i-th frame voice signal, si(n) indicate that time series is the voiced frame of n, si (n-1) indicate that time series is the voiced frame of n-1, sgn [siIt (n)] is a jump function, specific value and si(n) related, it should Jump function is expressed as shown in formula 3:
Wherein, meaning representated by each letter is identical as the meaning in above-mentioned formula.
The short-time zero-crossing rate that every frame voice signal is calculated can not be said again herein by read group total Average zero-crossing rate It is bright.
It is generally used for distinguishing the voiceless sound and voiced sound in voice signal it should be noted that calculating short-time energy, and is used to Distinguish noiseless and sound, initial consonant and simple or compound vowel of a Chinese syllable, the gap etc. between word and word, for example, the short-time energy of the voice signal of voiceless sound section Small, the short-time energy of the voice signal of voiced segments is big.It include at least two interactions pair since robot obtains in present embodiment The voice signal of elephant, voice signal are the voice signal of high s/n ratio, and noise energy is very in the case where no voice signal It is small, and when having voice signal, therefore the increase that short-time energy can be apparent is distinguished by way of default short-time energy is arranged Whether voiced frame is sound endpoint.
The Average zero-crossing rate of voiced frame refers to that number of the signal through zero-crossing values, zero-crossing rate are able to reflect out signal spectrum in frame Characteristic can determine zero passage in short-term by time shaft by the waveform in observation time domain waveform if continuous voice signal Rate, if voice signal be it is discrete, when two adjacent sampling point symbols are different, as zero crossing.It, will in present embodiment The voice signal that x (n) is divided into i frame is discrete sound frame signal.For the discrete sound frame signal system in present embodiment It is changed it is determined here that Average zero-crossing rate out to count out sampling point value symbol in the unit time.
It is noted that also can determine that the sound of voice signal by one in short-time energy and Average zero-crossing rate Endpoint, it is practical to calculate, by short-time energy by voice signal noise background and voice segments distinguish, it is low to signal-to-noise ratio System, determine sound end in conjunction with short-time energy and Average zero-crossing rate, can be improved the accuracy rate of speech terminals detection.On Stating is to determine sound end in a manner of combining short-time energy and Average zero-crossing rate, merely illustrative, not this reality of concrete restriction Apply mode.
Step 103: the interaction sequences of interactive object are determined according to each audio-frequency information.
In specific implementation, audio-frequency information includes voiceprint, loudness information, frequency information and tone information of sound etc., During determining interaction sequences interaction sequences can be determined according at least one of audio-frequency information.Wherein, voiceprint can Interactive object is uniquely determined, for example, people's sound after adult can keep long-term relatively stable, even if the deliberately sound of mimicker Sound, vocal print are identical always;Loudness information indicates the average value of the sound intensity in a sound bite;Frequency information Indicate the range of the sound frequency in a sound bite;Tone information indicates that the tone of the sound in a sound bite is high It is low.It can also include that other indicate the information of sound characteristic in audio-frequency information, be only example herein.
It should be noted that extracting in practical application according only to voice signal needs audio signal to be used, specifically Extracting mode be not specifically limited herein.
In one specific implementation, in audio-frequency information voiceprint and loudness value determine the interaction sequences of interactive object, It specifically includes: determining that the voiceprint for belonging to default vocal print library in each audio-frequency information is first kind voiceprint, according to first The corresponding each loudness value of class voiceprint is ranked up the corresponding interactive object of first kind voiceprint, generates First ray; It determines that the voiceprint for being not belonging to default vocal print library in each audio-frequency information is the second class voiceprint, is believed according to the second class vocal print It ceases corresponding each loudness value to be ranked up the corresponding interactive object of the second class voiceprint, generates the second sequence;According to first Sequence and the second sequence determine the interaction sequences of interactive object, wherein the interaction sequences of interactive object are than second in First ray The interaction sequences of interactive object are forward in sequence.
Wherein, the corresponding acoustic information of each interactive object, robot is in the sound for getting at least two interactive objects After message breath, need to handle the acoustic information of interactive object, and determine interaction sequences.Vocal print letter in vocal print library Breath is first kind voiceprint, can need to delimit the range in shared vocal print library according to business in practice, for example, a cell uses The vocal print library is shared by one vocal print library, the robot in cell.
In practical application, after robot gets the acoustic information of at least two interactive objects, each sound letter is identified Number corresponding audio-frequency information of object, and the voiceprint in audio-frequency information is extracted, determine whether to be preferential according to voiceprint The priority of the high interactive object of grade, the corresponding interactive object of voiceprint in vocal print library is higher than the vocal print letter in non-vocal print library Cease corresponding interactive object.If the voiceprint for having more than one is the voiceprint in vocal print library, will belong in vocal print library Voiceprint be ranked up according to loudness value, one by one with robot carry out information exchange, other are not belonging in vocal print library The corresponding interactive object of voiceprint be ranked up according to loudness value, one by one with robot carry out information exchange.Wherein, vocal print Interactive object of the interaction sequences of the corresponding interactive object of voiceprint in library prior to voiceprint in non-vocal print library.If machine Do not include the voiceprint in vocal print library in the acoustic information that people obtains, is then directly ranked up according to loudness value, and according to row Interaction sequences after sequence carry out information exchange with robot one by one.
It should be noted that the mode adaptability for handing over interaction sequences is determined in specific implementation according to voiceprint and loudness value Adjustment, it is specific in can also determine interaction sequences according to other interactive information, be only that one kind illustrates herein, not specifically Limit the method for determination of interactive object.
Step 104: according to interaction sequences, carrying out information exchange with each interactive object one by one.
Specifically, after determining interaction sequences, the information interactive process of robot and multiple interactive objects specifically: root According to interaction sequences, current interactive object is determined;According to the voice signal of current interactive object, current interaction pair is identified The interactive instruction of elephant;After responding interactive instruction, switch interactive object according to interaction sequences.
In one concrete application, with the artificial example of customer service machine, according to determining interaction sequences, one by one with each interactive object During interacting, interactive instruction is identified according to the acoustic information of current interactive object, and determine corresponding answer Information plays corresponding answer information by loudspeaker.
It should be noted that robot is with an interactive object during interacting, in lasting dialog procedure, Robot can wait preset time, if in the pre- of waiting after performing a upper interactive instruction for interactive object each time If not receiving the acoustic information of the interactive object within the time, then robot defaults current interactive object and no longer carries out letter Breath interaction, then select the interactive object of next cis-position to continue information exchange from interaction sequences.
As a result, after responding interactive instruction, in order to guarantee efficient information exchange, robot needs to judge current friendship Whether mutual object interacts completion, specifically includes execution following steps: determining and current interaction pair has not been obtained in preset time The voice messaging of elephant.
It is noted that the first embodiment of the application mainly solves robot in interactive process, there is at least two The problem of a interactive object carries out information exchange with robot simultaneously, and robot can not interact, main improvement includes The content of each step in present embodiment, specific embodiment are illustrated to each step, are to manage for convenience The relevant technical details for solving present embodiment and providing, not present embodiment is necessary.
It should be noted that in the above specific implementation by way of example only, not constituting limit to the technical solution of the application It is fixed.
In terms of existing technologies, robot can obtain the voice signal including at least two interactive objects, and machine People may recognize that the audio-frequency information of each voice signal, so that robot can handle the voice signal including multiple interactive objects, Robot is improved to the processing capacity of voice signal, and robot and determining according to the audio signal of each interactive object interacts Sequentially, information exchange is carried out with each interactive object one by one, improves the interactive efficiency of robot and interactive object, solves machine When acoustic information includes at least two interactive object in device people's interactive process, it also can continue to the problem of carrying out information exchange, mention The interaction capabilities of Gao Liao robot.
The second embodiment of the application is related to a kind of exchange method, second embodiment and first embodiment substantially phase Together, present embodiment master be noted that one by one with each interactive object carry out information exchange during, for unknown language The processing of sound signal.
Specifically, as shown in Fig. 2, present embodiment is increased based on first embodiment to unknown voice signal Processing, in the present embodiment, including step 201 is to step 207, wherein step 201 to step 204 and first embodiment In step 101 it is identical to step 104, details are not described herein again.Difference is mainly introduced below:
Step 205: obtaining the voice signal of unknown object.
Step 206: judge the voice signal of unknown object interaction sequences whether prior to current interactive object voice Otherwise the interaction sequences of signal, execute step 208 if it is, executing step 207.
Step 207: update unknown object is current interactive object.
Step 208: current interactive object being kept not change.
Wherein, after determining interaction sequences, an identity can be arranged for each interactive object in robot (identification, ID), so that robot carries out information exchange according to interaction sequences and each interactive object.
In specific implementation, for the interaction sequences having determined, robot carries out information friendship with each interactive object one by one Mutually, wherein during robot and each interactive object interact, the microphone of robot continues the sound around acquisition Message breath, if the interaction sequences of the voice signal of unknown object prior to the interaction sequences of the voice signal of current interactive object, It is current interactive object that then robot, which updates unknown object,.One in the specific implementation, in order to avoid current interactive object Interactive process is terminated, and carries out operation processing using stack mode, carries out stacking processing storage, by the ID of current interactive object and Session content stores stacking, and unknown object is updated to current interactive object.
Specifically, by taking robot microphone in interactive process collects the voice signal of unknown object around as an example, such as Shown in Fig. 3, X is current interactive object, and during X and robot carry out information exchange, robot collects around Y pairs The voice signal of elephant, the interaction sequences of voice signal of the interaction sequences prior to X of the voice signal by judging to determine Y, by X's ID and session information do stacking processing, and current interactive object is changed to Y;If in the interactive process of robot and Y, robot The voice signal for collecting Z object around, voice signal of the interaction sequences prior to Y of the voice signal by judging to determine Z Interaction sequences, do stacking processing for the ID of Y and session information, and current interactive object is updated to Z.And so on, in interaction The higher interactive object of priority occurs in sequence, then does stacking processing to current interactive object.
Specifically, the interactive object of highest priority exists during robot is interacted with interactive object one by one After completing information exchange, then Pop operations is successively carried out to the interactive object saved in stack, continue the interaction being interrupted Process, until interactive process is completed.
One in the specific implementation, as shown in figure 4, with interactive object Z for current interactive object, in the interactive process with Z It after completion, determines that not new interactive object is added, then pops processing to Y, robot continues and object Y progress information friendship Mutually, it completes that relief is added and pops processing, continuation and X to X with not new interactive object after the information exchange of Y, is determined Object carry out information exchange, until in stack without interactive object.
It should be noted that the above is only limit for example, not constituting to the technical solution of the application.
The step of various methods divide above, be intended merely to describe it is clear, when realization can be merged into a step or Certain steps are split, multiple steps are decomposed into, as long as including identical logical relation, all in the protection scope of this patent It is interior;To adding inessential modification in algorithm or in process or introducing inessential design, but its algorithm is not changed Core design with process is all in the protection scope of the patent.
The application third embodiment is related to a kind of robot, as shown in Figure 5, comprising: obtains module 501, identification module 502, determining module 503 and interactive module 504;
It obtains module 501 to be used for, obtains the voice signal of at least two interactive objects.
Identification module 502 is used for, and identifies the audio-frequency information of each voice signal respectively.
Determining module 503 is used for, and the interaction sequences of interactive object are determined according to each audio-frequency information.
Interactive module 504 is used for, and according to interaction sequences, carries out information exchange with each interactive object one by one.
It is not difficult to find that present embodiment is system embodiment corresponding with first embodiment, present embodiment can be with First embodiment is worked in coordination implementation.The relevant technical details mentioned in first embodiment still have in the present embodiment Effect, in order to reduce repetition, which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in In first embodiment.
It is noted that each module involved in present embodiment is logic module, and in practical applications, one A logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physics lists The combination of member is realized.In addition, in order to protrude the innovative part of the application, it will not be with solution the application institute in present embodiment The technical issues of proposition, the less close unit of relationship introduced, but this does not indicate that there is no other single in present embodiment Member.
The 4th embodiment of the application is related to a kind of robot.4th embodiment is roughly the same with third embodiment, Be in place of the main distinction: in the 4th embodiment of the application, robot executes corresponding processing task in interactive module 504 During, robot continue acquisition around voice messaging, the robot further include: receiving module 601, judgment module 602, First update module 603 and the second update module 604.Its structure is as shown in Figure 6.
Receiving module 601 is used for, and obtains the voice signal of unknown object.
Whether judgment module 602 is used for, judge the interaction sequences of the voice signal of unknown object prior to current interaction pair The interaction sequences of the voice signal of elephant.
First update module 603 is used for, and update unknown object is current interactive object.
Second update module 604 is used for, and current interactive object is kept not change.
Since second embodiment is corresponded to each other with present embodiment, present embodiment can be mutual with second embodiment Match implementation.The relevant technical details mentioned in second embodiment are still effective in the present embodiment, implement second The attainable technical effect of institute similarly may be implemented in the present embodiment in mode, no longer superfluous here in order to reduce repetition It states.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in second embodiment.
The 5th embodiment of the application is related to a kind of server, as shown in fig. 7, comprises at least one processor 701;With And the memory 702 with the communication connection of at least one processor 701, wherein memory 702 is stored with can be by least one The instruction that device 701 executes is managed, instruction is executed by least one processor 701, so that at least one processor 701 is able to carry out friendship Mutual method.
In present embodiment, for processor 701 is with central processing unit (Central Processing Unit, CPU), For memory 702 is with readable and writable memory (Random Access Memory, RAM).Processor 701, memory 702 can be with It is connected by bus or other modes, in Fig. 7 for being connected by bus.Memory 702 is used as a kind of non-volatile meter Calculation machine readable storage medium storing program for executing can be used for storing non-volatile software program, non-volatile computer executable program and module, As realized in the application embodiment, the program of exchange method is stored in memory 702.Processor 701 passes through operation storage Non-volatile software program, instruction and module in memory 702, thereby executing the various function application and number of equipment According to processing, that is, realize above-mentioned exchange method.
Memory 702 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;It storage data area can the Save option list etc..In addition, memory can wrap High-speed random access memory is included, can also include nonvolatile memory, for example, at least disk memory, a flash memories Part or other non-volatile solid state memory parts.In some embodiments, it includes relative to processor that memory 702 is optional 701 remotely located memories, these remote memories can pass through network connection to external equipment.The example packet of above-mentioned network Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
One or more program module is stored in memory 702, is executed when by one or more processor 701 When, execute the exchange method in above-mentioned first or second method implementation.
Exchange method provided by the application embodiment can be performed in the said goods, has the corresponding function mould of execution method Block and beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to provided by the application embodiment Exchange method.
The application sixth embodiment is related to a kind of computer readable storage medium, is stored with computer program.Computer Above method embodiment is realized when program is executed by processor.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are to make It obtains an equipment (can be single-chip microcontroller, chip etc.) or processor (processor) executes side described in each embodiment of the application The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize the specific embodiment of the application, And in practical applications, can to it, various changes can be made in the form and details, without departing from spirit and scope.

Claims (12)

1. a kind of exchange method, which is characterized in that be applied to robot, comprising:
Obtain the voice signal of at least two interactive objects;
The audio-frequency information of each voice signal is identified respectively;
The interaction sequences of the interactive object are determined according to each audio-frequency information;
According to the interaction sequences, information exchange is carried out with each interactive object one by one.
2. exchange method according to claim 1, which is characterized in that described to identify each voice signal respectively Audio-frequency information specifically includes:
Each voice signal is carried out the following processing respectively: identifying the voice starting point in the voice signal and voice end Stop determines sound bite according to the voice starting point and the voice terminating point, analyzes the sound bite and obtain institute State audio signal.
3. exchange method according to claim 1 or 2, which is characterized in that the audio-frequency information includes voiceprint and sound Angle value;
The interaction sequences that the interactive object is determined according to each audio-frequency information, specifically include:
Determine that the voiceprint for belonging to default vocal print library in each audio-frequency information is first kind voiceprint, according to described the The corresponding each loudness value of a kind of voiceprint is ranked up the corresponding interactive object of the first kind voiceprint, generates first Sequence;
Determine that the voiceprint that the default vocal print library is not belonging in each audio-frequency information is the second class voiceprint, according to The corresponding each loudness value of the second class voiceprint is ranked up the corresponding interactive object of the second class voiceprint, raw At the second sequence;
The interaction sequences of the interactive object are determined according to the First ray and second sequence, wherein first sequence The interaction sequences of interactive object are more forward than the interaction sequences of interactive object in second sequence in column.
4. exchange method according to claim 1 or 2, which is characterized in that it is described according to the interaction sequences, one by one and often A interactive object carries out information exchange, specifically includes:
According to the interaction sequences, current interactive object is determined;
According to the voice signal of the current interactive object, the interactive instruction of the current interactive object is identified;
After responding the interactive instruction, switch the interactive object according to the interaction sequences.
5. exchange method according to claim 4, which is characterized in that after the response interactive instruction, according to described Before interaction sequences switch the interactive object, the exchange method further include:
Determine the voice messaging that the current interactive object has not been obtained in preset time.
6. exchange method according to claim 2, which is characterized in that the voice starting in the identification voice signal Point and voice terminating point, determine sound bite according to the voice starting point and the voice terminating point, specifically include:
The voice signal is subjected to sub-frame processing, generates i frame voiced frame, wherein i is the positive integer greater than 1;
Identify the speech frame and sound end in the i frame voiced frame;
Determine that the sound end is voice starting point or voice terminating point according to the time of the sound end;
Complete sound bite is determined according to the sound starting point and the sound terminating point.
7. exchange method according to claim 6, which is characterized in that the voice identified in the i frame voiced frame Frame and sound end, specifically include:
Every frame voiced frame is handled as follows respectively:
Calculate the short-time energy of the voiced frame;
If it is determined that the short-time energy is less than or equal to preset energy threshold values, determine that the voiced frame is sound endpoint;
If it is determined that the short-time energy is greater than preset energy threshold values, the Average zero-crossing rate of the voiced frame is calculated, is judged described flat Whether equal zero-crossing rate is greater than default zero-crossing rate, if it is, determining that the voiced frame is sound endpoint, if it has not, determining the sound Sound frame is speech frame.
8. exchange method according to claim 4, which is characterized in that it is described according to the interaction sequences, it determines currently After interactive object, the exchange method further include:
Obtain the voice signal of unknown object;
If it is determined that voice signal of the interaction sequences of the voice signal of the unknown object prior to the current interactive object Interaction sequences;
Updating the unknown object is current interactive object.
9. exchange method according to claim 8, which is characterized in that described to update the unknown object for current interaction Before object, the exchange method further include:
For the current interactive object, issues and wait prompt information.
10. a kind of robot characterized by comprising obtain module, identification module, determining module and interactive module;
The acquisition module is used for, and obtains the voice signal of at least two interactive objects;
The identification module is used for, and identifies the audio-frequency information of each voice signal respectively;
The determining module is used for, and the interaction sequences of the interactive object are determined according to each audio-frequency information;
The interactive module is used for, and according to the interaction sequences, carries out information exchange with each interactive object one by one.
11. a kind of server characterized by comprising at least one processor;And
The memory being connect at least one described processor communication;
Wherein, the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out interaction side as described in any one of claim 1 to 9 Method.
12. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the computer program is located Reason device realizes exchange method described in any one of claims 1 to 9 when executing.
CN201910157240.5A 2019-03-01 2019-03-01 A kind of exchange method and robot, server and storage medium Pending CN109841207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910157240.5A CN109841207A (en) 2019-03-01 2019-03-01 A kind of exchange method and robot, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910157240.5A CN109841207A (en) 2019-03-01 2019-03-01 A kind of exchange method and robot, server and storage medium

Publications (1)

Publication Number Publication Date
CN109841207A true CN109841207A (en) 2019-06-04

Family

ID=66885155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910157240.5A Pending CN109841207A (en) 2019-03-01 2019-03-01 A kind of exchange method and robot, server and storage medium

Country Status (1)

Country Link
CN (1) CN109841207A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110570847A (en) * 2019-07-15 2019-12-13 云知声智能科技股份有限公司 Man-machine interaction system and method for multi-person scene
CN116991246A (en) * 2023-09-27 2023-11-03 之江实验室 Algorithm scheduling method and device for navigation robot and navigation robot system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262879A (en) * 2010-05-24 2011-11-30 乐金电子(中国)研究开发中心有限公司 Voice command competition processing method and device as well as voice remote controller and digital television
CN105139858A (en) * 2015-07-27 2015-12-09 联想(北京)有限公司 Information processing method and electronic equipment
CN107408027A (en) * 2015-03-31 2017-11-28 索尼公司 Message processing device, control method and program
CN108847225A (en) * 2018-06-04 2018-11-20 上海木木机器人技术有限公司 A kind of robot and its method of the service of airport multi-person speech
CN108922528A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice
CN108962260A (en) * 2018-06-25 2018-12-07 福来宝电子(深圳)有限公司 A kind of more human lives enable audio recognition method, system and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262879A (en) * 2010-05-24 2011-11-30 乐金电子(中国)研究开发中心有限公司 Voice command competition processing method and device as well as voice remote controller and digital television
CN107408027A (en) * 2015-03-31 2017-11-28 索尼公司 Message processing device, control method and program
CN105139858A (en) * 2015-07-27 2015-12-09 联想(北京)有限公司 Information processing method and electronic equipment
CN108847225A (en) * 2018-06-04 2018-11-20 上海木木机器人技术有限公司 A kind of robot and its method of the service of airport multi-person speech
CN108962260A (en) * 2018-06-25 2018-12-07 福来宝电子(深圳)有限公司 A kind of more human lives enable audio recognition method, system and storage medium
CN108922528A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110570847A (en) * 2019-07-15 2019-12-13 云知声智能科技股份有限公司 Man-machine interaction system and method for multi-person scene
CN116991246A (en) * 2023-09-27 2023-11-03 之江实验室 Algorithm scheduling method and device for navigation robot and navigation robot system

Similar Documents

Publication Publication Date Title
CN110136749B (en) Method and device for detecting end-to-end voice endpoint related to speaker
EP3459077B1 (en) Permutation invariant training for talker-independent multi-talker speech separation
CN109977218B (en) A kind of automatic answering system and method applied to session operational scenarios
CN103811003B (en) A kind of audio recognition method and electronic equipment
CN107818798A (en) Customer service quality evaluating method, device, equipment and storage medium
CN110557451B (en) Dialogue interaction processing method and device, electronic equipment and storage medium
CN107680597A (en) Audio recognition method, device, equipment and computer-readable recording medium
EP1455341A2 (en) Block synchronous decoding
CN107423363A (en) Art generation method, device, equipment and storage medium based on artificial intelligence
CN106683661A (en) Role separation method and device based on voice
CN107767861A (en) voice awakening method, system and intelligent terminal
CN103003876A (en) Modification of speech quality in conversations over voice channels
CN111916058A (en) Voice recognition method and system based on incremental word graph re-scoring
CN104766608A (en) Voice control method and voice control device
CN106297773A (en) A kind of neutral net acoustic training model method
CN110704618B (en) Method and device for determining standard problem corresponding to dialogue data
CN112131359A (en) Intention identification method based on graphical arrangement intelligent strategy and electronic equipment
CN110995943B (en) Multi-user streaming voice recognition method, system, device and medium
CN113345473A (en) Voice endpoint detection method and device, electronic equipment and storage medium
CN109841207A (en) A kind of exchange method and robot, server and storage medium
CN111625629B (en) Task type dialogue robot response method and device, robot and storage medium
CN110299140A (en) A kind of key content extraction algorithm based on Intelligent dialogue
US20240046921A1 (en) Method, apparatus, electronic device, and medium for speech processing
CN108932943A (en) Command word sound detection method, device, equipment and storage medium
CN113139044A (en) Question-answering multi-turn dialogue method supporting multi-intention switching for question-answering system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190604