CN103366732A

CN103366732A - Voice broadcast method and device and vehicle-mounted system

Info

Publication number: CN103366732A
Application number: CN2012101000372A
Authority: CN
Inventors: 刘根华
Original assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Current assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date: 2012-04-06
Filing date: 2012-04-06
Publication date: 2013-10-23

Abstract

The invention provides a voice broadcast method and a device and a vehicle-mounted system. The voice broadcast method comprises the following steps: information to be broadcasted is confirmed; pre-stored recording elements included in information to be broadcasted are detected; and the fact that information to be broadcasted includes the recording elements is confirmed, and then information to be broadcasted is broadcasted in combination with the synthesis of the recording elements and the voice. Based on the existing voice synthesis broadcast, the pre-stored manual recordings are combined to be the recording elements by the technical scheme so that the voice broadcast is more fluent and the effect is better and thus the user experience is improved.

Description

Voice broadcast method and device, onboard system

Technical field

The present invention relates to the voice technology field, particularly voice broadcast method and device, onboard system.

Background technology

Along with the development of voice technology, its importance to development of computer and social life also becomes increasingly conspicuous.Speech synthesis technique is very practical in a voice technology important technology, can realize that by phonetic synthesis and speech recognition the man machine language communicates by letter, so that computing machine has the same ability of speaking of people that is similar to.Compare with speech recognition, speech synthesis technique is relatively more ripe, and progressively is applied in the various information industry.

From Text To Speech (Text To Speech, TTS) technology, claim that again literary composition language switch technology is a kind of massage voice reading voice technology out that any Word message can be converted in real time standard, the TTS technology relates to a plurality of subject technologies such as acoustics, linguistics, digital signal processing, computer science.In the prior art, from the transfer process of the Text To Speech first step normally: first word sequence is converted to the harmonious sounds sequence, second step: generate speech waveform by tts system according to the harmonious sounds sequence again.Wherein, the first step relates to philology disposal, such as participle, word tone conversion etc., and a whole set of effective Prosodic control rule; Second step needs advanced speech synthesis technique, can synthesize in real time on request high-quality voice flow.That is to say, tts system can be regarded as an artificial intelligence system, in order to synthesize high-quality language, except depending on various rules, comprise outside semantics rule, lexical rule, the phonetics rule, also must be to having good understanding in the literal, this also relates to the problem of natural language understanding.More related contents about the TTS technology can be CN 101785048A with reference to publication number, and denomination of invention is " based on bilingual (mandarin-english) TTS technology of HM ".

But in actual applications, the voice of playing by tts system are unavoidable, and some is stiff, not smooth, when people listen to voice that tts system plays back or carry out man-machine conversation with tts system, still feel be with the machine dialogue, the user experiences not good.

Summary of the invention

The problem that the present invention solves is on the basis of existing TTS technology, and a kind of more smooth voice broadcast method is provided, and improves user's experience.

For addressing the above problem, the embodiment of the invention provides a kind of voice broadcast method, comprising:

Determine to treat report information; Check the described recording element that prestores that comprises in the report information treated; Confirm that the described report information for the treatment of comprises described recording element, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

Alternatively, describedly determine to treat that report information comprises: phonetic order or text message according to input are determined the described report information for the treatment of.

Alternatively, the described recording element that prestores that comprises in the report information for the treatment of of described inspection comprises: the described report information for the treatment of is split into a plurality of phonetic element; Whether mate to check the described recording element that prestores that comprises in the report information treated according to described phonetic element and described recording element.

Alternatively, describedly comprise in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast: report and the described recording element for the treatment of the coupling of the phonetic element in the report information, utilize described other phonetic element treated in the report information of voice synthesized broadcast.

Alternatively, described recording element comprises any in individual character, word, phrase, simple sentence or the paragraph.

Alternatively, described phonetic element comprises any in individual character, word, phrase, simple sentence or the paragraph.

Alternatively, describedly comprise in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast: report the described described recording element for the treatment of that report information comprises, utilize other parts of voice synthesized broadcast information described to be reported.

Alternatively, described phonetic synthesis comprises the conversion of literary composition language.

The embodiment of the invention also provides a kind of sound broadcasting device, comprising: determining unit is used for determining to treat report information; Inspection unit is used for the described recording element that prestores for the treatment of that report information comprises that checks that described determining unit is determined; Report the unit, be used for confirming that at described inspection unit the described report information for the treatment of comprises the described recording element that prestores, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

The embodiment of the invention also provides a kind of onboard system, comprises above-mentioned sound broadcasting device.

Compared with prior art, technical solution of the present invention has following beneficial effect:

Determining after report information, if describedly treat to comprise in the report information recording element of having stored, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast, namely report the described described recording element for the treatment of that report information comprises, utilize other parts of voice synthesized broadcast information described to be reported.The technical program on the basis of existing voice synthesized broadcast, in conjunction with the artificial recording that prestores as the recording element, thereby so that voice broadcast is more smooth, effect is better, has improved user's experience with this.

Further, owing to be the recording element that directly report prestores to the described part for the treatment of to comprise in the report information element of record, and do not need to report by phonetic synthesis again, therefore so that the processing speed of sound broadcasting device is faster, when especially in treating report information, comprising more recording element, response time is shorter, thereby has further improved user's experience.

Description of drawings

Fig. 1 is the schematic flow sheet of the embodiment of a kind of voice broadcast method of the present invention;

Fig. 2 is the structural representation of the specific embodiment of a kind of sound broadcasting device of the present invention.

Embodiment

For the problem of upper prior art, the inventor provides a kind of voice broadcast method and device, onboard system through research.The technical program on the basis of existing voice synthesized broadcast, in conjunction with the artificial recording that prestores as the recording element, thereby so that voice broadcast is more smooth, effect is better, has improved user's experience with this.

For above-mentioned purpose of the present invention, feature and advantage can more be become apparent, below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.

Set forth detail in the following description so that fully understand the present invention.But the present invention can be different from alternate manner described here and implements with multiple, and those skilled in the art can do similar popularization in the situation of intension of the present invention.Therefore the present invention is not subjected to the restriction of following public embodiment.

As shown in Figure 1 be the schematic flow sheet of the embodiment of a kind of voice broadcast method of the present invention.With reference to figure 1, described voice broadcast method comprises:

Step S1: determine to treat report information;

Step S2: check the described recording element that prestores that comprises in the report information treated;

Step S3: confirm that the described report information for the treatment of comprises described recording element, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

Particularly, as described in step S1, the described report information for the treatment of can be determined in several ways, in the present embodiment, is mainly determined by following dual mode: a kind of is to determine the described report information for the treatment of according to text message.That is to say that the described report information for the treatment of is exactly existing text message, described text message will be reported by speech form, and this report information for the treatment of can be referred to as static information.Another kind is to determine the described report information for the treatment of according to the phonetic order of input.From front a kind ofly different be, this treat report information at first need according to the input voice messaging make suitable replying, then with described content of replying as the described report information for the treatment of, this report information for the treatment of can be referred to as multidate information, namely make different replying according to different input voice, perhaps make different replying for same input voice.Said process is to have utilized speech recognition technology and speech synthesis technique, namely allows computing machine by identification and understanding process to input voice information voice signal be changed into corresponding text message, and text message is reported by phonetic synthesis.Wherein, described speech recognition technology and speech synthesis technique are technology as well known to those skilled in the art, and therefore not to repeat here.

As described in step S2, check the described recording element that prestores that comprises in the report information treated.In the prior art, describedly after report information, just directly in the mode of phonetic synthesis the described report information for the treatment of is reported determining, namely the TTS technology.But in actual applications, the inventor finds that existing direct report effect by the TTS technology is not very desirable, and user's the sense of hearing is experienced bad.Therefore, in embodiments of the present invention, the technician will record some individual character, word, phrase, simple sentence or paragraphs commonly used by artificial recording in advance, here above-mentioned individual character, word, phrase, simple sentence or paragraph are referred to as the element of recording, these recording elements are pre-stored in a recording data storehouse.Then, determining describedly after report information, determinedly treating whether comprise the recording element that prestores in the report information with judging.

Step S2 at first, splits into a plurality of phonetic element with the described report information for the treatment of in the specific implementation, and wherein said phonetic element also comprises in individual character, word, phrase, simple sentence or the paragraph any.In specific embodiment, to described when treating that report information splits, at first need to consider the feature of the recording element that prestores.For example, if the recording element that prestores mainly is take word or phrase as main, then the described report information for the treatment of is also split into a plurality of word or expressions; Again for example, if the recording element that prestores mainly is take simple sentence as main, then the described report information for the treatment of is also split into a plurality of simple sentences.Like this can be so that follow-up phonetic element and the described recording element that prestores that splits into be compared.

If when comprising various individual characters, holophrastic, simple sentence and paragraph in the recording element that prestores, then mainly split according to the characteristics of information described to be reported.For example, can set a kind of method for splitting the described report information for the treatment of is split, namely all treat report information take a kind of phonetic element wherein as benchmark for all report information for the treatment of and split, for example the described report information for the treatment of be split into a plurality of words.Again for example, also can for the different method for splitting of different information settings to be reported, for example for the more report information for the treatment of of number of words, can split into a plurality of phrases or a plurality of simple sentence or even a plurality of paragraph; And for the less report information for the treatment of of number of words, can split into a plurality of words or even a plurality of individual character.

Further, also can determine how the described report information for the treatment of is split for the characteristics of different language.Take Chinese and english as example, in Chinese, usually a word is split into a plurality of words, phrase is better, be unsuitable for a word is split into a plurality of individual characters, so for the report information for the treatment of of Chinese, preferably can split into word, phrase or simple sentence; And in English, usually the phrase that in short splits into a plurality of words (being equivalent to the individual character in the Chinese) or be made of some words is better, so for the report information for the treatment of of English, preferably can split into word (being equivalent to the individual character in the Chinese) or phrase etc.For the characteristics of other different languages, can correspondingly select comparatively ideal fractionation mode that the described report information for the treatment of is split into different phonetic element.

In actual applications, be not limited to above-mentioned these method for splitting, specifically can determine according to actual needs different method for splitting, therefore not to repeat here.

Then, whether mate to check the described recording element that prestores that comprises in the report information treated according to described phonetic element and described recording element.In the present embodiment, described phonetic element and the described recording element standard of whether mating is that phonetic element to be matched need to be in full accord with the recording element.Specifically, the recording element that prestores can be stored in the recording data storehouse, and according to the storage of classifying of the kind of different recording elements, being about to of a sort recording element is stored in the tables of data, in carrying out matching process, can in corresponding tables of data, search for whether prestore the recording element that is complementary according to the kind of phonetic element, be conducive to like this improve the efficient of coupling.Certainly, in actual applications, those skilled in the art can also utilize other modes to mate, and do not repeat them here.

As described in step S3, confirm that the described report information for the treatment of comprises described recording element, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.Matching result based on above-mentioned steps S2, if the described report information for the treatment of comprises described recording element, then report the described described recording element that comprises in the report information treated, utilize other parts of voice synthesized broadcast information described to be reported, wherein said other parts refer to the described part for the treatment of not comprise in the report information described recording element.

In specific embodiment, method according to above-mentioned steps S2, the described report information for the treatment of a plurality of phonetic element have been split into, therefore, if the described some phonetic element in the report information and the recording element that prestores treated is complementary, then report and the described recording element for the treatment of the coupling of the phonetic element in the report information, utilize described other phonetic element treated in the report information of voice synthesized broadcast.Wherein said other phonetic element refer to the phonetic element of the recording element that is not complementary.

Further, this phonetic element by voice synthesized broadcast can also be added in the recording data storehouse as new recording element, if determine so later on treat to comprise again this phonetic element in the report information time, just can in described recording data storehouse, find the recording element that is complementary with it, thereby directly report described recording element.Along with the recording element in the described recording data storehouse is more and more, will after report information splits into a plurality of phonetic element, more be conducive to find the recording element that is complementary with it, thereby so that voice broadcast is more smooth, the user experience better like this.Certainly, in actual applications, whether user as required self-defined setting will be added into phonetic element the recording data storehouse as new recording element, does not repeat them here.

In the present embodiment, described phonetic synthesis mainly is to utilize literary composition language conversion (Text To Speech, TTS) technology, but be not limited in actual applications this, can also comprise other existing speech synthesis techniques, these all are technology as well known to those skilled in the art, and therefore not to repeat here.

The below enumerates an application example that utilizes the voice broadcast method that present embodiment provides:

For example, the phonetic order of user's input is " please phone Wang Xiaohua ".

At first, according to this phonetic order, definite response information (namely treating report information) is " whether you will phone Wang Xiaohua ".Then, the report information for the treatment of of determining is split into a plurality of phonetic element, supposes will treat that report information splits into individual character and phrase here, namely split into " you ", " whether ", " wanting ", " making a phone call ", " to ", " Wang Xiaohua ".

Then, the recording element that prestores in a plurality of phonetic element after splitting and the recording data storehouse is mated, if the recording element that prestores in described recording data storehouse comprises: " you ", " whether ", " wanting ", " making a phone call ", " to ", that is to say, comprise at the described recording element that comprises in the report information " whether you will phone Wang Xiaohua " for the treatment of: " you ", " whether ", " wanting ", " making a phone call ", " to ", other phonetic element are exactly " Wang Xiaohua ".So, when reporting information described to be reported, can " make a phone call " by directly reporting the element " you " of recording " whether " " wanting " " to ", voice synthesized broadcast " Wang Xiaohua " utilized.

Therefore, the report information for the treatment of of final user's uppick is " whether you will phone Wang Xiaohua ", and utilizing in the phonetic element of voice synthesized broadcast has pause between each word, and the part of directly reporting the recording element is then smooth.

Further, in actual applications, when having utilized voice synthesized broadcast after " Wang Xiaohua ", " Wang Xiaohua " can also be stored in the recording data storehouse as new recording element, when in treating report information, again comprising " Wang Xiaohua " this phonetic element like this, just can in the recording data storehouse, find the recording element that matches, thereby directly report this recording element.

Based on above-mentioned voice broadcast method, the embodiment of the invention also provides a kind of sound broadcasting device.As shown in Figure 2 be the structural representation of the specific embodiment of a kind of sound broadcasting device of the present invention.With reference to figure 2, described sound broadcasting device 1 comprises: determining unit 11, inspection unit 12 and report unit 13.Wherein, described determining unit 11 is used for determining to treat report information.Described inspection unit 12 is for the described recording element that prestores for the treatment of that report information comprises that checks that described determining unit 11 is determined.Described report unit 13 is used for confirming that at described inspection unit 12 the described report information for the treatment of comprises the described recording element that prestores, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

Particularly, described determining unit 11 can be determined the described report information for the treatment of in several ways, in the present embodiment, mainly determined by following dual mode: a kind of is to determine the described report information for the treatment of according to text message, determine if come by this way, described determining unit 11 can read described text message by the fetch equipment (not shown), and with described text message as the described report information for the treatment of.Another kind is to determine the described report information for the treatment of according to the phonetic order of input, determine if come by this way, described determining unit 11 can receive by the receiving equipment (not shown) phonetic order of outside input, and determines the report information for the treatment of that adapts according to described phonetic order.

Described inspection unit 12 comprises: split cells 121 and processing unit 122.Wherein, described split cells 121 is used for the described report information for the treatment of is split into a plurality of phonetic element; Whether described phonetic element and described recording element that described processing unit 122 is used for splitting into according to described split cells 121 mate to judge whether the described report information for the treatment of comprises the recording element that prestores.In the present embodiment, described phonetic element also comprises in individual character, word, phrase, simple sentence or the paragraph any.Described recording element also comprises in individual character, word, phrase, simple sentence or the paragraph any.In actual applications, described split cells 121 can split the described report information for the treatment of according to different modes, specifically can be with reference to said method embodiment, and therefore not to repeat here.Described processing unit 122 will mate based on described split cells 121 phonetic element that splits and the recording element that prestores, thereby check the described recording element that prestores that comprises in the report information treated.

Described report unit 13 is used for reporting the described described recording element for the treatment of that report information comprises, utilizes other parts of voice synthesized broadcast information described to be reported, and wherein said other parts refer to the described part for the treatment of not comprise in the report information described recording element.In specific embodiment, result according to described inspection unit 12, if the described some phonetic element in the report information and the recording element that prestores treated is complementary, then described report unit 13 reports and the described recording element for the treatment of the coupling of the phonetic element in the report information, utilize described other phonetic element treated in the report information of voice synthesized broadcast, wherein said other phonetic element refer to the phonetic element of the recording element that is not complementary.In the present embodiment, described phonetic synthesis comprises literary composition language conversion (Text To Speech, TTS) technology, but is not limited in actual applications this, can also comprise other existing speech synthesis techniques, and therefore not to repeat here.

The embodiment of the invention also provides a kind of onboard system, and described onboard system comprises sound broadcasting device as shown in Figure 2.Described sound broadcasting device can be its power supply by the power supply of onboard system, and thinks that by matching with other equipment in the described onboard system user provides the voice broadcast service.Need to prove, the voice broadcast method that the embodiment of the invention provides and sound broadcasting device are not limited to be used on the onboard system, those skilled in the art can also be installed in described sound broadcasting device on other equipment, the inquiry system of for example booking tickets, meal ordering system etc., utilize described sound broadcasting device for the user provides corresponding service, do not repeat them here.

To sum up, the technical program comprises following beneficial effect at least: determining after report information, if describedly treat to comprise in the report information recording element of having stored, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast, namely report the described described recording element for the treatment of that report information comprises, utilize other parts of voice synthesized broadcast information described to be reported.The technical program on the basis of existing voice synthesized broadcast, in conjunction with the artificial recording that prestores as the recording element, thereby so that voice broadcast is more smooth, effect is better, has improved user's experience with this.

Although the present invention with preferred embodiment openly as above; but it is not to limit the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement that technical solution of the present invention is made possible change and modification; therefore; every content that does not break away from technical solution of the present invention; to any simple modification, equivalent variations and modification that above embodiment does, all belong to the protection domain of technical solution of the present invention according to technical spirit of the present invention.

Claims

1. a voice broadcast method is characterized in that, comprising:

Determine to treat report information;

Check the described recording element that prestores that comprises in the report information treated;

Confirm that the described report information for the treatment of comprises described recording element, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

2. voice broadcast method according to claim 1 is characterized in that, describedly determines to treat that report information comprises: phonetic order or text message according to input are determined the described report information for the treatment of.

3. voice broadcast method according to claim 1 is characterized in that, the described recording element that prestores that comprises in the report information for the treatment of of described inspection comprises:

The described report information for the treatment of is split into a plurality of phonetic element;

Whether mate to check the described recording element that prestores that comprises in the report information treated according to described phonetic element and described recording element.

4. voice broadcast method according to claim 3 is characterized in that, describedly comprises in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast:

Report and the described recording element for the treatment of the coupling of the phonetic element in the report information, utilize described other phonetic element treated in the report information of voice synthesized broadcast.

5. voice broadcast method according to claim 1 is characterized in that, described recording element comprises any in individual character, word, phrase, simple sentence or the paragraph.

6. voice broadcast method according to claim 3 is characterized in that, described phonetic element comprises any in individual character, word, phrase, simple sentence or the paragraph.

7. voice broadcast method according to claim 1 is characterized in that, describedly comprises in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast:

Report the described described recording element for the treatment of that report information comprises, utilize other parts of voice synthesized broadcast information described to be reported.

8. voice broadcast method according to claim 1 is characterized in that, described phonetic synthesis comprises literary composition language switch technology.

9. a sound broadcasting device is characterized in that, comprising:

Determining unit is used for determining to treat report information;

Inspection unit is used for the described recording element that prestores for the treatment of that report information comprises that checks that described determining unit is determined;

Report the unit, be used for confirming that at described inspection unit the described report information for the treatment of comprises the described recording element that prestores, then in conjunction with described recording element and the described report information for the treatment of of voice synthesized broadcast.

10. sound broadcasting device according to claim 9 is characterized in that, described determining unit is used for determining the described report information for the treatment of according to voice messaging instruction or this paper information of input.

11. sound broadcasting device according to claim 9 is characterized in that, described inspection unit comprises:

Split cells is used for the described report information for the treatment of is split into a plurality of phonetic element;

Processing unit, whether the described phonetic element and the described recording element that are used for splitting into according to described split cells mate to check the described recording element that prestores for the treatment of that report information comprises.

12. sound broadcasting device according to claim 11, it is characterized in that, described report unit is used for: report and the described recording element for the treatment of the phonetic element coupling of report information, utilize described other phonetic element treated in the report information of voice synthesized broadcast.

13. sound broadcasting device according to claim 9 is characterized in that, described recording element comprises any in individual character, word, phrase, simple sentence or the paragraph.

14. sound broadcasting device according to claim 11 is characterized in that, described phonetic element comprises any in individual character, word, phrase, simple sentence or the paragraph.

15. sound broadcasting device according to claim 9 is characterized in that, described report unit is used for: report the described described recording element for the treatment of that report information comprises, utilize other parts of voice synthesized broadcast information described to be reported.

16. sound broadcasting device according to claim 9 is characterized in that, described phonetic synthesis comprises literary composition language switch technology.

17. an onboard system is characterized in that, comprises each described sound broadcasting device in the claim 9 to 16.