A kind of voice broadcasting system and method
Technical field
The present invention relates to speech synthesis technique field, more particularly to a kind of voice broadcasting system and method.
Background technology
In daily life, we are frequently encountered the situation because being busy with that SMS can not be read working in hand, such as:
Drive, beat keyboard, in this case, can only wait the stopping that being worked in hand when leaf through mobile phone short message, and for very heavy
The SMS wanted, opportunity may be missed because not watching and responding in time, so as to bring loss.
It is bright so as to carry out voice broadcast missed call, voice in the prior art by the way that short message word is synthesized into voice
Read unread short messages.Phonetic synthesis, also known as literary periodicals (Text to Speech) technology, it is the method by machinery, electronics
Produce the technology of artificial voice, it be by caused by computer oneself or outside input text information be changed into the mankind can be with
The technology for listening natural language must understand, fluent to export.
But the speech sound feature that existing voice broadcasting modes are reported is used uniformly a kind of sound mould extracted in advance
Type, the speech sound of synthesis is single, and the voice played out can not realize there is identical language with the sender of text message
Adjust, the rhythm, cause that the voice reported out is stiff, emotional expression is insufficient, lacking individuality, be not easy to be received by hearer.Therefore,
It is badly in need of a kind of language play back system with text information sender's characteristic voice.
The content of the invention
The technical problems to be solved by the invention are to provide one kind in view of the shortcomings of the prior art there is text information to send
Person's characteristic voice voice broadcasting system and method.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of voice broadcasting system, including FTP client FTP
And server system;The FTP client FTP includes characteristic sound and records module, text information acquisition module, first network communication mould
Block and characteristic sound playing module, the server system include voice storage module, characteristic sound training module, the second network communication
Module and characteristic sound synthesis module;
The characteristic sound records module, and it is used to record the sample voice with text information report person's role match, and will
The sample voice is sent to voice storage module through first network communication module and the second network communication module;
The first network communication module, its transmission data being used to receive and dispatch between FTP client FTP and server system;
Second network communication module, its transmission data being used to receive and dispatch between FTP client FTP and server system;
The voice storage module, it is used to store sample voice data and the conjunction of characteristic sound that characteristic sound records module collection
Into the characteristic speech data of module synthesis;
The characteristic sound training module, it is used to extract sound characteristic ginseng from the sample voice of voice storage module storage
Number, and model training is carried out, characteristic speech model is obtained, and the characteristic speech model is sent to characteristic phonetic synthesis mould
Block;
The text information acquisition module, it is used to gather the text information that user needs to be reported with voice, and will
The text information collected is sent to characteristic sound synthesis module through first network communication module and the second network communication module;
The characteristic sound synthesis module, it is used to be had according to the characteristic speech model and the text information, synthesis
The characteristic voice of report person's characteristic voice and text information content, and the characteristic speech data is stored to phonetic storage mould
Block;
The characteristic sound playing module, it is used for the characteristic voice for playing the synthesis of characteristic sound synthesis module.
The beneficial effects of the invention are as follows:Can be on all kinds of mobile terminals, such as realize that voice is broadcast on mobile phone, tablet personal computer
All kinds of text informations are reported, text information includes:Newsletter archive information, e-book, SMS and QQ Fetions, wechat, footpath between fields footpath between fields
The text message received etc. timely bitcom.When user is using reciting news text message of the present invention, e-book, Ke Yigen
Played out according to a kind of voice tone color oneself liked in selection voice storage module;When user uses present invention report and other people
During the information of word communication, the voice that the present invention reports has the voice of text information sender's characteristic voice, comprehensive satisfaction
For user to the timbre demand of speaker dependent, personalization is strong, is easily received by hearer, the acquisition of user is preferably experienced effect
Fruit.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the characteristic sound, which records module, includes voice collecting unit and address list binding unit,
The voice collecting unit, it is used for the raw tone for gathering report person, and the raw tone collected is sent
Give address list binding unit;
The address list binding unit, it is used to be bound the raw tone of report person and report person's Role Information,
And the sample voice data bound are sent to phonetic storage mould through first network communication module and the second network communication module
Block.
Further, the first network communication module includes voice transmitting element, text information transmitting element and characteristic language
Sound transmitting element;Second network communication module connects including voice receiving unit, text information receiving unit and characteristic voice
Receive unit;
The voice transmitting element, it is used to receive the sample voice data that characteristic sound records module output, and by described in
Sample voice data are sent to voice receiving unit;
The voice receiving unit, it is used to receiving the speech data of the voice transmitting element output, and by institute's predicate
Sound data are sent to voice storage module;
The text information transmitting element, it is used to receiving the text information of text information acquisition module output, and by institute
State text information and be sent to text information receiving unit;
The text information receiving unit, it is used for the text information for receiving the text information transmitting element output, and
The text information is sent to characteristic sound synthesis module;
The characteristic voice transmitting element, it is used to receiving the characteristic speech data of voice storage module output, and by institute
State characteristic speech data and be sent to characteristic voice receiving unit.
The characteristic voice receiving unit, it is used for the characteristic speech data for receiving the output of characteristic voice transmitting element, and
The characteristic speech data is sent to characteristic sound playing module.
Further, the voice storage module includes sample voice memory cell and characteristic voice memory unit;
The sample voice memory cell, it is used to receive and stored the sample voice that the characteristic sound records module collection
Data;
The characteristic voice memory unit, it is used to receive and stores the characteristic voice of the characteristic sound synthesis module synthesis
Data.
Further, the characteristic sound training module include voice annotation unit, parameter extraction unit, model training unit and
Model storage unit;
The voice annotation unit, it is used for the sample voice for obtaining report person, and carries out voice annotation to it;
The parameter extraction unit, it is used for the sample voice completed to mark, carries out the extraction of acoustical characteristic parameters;
The model training unit, it is used to carry out model training to acoustical characteristic parameters, obtains the characteristic language of report person
Sound model, and the characteristic speech model is stored to model storage unit;
The model storage unit, it is used to receive and stores the characteristic speech model of report person, and the model is sent out
Give characteristic sound synthesis module.
Further, the characteristic sound synthesis module includes text-processing unit, parameter prediction unit and phonetic synthesis unit;
The text-processing unit, it is used to receive word by first network communication module and the second network communication module
The text information of information acquisition module output, and the text information is translated into the mark that phonetic synthesis unit can identify;
The parameter prediction unit, it is used for the characteristic language exported according to the mark and characteristic sound training module of text information
Parameters,acoustic corresponding to sound model extraction current text information;
The phonetic synthesis unit, it is used to carry out phonetic synthesis, output according to parameters,acoustic corresponding with text message
The characteristic voice with report person pronunciation characteristic consistent with current text information.
In order to solve the technical problem, the present invention also provides a kind of voice broadcast method, comprised the following steps,
S101:The sample voice with text information report person's role match is recorded, and by the sample voice through the first net
Network communication module and the second network communication module are sent to voice storage module;
S102:Voice data is obtained, sound characteristic parameter is extracted from the speech data of acquisition, and to the sound
Characteristic parameter carries out model training, obtains characteristic speech model;
S103:Collection user needs the text information reported with voice, and by the text information collected through first
Network communication module and the second network communication module are sent to characteristic sound synthesis module;
S104:The characteristic speech model and the text information are obtained, synthesis has report person's characteristic voice and word
The characteristic voice of the information content, and the characteristic speech data is stored to voice storage module;
S105:Play characteristic voice.
The beneficial effects of the invention are as follows:Can be such as mobile phone, tablet personal computer, notebook, vehicle-mounted on all kinds of mobile terminals
All kinds of text informations of voice broadcast are realized on computer, text information includes:Newsletter archive information, e-book, SMS and
The text message that bitcom receives in time such as QQ Fetions, wechat, footpath between fields footpath between fields.When user uses reciting news text envelope of the present invention
When breath, e-book, a kind of voice tone color that can be liked according to oneself in selection voice storage module plays out;When user makes
When reporting the information communicated with other people words with the present invention, the voice that the present invention reports has text information sender's characteristic voice
Voice, it is comprehensive to meet timbre demand of the user to speaker dependent, it is personalized strong, easily received by hearer, make user
The more preferable experience effect of acquisition.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, step S101 is specially:
S101a:The raw tone of report person is gathered, and the raw tone collected is sent to address list binding unit;
S101b:The raw tone of report person and report person's Role Information are bound, and the speech data bound is passed through
First network communication module and the second network communication module are sent to voice storage module.
Further, the step S102 specifically,
S102a:The sample voice of report person is obtained, and voice annotation is carried out to it;
S102b:The sample voice that mark is completed is obtained, extraction acoustical characteristic parameters are carried out to it;
S102c:Acoustical characteristic parameters are obtained, model training is carried out to it, obtains the characteristic speech model of report person;
S102d:Obtain and store the characteristic speech model of report person;
Further, the step S104 specifically,
S104a:The text information of collection is obtained, is translated into the mark that phonetic synthesis unit can identify;
S104b:The mark of acquisition text information and the characteristic speech model of characteristic sound training module output extract ought be above
Parameters,acoustic corresponding to this information;
S104c:Parameters,acoustic corresponding with text message is obtained, phonetic synthesis, output are carried out according to the parameters,acoustic
The characteristic voice with report person pronunciation characteristic consistent with current text information.
Brief description of the drawings
Fig. 1 is that voice broadcasting system forms structure chart;
Fig. 2 is that voice broadcasting system internal module forms structure chart;
Fig. 3 is voice broadcast method flow chart.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and
It is non-to be used to limit the scope of the present invention.
Fig. 1 is that voice broadcasting system forms structure chart, as shown in figure 1, a kind of voice broadcasting system, including FTP client FTP
And server system;FTP client FTP is mountable to all kinds of mobile terminals, mobile terminal include mobile phone, tablet personal computer, notebook,
Vehicle-mounted computer etc..FTP client FTP startup when user needs to carry out voice broadcast.
FTP client FTP includes characteristic sound and records module, text information acquisition module, first network communication module and characteristic
Sound playing module;Server system includes voice storage module, characteristic sound training module, the second network communication module and characteristic sound
Synthesis module;
Characteristic sound records module, for record with the sample voice of text information report person's role match, and by sample language
Sound is sent to voice storage module through first network communication module and the second network communication module;
Characteristic sound refers to the sound pronunciation feature with particular person, being capable of the fuzzy diagnosis pronunciation person according to the pronunciation characteristic
The voice of part.According to the speech samples for recording different people in the present invention, in order to be synthesized according to speech samples with speaker
The voice of characteristic voice.In addition, when recording characteristic sound for particular person, can be by the identity Role Information of characteristic people and recording
Characteristic sound is bound, and as characteristic sound is labelled with the role of speaker, and Role Information can be the name of speaker, the name
It can be stored in cell phone address book list, or the instant messaging such as the QQ pet names, the Fetion pet name, the wechat pet name, footpath between fields footpath between fields pet name
The pet name in the buddy list of software.
First network communication module and the second network communication module, for receiving and dispatching between FTP client FTP and server system
Transmission data;Network in the first network communication module and the second network communication module, can be wide area network or local
Net.
Voice storage module, the sample voice data and characteristic sound synthesis module of module collection are recorded for storing characteristic sound
The characteristic speech data of synthesis;Voice storage module is the database of a storaged voice, and particular person is stored with the database
Sample voice and the later stage synthesis characteristic voice.Wherein, sample voice is stored in the form of multiple recording short sentences, user
The voice that frequent contact can be recorded as needed is stored in the database as characteristic speech samples, and each contact person is initial
Speech samples length can be half an hour to one hour, due to synthesis characteristic sound effect with database expand better, rear
Phase to pursue the sound effect that more emulates, can by way of increasing sample voice duration expanding data storehouse.
Characteristic sound training module, it is used to extract sound characteristic parameter from the sample voice of voice storage module storage,
And model training is carried out, characteristic speech model is obtained, and the characteristic speech model is sent to characteristic voice synthetic module;
Text information acquisition module, it is used to gather the text information that user needs to be reported with voice, and will collection
To text information sent through first network communication module and the second network communication module to characteristic sound synthesis module;The text of collection
Word information, can be that newsletter archive information, e-book or SMS and QQ Fetions, wechat, footpath between fields footpath between fields etc. are logical in time
The text information that software receives is interrogated, FTP client FTP is gathered and collected text information by card format, closed as subsequent voice
Into the input of system.
Characteristic sound synthesis module, it is used for according to the characteristic speech model and the text information, and synthesis, which has, to be reported
The characteristic voice of person's characteristic voice and text information content, and the characteristic speech data is stored to voice storage module;
Characteristic sound playing module, it is used for the characteristic voice for playing the synthesis of characteristic sound synthesis module.
The present invention can realize all kinds of word letters of voice broadcast on all kinds of mobile terminals, such as on mobile phone, tablet personal computer
Breath, text information include:Newsletter archive information, e-book, SMS and QQ Fetions, wechat, footpath between fields footpath between fields etc. communicate soft in time
The text message that part receives.When user is using reciting news text message of the present invention, e-book, can be selected according to oneself hobby
A kind of voice tone color selected in voice storage module plays out;When user reports the letter communicated with other people words using the present invention
During breath, the voice that the present invention reports has the voice of text information sender's characteristic voice, comprehensive to meet user to specific theory
The timbre demand of people is talked about, it is personalized strong, easily received by hearer, make the more preferable experience effect of acquisition of user.
Fig. 2 is that voice broadcasting system internal module forms structure chart, is adopted as shown in Fig. 2 characteristic sound records module including voice
Collect unit and address list binding unit, voice collecting unit, it is used for the raw tone for gathering report person, and the original that will be collected
Beginning voice is sent to address list binding unit;Address list binding unit, it is used for the raw tone of report person and report person angle
Color information is bound, and the sample voice data bound are sent out through first network communication module and the second network communication module
Deliver to voice storage module.
First network communication module includes voice transmitting element, text information transmitting element and characteristic voice transmitting element;
Second network communication module includes voice receiving unit, text information receiving unit and characteristic voice receiving unit;Voice is sent
Unit, it is used for the sample voice data for receiving the recording module output of characteristic sound, and the sample voice data are sent into language
Sound receiving unit;Voice receiving unit, it is used to receiving the speech data of the voice transmitting element output, and by the voice
Data are sent to voice storage module;Text information transmitting element, it is used for the word for receiving the output of text information acquisition module
Information, and the text information is sent to text information receiving unit;Text information receiving unit, it is used to receive the text
The text information of word information transmitting unit output, and the text information is sent to characteristic sound synthesis module;Characteristic voice is sent out
Unit is sent, it is used for the characteristic speech data for receiving voice storage module output, and the characteristic speech data is sent into spy
Color voice receiving unit.Characteristic voice receiving unit, it is used for the characteristic speech data for receiving the output of characteristic voice transmitting element,
And the characteristic speech data is sent to characteristic sound playing module.
Voice storage module includes sample voice memory cell and characteristic voice memory unit;Sample voice memory cell,
It is used to receive and stored the sample voice data that the characteristic sound records module collection;Characteristic voice memory unit, it is used for
Receive and store the characteristic speech data of the characteristic sound synthesis module synthesis.
It is single that characteristic sound training module includes voice annotation unit, parameter extraction unit, model training unit and model storage
Member;Voice annotation unit, it is used for the sample voice for obtaining report person, and carries out voice annotation to it;The content of mark includes:
The syllable phoneme cutting of speech data and mark, stress and prosodic labeling, character/word border and part-of-speech tagging, identify the back of the body of voice
Scape noise marks.Parameter extraction unit, it is used for the sample voice completed to mark, carries out the extraction of acoustical characteristic parameters;Sound
Learning characteristic parameter includes fundamental frequency and spectrum signature parameter.Model training unit, it is used to carry out model instruction to acoustical characteristic parameters
Practice, obtain the characteristic speech model of report person, and the characteristic speech model is stored to model storage unit;Model storage is single
Member, it is used to receive and stores the characteristic speech model of report person, and the model is sent into characteristic sound synthesis module.
Characteristic sound synthesis module includes text-processing unit, parameter prediction unit and phonetic synthesis unit;Text-processing list
Member, it is used for the word for receiving text information acquisition module by first network communication module and the second network communication module and exporting
Information, and the text information is translated into the mark that phonetic synthesis unit can identify;Parameter prediction unit, it is used for basis
The characteristic speech model of mark and characteristic sound the training module output of text information extracts acoustics ginseng corresponding to current text information
Number;When having synthesis demand after gathering text information, if being stored with text information sender's in model storage unit
Sound model (gathered the sound of the word information transmitter as characteristic speech samples) before i.e., then called the sender's
Characteristic speech model, as one of input of parameter prediction unit.If text information is not stored with model storage unit
The sound model of sender, then can then gather speech samples in real time and export characteristic speech model as parameter prediction unit
One of input.
Phonetic synthesis unit, it is used to carry out phonetic synthesis according to parameters,acoustic corresponding with text message, and output is with working as
The consistent characteristic voice with report person's pronunciation characteristic of preceding text message.
Fig. 3 is voice broadcast method flow chart, as shown in figure 3, voice broadcast method comprises the following steps,
S101:The sample voice with text information report person's role match is recorded, and by the sample voice through the first net
Network communication module and the second network communication module are sent to voice storage module;
S102:Voice data is obtained, sound characteristic parameter is extracted from the speech data of acquisition, and to the sound
Characteristic parameter carries out model training, obtains characteristic speech model;
S103:Collection user needs the text information reported with voice, and by the text information collected through first
Network communication module and the second network communication module are sent to characteristic sound synthesis module;
S104:The characteristic speech model and the text information are obtained, synthesis has report person's characteristic voice and word
The characteristic voice of the information content, and the characteristic speech data is stored to voice storage module;
S105:Play characteristic voice.
Step S101 is specially:
S101a:The raw tone of report person is gathered, and the raw tone collected is sent to address list binding unit;
S101b:The raw tone of report person and report person's Role Information are bound, and the speech data bound is passed through
First network communication module and the second network communication module are sent to voice storage module.
Step S102 specifically,
S102a:The sample voice of report person is obtained, and voice annotation is carried out to it;
S102b:The sample voice that mark is completed is obtained, extraction acoustical characteristic parameters are carried out to it;
S102c:Acoustical characteristic parameters are obtained, model training is carried out to it, obtains the characteristic speech model of report person;
S102d:Obtain and store the characteristic speech model of report person;
Step S104 specifically,
S104a:The text information of collection is obtained, is translated into the mark that phonetic synthesis unit can identify;
S104b:The mark of acquisition text information and the characteristic speech model of characteristic sound training module output extract ought be above
Parameters,acoustic corresponding to this information;
S104c:Parameters,acoustic corresponding with text message is obtained, phonetic synthesis, output are carried out according to the parameters,acoustic
The characteristic voice with report person pronunciation characteristic consistent with current text information.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.