CN100369107C

CN100369107C - Musical tone and speech reproducing device and method

Info

Publication number: CN100369107C
Application number: CNB2004100953808A
Authority: CN
Inventors: 川岛隆宏
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2003-11-26
Filing date: 2004-11-24
Publication date: 2008-02-13
Anticipated expiration: 2024-11-24
Also published as: KR20050050583A; HK1073169A1; JP2005156946A; KR100650071B1; CN1622194A

Abstract

The invention provides a music reproducing device for reproducing musical sounds with text description which can be understood intuitively and moreover for reproducing languages described in a text, a voice reproducing device, a method for reproducing music and voice and its program. The music reproducing device reproduces musical sounds based on text data which consists of texts in which imitation sounds of musical sounds constituting music are expressed. Moreover, when text data (HV-Script) in which a character string to be voice-synthesized and a text for music which consists of texts in which imitation musical sounds constituting music are expressed are described while being intermingled is inputted, the voice reproducing device converts the text for music into musical sound data by interpreting the text and reproduces the musical data and also reproduces voices by interpreting the character string to be voice-synthesized.

Description

Musical sound and voice reproduction device and musical sound and voice reproduction method

Technical field

The present invention relates to a kind of musical sound and voice reproduction device and musical sound and voice reproduction method, particularly relate to a kind of device and method that reproduces musical sound and voice based on text data.

Background technology

For a long time, (MIDI (Musical Instrument Digital Interface): Musical instrument Digital Interface) speech data that has temporal information of standard offers the method that source of sound reproduces the melody etc. of regulation to meet MIDI as the known handlebar of method that reproduces melody etc.On the other hand, Music Macro Language) also known such technology: as using (music macrolanguage: the text data of record such as and making the method for melody by MML, use literal, the symbol of regulation, pronounce based on the text data of recording and narrating by textual form.For example, Jap.P. open communique spy open in 2002-49371 number, discloses the technology that the note that will constitute melody, rest etc. are represented with English alphabet.

But, reproducing in the method for melody by source of sound with the speech data that has temporal information of midi standard etc., in order to import this speech data, need special-purpose input application such as MIDI sequencer (MIDI sequencer), exist problems such as general consumer is unfamiliar with, method of operating indigestion.In addition, about according to the text of regulations such as MML, the method for Signed Domination pronunciation, be to be unfamiliar with elusive for ordinary consumer.In the public technology of the open communique of above-mentioned Jap.P.,, exist those English alphabets to represent what sound, the problem that can not grasp on directly perceived though can make employed English alphabet with text editing.

In addition, phonetic synthesis also is known by the next method of character string of textual form record.This technology is that character string that phonetic synthesis is used is with the form pronunciation approaching with people's sound, not corresponding to the generation of musical sound.

So wishing has such technological development: can easily carry out the reproduction of musical sound with the record of being undertaken by text, simultaneously can phonetic synthesis further be blended in the text and the language of the pronunciation usefulness of recording and narrating.

Summary of the invention

The present invention finishes with hope in view of the above problems, its purpose is to provide a kind of musical sound and voice reproduction device and musical sound and voice reproduction method, carry out the reproduction of musical sound with the record that the text that can intuitively be held by the people carries out, and can also carry out reproduction simultaneously by the language of text record.

The 1st feature of the present invention is that in musical sound and the voice reproduction device, the text data based on the text by the plan sound that has showed the musical sound that constitutes melody constitutes reproduces above-mentioned melody.

The 2nd feature of the present invention is in musical sound and voice reproduction device, at each text of the plan sound of recording and narrating in the above-mentioned text data of performance, to have defined the tone color of the musical sound that is pronounced at least.

The 3rd feature of the present invention is that in musical sound and voice reproduction device, each text at having showed the plan sound of recording and narrating in the above-mentioned text data has also defined the pitch of the musical sound that is pronounced and any one in the duration of a sound.

The 4th feature of the present invention is, explanation recorded and narrated phonetic synthesis character character string and produce in the musical sound and voice reproduction device of voice, have: transformation component, its input mixes the text data of recording and narrating with above-mentioned character string with by the melody that the text of the plan sound that has showed the musical sound that constitutes melody constitutes with text, explain above-mentioned melody text, be transformed into tone data; Sound source part, it reproduces above-mentioned tone data; Voice reproduction portion, it explains above-mentioned character string and realize voice again.

The 5th feature of the present invention is, but in the control method that is applicable to musical sound that phonetic synthesis and tone data reproduce and voice reproduction device, have following step: input will record and narrate phonetic synthesis character character string and mix the step of the text data of recording and narrating with text by the melody that the text of the plan sound that has showed the musical sound that constitutes melody constitutes; Explain that above-mentioned melody is transformed into the step of tone data with text; Reproduce the step of above-mentioned tone data; Explain above-mentioned character string and the step of realize voice again.

The 6th feature of the present invention is, in the program that in the computing machine that is mounted with speech-sound synthesizing function and tone data representational role, is suitable for, have following step: input will record and narrate phonetic synthesis character character string and mix the step of the text data of recording and narrating with text by the melody that the text of the plan sound that has showed the musical sound that constitutes melody constitutes; Explain that above-mentioned melody is transformed into the step of tone data with text; Reproduce the step of above-mentioned tone data; Explain above-mentioned character string and the step of realize voice again.

According to the present invention, record and narrate the musical sound of reproduction by having showed the character of intending sound, the people can intuitively hold this character and record and narrate, and has the effect that the ordinary consumer beyond the expert such as slip-stick artist also can easily understand that.

In addition, because record and narrate the musical sound reproduced and the language (or additional language that the rhythm of regulation is arranged) of pronunciation, so can use so-called text editing easily to record and narrate with textual form.

In addition,, can in the language of phonetic synthesis, add the musical sound that record and narrate to reproduce, so compositional language and musical sound reproduce the user easily because in 1 text.

Description of drawings

Fig. 1 is the block diagram that the formation of the musical sound of one embodiment of the invention and voice reproduction device is shown;

Fig. 2 comprises prosodic sign and the text exemplary plot of the HV-Script that records and narrates;

Fig. 3 A is the view that the example of prosodic sign is shown;

Fig. 3 B is the view of the frequency characteristics control the when voice reproduction of being represented by prosodic sign is shown;

Fig. 3 C is the view of the frequency characteristics control the when voice reproduction of being represented by prosodic sign is shown;

The view of the example of the note word that Fig. 4 is among the HV-Script to be comprised;

Fig. 5 is the view that the example of the record of specifying note interval, note length and speed (テ Application Port) is shown;

Fig. 6 A is a view of controlling the example of note interval and note length when illustrating by note word reproduction plan sound;

Fig. 6 B is the view that the example of the speed of controlling the note word is shown;

Fig. 7 is the block diagram that the formation of HV source of sound shown in Figure 1 is shown;

Fig. 8 is the block diagram that the structure of resonance peak generating unit shown in Figure 7 is shown;

Fig. 9 is the process flow diagram that the login process of musical sound and voice reproduction device is shown;

Figure 10 is the process flow diagram that the HV-Script interpretation process of musical sound and voice reproduction device is shown;

Figure 11 is the block diagram that the structure of the portable telephone that is suitable for musical sound and voice reproduction device is shown.

Embodiment

The most preferred embodiment that present invention will be described in detail with reference to the accompanying.

Fig. 1 is the block diagram that the structure of the musical sound of one embodiment of the invention and voice reproduction device is shown.This musical sound and voice reproduction device use the description method that is called as HV-Script (voice script, HV:Human Voice), and it is a text, and the symbol that comprises regulation in order to reproduce musical sound and voice is recorded and narrated.This HV-Script includes the pronunciation character string (utterance characterstring) and the note word (note word) of the object that becomes phonetic synthesis, and this pronunciation character string includes prosodic sign (rhythm orintonation symbols).In addition, prosodic sign is the symbol that is used to specify the transcription form of stress (accent) etc.

Among Fig. 1, Reference numeral 1 expression HV-Script player (HV-Script player), the control of carrying out the reproduction of HV-Script and stopping etc.Input HV-Script receives that HV-Script player 1 begins the explanation (translation) of this HV-Script when reproducing indication.That is, corresponding to the record content of HV-Script, any one in control HV driver (HV driver) 2 and the note word transducer (noteword converter) 5 and handling.In addition, recording and narrating in HV-Script has under the situation of note word, also will carry out the time management at this note interval (or duration (duration)).

HV driver 2 with the synthetic dictionary of being stored in the storer 3 (sound/speech synthesis dictionary), is carried out following processing with reference to synthetic dictionary.

People's sound has the resonance peak (formant, promptly intrinsic frequency spectrum) of shapes such as existing with ... vocal cords, oral cavity, and synthetic dictionary is being stored the parameter relevant with this resonance peak.Synthetic dictionary is a database, the parameter that will obtain according to the result who samples, analyzes with the pronunciation literal unit of reality (for example being literal units such as " あ ", " い " under the situation of Japanese), as resonance peak frame data (formant framedata), and with the pronunciation literal unit store in advance.This database also stores the data that change the parameter of resonance peak corresponding to prosodic sign.

HV driver 2, the pronunciation character string that includes prosodic sign among the HV-Script is made an explanation, use synthetic dictionary and as the resonance peak frame data of the transcription form of expression standard, after further being transformed to the additional resonance peak frame row that have by the change of the tone of prosodic sign appointment etc., offer HV source of sound 4.HV source of sound 4 generates the pronunciation signal according to the resonance peak frame row from 2 outputs of HV driver, to totalizer 8 outputs.

Describe at prosodic sign below.

Fig. 2 illustrates an example of the Japanese article that contains HV-Script.In this example, the character string of " か _ 3 さ Ga ほ ^5 _ 4 い ' ね $2-" that is surrounded by the represented special control character of symbol (1) (being " S " here) partly is equivalent to HV-Script, and other parts are common texts.This HV-Script " か _ 3 さ Ga ほ ^5 _ 4 い ' ね $2-" is to use and has added the tone of wishing and carry out that prosodic sign that phonetic synthesis uses records and narrates in the language of " かさ Ga ほいね one ".That is to say symbol " ' ", " ^ ", " _ ", " $ " etc. are equivalent to prosodic sign, expression is attached to the kind of the tone on other the literal (being kana text).With respect to having added the stress that limits followed by the literal of recording and narrating after this prosodic sign (recording and narrating under the situation of numerical value immediately following after the prosodic sign, is the literal of following this numerical value).

Fig. 3 A illustrates the meaning in the pronunciation control of above-mentioned prosodic sign (being the typical example of prosodic sign).That is to say prosodic sign " ' " the meaning be to raise up at the prefix tone, specify the frequency characteristics control (1) shown in Fig. 3 B, the meaning of prosodic sign " ^ " is that the pronunciation medium pitch raises up, and specifies the frequency characteristics control (3) shown in Fig. 3 C.In addition, the meaning of prosodic sign " _ " is to descend at the prefix tone, specifies the frequency characteristics control (2) shown in Fig. 3 B, and the meaning of prosodic sign " $ " is that the pronunciation medium pitch descends, and specifies the frequency characteristics control (4) shown in Fig. 3 C.That is, make each prosodic sign pronounce to control according to above-mentioned frequency characteristics control.The phonetic synthesis of the language of wishing by this prosodic sign.In addition, be additional to the variable quantity that prosodic sign numerical value afterwards is used to specify stress.For example, in " か _ 3 さ Ga ", the prefix tone that is illustrated in " さ " amount of " 3 " that only descends, ensuing " Ga " kept the tone after this decline and pronounced, and " か " expression is pronounced with the tone (or pitch) of standard.

As mentioned above, in the character that in making the language of its pronunciation, is included during additional stress (perhaps tone), before corresponding characters, comprise the prosodic sign shown in Fig. 3 A (variable quantity of expression tone), numerical value and record and narrate the article of HV-Script type as shown in Figure 2.In the present embodiment,, only adopt the prosodic sign relevant to be illustrated, in addition, can also use the prosodic sign of the power of controlling sound, speed, tonequality etc. with tone control as prosodic sign.

Note word transducer 5, the note word that is contained among the HV-Script, be transformed to note information (perhaps tone data, musical tone data) with reference to default note word list (and user definition note word list of being logined), then to source of sound 7 outputs.The note word list storer of symbol 6 expression storage default (default) note word lists.In this default note word list, as shown in Figure 4, to predefined each note word definitions " tone color name (tone color name) ", " program transformation (program change) ", " note number (note number) " and " UL (notelength) ".

Here, so-called note word is based on the character of the plan sound (for example sound such as " どん ", " ぽん " etc.s) that can intuitively hold of performance people or character string and the word of record.These note words example as shown in Figure 4 is such, also can increase the symbol (for example C3, C#3, C4, E3, F3, G3 etc.) of expression scale (tone pitch or musical scale).

In addition, among the HV-Script, also use to specify will after character, character string be considered as the symbol (in the present embodiment, this symbol is " Z1 ") of regulation of the note word mode of note word and the symbol (i.e. " Z0 ") of removing the note word mode.

In the note word list, the tone color when the tone color name is represented to reproduce corresponding note word.In addition, a series of data by program transformation (being tamber data), note number (that is, the pitch data) and UL (that is, the UL data) constitute are equivalent to above-mentioned note information.Example shown in Figure 4 is to utilize above-mentioned midi standard, and the classification of musical instrument is represented in program transformation, phonemic notation representation pitch, the length of the pronouncing note of UL.In addition, in Fig. 4, as UL and example shows the classification of note, and as actual data, definition has the door time of realizing corresponding UL.

Except above-mentioned, in order to support the input of more detailed note interval, note length, the numerical value that also can define is is as shown in Figure 5 recorded and narrated (promptly specifying the record of note interval, note length, speed).Among Fig. 5, X (numerical value) will pronounce to the time interval (or note at interval) of next pronunciation to be set at specified numerical value from last.Y (numerical value) specifies the pronunciation with correspondence only to prolong with specified numerical value time corresponding.About this part content, example illustrates in Fig. 6.In addition, T (numerical value) command speed, example illustrates in Fig. 6 B.

Below, the record example of the HV-Script that contains the note word is described.

(example 1) " えいとびいとだ I Z1X400 どんぱんどどぱん Z0 いか Ga? "

In this example, folded scope is the note word between symbols Z 1 and the Z0.When reproducing this record, after " 8 PVC, one トだ I " pronunciation, then reproduce 8 rhythm of clapping that undertaken by drum.At this, there is the time interval of 400ms till initial plan sound " どん " pronunciation.Then, send " いか Ga? " the sound of such language.

(example 2) " T100 ぶ E3 ぶ F3 ぶん G3 ぶん E4 ぶん C4Y800 "

In this routine literary composition, press such tone color of reproducing bass of being put down in writing with the speed of numerical value (100).Last " ぶん C4Y800 " prolongs 800ms with the reproduction sound of the plan sound " ぶん " of pitch C4 and pronounces.

In addition, the user of the musical sound of present embodiment and voice reproduction device, the user data that can represent Fig. 1 with the user definition note word list of being stored in the storer 10 via login API (the login application program interface: Registered Application Program Interface) 11, be sent to the note word list and store with storer 6.Note word transducer 5 behind HV-Script player reception note word, with reference to the note word list of note word list with storer 6, is transformed to this note word note information and outputs to source of sound 7.Source of sound 7 generates note signal, and outputs to totalizer 8 based on the note information that is provided by note word transducer 5.And, can adopt the FM source of sound (Frequency Modulation Sound Source) of corresponding midi standard or PCM source of sound (Pulse CodeModulation Sound Source) etc. as source of sound.

Totalizer 8 is the pronunciation signal that is provided by HV source of sound 4 and added syntheticly by the note signal that source of sound 7 provides, and its addition results is outputed to loudspeaker 9.Loudspeaker 9 carries out the pronunciation of voice and musical sound based on the composite signal that provides from addometer 8.

Then, the details of HV source of sound 4 illustrates by reference Fig. 7 and block diagram shown in Figure 8.

(the compounded sine wave mode: Composite Sinusoidal WaveModel) move by the phonetic synthesis mode according to CSM for HV source of sound 4.A phoneme (phoneme) (perhaps vowel, consonant (vowel, consonant) etc. voice inscape) constitute by 8 kinds of resonance peaks, in the above-mentioned synthetic dictionary, storing 8 groups formant frequency, resonance peak rank and tone information etc. as parameter.

HV source of sound 4 shown in Figure 1 as shown in Figure 7, have 8 resonance peak generating unit 40a～40h and a tone generating unit 50, based on the resonance peak signal that generates among each resonance peak generating unit 40a～40h with the parameter of the relevant resonance peak of sequencer (not shown) output and tone information from pronunciation, synthetic and generate the phoneme of hope in audio mixing portion 60.By the such phoneme generative process of continuous execution, and synthetic voice of wishing.In addition, each resonance peak generating unit 40a～40h generates as its basic basic waveform (basic waveform) in order to generate the resonance peak signal, and the generation of this basic waveform can be used for example waveform generator of known FM.Tone generating unit 50 has the function that generates tone (pitch) by computing.The phoneme of pronunciation only under corresponding to the situation that sound (voiced sound) arranged, is attached to the tone after the computing on the phoneme that is generated.

Below, the structure of each resonance peak generating unit 40a～40h is described with reference to Fig. 8.

Each resonance peak generating unit 40a～40h is made of waveform generator (waveform generator) 41, noisemaker (noise generator) 42, totalizer 43 and amplifier 44.

Waveform generator 41 is based on the phase place of the basic waveform (sine wave, triangular wave etc.) of the specified formant frequency of each resonance peak of each phoneme, resonance peak and each waveform and order produces the resonance peak that constitutes each phoneme.The classification of the resonance peak that noisemaker 42 is produced according to waveform generator 41, promptly sound to be arranged still be voiceless sound (unvoiced sound), and move.Under the asonant situation, generate noise and offer totalizer 43.

Totalizer 43 will be added by the resonance peak of waveform generator 41 generations and the noise that is produced by noisemaker 42.The addition results of this totalizer 43, be amplified to the resonance peak rank of regulation by amplifier 44 after, export.

The formation of each resonance peak generating unit 40a～40h is relevant with in the resonance peak that constitutes each phoneme one.A phoneme is to synthesize a plurality of resonance peaks (being 8 in the present embodiment) and formation.So needs generate a plurality of resonance peaks that constitute each phoneme and synthesize.Therefore, form so as shown in Figure 7 formation, use the phonetic synthesis of formant parameter.

As mentioned above, in the above-mentioned CSM phonetic synthesis,, determine the data content of each phoneme, make up a plurality of phonemes and carry out phonetic synthesis by the synthetic a plurality of resonance peak sounds that generate based on frequency parameter and amplitude parameter etc.Such as, when the such speech of phonetic synthesis " さㄑら ", by in every several milliseconds of frequency and amplitude parameters of setting many groups to a few tens of milliseconds etc., synthetic 6 phonemes as described below and order is pronounced.

/S/→/A/→/K/→/U/→/R/→/A/

Offer the parameter of each resonance peak generating unit, pre-defined by each phoneme as mentioned above, sign in in the above-mentioned synthetic dictionary.In addition, about the information relevant, for example under the situation of " さ " with the phoneme that constitutes each character, represent this kana character be by two phonemes (being consonant and vowel)/S/ and/ information that A/ constitutes, also sign in in the above-mentioned synthetic dictionary.And, under situation, to add change in the resonance peak frame data corresponding to each phoneme that this prosodic sign was suitable for, and offer HV source of sound 4 corresponding to prosodic sign by prosodic sign change stress.

Use Fig. 9 and process flow diagram shown in Figure 10 below, the action of the musical sound and the voice reproduction device of present embodiment is described.

At first, as shown in Figure 9, as required, the user imports user definition note word list and logins (step S01).The user is under the situation of only utilizing default note word list, and this login step S01 can omit.Read in storer 10 from user data by login API11 by user input, the user definition note word list stored in storer 10 at user data, this user definition note word list is stored in the note word usefulness storer 6.

Then, use text editing and make HV-Script, sign in to (step S02) in the HV-Script player by the user.

Then, explain and to carry out processing shown in Figure 10 when beginning to indicate when send HV-Script by the user.

In the following description, in the HV-Script that is logined, recording and narrating the pronunciation character string of the object that becomes phonetic synthesis, and also recording and narrating in this pronunciation character string there is the note word.

HV-Script player 1 begins indication, the explanation of the character string that beginning is recorded and narrated among the HV-Script corresponding to what the user sent.Record among the HV-Script player 1 serial interpretation HV-Script judges whether to comprise " Z1 " (the step S11) as note word mode designated symbols.

When detecting note word mode designated symbols " Z1 ", further judge whether to comprise the note word mode and separate divided-by symbol " Z0 " (step S12).

The judged result of step S12 is under the situation of " No ", promptly, though detect note word mode designated symbols " Z1 ", but detecting the note word mode separates in the absence of divided-by symbol " Z0 ", HV-Script player 1 is separated divided-by symbol " Z0 " character before with the note word mode that occurs in the character string of being explained in proper order after it, up to next time, make an explanation as the note word, and to 5 outputs (step S13) of note word transducer.In addition, though detecting note word mode designated symbols " Z1 ", but not comprising the note word mode in the character string afterwards fully separates under the situation of divided-by symbol " Z0 ", the character to the last that note word mode designated symbols " Z1 " is later, make an explanation as the note word, and to 5 outputs of word transducer.

Receive the note word transducer 5 of the data of note word, with default note word list of being stored in the storer 6 and user definition note word list, be transformed to the pairing note information of this note word with reference to the note word.And the temporal information that 5 pairs in note word transducer is recorded and narrated as the incidental information of note word makes an explanation and carries out time management, when reaching official hour, to the necessary note information (step S14) of source of sound 7 outputs.

From the source of sound 7 of note word transducer 5 reception note informations, based on this note information, produce note signal, export (step S15) by totalizer 8 to loudspeaker 9.Thus, reproduce musical sound from loudspeaker 9.

On the other hand, in step S11, judge when not comprising note word mode designated symbols " Z1 " among the HV-Script, perhaps in step S12, be judged as when detecting the note word mode and separating divided-by symbol " Z0 ", in the character string after becoming the character of explaining object, will be up to note word mode designated symbols " Z1 " character before, discern as the pronunciation character string, to 2 outputs (step S16) of HV driver.

Receive the HV driver 2 of above-mentioned pronunciation character string, with the synthetic dictionary of being stored in the storer 3, be transformed to resonance peak frame row with reference to synthetic dictionary.When in the pronunciation character string, including prosodic sign, generate the additional resonance peak frame row that have corresponding to the change of this prosodic sign, and to 4 outputs (step S17) of HV source of sound.

HV source of sound 4 based on the resonance peak frame row that provide from HV driver 2, is carried out phonetic synthesis, produces voice signal, exports (step S18) by totalizer 8 to loudspeaker 9.Thus, by the pronunciation character string after the loudspeaker 9 reproduction phonetic syntheses.

After, according to the judgement of step S19, till the last character that detects HV-Script, HV-Script player 1 repeats the processing of step S11～S19.And,, finish processing shown in Figure 10 in the moment of the last character that detects HV-Script.

In addition, the content of Fig. 9 and process flow diagram shown in Figure 10 is the example explanation, and the present invention is not limited to this contents processing.

Among Figure 11, what symbol 21 was represented is the internal circuit of portable telephone and the CPU of control function piece (central processing unit: Central Processing Unit).Symbol 22 expressions are carried out data to the outside and are sent the antenna that receives usefulness.Symbol 23 expression Department of Communication Forces, modulation sends data, sends the reception data that demodulation simultaneously receives via antenna 22 via antenna 22.Symbol 24 expression speech processes portions, during conversations such as the telephone set of portable telephone and outside, to be transformed to voice signal from the speech data of the partner of Department of Communication Force 23 output and to earphone (earphone or ear speaker, not shown in the figures) output, simultaneously will be by microphone (microphone, not shown in the figures) pick up and the voice signal that generates is transformed to speech data, to Department of Communication Force 23 outputs.

Symbol 25 expression sources of sound have same function with HV source of sound 4, source of sound 7 shown in Figure 1.Symbol 26 expression loudspeakers carry out the pronunciation of voice and musical sound.Symbol 27 expressions receive the operating portion of user's operation.RAM (the random access memory: Random-AccessMemory) of the text data of the relevant HV-Script that symbol 28 expression storages are above-mentioned and user definition note word list defined by the user etc.ROM (the ROM (read-only memory): Read-Only Memory) of program that symbol 29 expression storage is carried out by CPU21 and synthetic dictionary, default note word list etc.Symbol 30 is represented for example display part of LCD etc., the operational circumstances of explicit user, the state of portable telephone etc.Symbol 31 expression Vib.s receive during incoming call from the indication of CPU21 and vibrate.Above-mentioned circuit and functional block interconnect by bus B.

Portable telephone has the function that generates Wave data based on the voice of reality, and the voice by microphone picks up are transformed to Wave data by speech processes portion 24, and this Wave data is stored among the RAM28.In addition, by Department of Communication Force 23 when Web server is downloaded melody phrase data, with this melody phrase data storage in RAM28.

CPU21 is according to institute's program stored among the ROM29, carries out the action same with HV-Script player 1 shown in Figure 1, HV driver 2 and note word transducer 5.That is, CPU21 reads HV-Script from RAM28, and the record content of this HV-Script is made an explanation.In the record of this HV-Script, with the part that the special control character of stipulating surrounds, be the pronunciation character string of the pronunciation object of phonetic synthesis, so CPU21 is with reference to the synthetic dictionary of being stored among the ROM29, this pronunciation character string is transformed to resonance peak frame row, to source of sound 25 outputs.

On the other hand, in the record in the HV-Script, it is the used again note word of musical sound that note word mode designated symbols " Z1 " and note word mode are separated the part that clips between the divided-by symbol " Z0 ", so CPU21 is with reference to default note word list of being stored among the RAM28 and user definition note word list, this note word is transformed to note information, to source of sound 25 outputs.

Source of sound 25 generates voice signal based on these resonance peak frame row under the situation that resonance peak frame row are provided from CPU21, output to loudspeaker 26.In addition, when CPU21 provided note information, source of sound 25 generated note signal based on this note information, outputs to loudspeaker 26.Loudspeaker 26 is based on voice signal or note signal and carry out the pronunciation of voice or musical sound.

By the user operating portion 27 is operated, started software, can in the displaying contents of confirming display part 30, make HV-Script corresponding to text editing.In addition, the HV-Script that is made also can be saved among the RAM28.

The HV-Script that is made also can be applied to the melody of sending a telegram here.Action in this case is as described below.

Use HV-Script to be stored in advance as set information under the situation among the RAM28 when incoming call, when Department of Communication Force 23 receives from the call information (call establishment information) of transmissions such as other portable telephone via antenna 22, Department of Communication Force 23 notice CPU21 incoming calls.Receive the CPU21 of call-in reporting, from RAM28, read set information, from RAM28, read the HV-Script that this set information is represented thus, begin this explanation.Later action according to the record of HV-Script, is carried out the pronunciation of voice or musical sound from loudspeaker 26 as described previously.

In addition, the user also can make the text data that comprises HV-Script in the Email (electronic mail), and sends to the terminal of outside.

That is, the literary composition example that HV-Script also can be is is as shown in Figure 2 recorded and narrated the text of Email or the title of Email etc. in the place that the special control character (S) by symbol (1) expression is clipped like that.Perhaps, record HV-Script in the appended document of regulation (for example can discern extension name according to the rules and comprise the appended document of HV-Script), and add in the Email of transmission also passable.Then, the HV-Script that is comprised in the text of CPU21 explanation Email or the appended document when the operation of being stipulated by the user, according to the record of this HV-Script, provides the reproduction indication also passable to speech processes portion 24.In addition, HV-Script is mixed under the situation of the form in other the character string as shown in Figure 2, and CPU21 skips the character beyond the part that is clipped by special control character and do not read, not the object that reproduces as phonetic synthesis or musical sound.

In addition, each function of HV-Script player 1, HV driver 2, Waveform reproduction player (being the HV source of sound) 4 and phrase reproduction player (being source of sound) 7 needn't necessarily be loaded among the CPU21.In this case, source of sound 25 also can load above-mentioned functions arbitrarily.In addition, the scope of application of the present invention also not only is confined to portable telephone, PHS (personnel hand system for example, PersonalHandyphone System, the registered trademark of Japan), the function that also can load the musical sound and the voice reproduction device of present embodiment in the portable information terminal so-called portable terminal devices such as (PDA:Personal DigitalAssistant).

In addition, but each functional programs that realizes HV-Script player 1 shown in Figure 1, HV driver 2 and note word transducer 5 is read in the computer system that phonetic synthesis and musical sound reproduce and carries out, thereby also can carry out the reproduction of musical sound and the voice of HV-Script.And the notion of above-mentioned " computer system " is the hardware of computing machine not only, also comprises its peripheral equipment, OS (operating system: etc. software Operating System).

In addition, said procedure also can be from the computer system of this program of storage memory storage etc., and the transfer medium (network system etc.) through regulation by the transmission ripple in the transfer medium, is sent in other the computer system." transfer medium " of above-mentioned convey program refers to the communication line medium such, that have the function that transmits electronic information of the communication network that is made of networks such as internets, telephone wire etc.

And said procedure does not need to realize above-mentioned whole function, only realizes that wherein the function of a part also can.Further, realize above-mentioned functions, promptly realize that by differential file (or difference program) above-mentioned functions is also passable by the combination of both depositing between the program of having write down in the computer system.

As mentioned above, embodiments of the invention and suitable example have been described in detail with reference to accompanying drawing, but concrete formation and the action of the present invention is not limited to present embodiment, and the formation that does not exceed the scope of main idea of the present invention is also contained in the scope of the present invention.

Claims

1. a musical sound transcriber is characterized in that, comprising:

First storage part, its storage package contain to intend the text data that sound has showed the text of musical instrument sound;

Second storage part, its storage is to having showed the text of musical instrument sound and the table of setting up corresponding relation by the note information of the original musical instrument sound that shows of this article to intend sound;

The note signal generating unit, it produces the note signal of representing the musical instrument sound based on the note information corresponding with above-mentioned text, and wherein, above-mentioned note information is based on the above-mentioned text that is included in the above-mentioned text data and comes to obtain with reference to above-mentioned table.

2. musical sound transcriber as claimed in claim 1 is characterized in that,

Above-mentioned note information comprises the information of expression pitch or UL,

Above-mentioned note signal generating unit is included in above-mentioned pitch in the above-mentioned note information according to expression or the information of UL produces above-mentioned note signal.

3. musical sound and voice reproduction device is characterized in that, comprising:

First storage part, it stores text data, and text data comprise character string that is made of the character that is used in phonetic synthesis and the text that has showed the musical instrument sound with the plan sound;

The 3rd storage part, the synthetic dictionary of corresponding relation is set up in its storage to character and resonance peak frame data;

Transformation component, it comes with reference to above-mentioned table based on the above-mentioned text that is included in the above-mentioned text data, and is note information with above-mentioned text transform;

Source of sound, its after based on conversion above-mentioned note information and produce the note signal of expression musical instrument sound;

Voice reproduction portion, it is realize voice based on the resonance peak frame data corresponding with each character that constitutes above-mentioned character string and again, and wherein, above-mentioned resonance peak frame data are based on above-mentioned character string and come to obtain with reference to above-mentioned synthetic dictionary.

4. musical sound as claimed in claim 3 and voice reproduction device is characterized in that,

Above-mentioned source of sound is included in above-mentioned pitch in the above-mentioned note information according to expression or the information of UL produces above-mentioned note signal.

5. a musical sound reproducting method is characterized in that, comprising:

Storage package contains to intend the step of text data that sound has showed the text of musical instrument sound;

Storage is to having showed the text of musical instrument sound and the step of setting up the table of corresponding relation by the note information of the original musical instrument sound that shows of this article to intend sound;

Produce the step of the note signal of expression musical instrument sound based on the note information corresponding with above-mentioned text data, wherein, above-mentioned note information is based on the above-mentioned text that is included in the above-mentioned text data and comes to obtain with reference to above-mentioned table.

6. musical sound and voice reproduction method is characterized in that, comprising:

The step of storage text data, text data comprise character string that is made of the character that is used in phonetic synthesis and the text that has showed the musical instrument sound with the plan sound;

The step of the synthetic dictionary of corresponding relation is set up in storage to character and resonance peak frame data;

Come with reference to above-mentioned table based on the above-mentioned text that is included in the above-mentioned text data, and be the step of note information above-mentioned text transform;

Produce the step of the note signal of expression musical instrument sound based on the above-mentioned note information after the conversion;

The step of realize voice based on the resonance peak frame data corresponding with each character that constitutes above-mentioned character string and again, wherein, above-mentioned resonance peak frame data are based on above-mentioned character string and come to obtain with reference to above-mentioned synthetic dictionary.