CN112863530A - Method and device for generating sound works - Google Patents
Method and device for generating sound works Download PDFInfo
- Publication number
- CN112863530A CN112863530A CN202110018240.4A CN202110018240A CN112863530A CN 112863530 A CN112863530 A CN 112863530A CN 202110018240 A CN202110018240 A CN 202110018240A CN 112863530 A CN112863530 A CN 112863530A
- Authority
- CN
- China
- Prior art keywords
- sound
- gain
- segments
- segment
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000000694 effects Effects 0.000 claims abstract description 125
- 238000009499 grossing Methods 0.000 claims description 92
- 238000012545 processing Methods 0.000 claims description 49
- 230000008569 process Effects 0.000 claims description 43
- 238000005070 sampling Methods 0.000 claims description 40
- 239000012634 fragment Substances 0.000 claims description 21
- 230000009977 dual effect Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 210000005069 ears Anatomy 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 206010007776 catatonia Diseases 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a method and a device for generating sound works, wherein the method comprises the following steps: receiving a plurality of recorded sub-segments; respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized; and splicing the plurality of sound segments to be synthesized to generate the target sound work. Thereby be convenient for the user to the nimble creation of recording segment, realize the diversified selection to selection schemes such as sound segment and audio, and then generate the sound works that the audio is more diversified, tone quality is better.
Description
Technical Field
The invention relates to the technical field of audio processing, in particular to a method and a device for generating a sound work.
Background
Sound is the most natural and convenient way to communicate from person to person.
With the continuous development of internet social contact, the number of sound social contact products is increased, and the playing methods of sounds are increased. For example, many audio-like APPs provide such functionality: the user can record own sound and preset certain effects, the user can listen to different sound effect effects in an audition mode, and finally a satisfactory effect is selected to generate a section of sound clip.
However, in the process of recording and generating sound segments on the existing sound APP, a user can only select one effect to generate a final sound segment, and the selection scheme for the sound segment and the sound effect is generally single, so that the flexible creation requirement of the user on the sound works cannot be met.
Disclosure of Invention
The invention provides a method and a device for generating sound works, and solves the technical problems that in the prior art, the selection scheme of sound fragments and sound effects is single, and the flexible creation requirements of users on the sound works cannot be met.
The invention provides a method for generating a sound work, which comprises the following steps:
receiving a plurality of recorded sub-segments;
respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
and splicing the plurality of sound segments to be synthesized to generate the target sound work.
Optionally, the sound effect includes a sound changing effect and a scene sound effect, and the step of configuring the sound effect for a plurality of the recording sub-segments respectively according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized includes:
responding to an input sound effect configuration instruction, and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and respectively configuring the plurality of recording sub-segments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound segments to be synthesized.
Optionally, the step of generating the target sound work by splicing the plurality of sound segments to be synthesized includes:
splicing the sound segments to be synthesized and carrying out short-time fade-in and fade-out processing to generate intermediate sound segments;
and carrying out volume harmony processing on the intermediate sound fragment to generate the target sound work.
Optionally, the step of performing volume coordination processing on the intermediate sound segment to generate the target sound work includes:
dividing the middle sound segment into a plurality of frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segments to be processed;
determining a log-domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
and executing double-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual-gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of performing the dual-gain smoothing operation on each frame of the to-be-processed sound segment based on the log-domain amplitude gain to generate the target sound piece includes:
calculating a first smoothing gain value corresponding to the j frame of the sound segment to be processed according to the log domain amplitude gain corresponding to the j frame of the sound segment to be processed and the log domain amplitude gain corresponding to the j-3 frame of the sound segment to be processed; wherein j is not less than 4, and j is an integer;
executing a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing process of the jth frame of the to-be-processed sound segment is completed;
converting the log domain amplitude gain to the linear domain amplitude gain;
determining a plurality of sampling points from the j frame of the smoothed sound segment;
calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
executing second gain smoothing operation on each sampling point by adopting the sampling point gain to finish the second smoothing processing process of the jth frame of the sound segment to be processed;
and when the second smoothing processing process of each frame of the sound segment to be processed is finished, obtaining the target sound work.
The invention also provides a device for generating the sound works, which comprises:
the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments;
the sound effect configuration module is used for respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
and the splicing processing module is used for splicing the plurality of sound segments to be synthesized to generate the target sound works.
Optionally, the sound effect includes a sound changing effect and a scene sound effect, and the sound effect configuration module includes:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and the sound clip generation submodule to be synthesized is used for respectively configuring the plurality of recording sub-clips by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound clips to be synthesized.
Optionally, the splicing processing module includes:
the splicing submodule is used for splicing the sound segments to be synthesized and carrying out short-time fade-in and fade-out processing to generate intermediate sound segments;
and the volume coordination processing submodule is used for carrying out volume coordination processing on the middle sound fragment to generate the target sound work.
Optionally, the volume harmony processing sub-module includes:
the segment dividing unit is used for dividing the middle sound segment into a plurality of frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segments to be processed;
a log-domain amplitude gain determining unit, configured to determine, according to a preset dynamic range control curve, a log-domain amplitude gain corresponding to each input frame amplitude;
and the dual-gain smoothing operation unit is used for executing dual-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
a first smoothing gain value determining unit, configured to calculate a first smoothing gain value corresponding to a jth frame of the to-be-processed sound segment according to the log domain amplitude gain corresponding to the jth frame of the to-be-processed sound segment and the log domain amplitude gain corresponding to a j-3 th frame of the to-be-processed sound segment; wherein j is not less than 4, and j is an integer;
a first gain smoothing operation executing unit, configured to execute a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing procedure of the jth frame of the to-be-processed sound segment is completed;
a gain conversion unit for converting the logarithmic domain amplitude gain into the linear domain amplitude gain;
a sampling point determination unit for determining a plurality of sampling points from a jth frame of the smoothed sound piece;
the sampling point gain determining unit is used for calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
a second gain smoothing operation executing unit, configured to execute a second gain smoothing operation on each sampling point by using the sampling point gain, so that the second smoothing processing procedure of the jth frame of the to-be-processed sound clip is completed;
and the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound segment to be processed is finished.
According to the technical scheme, the invention has the following advantages:
the method comprises the steps of receiving a plurality of recording sub-segments input by a user, configuring corresponding sound effects for each recording sub-segment according to a sound effect configuration instruction input by the user for each recording sub-segment to obtain a plurality of sound segments to be synthesized, and finally splicing the sound segments to be synthesized to generate the target sound work. The technical problem that the selection scheme of sound segments and sound effects is single and the flexible creation requirements of users on sound works cannot be met in the prior art is solved. The user of being convenient for is to the nimble creation of recording segment, realizes the diversified selection to selection schemes such as sound segment and audio, and then generates the sound works that the audio is more diversified, and tone quality is better.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flowchart illustrating steps of a method for generating a sound work according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a method for generating a sound work according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a dynamic range control curve according to a second embodiment of the present invention;
fig. 4 is a block diagram of a sound work generation apparatus according to a third embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a device for generating a sound work, which are used for solving the technical problems that in the prior art, the selection scheme of sound fragments and sound effects is single, and the flexible creation requirements of users on the sound work cannot be met. The user can record a section of multi-person speech by one person, and different sound effects are adopted for each sub-section by selecting different sound processing methods, so that a sound work of multi-person conversation is finally synthesized. The whole process is like the process of dubbing some film and television works by sound and optimization and splicing the film and television works into a complete work in the later period. The process can be completed by multiple persons cooperating to dub, clip and synthesize, and the whole process is simple by the scheme that sound effects such as different sound changes can be set for sound sub-segments, and a sound dialogue work can be automatically generated by a single user through simple and convenient operation.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for generating a sound work according to an embodiment of the present invention.
The invention provides a method for generating a sound work, which comprises the following steps:
in the embodiment of the invention, a plurality of recording sub-segments can be recorded by a single user or a plurality of users.
It should be noted that each recording sub-segment may have the same recording time duration or different recording time durations, which is not limited in the embodiment of the present invention.
102, respectively configuring sound effects for the plurality of recording sub-segments according to a received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
after the plurality of recording sub-segments are obtained, corresponding sound effects can be configured for the plurality of recording sub-segments respectively based on a sound effect configuration instruction input by a user, so that each recording sub-segment can be set to meet the required scene sound effect and character sound changing effect, and a plurality of sound segments to be synthesized can be obtained.
And 103, splicing the sound segments to be synthesized to generate a target sound work.
After the plurality of to-be-synthesized sound segments are obtained, due to the fact that different to-be-synthesized sound segments are different in recording or sound effect setting conditions, defects of different sound sizes, different sound effects, irregular end points and the like may exist, and therefore the plurality of to-be-synthesized sound segments need to be further spliced so as to carry out smoothing, post-processing and the like on the joint of the sound segments, and the target sound work is generated.
In the embodiment of the invention, a plurality of sound fragments to be synthesized are obtained by receiving a plurality of sound recording sub-fragments input by a user, respectively configuring corresponding sound effects for each sound recording sub-fragment according to a sound effect configuration instruction input by the user for each sound recording sub-fragment, and finally splicing the sound fragments to be synthesized to generate the target sound work. The technical problem that the selection scheme of sound segments and sound effects is single and the flexible creation requirements of users on sound works cannot be met in the prior art is solved. The user of being convenient for is to the nimble creation of recording segment, realizes the diversified selection to selection schemes such as sound segment and audio, and then generates the sound works that the audio is more diversified, and tone quality is better.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for generating a sound work according to a second embodiment of the present invention.
The invention provides a method for generating a sound work, which comprises the following steps:
in the embodiment of the present invention, the specific implementation process of step 201 is similar to that of step 101, and is not described herein again.
optionally, the sound effects include a sound changing effect and a scene sound effect, and step 202 may include the following sub-steps:
responding to an input sound effect configuration instruction, and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and respectively configuring the plurality of recording sub-segments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound segments to be synthesized.
In the embodiment of the invention, after the plurality of recording sub-segments are obtained, the sound changing effect and the scene sound effect corresponding to the sound effect configuration instruction can be selected from a preset sound effect library in response to the sound effect configuration instruction input by a user, the voice of a person in the recording sub-segments is changed by the corresponding sound changing effect, and then the corresponding scene sound effect is configured for the background sound of the recording sub-segments, so that the plurality of corresponding sound segments to be synthesized are obtained.
The sound effect library may store various sound variation effects, such as a raptor, a girl's sound, a positive-too-loud, a tertiary sound, a thriller, a monster, a robot sound, a monster, a heavy mechanical sound, an electric sound, and the like, and various scene sound effects, such as a birdsong, a tidal sound, a light music, a catatonic atmosphere sound, a thriller atmosphere sound, and the like, which is not limited by the embodiment of the present invention.
In the embodiment of the present invention, the technical feature of step 103 in the first embodiment that "the target sound composition is generated by performing the splicing process by using the plurality of sound segments to be synthesized" may be replaced by the following steps 203 and 204:
since the process of recording sound is actively controlled by the user, the start and stop are not usually gradual processes, and if adjacent sound segments are directly spliced, a noise such as a "click" may be generated at the spliced portion.
In order to avoid the occurrence of multiple splicing noises in the finally generated sound work, the embodiment of the invention can adopt short-time fade-in and fade-out processing when a plurality of sound segments to be synthesized are spliced, so that the sound is gradually increased at the beginning of each sound segment and gradually decreased at the end of each sound segment, and adjacent sound segments can have smooth transition when being spliced without generating noises. And then splicing the plurality of sound segments to be synthesized according to the sequence to generate an intermediate sound segment.
Alternatively, in order not to affect the effect of the main body sound, the time length of the cross fade process is controlled to be 100ms, or is set by a technician according to the scene requirement, which is not limited by the embodiment of the present invention.
And 204, carrying out volume harmony processing on the intermediate sound segments to generate the target sound works.
After splicing together the sound segments, it is also necessary to perform a volume coordination process, i.e. dynamic range control of the sound, on the whole sound work. To achieve the desired results and to increase processing efficiency, step 204 may include the following substeps S1-S3:
s1, dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound fragments to be processed;
further, after the intermediate sound segment is obtained, the intermediate sound segment is divided into a plurality of frames of sound segments to be processed according to a preset time length, and simultaneously, the input frame amplitude of each frame of sound segment to be processed is calculated.
In the specific implementation, for example, the preset time length is 10ms, and the amplitude a of the input frame is calculated, two calculation methods, namely peak and rms, can be adopted, and the calculation methods are respectively expressed as formula (1) and formula (2):
Apeak=max{|xi||i=1,2,...,N} (1)
where N is the number of sampling points per frame of data, | xiAnd | is an absolute value representing the magnitude of the ith sample point. A. thepeakIs the maximum absolute value of the amplitude of each frame of data, ArmsIs the root mean square of the data per frame.
According to the psychoacoustic principle of human ears, the response of human ears to the loudness of signals is approximately logarithmic in signal amplitude, not linear. The input frame amplitude a in the linear domain can be converted into the input frame amplitude a in the logarithmic domaindBAnd calculating the gain value to be adjusted of the sound segment to be processed in a logarithmic domain.
Taking an example of the sampling data being 16-bit wide, the conversion formula is shown in formula (3):
AdB=20*log(A/32768) (3)
32768 is the maximum absolute value that can be represented by 16-bit width, 21532768, bit 1 is the sign bit.
S2, determining a log domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
dynamic range is the ratio of the maximum and minimum values of a variable signal (e.g., sound or light). It can also be expressed in base 10 logarithms (decibels) or base 2 logarithms.
In a specific implementation, a corresponding dynamic range control curve may be configured based on user input, as shown in fig. 3, where the dynamic range control curve includes an input amplitude a and an output amplitude B, the unit is dB, a straight line portion is a dynamic range control curve of an unadjusted input amplitude and output amplitude, and a curve portion is a dynamic range control curve of an adjusted input amplitude and output amplitude according to an embodiment of the present invention.
Each corresponding output amplitude B can be obtained based on each input amplitude a, whereby the required log domain amplitude gain Δ that needs to be increased or decreased is calculated, formula (4).
Δ=BdB-AdB (4)
And S3, performing double-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain, and generating an intermediate sound segment.
Further, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step S4 may include the following sub-steps:
calculating a first smoothing gain value corresponding to the j frame of the sound segment to be processed according to the log domain amplitude gain corresponding to the j frame of the sound segment to be processed and the log domain amplitude gain corresponding to the j-3 frame of the sound segment to be processed; wherein j is not less than 4, and j is an integer;
executing a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing process of the jth frame of the to-be-processed sound segment is completed;
converting the log domain amplitude gain to the linear domain amplitude gain;
determining a plurality of sampling points from the j frame of the smoothed sound segment;
calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
executing second gain smoothing operation on each sampling point by adopting the sampling point gain to finish the second smoothing processing process of the jth frame of the sound segment to be processed;
and when the second smoothing processing process of each frame of the sound segment to be processed is finished, obtaining the target sound work.
In one example of the present invention, in order to make smooth transition between frames of the signal after adjustment without generating broken noise, a dual-gain smoothing operation is required between adjacent frames, which includes a first smoothing process and a second smoothing process.
A first smoothing process: after the logarithmic domain gain is calculated, gain smoothing is carried out once, the current jth frame gain after smoothing is obtained by adopting the weighted moving average of the current jth frame and the historical 3 frames, and the calculation mode is as follows (5):
wherein, thetakRepresenting the weighting factor, Δ, of the k-th framekRepresenting the log domain gain value for the k-th frame.
In a specific implementation, the frame number smoothed by the gain may be a time length of 40 to 60ms, or may be set by a technician according to debugging itself, which is not limited in this embodiment of the present invention.
Since the amplitude representations of the input sound segment to be processed and the output target sound work are in a linear domain, after the first smoothing process is performed, the gain value Δ in a logarithmic domain needs to be converted into a linear domain gain δ to perform gain processing on the input signal, and the conversion mode is as shown in formula (6):
when the second smoothing process is executed, the gain difference alpha of the current j frame relative to the j-1 frame is firstly obtainedjAnd gain increase per point alphajN (N is the number of points of each frame), and finally the gain g of each sampling point of the jth frame is obtainedj,iCorresponding to the input signal at a sampling point xj,iMultiplying to obtain the target sound work yj,i。
αj=δj-δj-1 (7)
gj,i=δj-1+i*αj/N (8)
yj,i=gj,i*xj,i (9)
Where i denotes the ith sample point in each frame of the smoothed sound fragment.
Optionally, the above multi-segment dynamic range control manner can be used not only for non-real-time scenes processed by files, but also for real-time scenes processed by frames. For the non-real-time scene processed according to the file, further sound volume adjustment can be performed according to the condition of the whole audio file, so that the whole auditory sensation is more harmonious after the multiple sections of sound are spliced.
In the embodiment of the invention, a plurality of sound fragments to be synthesized are obtained by receiving a plurality of sound recording sub-fragments input by a user, respectively configuring corresponding sound effects for each sound recording sub-fragment according to a sound effect configuration instruction input by the user for each sound recording sub-fragment, and finally splicing the sound fragments to be synthesized to generate the target sound work. The technical problem that the selection scheme of sound segments and sound effects is single and the flexible creation requirements of users on sound works cannot be met in the prior art is solved. The user of being convenient for is to the nimble creation of recording segment, realizes the diversified selection to selection schemes such as sound segment and audio, and then generates the sound works that the audio is more diversified, and tone quality is better.
Referring to fig. 4, fig. 4 is a block diagram of a sound work generation apparatus according to a third embodiment of the present invention.
The invention provides a sound work generation device, comprising:
a recording sub-segment receiving module 401, configured to receive a plurality of recording sub-segments;
a sound effect configuration module 402, configured to configure sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction, so as to obtain a plurality of sound segments to be synthesized;
and a splicing processing module 403, configured to perform splicing processing on multiple sound segments to be synthesized, so as to generate a target sound work.
Optionally, the sound effects include a sound changing effect and a scene sound effect, and the sound effect configuration module 402 includes:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and the sound clip generation submodule to be synthesized is used for respectively configuring the plurality of recording sub-clips by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound clips to be synthesized.
Optionally, the splicing processing module 403 includes:
the splicing submodule is used for splicing the sound segments to be synthesized and carrying out short-time fade-in and fade-out processing to generate intermediate sound segments;
and the volume coordination processing submodule is used for carrying out volume coordination processing on the middle sound fragment to generate the target sound work.
Optionally, the volume harmony processing sub-module includes:
the segment dividing unit is used for dividing the middle sound segment into a plurality of frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segments to be processed;
a log-domain amplitude gain determining unit, configured to determine, according to a preset dynamic range control curve, a log-domain amplitude gain corresponding to each input frame amplitude;
and the dual-gain smoothing operation unit is used for executing dual-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
a first smoothing gain value determining unit, configured to calculate a first smoothing gain value corresponding to a jth frame of the to-be-processed sound segment according to the log domain amplitude gain corresponding to the jth frame of the to-be-processed sound segment and the log domain amplitude gain corresponding to a j-3 th frame of the to-be-processed sound segment; wherein j is not less than 4, and j is an integer;
a first gain smoothing operation executing unit, configured to execute a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing procedure of the jth frame of the to-be-processed sound segment is completed;
a gain conversion unit for converting the logarithmic domain amplitude gain into the linear domain amplitude gain;
a sampling point determination unit for determining a plurality of sampling points from a jth frame of the smoothed sound piece;
the sampling point gain determining unit is used for calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
a second gain smoothing operation executing unit, configured to execute a second gain smoothing operation on each sampling point by using the sampling point gain, so that the second smoothing processing procedure of the jth frame of the to-be-processed sound clip is completed;
and the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound segment to be processed is finished.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of generating a sound work, comprising:
receiving a plurality of recorded sub-segments;
respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
and splicing the plurality of sound segments to be synthesized to generate the target sound work.
2. The method for generating sound works according to claim 1, wherein the sound effects include a sound variation effect and a scene sound effect, and the step of configuring sound effects for the plurality of recording sub-segments respectively according to the received sound effect configuration command to obtain a plurality of sound segments to be synthesized includes:
responding to an input sound effect configuration instruction, and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and respectively configuring the plurality of recording sub-segments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound segments to be synthesized.
3. The method of claim 1, wherein the step of generating the target sound work by splicing the plurality of sound segments to be synthesized comprises:
splicing the sound segments to be synthesized and carrying out short-time fade-in and fade-out processing to generate intermediate sound segments;
and carrying out volume harmony processing on the intermediate sound fragment to generate the target sound work.
4. The method of claim 3, wherein the step of performing a volume coordination process on the intermediate sound segment to generate the target sound work comprises:
dividing the middle sound segment into a plurality of frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segments to be processed;
determining a log-domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
and executing double-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain to generate a target sound work.
5. The method of generating a sound work according to claim 4, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of generating a target sound work by performing the dual gain smoothing operation for the sound segment to be processed for each frame based on the log domain amplitude gain comprises:
calculating a first smoothing gain value corresponding to the j frame of the sound segment to be processed according to the log domain amplitude gain corresponding to the j frame of the sound segment to be processed and the log domain amplitude gain corresponding to the j-3 frame of the sound segment to be processed; wherein j is not less than 4, and j is an integer;
executing a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing process of the jth frame of the to-be-processed sound segment is completed;
converting the log domain amplitude gain to the linear domain amplitude gain;
determining a plurality of sampling points from the j frame of the smoothed sound segment;
calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
executing second gain smoothing operation on each sampling point by adopting the sampling point gain to finish the second smoothing processing process of the jth frame of the sound segment to be processed;
and when the second smoothing processing process of each frame of the sound segment to be processed is finished, obtaining the target sound work.
6. An apparatus for generating a sound work, comprising:
the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments;
the sound effect configuration module is used for respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
and the splicing processing module is used for splicing the plurality of sound segments to be synthesized to generate the target sound works.
7. The apparatus for generating sound works according to claim 6, wherein the sound effects include sound variation effects and scene sound effects, and the sound effect configuration module comprises:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound variation effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
and the sound clip generation submodule to be synthesized is used for respectively configuring the plurality of recording sub-clips by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound clips to be synthesized.
8. The apparatus for generating a sound work according to claim 6, wherein the splicing processing module comprises:
the splicing submodule is used for splicing the sound segments to be synthesized and carrying out short-time fade-in and fade-out processing to generate intermediate sound segments;
and the volume coordination processing submodule is used for carrying out volume coordination processing on the middle sound fragment to generate the target sound work.
9. The apparatus of claim 8, wherein the volume harmony processing submodule comprises:
the segment dividing unit is used for dividing the middle sound segment into a plurality of frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segments to be processed;
a log-domain amplitude gain determining unit, configured to determine, according to a preset dynamic range control curve, a log-domain amplitude gain corresponding to each input frame amplitude;
and the dual-gain smoothing operation unit is used for executing dual-gain smoothing operation on each frame of the sound segment to be processed based on the logarithmic domain amplitude gain to generate a target sound work.
10. The apparatus for generating a sound work according to claim 9, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
a first smoothing gain value determining unit, configured to calculate a first smoothing gain value corresponding to a jth frame of the to-be-processed sound segment according to the log domain amplitude gain corresponding to the jth frame of the to-be-processed sound segment and the log domain amplitude gain corresponding to a j-3 th frame of the to-be-processed sound segment; wherein j is not less than 4, and j is an integer;
a first gain smoothing operation executing unit, configured to execute a first gain smoothing operation on the to-be-processed sound segment of the jth frame by using the first smoothing gain value, so that the first smoothing processing procedure of the jth frame of the to-be-processed sound segment is completed;
a gain conversion unit for converting the logarithmic domain amplitude gain into the linear domain amplitude gain;
a sampling point determination unit for determining a plurality of sampling points from a jth frame of the smoothed sound piece;
the sampling point gain determining unit is used for calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the jth frame of the smooth sound segment;
a second gain smoothing operation executing unit, configured to execute a second gain smoothing operation on each sampling point by using the sampling point gain, so that the second smoothing processing procedure of the jth frame of the to-be-processed sound clip is completed;
and the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound segment to be processed is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110018240.4A CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110018240.4A CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112863530A true CN112863530A (en) | 2021-05-28 |
CN112863530B CN112863530B (en) | 2024-08-27 |
Family
ID=76004855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110018240.4A Active CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112863530B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113870896A (en) * | 2021-09-27 | 2021-12-31 | 动者科技(杭州)有限责任公司 | Motion sound false judgment method and device based on time-frequency graph and convolutional neural network |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710488A (en) * | 2009-11-20 | 2010-05-19 | 安徽科大讯飞信息科技股份有限公司 | Method and device for voice synthesis |
CN104410748A (en) * | 2014-10-17 | 2015-03-11 | 广东小天才科技有限公司 | Method for adding background sound effect according to position of mobile terminal and mobile terminal |
CN104517605A (en) * | 2014-12-04 | 2015-04-15 | 北京云知声信息技术有限公司 | Speech segment assembly system and method for speech synthesis |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
CN107154264A (en) * | 2017-05-18 | 2017-09-12 | 北京大生在线科技有限公司 | The method that online teaching wonderful is extracted |
CN107197404A (en) * | 2017-05-05 | 2017-09-22 | 广州盈可视电子科技有限公司 | A kind of audio Automatic adjustment method, device and a kind of recording and broadcasting system |
KR101800362B1 (en) * | 2016-09-08 | 2017-11-22 | 최윤하 | Music composition support apparatus based on harmonics |
WO2018077364A1 (en) * | 2016-10-28 | 2018-05-03 | Transformizer Aps | Method for generating artificial sound effects based on existing sound clips |
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN108064406A (en) * | 2015-06-22 | 2018-05-22 | 时光机资本有限公司 | It is synchronous for the rhythm of the cross-fade of music audio frequency segment for multimedia |
CN108347529A (en) * | 2018-01-31 | 2018-07-31 | 维沃移动通信有限公司 | A kind of audio frequency playing method and mobile terminal |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN109346044A (en) * | 2018-11-23 | 2019-02-15 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device and storage medium |
CN109686347A (en) * | 2018-11-30 | 2019-04-26 | 北京达佳互联信息技术有限公司 | Sound effect treatment method, sound-effect processing equipment, electronic equipment and readable medium |
CN110675848A (en) * | 2019-09-30 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device and storage medium |
CN111798831A (en) * | 2020-06-16 | 2020-10-20 | 武汉理工大学 | Sound particle synthesis method and device |
CN112133277A (en) * | 2020-11-20 | 2020-12-25 | 北京猿力未来科技有限公司 | Sample generation method and device |
-
2021
- 2021-01-07 CN CN202110018240.4A patent/CN112863530B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710488A (en) * | 2009-11-20 | 2010-05-19 | 安徽科大讯飞信息科技股份有限公司 | Method and device for voice synthesis |
CN104410748A (en) * | 2014-10-17 | 2015-03-11 | 广东小天才科技有限公司 | Method for adding background sound effect according to position of mobile terminal and mobile terminal |
CN104517605A (en) * | 2014-12-04 | 2015-04-15 | 北京云知声信息技术有限公司 | Speech segment assembly system and method for speech synthesis |
CN108064406A (en) * | 2015-06-22 | 2018-05-22 | 时光机资本有限公司 | It is synchronous for the rhythm of the cross-fade of music audio frequency segment for multimedia |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
KR101800362B1 (en) * | 2016-09-08 | 2017-11-22 | 최윤하 | Music composition support apparatus based on harmonics |
WO2018077364A1 (en) * | 2016-10-28 | 2018-05-03 | Transformizer Aps | Method for generating artificial sound effects based on existing sound clips |
CN107197404A (en) * | 2017-05-05 | 2017-09-22 | 广州盈可视电子科技有限公司 | A kind of audio Automatic adjustment method, device and a kind of recording and broadcasting system |
CN107154264A (en) * | 2017-05-18 | 2017-09-12 | 北京大生在线科技有限公司 | The method that online teaching wonderful is extracted |
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN108347529A (en) * | 2018-01-31 | 2018-07-31 | 维沃移动通信有限公司 | A kind of audio frequency playing method and mobile terminal |
CN108877753A (en) * | 2018-06-15 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN109346044A (en) * | 2018-11-23 | 2019-02-15 | 广州酷狗计算机科技有限公司 | Audio-frequency processing method, device and storage medium |
CN109686347A (en) * | 2018-11-30 | 2019-04-26 | 北京达佳互联信息技术有限公司 | Sound effect treatment method, sound-effect processing equipment, electronic equipment and readable medium |
CN110675848A (en) * | 2019-09-30 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device and storage medium |
CN111798831A (en) * | 2020-06-16 | 2020-10-20 | 武汉理工大学 | Sound particle synthesis method and device |
CN112133277A (en) * | 2020-11-20 | 2020-12-25 | 北京猿力未来科技有限公司 | Sample generation method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113870896A (en) * | 2021-09-27 | 2021-12-31 | 动者科技(杭州)有限责任公司 | Motion sound false judgment method and device based on time-frequency graph and convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN112863530B (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8874245B2 (en) | Effects transitions in a music and audio playback system | |
US12089021B2 (en) | Method and apparatus for listening scene construction and storage medium | |
CA2488689C (en) | Acoustical virtual reality engine and advanced techniques for enhancing delivered sound | |
JP4810541B2 (en) | Non-natural response | |
Rämö et al. | High-precision parallel graphic equalizer | |
JP2012235310A (en) | Signal processing apparatus and method, program, and data recording medium | |
CN103828232A (en) | Dynamic range control | |
US10728688B2 (en) | Adaptive audio construction | |
EP3761672B1 (en) | Using metadata to aggregate signal processing operations | |
CN112863530B (en) | Sound work generation method and device | |
US20220303710A1 (en) | Sound Field Related Rendering | |
CN113077771B (en) | Asynchronous chorus sound mixing method and device, storage medium and electronic equipment | |
JP2017525292A (en) | Apparatus and method for manipulating input audio signals | |
US20230230607A1 (en) | Automated mixing of audio description | |
CN113905307A (en) | Color slide block | |
CN113766307A (en) | Techniques for audio track analysis to support audio personalization | |
CN116994545B (en) | Dynamic original sound adjusting method and device for K song system | |
CN115835112A (en) | Projector-based adaptive sound effect debugging method, system, device and platform | |
US9979369B2 (en) | Audio peak limiting | |
AU2003251403B2 (en) | Acoustical virtual reality engine and advanced techniques for enhancing delivered sound | |
CN116627377A (en) | Audio processing method, device, electronic equipment and storage medium | |
CN118918912A (en) | Singing voice synthesizing method, singing voice synthesizing equipment and computer readable storage medium | |
JP2000276186A (en) | Device and method for voice processing and recording medium where the method is recorded |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |