[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020119150A1 - 节奏点识别方法、装置、电子设备及存储介质 - Google Patents

节奏点识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020119150A1
WO2020119150A1 PCT/CN2019/099640 CN2019099640W WO2020119150A1 WO 2020119150 A1 WO2020119150 A1 WO 2020119150A1 CN 2019099640 W CN2019099640 W CN 2019099640W WO 2020119150 A1 WO2020119150 A1 WO 2020119150A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
point
rhythm
signal
rhythm point
Prior art date
Application number
PCT/CN2019/099640
Other languages
English (en)
French (fr)
Inventor
范旭
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020119150A1 publication Critical patent/WO2020119150A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • Embodiments of the present disclosure relate to the technical field of data processing, for example, to a rhythm point recognition method, device, electronic device, and storage medium.
  • the music interactive application displays interactive prompts to the user according to the rhythm of the music, and the user inputs interactive operations according to the interactive prompts, thereby realizing the function of activating the video special effects and displaying the video special effects.
  • the rhythm points in the related art are generally determined by manual labeling, which results in a high time cost for identifying the rhythm points and a long music update cycle in music interactive applications.
  • Embodiments of the present disclosure provide a rhythm point recognition method, device, electronic device, and storage medium, which can automatically and accurately identify rhythm points, and improve the efficiency of rhythm point recognition.
  • An embodiment of the present disclosure provides a rhythm point recognition method.
  • the method includes:
  • the starting time, volume information, and duration corresponding to the target rhythm point are used as the result of identifying the rhythm point of the audio signal.
  • An embodiment of the present disclosure also provides a rhythm point recognition device, which includes:
  • An alternative rhythm point determination module configured to determine an alternative rhythm point in the audio signal according to the spectral characteristics of the audio signal to be identified, and obtain a starting time corresponding to the alternative rhythm point;
  • the target rhythm point determination module is set to map the candidate rhythm points to the trend fitting envelope signal of the audio signal according to the corresponding starting time, and to fit the waveform characteristics of the envelope signal according to the trend, Determining a target rhythm point among the candidate rhythm points;
  • the volume information and duration determination module is configured to determine the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and fit the envelope signal according to the fluctuation of the audio signal to determine the target The duration corresponding to the rhythm point;
  • the rhythm point recognition result determination module is set to use the starting time, volume information, and duration corresponding to the target rhythm point as the rhythm point recognition result for the audio signal.
  • An embodiment of the present disclosure also provides an electronic device.
  • the electronic device includes:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the rhythm point recognition method as described in the embodiments of the present disclosure.
  • An embodiment of the present disclosure also provides a computer-readable storage medium that stores a computer program, and when the program is executed by a processor, a rhythm point recognition method as described in an embodiment of the present disclosure is implemented.
  • FIG. 1a is a flowchart of a rhythm point recognition method provided in Embodiment 1 of the present disclosure
  • FIG. 1b is a schematic diagram of an audio signal provided by Embodiment 1 of the present disclosure.
  • FIG. 2a is a flowchart of a rhythm point recognition method provided in Embodiment 2 of the present disclosure
  • FIG. 2b is a schematic diagram of an audio signal provided by Embodiment 2 of the present disclosure.
  • FIG. 3 is a flowchart of a rhythm point recognition method provided in Embodiment 3 of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a rhythm point recognition device according to Embodiment 4 of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device according to Embodiment 5 of the present disclosure.
  • FIG. 1a is a flowchart of a rhythm point recognition method according to Embodiment 1 of the present disclosure. This embodiment can be applied to the case of recognizing rhythm points in an audio signal.
  • the method can be performed by a rhythm point recognition device. It can be implemented in software and/or hardware, and the device can be configured in an electronic device, such as a computer. As shown in Figure 1a, the method includes the following steps:
  • S110 Determine an alternative rhythm point in the audio signal according to the spectral characteristics of the audio signal to be identified, and obtain a starting time corresponding to the alternative rhythm point.
  • the audio signal to be recognized refers to an audio signal generated by preprocessing the original audio signal.
  • the original audio signal refers to a continuous time-domain signal, but since the computer can only process discrete signals, the original audio signal needs to be sampled and quantized to obtain a discrete digital signal that is easy to analyze.
  • a discrete time-domain signal can be obtained by sampling the original audio signal at a set frequency. In one embodiment, the set frequency is 44.1 kHz. In other words, the audio signal is actually a signal formed by sampling discrete signal points.
  • the spectrum characteristics mainly refer to information of changes in parameters such as frequency, frequency domain amplitude, and frequency domain phase of the audio signal.
  • the amplitude in the time domain is different from the calculated amplitude in the frequency domain, and the amplitude at a signal point in the time domain signal is the time point corresponding to the signal point mapped to a sinusoidal component of different frequencies in the frequency domain signal.
  • the audio signal is a sound wave signal
  • the rhythm point can be used to represent the rhythm characteristic of the sound wave signal.
  • the rhythm point is used to characterize a musical note.
  • the signal point closest to the time point at which the note starts in the audio signal is used as the rhythm point.
  • the rhythm characteristic of a note is a note that lasts for a period of time and has a set volume value.
  • the analysis result of the rhythm point includes the starting time, duration, and volume value of the rhythm point.
  • the starting time of the rhythm point may refer to the time point corresponding to the starting time of the rhythm point in the audio signal; the duration may be the length of time that the rhythm point lasts. In addition, the starting time is also the duration of the rhythm point.
  • the starting time of the volume; the volume information can refer to the sound intensity of the rhythm point, which is used to characterize the strength of the sound corresponding to the rhythm point.
  • the sound intensity of a note is not a fixed value during the duration, for example, the sound intensity is continuously decayed. At this time, the average of the time-domain amplitude values of the signal points in the audio signal within the duration can be used as the sound intensity.
  • the alternative rhythm points may refer to the rhythm points coarsely screened from the audio signal.
  • At least one candidate rhythm point is determined in the audio signal.
  • the audio signal may be sequentially subjected to differential processing, Fourier transform, and differential processing, and the candidate rhythm point and the corresponding starting time are determined based on the short-term energy method.
  • embodiments of the present disclosure may also determine alternative rhythm points by other methods, and the present disclosure does not limit this.
  • S120 Map the candidate rhythm points to the trend fitting envelope signal of the audio signal according to the corresponding starting time, and fit the envelope signal's waveform characteristics according to the trend, in the candidate rhythm Point to determine the target rhythm point.
  • the trend fitting envelope signal may refer to a signal that fits the amplitude characteristic of the audio signal in the time domain, and is used to characterize the change trend of the audio signal in the time domain.
  • the trend-fit envelope signal can be obtained by Hilbert transform.
  • the waveform characteristic of the trend fitting envelope signal may refer to the trend characteristic of the time-domain amplitude change of the audio signal.
  • the waveform characteristics of the trend-fit envelope signal may include the peaks and troughs of the trend-fit envelope signal, corresponding to the time-domain amplitude peaks and time-domain amplitude valleys in the audio signal. Screen the alternative rhythm points according to the waveform characteristics of the trend-fit envelope signal.
  • each peak is actually a note, which can be determined according to the peak and trough of the trend-fit envelope signal Filter the target rhythm point, for example, among the alternative rhythm points between each peak and the adjacent trough before the peak, select the alternative rhythm point closest to the peak in time as the target rhythm point, so that it can be based on Each peak determines a target rhythm point.
  • the candidate rhythm points are mapped to the trend-fit envelope signal of the audio signal according to the corresponding starting time, and the waveform characteristics of the envelope signal are fitted according to the trend.
  • Determining the target rhythm point among the candidate rhythm points may include: identifying the peak point in the trend fitting envelope signal according to the waveform characteristics of the trend fitting envelope signal of the audio signal; The points are mapped to the trend fitting envelope signal according to the corresponding starting time, and the candidate rhythm point closest to the peak point in time is used as the target rhythm point.
  • the peak point is identified in the trend-fit envelope signal.
  • the signal point is the peak point.
  • Each candidate rhythm point is mapped to the trend fitting envelope signal according to the corresponding starting time, and the time relationship between the starting time corresponding to each candidate rhythm point and the time corresponding to the peak point can be determined.
  • each peak can be regarded as a note, and therefore, a matching alternative rhythm point is selected from at least one alternative rhythm point as the target rhythm point according to each peak point.
  • the candidate rhythm point closest in time to the peak point is selected as the target rhythm point matching the peak point.
  • the target rhythm points are determined therefrom to further filter the rhythm points, thereby improving the accuracy of rhythm point recognition.
  • S130 Determine the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and determine the duration corresponding to the target rhythm point according to the envelope signal of the fluctuation fitting of the audio signal.
  • the beat is used to indicate the period in which the strong and weak sounds change regularly
  • the beat information may refer to the feature information that the strong and weak sounds in the music are periodically and periodically repeated.
  • the beat information includes beats per minute (bpm).
  • the notes are generally in the unit of a beat, and the duration of a beat can be determined according to the number of beats per minute in the audio signal, mapped to the audio signal, and the duration corresponding to a rhythm point (note) can be determined, and the corresponding included
  • the volume information of the rhythm point can be determined according to the time-domain amplitudes of the multiple signal points.
  • the signal interval is an array interval formed by discrete signal points. Therefore, the volume information corresponding to the rhythm point is determined according to the time-domain amplitudes of multiple signal points in the signal interval. Exemplarily, the average value of the time-domain amplitudes of multiple signal points in the signal interval is taken as the volume value corresponding to the rhythm point.
  • the wave-fit envelope signal may refer to a signal that fits the amplitude characteristics of the audio signal in the time domain, and the waveform characteristic of the wave-fit envelope signal also refers to the trend characteristic of the time-domain amplitude change of the audio signal.
  • the fluctuation-fitting envelope signal is more volatile, and the trend-fitting envelope signal is smoother.
  • the trend-fitting envelope signal can be completed by smoothing processing on the basis of the fluctuation-fitting envelope signal.
  • the trend-fit envelope signal 102 of the audio signal 101 to be recognized is smoother than the wave-fit envelope signal 103.
  • the continuous signal interval of the note corresponding to the rhythm point in the audio signal can be determined, and the beat information can be mapped to the wave-fit envelope signal to obtain the signal interval corresponding to the beat information.
  • the end signal point of the note may be any trough point in the signal interval corresponding to the beat information.
  • the determining the volume information corresponding to the target rhythm point according to the beat information of the audio signal may include: according to the starting time corresponding to the target rhythm point and the beat information of the audio signal, Determining a volume interval matching the target rhythm point; calculating volume information corresponding to the target rhythm point according to signal time domain characteristic parameters of multiple signal points in the volume interval.
  • the volume information corresponding to the target rhythm point is determined according to the beat information of the audio signal, which may be that the start time corresponding to the target rhythm point is used as the starting endpoint, and at the same time, one beat in the beat information of the audio signal is corresponding to The duration is used as the interval length to determine the volume interval of the target rhythm point.
  • the volume information of the target rhythm point is determined according to the time-domain amplitudes of multiple signal points in the volume interval of the audio signal. Exemplarily, the average value of the time-domain amplitudes of the multiple signal points of the audio signal in the volume interval may be used as the volume value of the target rhythm point.
  • the square of the time-domain amplitudes of multiple signal points in the audio signal in the volume interval may also be calculated, and the maximum value of the squares of the time-domain amplitudes of the multiple signal points is used as the volume value of the target rhythm point,
  • the embodiments of the present disclosure are not limited.
  • bpm can be calculated by at least one of the complex domain spectral difference function, the spectral difference function, and the beat emphasis function.
  • a variety of functions can be used and selected from the calculation results of bpm. Determine the required bpm.
  • other methods can also be used to calculate bpm, which is not limited in the embodiments of the present disclosure.
  • the volume information matching the target rhythm point is determined by the beat information of the audio signal, and the volume information of the target rhythm point can be accurately determined.
  • the fitting of the envelope signal according to the fluctuation of the audio signal to determine the duration corresponding to the target rhythm point includes: mapping any two adjacent target rhythm points according to the corresponding starting time To the wave fitting envelope signal of the audio signal, and according to the waveform characteristics of the wave fitting envelope signal, determine the starting time corresponding to the signal points matching the two adjacent target rhythm points; The starting time corresponding to the first target rhythm point in the two adjacent target rhythm points, and the duration between the starting point time corresponding to the signal points matching the two adjacent target rhythm points are used as the adjacent two The duration corresponding to the first target rhythm point among the target rhythm points.
  • the first target rhythm point refers to the target rhythm point where the starting time is the first of the two target rhythm points.
  • the duration of any one rhythm point is less than the duration between the starting time corresponding to the rhythm point and the starting time corresponding to the next following rhythm point.
  • the energy of the note is the smallest, which is reflected in the audio signal, that is, the amplitude is the smallest, and the valley point between the two adjacent target rhythm points can be used as the match between the two target rhythm points.
  • the signal point, and the duration between the starting time corresponding to the trough point (actually the time corresponding to the trough point) and the starting time corresponding to the first target rhythm point of the two adjacent target rhythm points is taken as the first target rhythm point The corresponding duration.
  • the wave-fitting envelope signal of the audio signal is more in line with the change of the amplitude of the audio signal than the trend-fitting envelope signal, so that it can be determined between the two adjacent target rhythm points according to the wave-fitting envelope signal of the audio signal Valley point.
  • the signal point is the trough point.
  • the duration of the target rhythm point is determined by the waveform characteristics of the wave-fit envelope signal, and the end time corresponding to the first target rhythm point of the two adjacent target rhythm points can be accurately found, thereby accurately determining the duration corresponding to the first target rhythm point time.
  • S140 Use the starting time, volume information, and duration corresponding to the target rhythm point as the result of identifying the rhythm point of the audio signal.
  • the rhythm is composed of a variety of notes with different time values.
  • the diversified form is closely related to the length and strength of the notes.
  • the recognition result of each rhythm point includes the starting time, volume information, and duration corresponding to the target rhythm point.
  • At least one candidate rhythm point of the audio signal and the corresponding starting time are determined according to the spectral characteristics of the audio signal, and the waveform characteristics of the envelope signal are fitted according to the trend of the audio signal to filter from the at least one candidate rhythm point
  • the target rhythm point, and finally, the volume information and duration corresponding to the target rhythm point are determined according to the envelope signal of the fluctuation of the audio signal and the beat information of the audio signal, and the recognition result of the target rhythm point is determined, which solves the manual labeling in the related technology
  • the problem of high cost and low efficiency of rhythm points realizes the automatic recognition of rhythm points, and screens the rhythm points multiple times to improve the accuracy of rhythm point recognition.
  • the starting time, volume information, and duration corresponding to the target rhythm point may further include: At the corresponding starting time, according to the volume information and duration of the target rhythm point, a music special effect matching the target rhythm point is added.
  • a music special effect is added at the starting time.
  • the duration of the music special effect is the same as the duration of the target rhythm point.
  • the volume information of is matched with the volume information of the target rhythm point. For example, the volume of the target rhythm point is gradually attenuated from 35 decibels, and the volume of the added music special effect is gradually attenuated from 35 decibels.
  • the music effects matching each target rhythm point may be the same or different.
  • FIG. 2a is a flowchart of a rhythm point recognition method according to Embodiment 2 of the present disclosure. This embodiment is described based on the optional solutions in the above embodiments.
  • determining an alternative rhythm point in the audio signal according to the spectral characteristics of the audio signal to be identified, and acquiring a starting time corresponding to the alternative rhythm point may include: converting the audio signal
  • the multiple signal points in the group are processed to determine multiple groups, where each group includes a set number of adjacent signal points, and the signal points included in different groups are different or partially overlapping; according to each group Signal frequency domain characteristic parameters of each signal point, calculate the group frequency domain characteristic parameters corresponding to each group; according to the group frequency domain characteristic parameters corresponding to each group and the preset feature filtering conditions, the The target group is selected from each group, and a candidate rhythm point is determined according to the multiple signal points in the target group; in the time interval corresponding to the multiple signal points in the target group, a time point is selected as the target group The starting time of the corresponding alternative rhythm point.
  • each group includes a set number of adjacent signal points, and the signal points included in different groups are different or partially overlap.
  • the audio signal is a discrete signal
  • the set number can be 1024
  • the grouping process can be to continuously select 1024 adjacent signal points as a group every 511 signal points.
  • the discrete signals included in the audio signal are sequentially numbered in chronological order, the first signal point is numbered 0, the second signal point is numbered 1, and so on, then the discrete signal corresponding to the first group The number is [0, 1023], the discrete signal corresponding to the second group is [512, 512+1023], the discrete signal corresponding to the third group is [1024, 1024+1023], and so on.
  • the corresponding value in each array is the time-domain amplitude corresponding to each signal point.
  • the audio signal, spectrum characteristics, alternative rhythm points, starting time, trend fitting envelope signal, beat information, volume information, fluctuation fitting envelope signal and rhythm point recognition results in this embodiment can all refer to the above embodiments Description.
  • S220 Calculate the packet frequency domain characteristic parameters corresponding to each group according to the signal frequency domain characteristic parameters of multiple signal points in each group.
  • the frequency domain characteristic parameters of the signal may refer to the frequency domain phase and frequency domain amplitude obtained when the audio signal is converted from the time domain signal to the frequency domain signal.
  • the grouped frequency domain characteristic parameter may refer to the corresponding rhythm point feature value of each group, and the rhythm point feature value is used to identify the rhythm point.
  • the Fourier transform can realize the conversion of the audio signal from the time domain signal to the frequency domain signal.
  • Windowing is performed first, that is, the aforementioned grouping processing and window function processing, to achieve truncating an infinitely long time segment into multiple short segments, and then performing Fourier transform for each group.
  • the signal points at the center of the number are used as a reference, and the data of the signal points at the symmetrical positions are swapped and multiplied by
  • the preset window function is then subjected to Fourier transform.
  • the number of the discrete signal corresponding to the first group is [0,1023]
  • the point of 512 is used as the reference
  • the signal points of [0,511] and [512,1023] correspond
  • the domain amplitude is reversed and multiplied by the Hanning coefficient to obtain multiple packet data before Fourier transform.
  • multiple signal points in each packet are obtained.
  • the corresponding frequency domain phase and frequency domain amplitude are used as the signal frequency domain characteristic parameters of multiple signal points in the group.
  • Calculating the group frequency-domain characteristic parameters corresponding to each group according to the signal frequency-domain characteristic parameters of multiple signal points in each group may be calculated using an onset detection method.
  • the characteristic value of the rhythm point of each signal point in each group can be calculated based on the signal frequency domain characteristic parameters of multiple signal points in each group and based on the following formula:
  • i represents the i-th signal point
  • Onset[i] is the rhythm point characteristic value of the i-th signal point
  • D[i] is the amplitude of the i-th signal point
  • P[i] is the i-th signal point
  • the phase of a signal point If i-1 is less than 0, P[i-1] is 0; if i-2 is less than 0, P[i-2] is 0.
  • the frequency domain characteristic parameter corresponding to each group is the sum of the characteristic values of the rhythm points of the multiple signal points in the group.
  • the normalization process is that the packet frequency domain characteristic parameter corresponding to each packet is divided by the largest packet frequency domain characteristic parameter among the packet frequency domain characteristic parameters corresponding to multiple packets; the window smoothing process may be an infinite impulse response (Infinite Impulse Response, IIR) smoothing. In one embodiment, the window in the window smoothing process is 5.
  • a target packet is selected from the multiple packets, and a candidate rhythm is determined according to multiple signal points in the target packet point.
  • the feature screening condition may include at least one screening step for determining at least one target group from a plurality of groups, and at the same time, each target group determines an alternative rhythm point, so as to realize preliminary identification of the rhythm point in the audio signal.
  • the feature filtering condition may be that the packet corresponding to the frequency domain feature parameter of the packet exceeding the set threshold is used as the target packet.
  • the feature selection condition may also be other conditions, and this embodiment of the present disclosure is not limited thereto.
  • filtering out the target group from the plurality of groups may include: taking a set number of groups continuously as one Grouping sets, to determine multiple grouping sets; for each grouping set, when it is determined that each grouping set satisfies the frequency domain characteristic threshold condition, the first grouping in each grouping set is used as an alternative target grouping ; Eliminate candidate target groups that satisfy the adjacent culling conditions from multiple candidate target groups, and use the remaining candidate target groups as target groups.
  • the frequency domain characteristic threshold condition may be a condition that defines the size relationship of the frequency domain characteristic parameters of multiple packets in the packet set. For example, a grouping set includes 5 groups, which are sequentially numbered in chronological order.
  • the frequency domain feature threshold conditions are as follows:
  • the grouping set includes a total of five groups from i to i+4, and Onsets_ma[i] represents the group frequency-domain characteristic parameter of the i-th group.
  • Onsets_ma[i] represents the group frequency-domain characteristic parameter of the i-th group.
  • each group can be modified when the group set is determined.
  • the method may further include: modifying the frequency domain characteristic parameters of the packets below the set threshold to 0.
  • the grouping set is determined according to the modified grouping, and the candidate target grouping is determined according to the frequency domain characteristic threshold conditions, reducing the amount of data to judge the candidate target grouping, thereby improving the efficiency of screening the candidate target grouping.
  • the neighbor rejection condition may refer to a condition that defines the neighbor relationship between candidate target packets.
  • the interval between two rhythm points is extremely short, it means that the two rhythm points are adjacent in time.
  • the occurrence of two adjacent rhythm points is due to noise, not a real rhythm.
  • Points, and since a group can determine a rhythm point, adjacent groups can be eliminated from the candidate target group to realize the recognition of the rhythm point.
  • group adjacent means that the starting time corresponding to the first signal point in two or more groups is adjacent in time, or the starting time corresponding to the first signal point in two or more groups There is no starting time corresponding to the first signal point of other packets.
  • the starting time corresponding to the first signal point in the plurality of candidate target groups it is determined that at least two candidate target groups adjacent to the starting time satisfy the adjacent rejection condition, and the at least two candidates The target group is eliminated, and the remaining candidate target groups are used as the target group.
  • the other grouping is not limited to the candidate target grouping, and the other grouping refers to the grouping formed when grouping as described above.
  • the signal point 201 in the audio signal is determined as the target rhythm point according to the trend fitting envelope signal 202.
  • the target group is finally determined, two-step screening of rhythm points is realized, and the accuracy of rhythm point recognition is improved.
  • determining an alternative rhythm point according to multiple signal points in the target group may be confirming an alternative rhythm point from multiple signal points in the target group according to a preset rule. In an embodiment, any signal point in the target group is used as an alternative rhythm point.
  • S240 In the time interval corresponding to the multiple signal points in the target group, select a time point as the starting time of the candidate rhythm point corresponding to the target group.
  • the time interval may refer to an interval formed between the time corresponding to the first signal point in the target packet and the time corresponding to the end signal point in the target packet. Select a time point from the interval as the starting time of the candidate rhythm point corresponding to the target group. In an embodiment, the time point corresponding to the first signal point may be used as the starting time of the candidate rhythm point.
  • S250 Map the candidate rhythm points to the trend fitting envelope signal of the audio signal according to the corresponding starting time, and fit the envelope signal's waveform characteristics according to the trend, in the candidate rhythm Point to determine the target rhythm point.
  • S260 Determine the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and determine the duration corresponding to the target rhythm point according to the envelope signal of the fluctuation of the audio signal.
  • S270 Use the starting time, volume information, and duration corresponding to the target rhythm point as the result of identifying the rhythm point of the audio signal.
  • the group frequency domain characteristic parameters corresponding to each group are determined, and according to the group frequency domain characteristic parameters Screen each group, determine the target group, and determine an alternative rhythm point corresponding to each target group, so as to filter the group before determining the alternative rhythm point, reduce the number of alternative rhythm points, and improve the efficiency of rhythm point recognition And accuracy.
  • FIG. 3 is a flowchart of a rhythm point recognition method provided in Embodiment 3 of the present disclosure. This embodiment is described based on the optional solutions in the above embodiments.
  • each group includes a set number of adjacent signal points, and the signal points included in different groups are different or partially overlap.
  • S330 Determine a plurality of grouping sets by using a continuously set number of groupings as a grouping set.
  • S350 Eliminate the candidate target group satisfying the adjacent culling condition from the candidate target group, and use the remaining candidate target group as the target group.
  • S360 Determine a candidate rhythm point according to multiple signal points in the target group.
  • S390 Map the candidate rhythm points to the trend fitting envelope signal according to the corresponding starting time, and use the candidate rhythm point closest to the peak point in time as the target rhythm point.
  • S3100 Determine the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and determine the duration corresponding to the target rhythm point according to the envelope signal of the audio signal fluctuation fitting.
  • S3110 Use the starting time, volume information, and duration corresponding to the target rhythm point as the result of identifying the rhythm point of the audio signal.
  • FIG. 4 is a schematic structural diagram of a rhythm point recognition device according to an embodiment of the present disclosure. This embodiment can be applied to the case of recognizing a rhythm point in an audio signal.
  • the device may be implemented in software and/or hardware, and the device may be configured in an electronic device.
  • the apparatus may include: a candidate rhythm point determination module 410, a target rhythm point determination module 420, a volume information and duration determination module 430, and a rhythm point recognition result determination module 440.
  • the alternative rhythm point determination module 410 is configured to determine an alternative rhythm point in the audio signal according to the spectral characteristics of the audio signal to be recognized, and obtain a starting time corresponding to the alternative rhythm point; the target rhythm point is determined Module 420 is configured to map the candidate rhythm points to the trend fitting envelope signal of the audio signal according to the corresponding starting time, and to fit the waveform characteristics of the envelope signal according to the trend.
  • the target rhythm point is determined from the candidate rhythm points; the volume information and duration determination module 430 is set to determine the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and according to the fluctuation of the audio signal Fitting the envelope signal to determine the duration corresponding to the target rhythm point; the rhythm point recognition result determination module 440 is set to use the starting time, volume information and duration corresponding to the target rhythm point as the audio frequency The recognition result of the rhythm point of the signal.
  • At least one candidate rhythm point of the audio signal and the corresponding starting time are determined according to the spectral characteristics of the audio signal, and the waveform characteristics of the envelope signal are fitted according to the trend of the audio signal to filter from the at least one candidate rhythm point
  • the target rhythm point and finally, the volume information and duration corresponding to the target rhythm point are determined according to the envelope signal of the fluctuation of the audio signal and the beat information of the audio signal, and the recognition result of the target rhythm point is determined, which solves the artificial
  • the problem of marking the time cost of rhythm points is high efficiency and low efficiency, which realizes the automatic recognition of rhythm points and screens the rhythm points multiple times to improve the accuracy of rhythm point recognition.
  • the candidate rhythm point determination module 410 includes a grouping module configured to group a plurality of signal points in the audio signal to determine a plurality of groups, where each group includes a set A certain number of adjacent signal points, the signal points included in different groups are different or partially overlapping;
  • the frequency domain characteristic parameter calculation module is set to calculate and describe the signal frequency domain characteristic parameters of multiple signal points in each group Group frequency domain characteristic parameters corresponding to each group;
  • the alternative rhythm point filtering module is set to filter out the plurality of groups based on the group frequency domain characteristic parameters corresponding to each group and preset characteristic filtering conditions Target group, and determine an alternative rhythm point according to multiple signal points in the target group;
  • the starting time determination module is set to select a time point as the time point in the time interval corresponding to the multiple signal points in the target group The starting time of the alternative rhythm point corresponding to the target group.
  • the candidate rhythm point screening module includes: a group set determination module, which is configured to determine a plurality of group sets by using a set number of groups continuously as a group set; an alternative target group determination module, which is set to For each grouping set, when it is determined that the grouping frequency domain characteristic parameter corresponding to the grouping in each grouping set satisfies the frequency domain characteristic threshold condition, the first grouping in each grouping set is taken as an alternative target Grouping; the target grouping determination module is set to exclude candidate target groups satisfying the adjacent culling conditions from multiple candidate target groups, and use the remaining candidate target groups as target groups.
  • the target rhythm point determination module 420 includes a peak point identification module, which is configured to identify the waveform characteristics of the trend-fit envelope signal according to the trend-fit envelope signal of the audio signal. Peak point; the target rhythm point filtering module is set to map the candidate rhythm point to the trend fitting envelope signal according to the corresponding starting time, and will be the closest to the peak point in time. Select the rhythm point as the target rhythm point.
  • the volume information and duration determination module 430 includes: a volume interval determination module configured to determine the target rhythm point according to the starting time corresponding to the target rhythm point and the beat information of the audio signal Matching volume interval; the volume information calculation module is set to calculate volume information corresponding to the target rhythm point according to the signal time domain characteristic parameters of multiple signal points in the volume interval.
  • the volume information and duration determination module 430 includes: an end time determination module configured to map any two adjacent target rhythm points to the wave signal fitting of the audio signal according to the corresponding start time In the envelope signal, and according to the waveform characteristics of the fluctuating envelope signal, determine the starting time of the signal point matching the two adjacent target rhythm points; the duration calculation module is set to set the adjacent The starting time corresponding to the first target rhythm point in the two target rhythm points, and the duration between the start time of the signal points matching the two adjacent target rhythm points are used as the two adjacent target rhythm points The duration corresponding to the first target rhythm point in.
  • the rhythm point recognition device further includes: a music special effect adding module, which is set to add at the starting time corresponding to the target rhythm point according to the volume information and duration of the target rhythm point The music special effect matching the target rhythm point.
  • a music special effect adding module which is set to add at the starting time corresponding to the target rhythm point according to the volume information and duration of the target rhythm point The music special effect matching the target rhythm point.
  • the rhythm point recognition device provided by the embodiment of the present disclosure and the rhythm point recognition method provided by the first embodiment belong to the same inventive concept.
  • An embodiment of the present disclosure provides an electronic device.
  • FIG. 5 shows a schematic structural diagram of an electronic device (eg, client or server) 500 suitable for implementing the embodiment of the present disclosure.
  • the electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant (PDA), tablet computers (Portable Android Device, PAD), portable multimedia players (Portable Media Player, PMP), mobile terminals such as in-vehicle terminals (such as in-vehicle navigation terminals), etc., and fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable multimedia players
  • mobile terminals such as in-vehicle terminals (such as in-vehicle navigation terminals), etc.
  • fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • the electronic device shown in FIG. 5 is only an example, and should not bring any
  • the electronic device 500 may include a processing device (for example, a central processing unit, a graphics processor, etc.) 501, and the processing device may be stored in a read-only memory (Read-only Memory, ROM) 502 program
  • the device 508 loads the program in the random access memory (Random Access Memory, RAM) 503 to perform at least one appropriate action and process.
  • RAM Random Access Memory
  • the processing device 501, ROM 502, and RAM 503 are connected to each other via a bus 504.
  • An input/output (Input/Output, I/O) interface 505 is also connected to the bus 504.
  • the following devices can be connected to the I/O interface 505: including an input device 506 such as a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; including, for example, a liquid crystal display (Liquid Crystal Display) , LCD), speakers, vibrators, and other output devices 507; including storage devices 508 such as magnetic tape, hard disk, etc.; and communication devices 509.
  • the communication device 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 5 shows an electronic device 500 with multiple devices, it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided instead.
  • an embodiment of the present disclosure includes a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 509, or from the storage device 508, or from the ROM 502.
  • the processing device 501 When the computer program is executed by the processing device 501, the above-mentioned functions defined in the method of the embodiments of the present disclosure are executed.
  • Embodiments of the present disclosure also provide a computer-readable storage medium.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above.
  • Examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard drives, RAM, ROM, erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM or Flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer-readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is propagated in baseband or as part of a carrier wave, and the computer-readable signal medium carries computer-readable program code.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium may be transmitted on any appropriate medium, including but not limited to: electric wires, optical cables, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • the computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device causes the electronic device to determine, according to the spectral characteristics of the audio signal to be recognized, a backup device in the audio signal. Selecting a rhythm point, and obtaining a starting time corresponding to the candidate rhythm point; mapping the candidate rhythm point to the trend fitting envelope signal of the audio signal according to the corresponding starting time, and according to the Trend-fitting the waveform characteristics of the envelope signal to determine the target rhythm point among the candidate rhythm points; determining the volume information corresponding to the target rhythm point according to the beat information of the audio signal, and according to the audio signal
  • the envelope of the wave fits the envelope signal to determine the duration corresponding to the target rhythm point; the starting time, volume information and duration corresponding to the target rhythm point are used as the rhythm point recognition result for the audio signal.
  • the computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the above programming languages include object-oriented programming languages such as Java, Smalltalk, C++, as well as conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code may be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN) or a wide area network (Wide Area Network, WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect through the Internet).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, and the module, program segment, or a part of code contains one or more executable instructions for implementing a prescribed logical function.
  • the functions marked in the blocks may also occur in an order different from that marked in the drawings. For example, two blocks represented in succession may actually be executed in parallel, and they may sometimes be executed in reverse order, depending on the functions involved.
  • Each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented with dedicated hardware-based systems that perform specified functions or operations, or can be dedicated The combination of hardware and computer instructions.
  • the modules described in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • the alternative rhythm point determination module can also be described as "according to the spectral characteristics of the audio signal to be recognized, in the audio signal A module for determining an alternative rhythm point and acquiring a starting time corresponding to the alternative rhythm point".

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

本文公开了一种节奏点识别方法、装置、电子设备及存储介质。该方法包括:根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;将与目标节奏点对应的起点时间、音量信息和持续时间作为对所述音频信号的节奏点识别结果。

Description

节奏点识别方法、装置、电子设备及存储介质
本申请要求在2018年12月12日提交中国专利局、申请号为201811519398.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及数据处理技术领域,例如涉及一种节奏点识别方法、装置、电子设备及存储介质。
背景技术
随着通信技术和电子设备的发展,电子设备例如手机、平板电脑等已经成为了人们工作和生活中不可或缺的一部分,而且随着电子设备的日益普及,安装于电子设备中的交互应用成为一种沟通和娱乐的主要渠道。
相关技术中,音乐交互应用根据音乐的节奏点,将交互提示展示给用户,用户根据交互提示输入交互操作,从而,实现激活视频特效并显示视频特效的功能。但相关技术中的节奏点一般是通过人工标注确定的,导致节奏点识别时间成本高,音乐交互应用中的音乐更新周期长。
发明内容
本公开实施例提供一种节奏点识别方法、装置、电子设备及存储介质,可以自动准确识别节奏点,提高节奏点识别效率。
本公开实施例提供了一种节奏点识别方法,该方法包括:
根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;
将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;
根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;
将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
本公开实施例还提供了一种节奏点识别装置,该装置包括:
备选节奏点确定模块,设置为根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;
目标节奏点确定模块,设置为将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;
音量信息和持续时间确定模块,设置为根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;
节奏点识别结果确定模块,设置为将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
本公开实施例还提供了一种电子设备,该电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开实施例所述的节奏点识别方法。
本公开实施例还提供了一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如本公开实施例所述的节奏点识别方法。
附图说明
图1a是本公开实施例一提供的一种节奏点识别方法的流程图;
图1b是本公开实施例一提供的一种音频信号的示意图;
图2a是本公开实施例二提供的一种节奏点识别方法的流程图;
图2b是本公开实施例二提供的一种音频信号的示意图;
图3是本公开实施例三提供的一种节奏点识别方法的流程图;
图4是本公开实施例四提供的一种节奏点识别装置的结构示意图;
图5是本公开实施例五提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本公开进行说明。此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。另外,为了便于描述,附图中仅示出 了与本公开相关的部分而非全部结构。
实施例一
图1a为本公开实施例一提供的一种节奏点识别方法的流程图,本实施例可适用于识别一段音频信号中的节奏点的情况,该方法可以由节奏点识别装置来执行,该装置可以采用软件和/或硬件的方式实现,该装置可以配置于电子设备中,例如计算机等。如图1a所示,该方法包括如下步骤:
S110,根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间。
待识别的音频信号是指原始音频信号经过预处理生成的音频信号。一实施例中,原始音频信号是指连续的时域信号,但由于计算机只能处理离散信号,因此,需要对原始音频信号进行采样和量化,得到便于分析的离散数字信号。可以通过对原始音频信号按照设定频率进行采样,得到离散的时域信号,一实施例中,设定频率为44.1kHz。也就是说,音频信号实际为经过采样的离散信号点形成的信号。
在本公开实施例中,频谱特性主要是指音频信号的频率、频域幅值和频域相位等参数变化的信息。
一实施例中,时域幅值和频域中计算得到的幅值不同,时域信号中某一个信号点处的幅值是该信号点对应的时间点映射在频域信号中不同频率正弦成分的信号的叠加,其中,每个信号点对应的时域幅值实际是包括多个频率信号对应的幅值信息和相位信息的叠加,并非多个频率信号对应的幅值信息的简单相加。
音频信号是一种声波信号,节奏点可以用于表示声波信号的节奏特征。通常,节奏点用于表征音符,示例性的,将在音频信号中的距离音符起始的时间点最近的信号点作为节奏点。音符的节奏特征是持续一段时间,且具有设定音量值的音符,相应的,节奏点的分析结果包括节奏点的起点时间、持续时间和音量值。
本实施例中,节奏点的起点时间可以是指该节奏点在音频信号中的开始时刻对应的时间点;持续时间可以是节奏点持续的时间长度,另外,起点时间也是该节奏点的持续时间的起点时间;音量信息可以是指节奏点的音强,用于表征该节奏点对应的声音强弱,一个音符的音强在持续时间内不是一个固定值,例如音强是不断衰减。此时,可以将音频信号中在持续时间内的信号点的时域幅值均值作为音强。
备选节奏点可以是指从音频信号中粗筛出来的节奏点。
根据待识别的音频信号的频谱特性,在所述音频信号中确定至少一个备选节奏点。例如可以是对音频信号依次进行差分处理、傅里叶变换和差分处理,并基于短时能量法确定备选节奏点以及对应的起点时间。
此外,本公开实施例还可以通过其他方法确定备选节奏点,本公开对此不作限制。
S120,将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点。
趋势拟合包络信号可以是指对音频信号在时域中幅值特性进行拟合的信号,用于表征音频信号的时域幅值变化趋势。一实施例中,趋势拟合包络信号可以通过希尔伯特(Hilbert)变换得到。趋势拟合包络信号的波形特征可以是指音频信号的时域幅值变化趋势特征。趋势拟合包络信号的波形特征可以包括趋势拟合包络信号的波峰和波谷,对应音频信号中时域幅值波峰和时域幅值波谷。根据趋势拟合包络信号的波形特征对备选节奏点进行筛选,由于节奏点是用于表征音符,可以认为每个波峰实际就是一个音符,可以根据趋势拟合包络信号的波峰和波谷确定筛选目标节奏点,例如,在每个波峰与该波峰之前的相邻波谷之间的备选节奏点中,选择在时间上,距离该波峰最近的备选节奏点作为目标节奏点,从而可以根据每个波峰确定一个目标节奏点。
一实施例中,所述将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点,可以包括:根据所述音频信号的趋势拟合包络信号的波形特征,识别所述趋势拟合包络信号中的波峰点;将所述备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,并将在时间上,与所述波峰点最近的备选节奏点作为目标节奏点。
根据趋势拟合包络信号的波形特征,在趋势拟合包络信号中识别波峰点。一实施例中,若一个信号点之前和之后的信号点的时域幅值均小于该信号点,该信号点即为波峰点。将每个备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,可以确定每个备选节奏点对应的起点时间与波峰点对应的时间之间的时间关系。一般来说,每个波峰可认为是一个音符,由此,根据每个波峰点从至少一个备选节奏点中筛选一个匹配的备选节奏点作为目标节奏点。一实施例中,是选择在时间上与波峰点最近的备选节奏点作为该波峰点匹配的目标节奏点。
通过根据趋势拟合包络信号的波形特征对备选节奏点进行筛选,从中确定目标节奏点,以对节奏点进一步筛选,实现提高节奏点识别的准确性。
S130,根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间。
在音频信号中,节拍用于表示强音和弱音规律性变化的周期,节拍信息可以是指音乐中强音和弱音有规律地和周期性地循环反复的特征信息。一实施例中,节拍信息包括每分钟节拍数目(beat per minute,bpm)。其中,音符一般以一拍为单位,根据音频信号中的每分钟节拍数目可以确定一拍对应的时长,映射到音频信号中,可以确定一个节奏点(音符)对应的持续时间,以及对应包括的多个信号点,根据该多个信号点的时域幅值,可以确定该节奏点的音量信息。
根据节拍信息可以确定节奏点对应的音符在音频信号中持续的信号区间,本实施例中,该信号区间是离散信号点形成的数组区间。从而根据信号区间中多个信号点的时域幅值,确定与节奏点对应的音量信息。示例性的,将信号区间中多个信号点的时域幅值的均值作为与节奏点对应的音量值。
波动拟合包络信号可以是指对音频信号在时域中的幅值特性进行拟合的信号,而且,波动拟合包络信号的波形特征也是指音频信号的时域幅值变化趋势特征。本实施例中,波动拟合包络信号更加波动,趋势拟合包络信号更加平滑,趋势拟合包络信号可以是在波动拟合包络信号的基础上经过平滑操作处理完成。
一实施例中,如图1b所示,待识别的音频信号101的趋势拟合包络信号102比波动拟合包络信号103更加平缓。
根据节拍信息可以确定节奏点对应的音符在音频信号中持续的信号区间,将节拍信息映射到波动拟合包络信号中得到与节拍信息对应的信号区间,根据波动拟合包络信号的波形特征,在上述确定的与节拍信息对应的信号区间中确定该音符的结束信号点,进而将结束信号点对应的时间作为该节奏点的终点时间,并根据该节奏点的起点时间,可以确定与该节奏点对应的持续时间。一实施例中,该音符的结束信号点可以是与节拍信息对应的信号区间中任意一个波谷点。
一实施例中,所述根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,可以包括:根据所述目标节奏点对应的起点时间以及所述音频信号的节拍信息,确定所述目标节奏点匹配的音量区间;根据所述音量区间中多个信号点的信号时域特征参数,计算与所述目标节奏点对应的音量信息。
一实施例中,根据音频信号的节拍信息,确定与目标节奏点对应的音量信 息,可以是,将目标节奏点对应的起点时间作为起始端点,同时将音频信号的节拍信息中的一拍对应的时长作为区间长度,确定目标节奏点的音量区间。并根据音频信号在音量区间中多个信号点的时域幅值确定该目标节奏点的音量信息。示例性的,可以将音频信号在音量区间中多个信号点的时域幅值的均值作为该目标节奏点的音量值。此外,还可以是计算音频信号在音量区间中多个信号点的时域幅值的平方,将该多个信号点的时域幅值的平方中的最大值作为该目标节奏点的音量值,对此,本公开实施例不做限制。
本实施例中,bpm可以通过复域谱差函数、光谱差函数和节拍强调函数等中至少一种进行计算,一实施例中,可以采用多种函数,并从bpm的计算结果中进行筛选,确定需要的bpm。此外,还可以采用其他方法计算bpm,对此,本公开实施例不做限制。
通过音频信号的节拍信息确定目标节奏点匹配的音量区间,可以准确确定目标节奏点的音量信息。
一实施例中,所述根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间,包括:将任意相邻两个目标节奏点按照对应的起点时间,映射至所述音频信号的波动拟合包络信号中,并根据所述波动拟合包络信号的波形特征,确定与所述相邻两个目标节奏点匹配的信号点对应的起点时间;将所述相邻两个目标节奏点中的首个目标节奏点对应的起点时间,以及与所述相邻两个目标节奏点匹配的信号点对应的起点时间之间的时长,作为所述相邻两个目标节奏点中的首个目标节奏点对应的持续时间。
首个目标节奏点是指,两个目标节奏点中起点时间在前的目标节奏点。任意一个节奏点的持续时间小于该节奏点对应的起点时间与相邻后一个节奏点对应的起点时间之间的时长。一实施例中,一个音符结束时,该音符的能量最小,此时反映在音频信号中即幅值最小,可以将相邻两个目标节奏点之间的波谷点作为两个目标节奏点匹配的信号点,并将波谷点对应的起点时间(实际是波谷点对应的时间点)与相邻两个目标节奏点中首个目标节奏点对应的起点时间之间的时长作为该首个目标节奏点对应的持续时间。而且音频信号的波动拟合包络信号比趋势拟合包络信号更符合音频信号的幅值变化情况,从而,可以根据音频信号的波动拟合包络信号确定相邻两个目标节奏点之间的波谷点。一实施例中,若一个信号点之前和之后的信号点的幅值均大于该信号点,该信号点即为波谷点。
通过波动拟合包络信号的波形特征确定目标节奏点的持续时间,可以准确找到相邻两个目标节奏点中首个目标节奏点对应的终点时间,从而准确确定首个目标节奏点对应的持续时间。
S140,将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
节奏是由多种不同时值的音符组合在一起,形成的多样化的形态,和音符的长短、强弱有着密切关系。为了表示节奏的特征,每个节奏点识别结果包括目标节奏点对应的起点时间、音量信息以及持续时间。
本公开实施例通过根据音频信号的频谱特性确定音频信号的至少一个备选节奏点,以及对应的起点时间,并根据音频信号的趋势拟合包络信号的波形特征从至少一个备选节奏点筛选目标节奏点,最后根据音频信号的波动拟合包络信号以及所述音频信号的节拍信息确定目标节奏点对应的音量信息和持续时间,确定目标节奏点的识别结果,解决了相关技术中人工标注节奏点的时间成本高效率低的问题,实现自动识别节奏点,而且对节奏点进行多次筛选,提高节奏点识别的准确率。
在上述实施例的基础上,在将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果之后,还可以包括:在与所述目标节奏点对应的起点时间处,根据所述目标节奏点的音量信息和持续时间,添加与所述目标节奏点匹配的音乐特效。
在得到音频信号的节奏点识别结果之后,针对每个目标节奏点,在起点时间处,开始添加音乐特效,本实施例中,音乐特效的持续时间与该目标节奏点的持续时间相同,音乐特效的音量信息与该目标节奏点的音量信息匹配,例如,该目标节奏点的音量由35分贝逐渐衰减,添加的音乐特效的音量相应由35分贝逐渐衰减。此外,每个目标节奏点匹配的音乐特效可以相同也可以不同。
通过在识别音频信号中的节奏点之后,添加与目标节奏点匹配的音乐特效,为音频信号增加特殊效果,提高音频信号的丰富度。
实施例二
图2a为本公开实施例二提供的一种节奏点识别方法的流程图。本实施例以上述实施例中可选方案为基础进行说明。在本实施例中,根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间,可以包括:将所述音频信号中的多个信号点进行分组处理,确定多个分组,其中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠;根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数;根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分 组,并根据目标分组中的多个信号点确定一个备选节奏点;在目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述目标分组对应的备选节奏点的起点时间。
本实施例的方法可以包括:
S210,将音频信号中的多个信号点进行分组处理,确定多个分组。
本实施例中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠。
音频信号为离散信号,设定数量可以为1024,分组处理可以是每隔511个信号点连续取1024个相邻信号点作为一组。一实施例中,将音频信号包括的离散信号按照时间顺序依次编号,第一个信号点的编号为0,第二个信号点的编号为1,以此类推,则第1分组对应的离散信号的编号为[0,1023],第2分组对应的离散信号的编号为[512,512+1023],第3分组对应的离散信号的编号为[1024,1024+1023],以此类推,本实施例中,每个数组中对应的数值为每个信号点对应的时域幅值。
本实施例中的音频信号、频谱特性、备选节奏点、起点时间、趋势拟合包络信号、节拍信息、音量信息、波动拟合包络信号和节奏点识别结果等均可以参考上述实施例中的描述。
S220,根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数。
信号频域特征参数可以是指音频信号由时域信号转换为频域信号时得到的频域相位和频域幅值。分组频域特征参数可以是指每组对应的节奏点特征值,节奏点特征值用于识别节奏点。
一实施例中,傅里叶变换可以实现音频信号由时域信号到频域信号的转换,为了避免将音频信号中不同频率的信号混杂在一起难以分辨,提高音频信号的分辨率,对音频信号先进行加窗,即前述的分组处理和窗函数处理,实现将无限长的时间片段截断为多个短片段,再针对每个分组进行傅里叶变换。
一实施例中,在对音频信号中的多个信号点进行分组处理之后,在每个分组中,以编号居中的信号点为基准,将对称位置上的信号点的数据进行对调,并乘以预设的窗函数,再进行傅里叶变换。如前例,第1分组对应的离散信号的编号为[0,1023],以编号为512的点作为基准,将编号为[0,511]和编号为[512,1023]的信号点对应的时域幅值进行对调,并乘以汉宁窗(hann)系数,得到傅里叶变换前的多个分组数据,在对每个分组数据进行傅里叶变换,得到每个分组中多个信号点对应的频域相位和频域幅值作为该分组中多个信号点的信号频 域特征参数。
根据每个分组中多个信号点的信号频域特征参数计算与每个分组对应的分组频域特征参数,可以是采用起始点(onset)检测方法进行计算。可以根据每个分组中多个信号点的信号频域特征参数,并基于如下公式计算每个分组中每个信号点的节奏点特征值:
Onset[i]=2×D[i]×sin((P[i]-2×P[i-1]+P[i-2])×0.5)
Onset[i]=Onset[i]×Onset[i]
本实施例中,i表示第i个信号点,Onset[i]为第i个信号点的节奏点特征值,D[i]为第i个信号点的幅值,P[i]为第i个信号点的相位。若i-1小于0,P[i-1]为0;若i-2小于0,P[i-2]为0。每个分组对应的分组频域特征参数为该分组中多个信号点的节奏点特征值之和。
此外,可以对多个分组对应的分组频域特征参数进行归一化处理和窗口平滑处理,并根据处理后的结果修正每个分组频域特征参数。一实施例中,归一化处理是,每个分组对应的分组频域特征参数除以多个分组对应的分组频域特征参数中最大的分组频域特征参数;窗口平滑处理可以是无限脉冲响应(Infinite Impulse Response,IIR)平滑处理。一实施例,窗口平滑处理中的窗口为5。
S230,根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,并根据目标分组中的多个信号点确定一个备选节奏点。
一实施例中,特征筛选条件可以包括至少一个筛选步骤,用于从多个分组中确定至少一个目标分组,同时每个目标分组确定一个备选节奏点,实现音频信号中节奏点初步识别。示例性的,特征筛选条件可以是将超过设定阈值的分组频域特征参数对应的分组作为目标分组。此外,特征筛选条件还可以是其他条件,对此,本公开实施例不作限制。
一实施例中,根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,可以包括:将连续设定数量的分组作为一个分组集合,确定多个分组集合;针对每个分组集合,在确定所述每个分组集合满足频域特征阈值条件的情况下,将所述每个分组集合中的首个分组作为备选目标分组;从多个备选目标分组中剔除满足相邻剔除条件的备选目标分组,将剩下的备选目标分组作为目标分组。
一实施例中,频域特征阈值条件可以是限定分组集合中多个分组频域特征参数大小关系的条件。例如,一个分组集合包括5个分组,按照时间顺序依次编号,频域特征阈值条件如下:
Figure PCTCN2019099640-appb-000001
本实施例中,该分组集合包括由i到i+4共五个分组,Onsets_ma[i]表示第i个分组的分组频域特征参数。当满足上述不等式时,该分组集合满足频域特征阈值条件,同时,将首个分组即Onsets_ma[i]作为备选目标分组。
此外还可以在确定分组集合的时候,对每个分组进行修正。一实施例中,在连续设定数量的分组作为一个分组集合之前,还可以包括:将低于设定阈值的分组频域特征参数修正为0。通过对分组进行修正,根据修正后的分组确定分组集合,并根据频域特征阈值条件确定备选目标分组,减少对备选目标分组进行判断的数据量,从而提高筛选备选目标分组的效率。
相邻剔除条件可以是指限定备选目标分组之间的相邻关系的条件。一实施例中,如果两个节奏点的间隔时间极短,说明这两个节奏点在时间上是相邻的,通常,出现相邻的两个节奏点是由于噪音,而并非是真正的节奏点,而且由于一个分组可以确定一个节奏点,因此,可以将相邻的分组从备选目标分组中剔除,实现对节奏点的识别。本实施例中,分组相邻是指两个及以上的分组中首个信号点对应的起点时间在时间上是相邻的,或者说两个及以上的分组中首个信号点对应的起点时间之间不存在其他分组的首个信号点对应的起点时间。
一实施例中,根据多个备选目标分组中首个信号点对应的起点时间,确定起点时间相邻的至少两个备选目标分组满足相邻剔除条件,并将所述至少两个备选目标分组剔除,将剩下的备选目标分组作为目标分组。
一实施例中,第30个备选目标分组中首个信号点对应的起点时间,与第31个备选目标分组中首个信号点对应的起点时间之间的区间中,不存在其他分组中首个信号点对应的起点时间,确定第30个备选目标分组和第31个备选目标分组满足相邻剔除条件。若第32个备选目标分组和第31个备选目标分组也满足相邻剔除条件,将第30个备选目标分组、第31个备选目标分组和第32个备选目标分组均剔除。本实施例中,其他分组并不是局限于备选目标分组,其他分组是指在前述进行分组时,形成的分组。
也就是说,经过相邻剔除条件筛选后的目标分组之间不存在相邻情况。
在一个具体的例子中,如图2b所示,根据趋势拟合包络信号202确定音频信号中的信号点201为目标节奏点。通过对多个分组分别进行阈值筛选和相邻筛选的两步筛选,最终确定目标分组,实现对节奏点的两步筛选,提高节奏点识别的准确性。
一实施例中,根据目标分组中的多个信号点确定一个备选节奏点可以是根 据预设规则从目标分组中的多个信号点中确认一个备选节奏点。一实施例中,将目标分组中的任意一个信号点作为备选节奏点。
S240,在目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述目标分组对应的备选节奏点的起点时间。
时间区间可以是指目标分组中首个信号点对应的时间到目标分组中终点信号点对应的时间之间形成的区间。从该区间中选择一个时间点作为该目标分组对应的备选节奏点的起点时间,一实施例中,可以将首个信号点对应的时间点作为该备选节奏点对应的起点时间。
S250,将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点。
S260,根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间。S270,将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
本公开实施例通过对音频信号进行分组处理,以及获取每个分组中多个信号点的信号频域特征参数,由此确定每个分组对应的分组频域特征参数,并根据分组频域特征参数对每个分组进行筛选,确定目标分组,对应每个目标分组确定一个备选节奏点,实现在确定备选节奏点之前对分组进行筛选,减少备选节奏点的数量,提高节奏点识别的效率和准确性。
实施例三
图3为本公开实施例三提供的一种节奏点识别方法的流程图。本实施例以上述实施例中可选方案为基础进行说明。
本实施例的方法可以包括:
S310,将所述音频信号中的多个信号点进行分组处理,确定多个分组。
本实施例中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠。
S320,根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数。
S330,将连续设定数量的分组作为一个分组集合,确定多个分组集合。
S340,针对每个分组集合,在确定所述每个分组集合满足频域特征阈值条 件的情况下,将所述每个分组集合中的首个分组作为备选目标分组。
S350,从所述备选目标分组中剔除满足相邻剔除条件的备选目标分组,将剩下的备选目标分组作为目标分组。
S360,根据目标分组中的多个信号点确定一个备选节奏点。
S370,在目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述每个目标分组对应的备选节奏点的起点时间。
S380,根据所述音频信号的趋势拟合包络信号的波形特征,识别所述趋势拟合包络信号中的波峰点。
S390,将所述备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,并将在时间上,与所述波峰点最近的备选节奏点作为目标节奏点。
S3100,根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间。
S3110,将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
S3120,在与所述目标节奏点对应的起点时间处,根据所述目标节奏点的音量信息和持续时间,添加与所述目标节奏点匹配的音乐特效。
实施例四
图4为本公开实施例提供的一种节奏点识别装置的结构示意图,本实施例可适用于识别一段音频信号中的节奏点的情况。该装置可以采用软件和/或硬件的方式实现,该装置可以配置于电子设备中。如图4所示,该装置可以包括:备选节奏点确定模块410、目标节奏点确定模块420、音量信息和持续时间确定模块430和节奏点识别结果确定模块440。
备选节奏点确定模块410,设置为根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;目标节奏点确定模块420,设置为将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;音量信息和持续时间确定模块430,设置为根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;节奏点识别结果确定模块440,设置为将与所述目标节奏点对应的起点时 间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
本公开实施例通过根据音频信号的频谱特性确定音频信号的至少一个备选节奏点,以及对应的起点时间,并根据音频信号的趋势拟合包络信号的波形特征从至少一个备选节奏点筛选目标节奏点,最后根据音频信号的波动拟合包络信号以及所述音频信号的节拍信息确定目标节奏点对应的的音量信息和持续时间,确定目标节奏点的识别结果,解决了相关技术中人工标注节奏点的时间成本高效率低的问题,实现自动识别节奏点,而且对节奏点进行多次筛选,提高节奏点识别的准确率。
一实施例中,所述备选节奏点确定模块410,包括:分组模块,设置为将所述音频信号中的多个信号点进行分组处理,确定多个分组,其中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠;频域特征参数计算模块,设置为根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数;备选节奏点筛选模块,设置为根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,并根据目标分组中的多个信号点确定一个备选节奏点;起点时间确定模块,设置为在目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述目标分组对应的备选节奏点的起点时间。
一实施例中,所述备选节奏点筛选模块,包括:分组集合确定模块,设置为将连续设定数量的分组作为一个分组集合,确定多个分组集合;备选目标分组确定模块,设置为针对每个分组集合,在确定所述每个分组集合中的分组对应的分组频域特征参数满足频域特征阈值条件的情况下,将所述每个分组集合中的首个分组作为备选目标分组;目标分组确定模块,设置为从多个备选目标分组中剔除满足相邻剔除条件的备选目标分组,将剩下的备选目标分组作为目标分组。
一实施例中,所述目标节奏点确定模块420,包括:波峰点识别模块,设置为根据所述音频信号的趋势拟合包络信号的波形特征,识别所述趋势拟合包络信号中的波峰点;目标节奏点筛选模块,设置为将所述备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,并将在时间上,与所述波峰点最近的备选节奏点作为目标节奏点。
一实施例中,所述音量信息和持续时间确定模块430,包括:音量区间确定模块,设置为根据所述目标节奏点对应的起点时间以及所述音频信号的节拍信息,确定所述目标节奏点匹配的音量区间;音量信息计算模块,设置为根据所述音量区间中多个信号点的信号时域特征参数,计算与所述目标节奏点对应的音量信息。
一实施例中,所述音量信息和持续时间确定模块430,包括:终点时间确定模块,设置为将任意相邻两个目标节奏点按照对应的起点时间,映射至所述音频信号的波动拟合包络信号中,并根据所述波动拟合包络信号的波形特征,确定与所述相邻两个目标节奏点匹配的信号点的起点时间;持续时间计算模块,设置为将所述相邻两个目标节奏点中的首个目标节奏点对应的起点时间,以及与所述相邻两个目标节奏点匹配的信号点的起点时间之间的时长,作为所述相邻两个目标节奏点中的首个目标节奏点对应的持续时间。
一实施例中,所述节奏点识别装置,还包括:音乐特效添加模块,设置为在与所述目标节奏点对应的起点时间处,根据所述目标节奏点的音量信息和持续时间,添加与所述目标节奏点匹配的音乐特效。
本公开实施例提供的节奏点识别装置,与实施例一提供的节奏点识别方法属于同一发明构思,未在本公开实施例中详尽描述的技术细节可参见实施例一,并且本公开实施例与实施例一具有相同的有益效果。
实施例五
本公开实施例提供了一种电子设备,下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如客户端或服务器端)500的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视机(Television,TV)、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,处理装置可以根据存储在只读存储器(Read-only Memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(Random Access Memory,RAM)503中的程序而执行至少一种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的至少一种程序和数据。处理装置501、ROM502以及RAM 503通过总线504彼此相连。输入/输出(Input/Output,I/O)接口505也连接至总线504。
一实施例中,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507; 包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有多种装置的电子设备500,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,该计算机程序产品包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
实施例六
本公开实施例还提供了一种计算机可读存储介质,计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,计算机可读信号介质中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序 被该电子设备执行时,使得该电子设备:根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了本公开至少一种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,备选节奏点确定模块还可以被描述为“根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间的模块”。

Claims (16)

  1. 一种节奏点识别方法,包括:
    根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;
    将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点;
    根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;
    将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
  2. 根据权利要求1所述的方法,其中,所述根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间,包括:
    将所述音频信号中的多个信号点进行分组处理,确定多个分组,其中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠;
    根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数;
    根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,并根据所述目标分组中的多个信号点确定一个备选节奏点;
    在所述目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述目标分组对应的备选节奏点的起点时间。
  3. 根据权利要求2所述的方法,其中,所述根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,包括:
    将连续设定数量的分组作为一个分组集合,确定多个分组集合;
    针对每个分组集合,在确定所述每个分组集合中的分组对应的分组频域特征参数满足频域特征阈值条件的情况下,将所述每个分组集合中的首个分组作为备选目标分组;
    从多个备选目标分组中剔除满足相邻剔除条件的备选目标分组,将剩下的 备选目标分组作为目标分组。
  4. 根据权利要求1-3任一项所述的方法,其中,所述将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波形特征,在所述备选节奏点中确定目标节奏点,包括:
    根据所述音频信号的趋势拟合包络信号的波形特征,识别所述趋势拟合包络信号中的波峰点;
    将所述备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,并将在时间上,与所述波峰点最近的备选节奏点作为目标节奏点。
  5. 根据权利要求1-4任一项所述的方法,其中,所述根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,包括:
    根据所述目标节奏点对应的起点时间以及所述音频信号的节拍信息,确定所述目标节奏点匹配的音量区间;
    根据所述音量区间中多个信号点的信号时域特征参数,计算与所述目标节奏点对应的音量信息。
  6. 根据权利要求1-5任一项所述的方法,其中,所述根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间,包括:
    将任意相邻两个目标节奏点按照对应的起点时间,映射至所述音频信号的波动拟合包络信号中,并根据所述波动拟合包络信号的波形特征,确定与所述相邻两个目标节奏点匹配的信号点的起点时间;
    将所述相邻两个目标节奏点中的首个目标节奏点对应的起点时间,以及与所述相邻两个目标节奏点匹配的信号点的起点时间之间的时长,作为所述相邻两个目标节奏点中的首个目标节奏点对应的持续时间。
  7. 根据权利要求1-6任一项所述的方法,,在所述将与所述目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果之后,还包括:
    在与所述目标节奏点对应的起点时间处,根据所述目标节奏点的音量信息和持续时间,添加与所述目标节奏点匹配的音乐特效。
  8. 一种节奏点识别装置,包括:
    备选节奏点确定模块,设置为根据待识别的音频信号的频谱特性,在所述音频信号中确定备选节奏点,并获取与所述备选节奏点对应的起点时间;
    目标节奏点确定模块,设置为将所述备选节奏点按照对应的起点时间,映射至所述音频信号的趋势拟合包络信号中,并根据所述趋势拟合包络信号的波 形特征,在所述备选节奏点中确定目标节奏点;
    音量信息和持续时间确定模块,设置为根据所述音频信号的节拍信息,确定与所述目标节奏点对应的音量信息,并根据所述音频信号的波动拟合包络信号,确定与所述目标节奏点对应的持续时间;
    节奏点识别结果确定模块,设置为将与目标节奏点对应的起点时间、音量信息以及持续时间作为对所述音频信号的节奏点识别结果。
  9. 根据权利要求8所述的装置,其中,所述备选节奏点确定模块,包括:
    分组模块,设置为将所述音频信号中的多个信号点进行分组处理,确定多个分组,其中,每个分组中包括设定数量的相邻信号点,不同分组中包括的信号点相异或者部分重叠;
    频域特征参数计算模块,设置为根据每个分组中多个信号点的信号频域特征参数,计算与所述每个分组对应的分组频域特征参数;
    备选节奏点筛选模块,设置为根据与每个分组对应的分组频域特征参数,以及预设的特征筛选条件,在所述多个分组中筛选出目标分组,并根据所述目标分组中的多个信号点确定一个备选节奏点;
    起点时间确定模块,设置为在所述目标分组中的多个信号点对应的时间区间中,选择一个时间点作为与所述目标分组对应的备选节奏点的起点时间。
  10. 根据权利要求9所述的装置,其中,所述备选节奏点筛选模块,包括:
    分组集合确定模块,设置为将连续设定数量的分组作为一个分组集合,确定多个分组集合;
    备选目标分组确定模块,设置为针对每个分组集合,在确定所述每个分组集合中的分组对应的分组频域特征参数满足频域特征阈值条件的情况下,将所述每个分组集合中的首个分组作为备选目标分组;
    目标分组确定模块,设置为从多个备选目标分组中剔除满足相邻剔除条件的备选目标分组,将剩下的备选目标分组作为目标分组。
  11. 根据权利要求8-10任一项所述的装置,其中,所述目标节奏点确定模块,包括:
    波峰点识别模块,设置为根据所述音频信号的趋势拟合包络信号的波形特征,识别所述趋势拟合包络信号中的波峰点;
    目标节奏点筛选模块,设置为将每个备选节奏点按照对应的起点时间,映射至所述趋势拟合包络信号中,并将在时间上,与所述波峰点最近的备选节奏点作为目标节奏点。
  12. 根据权利要求8-11任一项所述的装置,其中,所述音量信息和持续时间确定模块,包括:
    音量区间确定模块,设置为根据所述目标节奏点对应的起点时间以及所述音频信号的节拍信息,确定所述目标节奏点匹配的音量区间;
    音量信息计算模块,设置为根据所述音量区间中多个信号点的信号时域特征参数,计算与所述目标节奏点对应的音量信息。
  13. 根据权利要求8-12任一项所述的装置,其中,所述音量信息和持续时间确定模块,包括:
    终点时间确定模块,设置为将任意相邻两个目标节奏点按照对应的起点时间,映射至波动拟合包络信号中,并根据所述波动拟合包络信号的波形特征,确定与所述相邻两个目标节奏点匹配的信号点对应的起点时间;
    持续时间计算模块,设置为将所述相邻两个目标节奏点中的首个目标节奏点对应的起点时间,以及与所述相邻两个目标节奏点匹配的信号点对应的起点时间之间的时长,作为所述相邻两个目标节奏点中的首个目标节奏点对应的持续时间。
  14. 根据权利要求8-13任一项所述的装置,还包括:
    音乐特效添加模块,设置为在与所述目标节奏点对应的起点时间处,根据所述目标节奏点的音量信息和持续时间,添加与所述目标节奏点匹配的音乐特效。
  15. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7任一所述的节奏点识别方法。
  16. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-7任一所述的节奏点识别方法。
PCT/CN2019/099640 2018-12-12 2019-08-07 节奏点识别方法、装置、电子设备及存储介质 WO2020119150A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811519398.4 2018-12-12
CN201811519398.4A CN109670074B (zh) 2018-12-12 2018-12-12 一种节奏点识别方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020119150A1 true WO2020119150A1 (zh) 2020-06-18

Family

ID=66144273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099640 WO2020119150A1 (zh) 2018-12-12 2019-08-07 节奏点识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN109670074B (zh)
WO (1) WO2020119150A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023246496A1 (zh) * 2022-06-23 2023-12-28 深圳市智岩科技有限公司 音频节奏检测方法、智能灯具、装置、电子设备及介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670074B (zh) * 2018-12-12 2020-05-15 北京字节跳动网络技术有限公司 一种节奏点识别方法、装置、电子设备及存储介质
CN110390943B (zh) * 2019-06-28 2022-07-08 上海元笛软件有限公司 音频合成方法、装置、计算机设备和存储介质
CN110392045B (zh) * 2019-06-28 2022-03-18 上海元笛软件有限公司 音频播放方法、装置、计算机设备和存储介质
CN110265057B (zh) * 2019-07-10 2024-04-26 腾讯科技(深圳)有限公司 生成多媒体的方法及装置、电子设备、存储介质
CN110415669B (zh) * 2019-07-19 2022-03-04 北京字节跳动网络技术有限公司 一种节奏器的实现方法、装置、电子设备及存储介质
CN110519638B (zh) * 2019-09-06 2023-05-16 Oppo广东移动通信有限公司 处理方法、处理装置、电子装置和存储介质
CN110753238B (zh) * 2019-10-29 2022-05-06 北京字节跳动网络技术有限公司 视频处理方法、装置、终端及存储介质
CN111128100B (zh) * 2019-12-20 2021-04-20 网易(杭州)网络有限公司 节奏点检测方法、装置及电子设备
CN111128232B (zh) * 2019-12-26 2022-11-15 广州酷狗计算机科技有限公司 音乐的小节信息确定方法、装置、存储介质及设备
CN111429942B (zh) * 2020-03-19 2023-07-14 北京火山引擎科技有限公司 一种音频数据处理方法、装置、电子设备及存储介质
CN111785237B (zh) * 2020-06-09 2024-04-19 Oppo广东移动通信有限公司 音频节奏确定方法、装置、存储介质和电子设备
CN112466267B (zh) * 2020-11-24 2024-04-02 瑞声新能源发展(常州)有限公司科教城分公司 振动生成方法、振动控制方法及其相关设备
CN112435687B (zh) * 2020-11-25 2024-06-25 腾讯科技(深圳)有限公司 一种音频检测方法、装置、计算机设备和可读存储介质
CN114845145B (zh) * 2021-01-30 2024-04-12 华为技术有限公司 动作提示图标序列生成方法、电子设备和可读存储介质
CN113053339B (zh) * 2021-03-10 2024-04-02 百果园技术(新加坡)有限公司 节奏调整方法、装置、设备和存储介质
CN113096689B (zh) * 2021-04-02 2024-06-14 腾讯音乐娱乐科技(深圳)有限公司 一种歌曲演唱的评价方法、设备及介质
CN113613061B (zh) * 2021-07-06 2023-03-21 北京达佳互联信息技术有限公司 一种卡点模板生成方法、装置、设备及存储介质
CN113643717B (zh) * 2021-07-07 2024-09-06 深圳市联洲国际技术有限公司 一种音乐节奏检测方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3674950B2 (ja) * 2002-03-07 2005-07-27 ヤマハ株式会社 音楽データのテンポ推定方法および装置
CN108320730A (zh) * 2018-01-09 2018-07-24 广州市百果园信息技术有限公司 音乐分类方法及节拍点检测方法、存储设备及计算机设备
CN108364660A (zh) * 2018-02-09 2018-08-03 腾讯音乐娱乐科技(深圳)有限公司 重音识别方法、装置及计算机可读存储介质
CN109670074A (zh) * 2018-12-12 2019-04-23 北京字节跳动网络技术有限公司 一种节奏点识别方法、装置、电子设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3017224B1 (fr) * 2014-02-04 2017-07-21 Michael Brouard Procede de synchronisation d'une partition musicale avec un signal audio
CN206134252U (zh) * 2016-07-07 2017-04-26 惠州市新斯贝克动力科技有限公司 音频节奏识别电路

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3674950B2 (ja) * 2002-03-07 2005-07-27 ヤマハ株式会社 音楽データのテンポ推定方法および装置
CN108320730A (zh) * 2018-01-09 2018-07-24 广州市百果园信息技术有限公司 音乐分类方法及节拍点检测方法、存储设备及计算机设备
CN108364660A (zh) * 2018-02-09 2018-08-03 腾讯音乐娱乐科技(深圳)有限公司 重音识别方法、装置及计算机可读存储介质
CN109670074A (zh) * 2018-12-12 2019-04-23 北京字节跳动网络技术有限公司 一种节奏点识别方法、装置、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023246496A1 (zh) * 2022-06-23 2023-12-28 深圳市智岩科技有限公司 音频节奏检测方法、智能灯具、装置、电子设备及介质

Also Published As

Publication number Publication date
CN109670074B (zh) 2020-05-15
CN109670074A (zh) 2019-04-23

Similar Documents

Publication Publication Date Title
WO2020119150A1 (zh) 节奏点识别方法、装置、电子设备及存储介质
US12119023B2 (en) Audio onset detection method and apparatus
CN111798821A (zh) 声音转换方法、装置、可读存储介质及电子设备
CN112309414B (zh) 基于音频编解码的主动降噪方法、耳机及电子设备
CN109979418B (zh) 音频处理方法、装置、电子设备及存储介质
WO2021212985A1 (zh) 声学网络模型训练方法、装置及电子设备
US20240062738A1 (en) Methods and Apparatus for Harmonic Source Enhancement
CN109361995A (zh) 一种电器设备的音量调节方法、装置、电器设备和介质
CN112562633B (zh) 一种歌唱合成方法、装置、电子设备及存储介质
KR20240108548A (ko) 정규화를 통해 오디오 신호를 핑거프린팅하는 방법 및 장치
WO2022052246A1 (zh) 语音信号的检测方法、终端设备及存储介质
CN110070885B (zh) 音频起始点检测方法和装置
WO2021147157A1 (zh) 游戏特效生成方法及装置、存储介质、电子设备
CN113271386B (zh) 啸叫检测方法及装置、存储介质、电子设备
CN112382266B (zh) 一种语音合成方法、装置、电子设备及存储介质
CN110085214B (zh) 音频起始点检测方法和装置
US9445210B1 (en) Waveform display control of visual characteristics
WO2023051651A1 (zh) 音乐生成方法、装置、设备、存储介质及程序
WO2023193573A1 (zh) 一种音频处理方法、装置、存储介质及电子设备
WO2022121800A1 (zh) 声源定位方法、装置和电子设备
CN109495786B (zh) 视频处理参数信息的预配置方法、装置及电子设备
CN117316171A (zh) 一种拟合音频信号的生成方法、装置、电子设备和介质
CN111833883A (zh) 一种语音控制方法、装置、电子设备及存储介质
CN113156373B (zh) 声源定位方法、数字信号处理装置及音频系统
US20240282329A1 (en) Method and apparatus for separating audio signal, device, storage medium, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896199

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27/09/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19896199

Country of ref document: EP

Kind code of ref document: A1