[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024004564A1 - Acoustic analysis system, acoustic analysis method, and program - Google Patents

Acoustic analysis system, acoustic analysis method, and program Download PDF

Info

Publication number
WO2024004564A1
WO2024004564A1 PCT/JP2023/021287 JP2023021287W WO2024004564A1 WO 2024004564 A1 WO2024004564 A1 WO 2024004564A1 JP 2023021287 W JP2023021287 W JP 2023021287W WO 2024004564 A1 WO2024004564 A1 WO 2024004564A1
Authority
WO
WIPO (PCT)
Prior art keywords
beat
point
beat point
points
target
Prior art date
Application number
PCT/JP2023/021287
Other languages
French (fr)
Japanese (ja)
Inventor
和彦 山本
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2024004564A1 publication Critical patent/WO2024004564A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G3/00Recording music in notation form, e.g. recording the mechanical operation of a musical instrument
    • G10G3/04Recording music in notation form, e.g. recording the mechanical operation of a musical instrument using electrical means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present disclosure relates to techniques for analyzing acoustic signals.
  • Patent Document 1 discloses a technique for estimating beat points of a song using a probability model such as a hidden Markov model.
  • an acoustic analysis system includes a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal, and a beat point estimation unit that estimates a plurality of first beat points by estimation processing for an acoustic signal;
  • the target beat point selected by the user among them and one or more adjacent beat points located around the target beat point among the plurality of first beat points are set on a time axis according to instructions from the user.
  • a beat point editing unit that moves above the target beat point; and an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points;
  • a plurality of second beat points are estimated by performing subsequent estimation processing on the acoustic signal.
  • An acoustic analysis method estimates a plurality of first beat points by estimation processing on an acoustic signal, and includes a target beat point selected by a user among the plurality of first beat points, and a target beat point selected by a user from among the plurality of first beat points.
  • the first beat points one or more adjacent beat points located around the target beat point are moved on the time axis according to instructions from the user, and the target beat point and the one or more adjacent beat points are The estimation process is updated according to the movement of adjacent beat points, and the updated estimation process is executed on the acoustic signal, thereby estimating a plurality of second beat points.
  • a program includes: a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal; a target beat point selected by a user from among the plurality of first beat points; a beat point editing unit that moves one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user; A program that causes a computer system to function as an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, the beat point estimation unit updating the estimation process after the update.
  • a plurality of second beat points are estimated by performing this on the acoustic signal.
  • FIG. 1 is a block diagram illustrating the configuration of an acoustic analysis system in a first embodiment.
  • FIG. 1 is a block diagram illustrating a functional configuration of an acoustic analysis system. It is a flowchart of estimation processing.
  • FIG. 2 is an explanatory diagram of machine learning for establishing an estimation model. It is a schematic diagram of a confirmation image.
  • FIG. 3 is an explanatory diagram of beat point movement and update processing. It is a flowchart of update processing. It is a flowchart of acoustic analysis processing.
  • FIG. 2 is a block diagram illustrating a functional configuration of an acoustic analysis system in a second embodiment.
  • FIG. 1 is a block diagram illustrating the configuration of an acoustic analysis system 100 according to a first embodiment.
  • the acoustic analysis system 100 is a computer system that estimates a plurality of beat points B of a song by analyzing an acoustic signal A representing the performance sound of the song.
  • the acoustic analysis system 100 includes a control device 11, a storage device 12, a display device 13, an operating device 14, and a sound emitting device 15.
  • the acoustic analysis system 100 is realized by, for example, an information device such as a smartphone, a tablet terminal, or a personal computer. Note that the acoustic analysis system 100 is realized not only as a single device but also as a plurality of devices configured separately from each other.
  • the control device 11 is one or more processors that control each element of the acoustic analysis system 100. Specifically, for example, CPU (Central Processing Unit), GPU (Graphics Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit).
  • the control device 11 is composed of one or more types of processors such as the following.
  • the storage device 12 is one or more memories that store programs executed by the control device 11 and various data used by the control device 11.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media is used as the storage device 12.
  • a portable recording medium that can be attached to and detached from the acoustic analysis system 100 or a recording medium that can be accessed by the control device 11 via a communication network (for example, cloud storage) is used as the storage device 12.
  • a communication network for example, cloud storage
  • the storage device 12 stores the acoustic signal A.
  • the acoustic signal A is a sample series representing the waveform of the performance sound of the music piece. Specifically, the acoustic signal A represents at least one of an instrumental sound and a singing sound of a song.
  • the data format of the acoustic signal A is arbitrary. Note that the acoustic signal A may be supplied to the acoustic analysis system 100 from a signal supply device separate from the acoustic analysis system 100.
  • the signal supply device is, for example, a playback device that supplies the acoustic signal A recorded on a recording medium to the acoustic analysis system 100, or an acoustic signal A received from a distribution device (not shown) via a communication network to the acoustic analysis system. 100.
  • the display device 13 displays images under the control of the control device 11.
  • various display panels such as a liquid crystal display panel or an organic EL (Electroluminescence) panel are used as the display device 13.
  • a display device 13 that is separate from the acoustic analysis system 100 may be connected to the acoustic analysis system 100 by wire or wirelessly.
  • the operating device 14 is an input device that accepts instructions from a user.
  • the operating device 14 is, for example, an operator operated by a user or a touch panel that detects a touch by the user.
  • the sound emitting device 15 reproduces sound under the control of the control device 11.
  • a speaker or headphones are used as the sound emitting device 15.
  • a sound emitting device 15 that is separate from the acoustic analysis system 100 may be connected to the acoustic analysis system 100 by wire or wirelessly.
  • FIG. 2 is a block diagram illustrating the functional configuration of the acoustic analysis system 100.
  • the control device 11 has a plurality of functions (beat point estimation section 21, display control section 22, playback control section 23, beat point editing) for processing the acoustic signal A by executing a program stored in the storage device 12. 24 and update processing section 25).
  • the beat point estimation unit 21 estimates a plurality of beat points B in the song by analyzing the acoustic signal A. Specifically, the beat point estimating unit 21 generates time series data that specifies the time of each of the plurality of beat points B in the song.
  • the beat point estimation unit 21 of the first embodiment includes a feature extraction unit 30, a first processing unit 31, and a second processing unit 32.
  • the feature extraction unit 30 calculates the feature amount F(t) of the acoustic signal A for each of a plurality of time points (hereinafter referred to as "analysis time points") t on the time axis.
  • Each analysis time point t is a time point set on the time axis at predetermined intervals. The interval between the analysis time points t is sufficiently smaller than the interval between the beat points B assumed in the song.
  • the feature quantity F(t) is information representing the acoustic characteristics of the acoustic signal A at the analysis time t.
  • the feature amount F(t) at each analysis time point t is a time series of acoustic information within a predetermined period of time including the analysis time point t.
  • the acoustic information is, for example, information regarding the intensity of the acoustic signal A, such as volume and amplitude.
  • information regarding the frequency characteristics (timbre) of the acoustic signal A is also used as acoustic information.
  • Examples of information regarding frequency characteristics include MFCC (Mel-Frequency Cepstrum Coefficients), MSLS (Mel-Scale Log Spectrum), and Constant-Q Transform (CQT). Note that a plurality of pieces of acoustic information corresponding to one analysis time point t may be used as the feature amount F(t). Furthermore, the types of acoustic information are not limited to the above examples. The acoustic information may be a combination of multiple types of acoustic information regarding the acoustic signal A.
  • the first processing unit 31 and the second processing unit 32 estimate a plurality of beat points B from each feature amount F(t) of the acoustic signal A.
  • FIG. 3 is a flowchart of the process S2 of estimating a plurality of beat points B (hereinafter referred to as "estimation process").
  • the estimation process S2 includes a first process S21 and a second process S22.
  • the first processing section 31 executes the first processing S21
  • the second processing section 32 executes the second processing S22.
  • the first process S21 is a process that generates a probability P(t) that each analysis time point t corresponds to the beat point B of the song. The greater the probability P(t) of each analysis time point t, the higher the probability that the analysis time point t corresponds to the beat point B.
  • the first processing unit 31 generates a time series with probability P(t) by repeating the first process S21 at every analysis time point t. Estimated model M is used in the first process S21.
  • the estimated model M is a statistical model that has learned the above correlation. That is, the estimated model M is a learned model that has learned the relationship between the feature amount F(t) and the probability P(t) by machine learning. The estimated model M is also expressed as a trained model that has acquired the relationship between the feature quantity F(t) and the probability P(t) through training (machine learning).
  • the first processing unit 31 generates a probability P(t) by processing the feature amount F(t) of the acoustic signal A at each analysis time point t using the estimation model M. Specifically, the first processing unit 31 generates the probability P(t) by inputting input data including the feature amount F(t) to the estimation model M.
  • the estimation model M is composed of, for example, a deep neural network (DNN).
  • DNN deep neural network
  • any type of deep neural network such as a recurrent neural network (RNN) or a convolutional neural network (CNN) is used as the estimation model M.
  • RNN recurrent neural network
  • CNN convolutional neural network
  • the estimation model M may be configured by a combination of multiple types of deep neural networks. Further, additional elements such as long short-term memory (LSTM) or attention may be included in the estimation model M.
  • LSTM long short-term memory
  • the estimation model M includes a program that causes the control device 11 to execute a calculation to generate a probability P(t) from a feature amount F(t), and a plurality of variables (specifically, weight values and biases) applied to the calculation. This is realized in combination with A program and a plurality of variables that realize the estimation model M are stored in the storage device 12. The numerical values of each of the plurality of variables that define the estimation model M are set in advance by machine learning.
  • the second process S22 in FIG. 3 is a process for estimating a plurality of beat points B in the song from the time series of probabilities P(t) generated in the first process S21.
  • Various state transition models are used in the second process S22.
  • the state transition model is composed of, for example, a Hidden Semi-Markov Model (HSMM), and a plurality of beat points B are estimated by the Viterbi algorithm, which is an example of dynamic programming. For example, the time point when probability P(t) becomes maximum is estimated as beat point B.
  • HSMM Hidden Semi-Markov Model
  • FIG. 4 is an explanatory diagram of machine learning to establish the estimation model M.
  • the estimated model M is established by machine learning using a machine learning system 200 that is separate from the acoustic analysis system 100.
  • An estimated model M is provided from the machine learning system 200 to the acoustic analysis system 100.
  • the functions of the machine learning system 200 may be installed in the acoustic analysis system 100.
  • a plurality of training data Z are used for machine learning of the estimated model M.
  • Each of the plurality of teacher data Z is composed of a combination of a feature amount Fm for machine learning and a probability Pm for machine learning.
  • the feature amount Fm is a feature amount F(t) at a specific time point of the acoustic signal Am prepared for machine learning.
  • the acoustic signal Am is a signal recording the sound radiated into the acoustic space, or a signal synthesized by known sound synthesis processing.
  • the probability Pm for machine learning corresponding to a specific point in time is the probability (that is, the correct value) that the point in time corresponds to beat point B of the song.
  • a plurality of pieces of teacher data Z are prepared for a large number of songs whose beat points B are known. Note that the acoustic signal Am is an example of a "learning acoustic signal.”
  • the machine learning system 200 calculates the probability P(t) that an initial or provisional model (hereinafter referred to as "provisional model") M0 outputs when the feature quantity Fm of each teaching data Z is input, and An error function representing the error with probability Pm is calculated.
  • provisional model an initial or provisional model
  • the machine learning system 200 then updates the multiple variables of the interim model M0 so that the error function is reduced.
  • the provisional model M0 at the time when the above process is repeated for each of the plurality of teacher data Z is determined as the estimated model M.
  • the estimation model M calculates a statistically valid probability P( t) is output. That is, the estimated model M is a learned model that has learned the relationship between the feature amount Fm of the acoustic signal Am for machine learning and the probability Pm that the time point at which the feature amount is observed corresponds to the beat point B.
  • the first processing unit 31 processes the feature amount F(t) of each analysis time point t using the estimation model M established in the above procedure, and calculates the probability that the analysis time point t corresponds to the beat point B of the song. Generate (t).
  • the relationship between the feature amount Fm of the acoustic signal Am for machine learning and the probability Pm that the analysis time t at which the feature amount Fm is observed corresponds to the beat point B is calculated.
  • a plurality of beat points B are estimated from the acoustic signal A using the learned estimation model M. Therefore, a plurality of beat points B can be estimated with high accuracy for an unknown acoustic signal A in which the feature amount F(t) varies in various ways.
  • the display control unit 22 in FIG. 2 displays an image on the display device 13. Specifically, the display control unit 22 displays the confirmation image G in FIG. 5 on the display device 13.
  • the confirmation image G includes a waveform area Ga and a beat area Gb. A common time axis is set for the waveform area Ga and the beat area Gb.
  • a waveform within a specific range (hereinafter referred to as "display range") of the acoustic signal A is displayed.
  • the display control unit 22 changes the display range of the acoustic signal A according to a user's instruction to the operating device 14.
  • a plurality of beat points B estimated from the acoustic signal A by the beat point estimation unit 21 are displayed in the beat point area Gb.
  • a plurality of beat points B within the display range of the acoustic signal A are displayed in the beat point area Gb.
  • the beat area Gb is an example of a "beat point image.”
  • the user can instruct the reproduction of the audio signal A by operating the operating device 14.
  • the reproduction control unit 23 in FIG. 2 reproduces the sound represented by the acoustic signal A by supplying the acoustic signal A to the sound emitting device 15.
  • the display control unit 22 displays the reproduction position Gc on the confirmation image G in parallel with the reproduction of the acoustic signal A, as illustrated in FIG.
  • the reproduction position Gc is the point in time when the acoustic signal A is being reproduced by the sound emitting device 15. Therefore, the reproduction position Gc advances in parallel with the reproduction of the acoustic signal A in the direction of the time axis.
  • the user can confirm the position of the beat point B estimated by the immediately preceding estimation process S2 by visually recognizing the beat point area Gb while listening to the sound reproduced by the sound emitting device 15. If the current position of beat point B does not match the user's intention, the user can instruct correction of the estimated beat point B position by operating the operating device 14.
  • the beat point editing unit 24 in FIG. 2 moves each beat point B on the time axis according to instructions from the user.
  • Moving the beat point B is a process of changing the position of the beat point B on the time axis.
  • FIG. 6 is an explanatory diagram regarding the movement of the beat point B.
  • State 1 in FIG. 6 is a state in which a plurality of beat points B have been estimated by the estimation process S2 described above.
  • FIG. 6 also shows a time series of the probability P(t) calculated in the first process S21.
  • the user can select any one of the plurality of beat points B (hereinafter referred to as "target beat point Bn") displayed in the beat point area Gb by operating the operating device 14 while checking the beat point area Gb. . Further, the user can instruct movement of the target beat point Bn on the time axis by operating the operating device 14 while checking the beat point area Gb. Specifically, the user can instruct the movement direction (forward/backward) and movement amount ⁇ of the target beat point Bn. For example, the user can instruct the target beat point Bn to be moved to a point that he or she deems appropriate. As illustrated as state 2 in FIG.
  • the beat point editing unit 24 moves the target beat point Bn in the direction (forward/backward) specified by the user by the amount of movement ⁇ specified by the user on the time axis. move on.
  • FIG. 6 illustrates a case in which the target beat point Bn moves forward, the target beat point Bn may also move backward.
  • the beat point editing unit 24 selects a beat point B located immediately before the target beat point Bn (hereinafter referred to as "adjacent beat point Bn-1") among the plurality of beat points B. , a beat point B located immediately after the target beat point Bn (hereinafter referred to as "adjacent beat point Bn+1") is moved on the time axis in conjunction with the target beat point Bn. Specifically, the beat point editing unit 24 edits the adjacent beat points Bn by the amount of movement ⁇ specified by the user for the target beat point Bn in the movement direction (forward/backward) specified by the user for the target beat point Bn. -1 and the adjacent beat point Bn+1 are moved on the time axis.
  • the beat point area Gb displayed on the display device 13 includes the target beat point Bn and the adjacent beat points Bn ⁇ 1.
  • the display control section 22 causes the movement of each beat point B by the beat point editing section 24 to be reflected in the beat point area Gb displayed on the display device 13. Specifically, the display control unit 22 moves the target beat point Bn and each adjacent beat point Bn ⁇ 1 in the beat point area Gb on the time axis according to an instruction from the user.
  • the update processing unit 25 in FIG. 2 updates the estimated model M according to the movement of the target beat point Bn and each adjacent beat point Bn ⁇ 1. Specifically, the update processing unit 25 updates the estimated model M by machine learning according to the movement of the target beat point Bn and each adjacent beat point Bn ⁇ 1.
  • FIG. 7 is a flowchart of the process (hereinafter referred to as "update process") S8 in which the control device 11 (update processing unit 25) updates the estimated model M.
  • the updating process S8 is started with the movement of the target beat point Bn and each adjacent beat point Bn ⁇ 1.
  • the update processing unit 25 sets a numerical value string C corresponding to the moved target beat point Bn and each adjacent beat point Bn ⁇ 1 on the time axis (S81). As illustrated as state 4 in FIG. 6, the numerical value string C is a time series of numerical values Q(t) set at each analysis time point t on the time axis.
  • the numerical value sequence C includes a numerical value distribution D corresponding to the target beat point Bn after movement and each adjacent beat point Bn ⁇ 1.
  • Numerical distribution D is a distribution of numerical values Q(t) in a specific range on the time axis.
  • the numerical distribution D is expressed by a probability distribution function defined with time t as a variable on the time axis.
  • the numerical distribution D in the first embodiment is a line-symmetric triangular distribution over a predetermined distribution width.
  • Numerical distribution D is individually set for each beat point B. The position on the time axis of the numerical distribution D corresponding to each beat point B is determined so that the value is the maximum value at the beat point B.
  • the numerical distribution D corresponding to the target beat point Bn takes the maximum value at the target beat point Bn
  • the numerical distribution D corresponding to each adjacent beat point Bn ⁇ 1 takes the maximum value at the relevant adjacent beat point Bn ⁇ 1.
  • Numerical values Q(t) at each analysis time point t other than each numerical distribution D in the numerical value string C are set to zero.
  • the update processing unit 25 calculates the error e(t) for each analysis time point t within the applicable interval T on the time axis (S82).
  • the applicable section T is a series of sections including the adjacent beat point Bn-1 and the adjacent beat point Bn+1. Specifically, a period on the time axis whose end points are the adjacent beat point Bn-1 and the adjacent beat point Bn+1 is set as the applicable section T.
  • the update processing unit 25 calculates an error function E from a plurality of errors e(t) calculated for different analysis time points t within the applicable interval T (S83).
  • the error function E is an objective function representing the difference between the probability P(t) and the numerical value Q(t) within the application interval T. For example, the sum of multiple errors e(t) within the applicable interval T is calculated as the error function E.
  • the update processing unit 25 updates the estimated model M so that the error function E is minimized (S84).
  • a known technique is arbitrarily adopted.
  • adaptive processing using Self-Attention is employed to update the estimated model M.
  • the adaptive processing for the estimated model M is described in, for example, Kazuhiko Yamamoto, “HUMAN-IN-THE-LOOP ADAPTATION FOR INTERACTIVE MUSICAL BEAT TRACKING,” Proceedings of the 22nd ISMIR Conference, Online, November 7-12, 2021. .
  • the update processing unit 25 calculates the numerical distribution D (numerical value Q(t)) corresponding to the target beat point Bn after movement and each adjacent beat point Bn ⁇ 1, and the pre-estimation process S2
  • the estimated model M is updated so that the error e(t) between the probability P(t) estimated in (first process S21) and the time series is reduced. Therefore, the movement of the target beat point Bn and each adjacent beat point Bn ⁇ 1 can be appropriately reflected in the estimation model M.
  • FIG. 8 is a flowchart of the process (hereinafter referred to as "acoustic analysis process") executed by the control device 11. For example, the acoustic analysis process is started in response to a user's instruction to the operating device 14.
  • the control device 11 calculates the feature amount F(t) of the acoustic signal A for each analysis time point t on the time axis (S1).
  • the control device 11 (beat point estimating unit 21) estimates a plurality of beat points B from each feature amount F(t) of the acoustic signal A by estimation processing S2 illustrated in FIG.
  • an estimation model M that has learned the relationship between the feature amount F(t) and the probability P(t) by machine learning is used.
  • the control device 11 (display control section 22) displays the confirmation image G on the display device 13 (S3). In the beat point area Gb of the confirmation image G, a plurality of beat points B estimated by the estimation process S2 are displayed.
  • the control device 11 determines whether the termination condition is satisfied (S4).
  • the termination condition is, for example, that the user instructs to terminate the acoustic analysis process by operating the operating device 14. If the termination condition is satisfied (S4: YES), the control device 11 terminates the acoustic analysis process. If the end condition is not satisfied (S4: NO), the control device 11 (beat point editing unit 24) determines whether an instruction to move the target beat point Bn has been received from the user (S5). If movement of the target beat point Bn is not instructed (S5: NO), the control device 11 moves the process to step S4. That is, the control device 11 waits for an instruction to end the acoustic analysis process or an instruction to move the target beat point Bn.
  • the control device 11 When receiving an instruction to move the target beat point Bn (S5: YES), the control device 11 (beat point editing unit 24) moves the target beat point Bn and adjacent beat points Bn ⁇ 1 before and after it from the user. It moves on the time axis according to the instruction (S6). Further, the control device 11 (display control unit 22) moves the target beat point Bn and each adjacent beat point Bn ⁇ 1 in the beat point area Gb according to an instruction from the user (S7).
  • the control device 11 updates the estimated model M by the update process S8 illustrated in FIG.
  • the control device 11 shifts the process to estimation processing S2. That is, the control device 11 (beat point estimating unit 21) estimates a plurality of beat points B by performing estimation processing S2 on the acoustic signal A using the updated estimation model M.
  • the feature amount F(t) calculated immediately after the start of the acoustic analysis process is applied to the second and subsequent estimation processes S2.
  • the updating process S8 of the estimated model M and the estimation process S2 using the updated estimated model M are repeated every time the target beat point Bn moves. Therefore, the position of each beat point B estimated by the estimation process S2 approaches the position where the user's instruction is reflected each time the estimation process S2 is repeated.
  • the beat point B estimated by any one estimation process S2 is an example of the "first beat point”
  • the beat point B estimated by the next estimation process S2 after updating the estimation model M is the "second beat point”. This is an example of "beat point”.
  • the target beat point Bn selected by the user and the adjacent beat points Bn around the target beat point Bn The estimation model M is updated according to the movement from ⁇ 1, and the plurality of beat points B are re-estimated by estimation processing S2 applying the updated estimation model M. That is, in updating the estimated model M, not only the movement of the target beat point Bn but also the temporal relationship between the target beat point Bn and each adjacent beat point Bn ⁇ 1 is reflected in the estimated model M. Therefore, compared to a configuration in which only the movement of the target beat point Bn is reflected in the estimation model M (hereinafter referred to as a "comparative example"), a beat point B that appropriately matches the user's intention can be estimated.
  • the estimation model M is given a tendency (ritardando) that the performance speed decreases after the target beat point Bn has elapsed.
  • the target beat point Bn is moved, it is more likely that the user intends to modify the beat point B throughout the song than when the user intends to change the performance speed.
  • the performance speed decreases after the target beat point Bn has passed.
  • the problem will be resolved. That is, as described above, by comparing with the comparative example, it is possible to estimate the beat point B that appropriately matches the user's intention. Furthermore, it is possible to provide the user with the customer experience of being able to estimate the beat point B that appropriately reflects the user's intention.
  • the beat point area Gb is displayed on the display device 13
  • the movement of the target beat point Bn and each adjacent beat point Bn ⁇ 1 in accordance with instructions from the user can be viewed. Users can check visually. Therefore, while predicting the beat point B estimated by the updated estimation model M, the user can instruct movement between the target beat point Bn and each adjacent beat point Bn ⁇ 1.
  • FIG. 9 is a block diagram illustrating the functional configuration of the acoustic analysis system 100 in the second embodiment.
  • the control device 11 of the second embodiment has the same elements as the first embodiment (beat point estimating section 21, display control section 22, playback control section 23, beat point estimation section 21, display control section 22, playback control section 23, It functions as a section setting section 26 in addition to the point editing section 24 and update processing section 25).
  • the section setting unit 26 sets a partial section (hereinafter referred to as a "specific section") of the acoustic signal A on the time axis. Specifically, the section setting unit 26 sets a specific section according to an instruction from the user. For example, by operating the operating device 14, the user can specify a specific section of the acoustic signal A displayed in the waveform area Ga. The section setting unit 26 sets the section specified by the user as a specific section.
  • the control device 11 of the second embodiment executes the acoustic analysis process of FIG. 8 for a specific section of the acoustic signal A.
  • the estimation process S2 by the beat point estimating unit 21 is executed limitedly for a specific section. That is, a plurality of beat points B are estimated within a specific section within the song.
  • the second embodiment also achieves the same effects as the first embodiment. Further, in the second embodiment, the beat point B can be estimated in a limited manner for a part of the section (specific section) of the acoustic signal A.
  • the section setting unit 26 may set the specific section according to predetermined rules without requiring instructions from the user.
  • the section setting unit 26 may set any one of a plurality of structural sections of the song represented by the acoustic signal A as the specific section.
  • a structural section is a section in which a piece of music is divided on the time axis according to musical meaning.
  • the structural sections are, for example, sections such as an intro, a verse, a bridge, a chorus, and an outro.
  • the section setting unit 26 divides the acoustic signal A into a plurality of structural sections by analyzing the acoustic signal A, and sets a specific structural section among the plurality of structural sections as a specific section. According to the above configuration, beat points B can be estimated in a limited manner for a specific structural section.
  • the adjacent beat point Bn-1 immediately before the target beat point Bn and the adjacent beat point Bn+1 immediately after the target beat point Bn are moved together with the target beat point Bn.
  • a mode in which only one of Bn-1 and the adjacent beat point Bn+1 is moved together with the target beat point Bn is also assumed.
  • the beat point editing unit 24 moves only the target beat point Bn and the immediately preceding adjacent beat point Bn-1 on the time axis according to an instruction from the user
  • the update processing unit 25 moves only the target beat point Bn and the immediately preceding adjacent beat point Bn-1.
  • the error e(t) may be calculated within the applicable interval T between -1 and the target beat point Bn.
  • the beat point editing unit 24 moves only the target beat point Bn and the immediately following adjacent beat point Bn+1 on the time axis according to instructions from the user, and the update processing unit 25 moves only the target beat point Bn+1
  • the error e(t) may be calculated within the applicable interval T between Bn and the adjacent beat point Bn+1.
  • the beat point editing unit 24 uses one or more adjacent beat points Bn ⁇ 1 located around the target beat point Bn among the plurality of beat points B as an element to move on the time axis. expressed.
  • a triangular distribution was exemplified as the numerical distribution D corresponding to the target beat point Bn and each adjacent beat point Bn ⁇ 1 after movement, but the type or shape of the numerical distribution D is as described above. Not limited to examples. For example, a probability distribution such as a normal distribution or a pulse-like distribution may also be employed as the numerical distribution D.
  • the type of feature amount F(t) that the feature extraction unit 30 calculates from the acoustic signal A is not limited to the examples in each of the above embodiments.
  • a time series of a predetermined number of samples constituting the acoustic signal A may be applied to the estimation process S2 as the feature amount F(t).
  • the feature extraction unit 30 extracts a time series of samples from the acoustic signal A, but from the viewpoint that the acoustic signal A itself is partially applied to the estimation process S2, the feature extraction It can also be interpreted as a form in which the section 30 is omitted.
  • the target beat point Bn and each adjacent beat point Bn ⁇ 1 were moved according to the movement direction (forward/backward) and movement amount ⁇ specified by the user, but the target beat point
  • the method by which the user instructs the movement of Bn and each adjacent beat point Bn ⁇ 1 is not limited to the above example.
  • the beat point editing unit 24 may move the target beat point Bn and each adjacent beat point Bn ⁇ 1 according to the sign ( ⁇ ) and numerical value input by the user.
  • the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn ⁇ 1 forward on the time axis by a movement amount ⁇ corresponding to the absolute value of the negative number.
  • the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn ⁇ 1 backward on the time axis by the amount of movement ⁇ corresponding to the positive number. do.
  • the beat point editing unit 24 may move the target beat point Bn and each adjacent beat point Bn ⁇ 1 by a predetermined unit amount on the time axis over the number of times instructed by the user. For example, every time a movement instruction is received from the user, the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn ⁇ 1 by a unit amount in the direction (forward/backward) specified by the user. do. Therefore, the target beat point Bn and each adjacent beat point Bn ⁇ 1 move on the time axis by a movement amount ⁇ corresponding to the multiplication value of a predetermined unit amount and the number of movement instructions.
  • moving the beat point in response to an instruction from the user means that the conditions for moving the beat point (for example, the direction and amount of movement) are based on the instruction from the user.
  • the method of instruction by the user and the matters instructed by the user are arbitrary in this disclosure.
  • moving the beat point means changing the position of the beat point on the time axis.
  • a deep neural network is illustrated as the estimation model M, but the configuration of the estimation model M is not limited to the above examples.
  • a statistical model such as a Hidden Markov Model (HMM) or a Support Vector Machine (SVM) may also be used as the estimation model M.
  • HMM Hidden Markov Model
  • SVM Support Vector Machine
  • the estimated model M was updated by the update process S8. Since the estimation model M is applied to the estimation process S2, the update process S8 can also be expressed as a process for updating the estimation process S2.
  • the acoustic analysis system 100 may be realized by a server device that communicates with an information device such as a smartphone or a tablet terminal.
  • the acoustic analysis system 100 estimates a plurality of beat points B by analyzing the acoustic signal A received from the information device, and transmits data representing the plurality of beat points B to the information device.
  • a program according to the present disclosure may be provided in a form stored in a computer-readable recording medium and installed on a computer.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of.
  • the non-transitory recording medium includes any recording medium excluding transitory, propagating signals, and does not exclude volatile recording media.
  • a storage device that stores the program in the distribution device corresponds to the above-mentioned non-transitory recording medium.
  • An acoustic analysis system includes a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal, and a beat point estimation unit that estimates a plurality of first beat points by an estimation process for an acoustic signal; A beat that moves a selected target beat point and one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user. a point editing unit; and an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, and the beat point estimation unit updates the estimation process after the update.
  • a plurality of second beat points are estimated by performing the estimation on the acoustic signal.
  • the estimation process is updated according to the movement on the time axis regarding the target beat point selected by the user and one or more adjacent beat points located around the target beat point, and the updated A plurality of second beat points are estimated by the estimation process.
  • the acoustic analysis system may also be expressed as an acoustic analysis device. It does not matter whether the "acoustic analysis system" or the “acoustic analysis device” is composed of a single device or a plurality of mutually separate devices.
  • the "estimation process” is a process for estimating multiple beat points (first beat point/second beat point) from the acoustic signal.
  • an example of “estimation processing” is a process that uses an estimation model that has learned the relationship between the feature quantity of the acoustic signal for learning and the probability that the time point at which the feature quantity is observed corresponds to a beat point. Specifically, by processing the feature amount at a specific point in time of the acoustic signal to be processed using the estimation model, the probability that the point in time corresponds to a beat point is output.
  • Updating estimation processing means processing for updating elements applied to estimation processing. For example, assuming an estimation process that uses an estimation model, machine learning that updates variables that define the estimation model corresponds to “updating the estimation process.”
  • the one or more adjacent beat points include a first beat point located immediately before the target beat point among the plurality of first beat points, and a first beat point located immediately before the target beat point among the plurality of first beat points.
  • the acoustic analysis system displays a beat point image representing the target beat point and the one or more adjacent beat points on the display device 13, and displays the beat point image in the beat point image.
  • the display control unit 22 further includes a display control unit 22 that moves the included target beat point and the one or more adjacent beat points in accordance with an instruction from the user.
  • the user can visually confirm how the target beat point and one or more adjacent beat points move according to instructions from the user. Therefore, the user can instruct the movement of the target beat point and one or more adjacent beat points while predicting the second beat point estimated by the updated estimation process.
  • the estimation process calculates the relationship between the feature amount of the learning acoustic signal and the probability that the time point at which the feature amount is observed corresponds to a beat point. a first process of generating a probability that the point corresponds to a beat point by processing feature amounts at each point in time of the acoustic signal using a learned estimation model; and a time series of the probabilities generated by the first process. and a second process of identifying the plurality of first beat points from.
  • multiple beat points are estimated from the acoustic signal using an estimation model that has learned the relationship between the feature amount of the learning acoustic signal and the probability that the time point of the feature value corresponds to a beat point. . Therefore, a plurality of beat points (first beat point/second beat point) can be estimated with high accuracy for an unknown acoustic signal in which the feature value changes in various ways.
  • the update processing unit includes a numerical distribution set on a time axis corresponding to the target beat point after the movement and the one or more adjacent beat points;
  • the estimation model is updated so that the error between the time series of probabilities estimated by one process and the time series is reduced.
  • the estimation model is updated so that the error between the numerical distribution corresponding to the target beat point and adjacent beat points and the time series of probabilities estimated by the first process is reduced. It is possible to appropriately reflect the movement of beat points and adjacent beat points in the estimation model.
  • “Numerical distribution” is the distribution of numerical values on the time axis.
  • the type and shape of the numerical distribution are arbitrary. For example, a triangular distribution, a normal distribution, or a pulsed distribution is exemplified as a "numeric distribution.”
  • “set corresponding to (target/adjacent) beat points” means that the position of the beat point on the time axis and the position of the numerical distribution on the time axis correspond to each other. That is, as the position of the beat point on the time axis changes, the position of the numerical distribution on the time axis also changes. For example, a relationship in which the maximum point of the numerical distribution coincides with a beat point is a typical example of a relationship "set corresponding to a beat point.”
  • the acoustic analysis system further includes an interval setting unit 26 that sets a specific interval that is a part of the acoustic signal on the time axis, The estimation process by the beat point estimation unit is performed for the specific section.
  • the second beat point can be estimated in a limited manner for a part of the acoustic signal.
  • the “specific section” is an arbitrary part of the section of the acoustic signal on the time axis.
  • a section specified by the user is an example of a "specific section.”
  • the beat point may be estimated by using any one of a plurality of structural sections of the music represented by the acoustic signal as a "specific section.”
  • a structural section is a section in which a piece of music is divided on the time axis according to musical meaning.
  • the structural sections are, for example, sections such as an intro, a verse, a bridge, a chorus, and an outro.
  • An acoustic analysis method estimates a plurality of first beat points by estimation processing on an acoustic signal, and includes a target beat point selected by a user among the plurality of first beat points, and a target beat point selected by a user from among the plurality of first beat points.
  • the first beat points one or more adjacent beat points located around the target beat point are moved on the time axis according to instructions from the user, and the target beat point and the one or more adjacent beat points are The estimation process is updated according to the movement of adjacent beat points, and the updated estimation process is executed on the acoustic signal, thereby estimating a plurality of second beat points. Note that each aspect illustrated for the acoustic analysis system is similarly applied to the acoustic analysis method according to the present disclosure.
  • a program includes: a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal; a target beat point selected by a user from among the plurality of first beat points; a beat point editing unit that moves one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user; A program that causes a computer system to function as an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, the beat point estimation unit updating the estimation process after the update.
  • a plurality of second beat points are estimated by performing this on the acoustic signal. Note that each aspect illustrated for the acoustic analysis system is similarly applied to the program according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

This information analysis system 100 comprises: a beat point estimation unit 21 for estimating a plurality of beat points B by an estimation process performed on an acoustic signal A; a beat point editing unit 24 for moving, on a time axis in accordance with an instruction from a user, a target beat point selected by the user from among the plurality of beat points B and one or more adjacent beat points located around the target beat point from among the plurality of beat points B; and an update processing unit 25 for updating the estimation process in accordance with the movement of the target beat point and the one or more adjacent beat points. The beat point estimation unit 21 re-estimates the plurality of beat points B by executing the updated estimation process on the acoustic signal A.

Description

音響解析システム、音響解析方法およびプログラムAcoustic analysis system, acoustic analysis method and program
 本開示は、音響信号を解析する技術に関する。 The present disclosure relates to techniques for analyzing acoustic signals.
 楽曲の演奏音を表す音響信号を解析することで当該楽曲の拍点(ビート)を推定する解析技術が従来から提案されている。例えば特許文献1には、隠れマルコフモデル等の確率モデルを利用して楽曲の拍点を推定する技術が開示されている。 Analysis techniques have been proposed in the past that estimate the beats of a song by analyzing acoustic signals representing the performance sounds of the song. For example, Patent Document 1 discloses a technique for estimating beat points of a song using a probability model such as a hidden Markov model.
特開2015-114361号公報Japanese Patent Application Publication No. 2015-114361
 楽曲の拍点を推定する従来の技術においては、例えば楽曲の裏拍が拍点として誤推定される可能性、または、楽曲の本来のテンポの2倍のテンポに対応する拍点が誤推定される可能性がある。また、利用者が表拍の推定を期待している状況で楽曲の裏拍が推定される場合のように、拍点の推定結果が利用者の意図に適合しない可能性もある。以上の事情を考慮すると、音響信号から推定された複数の拍点の時間軸上の位置を、利用者が変更できる構成が重要である。以上の事情を考慮して、本開示のひとつの態様は、利用者の意図に適切に適合した拍点を推定することをひとつの目的とする。 In conventional techniques for estimating beat points of a song, for example, there is a possibility that a backbeat of a song may be incorrectly estimated as a beat point, or a beat point that corresponds to a tempo twice the original tempo of the song may be incorrectly estimated. There is a possibility that Furthermore, there is a possibility that the beat point estimation result does not match the user's intention, as in the case where the back beat of a song is estimated in a situation where the user expects the front beat to be estimated. Considering the above circumstances, it is important to have a configuration that allows the user to change the positions on the time axis of a plurality of beat points estimated from the acoustic signal. In consideration of the above circumstances, one aspect of the present disclosure aims to estimate a beat that appropriately matches the user's intention.
 以上の課題を解決するために、本開示のひとつの態様に係る音響解析システムは、音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部と、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部と、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部とを具備し、前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。 In order to solve the above problems, an acoustic analysis system according to one aspect of the present disclosure includes a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal, and a beat point estimation unit that estimates a plurality of first beat points by estimation processing for an acoustic signal; The target beat point selected by the user among them and one or more adjacent beat points located around the target beat point among the plurality of first beat points are set on a time axis according to instructions from the user. a beat point editing unit that moves above the target beat point; and an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points; A plurality of second beat points are estimated by performing subsequent estimation processing on the acoustic signal.
 本開示のひとつの態様に係る音響解析方法は、音響信号に対する推定処理により複数の第1拍点を推定し、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動し、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新し、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。 An acoustic analysis method according to one aspect of the present disclosure estimates a plurality of first beat points by estimation processing on an acoustic signal, and includes a target beat point selected by a user among the plurality of first beat points, and a target beat point selected by a user from among the plurality of first beat points. Among the first beat points, one or more adjacent beat points located around the target beat point are moved on the time axis according to instructions from the user, and the target beat point and the one or more adjacent beat points are The estimation process is updated according to the movement of adjacent beat points, and the updated estimation process is executed on the acoustic signal, thereby estimating a plurality of second beat points.
 本開示のひとつの態様に係るプログラムは、音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部、および、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部、としてコンピュータシステムを機能させるプログラムであって、前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。 A program according to one aspect of the present disclosure includes: a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal; a target beat point selected by a user from among the plurality of first beat points; a beat point editing unit that moves one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user; A program that causes a computer system to function as an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, the beat point estimation unit updating the estimation process after the update. A plurality of second beat points are estimated by performing this on the acoustic signal.
第1実施形態における音響解析システムの構成を例示するブロック図である。FIG. 1 is a block diagram illustrating the configuration of an acoustic analysis system in a first embodiment. 音響解析システムの機能的な構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of an acoustic analysis system. 推定処理のフローチャートである。It is a flowchart of estimation processing. 推定モデルを確立する機械学習の説明図である。FIG. 2 is an explanatory diagram of machine learning for establishing an estimation model. 確認画像の模式図である。It is a schematic diagram of a confirmation image. 拍点の移動および更新処理の説明図である。FIG. 3 is an explanatory diagram of beat point movement and update processing. 更新処理のフローチャートである。It is a flowchart of update processing. 音響解析処理のフローチャートである。It is a flowchart of acoustic analysis processing. 第2実施形態における音響解析システムの機能的な構成を例示するブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of an acoustic analysis system in a second embodiment.
A:第1実施形態
 図1は、第1実施形態に係る音響解析システム100の構成を例示するブロック図である。音響解析システム100は、楽曲の演奏音を表す音響信号Aの解析により当該楽曲の複数の拍点Bを推定するコンピュータシステムである。
A: First Embodiment FIG. 1 is a block diagram illustrating the configuration of an acoustic analysis system 100 according to a first embodiment. The acoustic analysis system 100 is a computer system that estimates a plurality of beat points B of a song by analyzing an acoustic signal A representing the performance sound of the song.
 音響解析システム100は、制御装置11と記憶装置12と表示装置13と操作装置14と放音装置15とを具備する。音響解析システム100は、例えばスマートフォン、タブレット端末またはパーソナルコンピュータ等の情報装置で実現される。なお、音響解析システム100は、単体の装置として実現されるほか、相互に別体で構成された複数の装置でも実現される。 The acoustic analysis system 100 includes a control device 11, a storage device 12, a display device 13, an operating device 14, and a sound emitting device 15. The acoustic analysis system 100 is realized by, for example, an information device such as a smartphone, a tablet terminal, or a personal computer. Note that the acoustic analysis system 100 is realized not only as a single device but also as a plurality of devices configured separately from each other.
 制御装置11は、音響解析システム100の各要素を制御する単数または複数のプロセッサである。具体的には、例えばCPU(Central Processing Unit)、GPU(Graphics Processing Unit)、SPU(Sound Processing Unit)、DSP(Digital Signal Processor)、FPGA(Field Programmable Gate Array)、またはASIC(Application Specific Integrated Circuit)等の1種類以上のプロセッサにより、制御装置11が構成される。 The control device 11 is one or more processors that control each element of the acoustic analysis system 100. Specifically, for example, CPU (Central Processing Unit), GPU (Graphics Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). The control device 11 is composed of one or more types of processors such as the following.
 記憶装置12は、制御装置11が実行するプログラムと、制御装置11が使用する各種のデータとを記憶する単数または複数のメモリである。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置12として利用される。なお、例えば、音響解析システム100に対して着脱される可搬型の記録媒体、または、制御装置11が通信網を介してアクセス可能な記録媒体(例えばクラウドストレージ)が、記憶装置12として利用されてもよい。 The storage device 12 is one or more memories that store programs executed by the control device 11 and various data used by the control device 11. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media is used as the storage device 12. Note that, for example, a portable recording medium that can be attached to and detached from the acoustic analysis system 100 or a recording medium that can be accessed by the control device 11 via a communication network (for example, cloud storage) is used as the storage device 12. Good too.
 記憶装置12は、音響信号Aを記憶する。音響信号Aは、楽曲の演奏音の波形を表すサンプル系列である。具体的には、音響信号Aは、楽曲の楽器音および歌唱音の少なくとも一方を表す。音響信号Aのデータ形式は任意である。なお、音響解析システム100とは別体の信号供給装置から音響信号Aが音響解析システム100に供給されてもよい。信号供給装置は、例えば、記録媒体に記録された音響信号Aを音響解析システム100に供給する再生装置、または、配信装置(図示略)から通信網を介して受信した音響信号Aを音響解析システム100に供給する通信機器である。 The storage device 12 stores the acoustic signal A. The acoustic signal A is a sample series representing the waveform of the performance sound of the music piece. Specifically, the acoustic signal A represents at least one of an instrumental sound and a singing sound of a song. The data format of the acoustic signal A is arbitrary. Note that the acoustic signal A may be supplied to the acoustic analysis system 100 from a signal supply device separate from the acoustic analysis system 100. The signal supply device is, for example, a playback device that supplies the acoustic signal A recorded on a recording medium to the acoustic analysis system 100, or an acoustic signal A received from a distribution device (not shown) via a communication network to the acoustic analysis system. 100.
 表示装置13は、制御装置11による制御のもとで画像を表示する。例えば液晶表示パネルまたは有機EL(Electroluminescence)パネル等の各種の表示パネルが表示装置13として利用される。なお、音響解析システム100とは別体の表示装置13を音響解析システム100に有線または無線により接続してもよい。操作装置14は、利用者からの指示を受付ける入力機器である。操作装置14は、例えば、利用者が操作する操作子、または、利用者による接触を検知するタッチパネルである。 The display device 13 displays images under the control of the control device 11. For example, various display panels such as a liquid crystal display panel or an organic EL (Electroluminescence) panel are used as the display device 13. Note that a display device 13 that is separate from the acoustic analysis system 100 may be connected to the acoustic analysis system 100 by wire or wirelessly. The operating device 14 is an input device that accepts instructions from a user. The operating device 14 is, for example, an operator operated by a user or a touch panel that detects a touch by the user.
 放音装置15は、制御装置11による制御のもとで音響を再生する。例えばスピーカまたはヘッドホンが放音装置15として利用される。なお、音響解析システム100とは別体の放音装置15を音響解析システム100に有線または無線により接続してもよい。 The sound emitting device 15 reproduces sound under the control of the control device 11. For example, a speaker or headphones are used as the sound emitting device 15. Note that a sound emitting device 15 that is separate from the acoustic analysis system 100 may be connected to the acoustic analysis system 100 by wire or wirelessly.
 図2は、音響解析システム100の機能的な構成を例示するブロック図である。制御装置11は、記憶装置12に記憶されたプログラムを実行することで、音響信号Aを処理するための複数の機能(拍点推定部21,表示制御部22,再生制御部23,拍点編集部24および更新処理部25)を実現する。 FIG. 2 is a block diagram illustrating the functional configuration of the acoustic analysis system 100. The control device 11 has a plurality of functions (beat point estimation section 21, display control section 22, playback control section 23, beat point editing) for processing the acoustic signal A by executing a program stored in the storage device 12. 24 and update processing section 25).
 拍点推定部21は、音響信号Aの解析により楽曲内の複数の拍点Bを推定する。具体的には、拍点推定部21は、楽曲内の複数の拍点Bの各々の時刻を指定する時系列データを生成する。第1実施形態の拍点推定部21は、特徴抽出部30と第1処理部31と第2処理部32とを具備する。 The beat point estimation unit 21 estimates a plurality of beat points B in the song by analyzing the acoustic signal A. Specifically, the beat point estimating unit 21 generates time series data that specifies the time of each of the plurality of beat points B in the song. The beat point estimation unit 21 of the first embodiment includes a feature extraction unit 30, a first processing unit 31, and a second processing unit 32.
 特徴抽出部30は、時間軸上の複数の時点(以下「解析時点」という)tの各々について音響信号Aの特徴量F(t)を算定する。各解析時点tは、所定の間隔で時間軸上に設定された時点である。解析時点tの間隔は、楽曲に想定される拍点Bの間隔よりも充分に小さい。 The feature extraction unit 30 calculates the feature amount F(t) of the acoustic signal A for each of a plurality of time points (hereinafter referred to as "analysis time points") t on the time axis. Each analysis time point t is a time point set on the time axis at predetermined intervals. The interval between the analysis time points t is sufficiently smaller than the interval between the beat points B assumed in the song.
 特徴量F(t)は、解析時点tにおける音響信号Aの音響的な特徴を表す情報である。例えば、各解析時点tの特徴量F(t)は、当該解析時点tを含む所定長の期間内における音響情報の時系列である。音響情報は、例えば、例えば音量および振幅等、音響信号Aの強度に関する情報である。また、音響信号Aの周波数特性(音色)に関する情報も、音響情報として利用される。周波数特性に関する情報としては、例えば、MFCC(Mel-Frequency Cepstrum Coefficients),MSLS(Mel-Scale Log Spectrum)、または定Q変換(CQT:Constant-Q Transform)等が例示される。なお、1個の解析時点tに対応する複数の音響情報が特徴量F(t)として利用されてもよい。また、音響情報の種類は以上の例示に限定されない。音響情報は、音響信号Aに関する複数種の音響情報の組合せでもよい。 The feature quantity F(t) is information representing the acoustic characteristics of the acoustic signal A at the analysis time t. For example, the feature amount F(t) at each analysis time point t is a time series of acoustic information within a predetermined period of time including the analysis time point t. The acoustic information is, for example, information regarding the intensity of the acoustic signal A, such as volume and amplitude. Furthermore, information regarding the frequency characteristics (timbre) of the acoustic signal A is also used as acoustic information. Examples of information regarding frequency characteristics include MFCC (Mel-Frequency Cepstrum Coefficients), MSLS (Mel-Scale Log Spectrum), and Constant-Q Transform (CQT). Note that a plurality of pieces of acoustic information corresponding to one analysis time point t may be used as the feature amount F(t). Furthermore, the types of acoustic information are not limited to the above examples. The acoustic information may be a combination of multiple types of acoustic information regarding the acoustic signal A.
 第1処理部31および第2処理部32は、音響信号Aの各特徴量F(t)から複数の拍点Bを推定する。図3は、複数の拍点Bを推定する処理(以下「推定処理」という)S2のフローチャートである。推定処理S2は、第1処理S21と第2処理S22とを含む。第1処理部31は第1処理S21を実行し、第2処理部32は第2処理S22を実行する。 The first processing unit 31 and the second processing unit 32 estimate a plurality of beat points B from each feature amount F(t) of the acoustic signal A. FIG. 3 is a flowchart of the process S2 of estimating a plurality of beat points B (hereinafter referred to as "estimation process"). The estimation process S2 includes a first process S21 and a second process S22. The first processing section 31 executes the first processing S21, and the second processing section 32 executes the second processing S22.
 第1処理S21は、各解析時点tが楽曲の拍点Bに該当する確率P(t)を生成する処理である。各解析時点tの確率P(t)が大きいほど、解析時点tが拍点Bに該当する確度が高い。第1処理部31は、第1処理S21を解析時点t毎に反復することで確率P(t)の時系列を生成する。第1処理S21には推定モデルMが利用される。 The first process S21 is a process that generates a probability P(t) that each analysis time point t corresponds to the beat point B of the song. The greater the probability P(t) of each analysis time point t, the higher the probability that the analysis time point t corresponds to the beat point B. The first processing unit 31 generates a time series with probability P(t) by repeating the first process S21 at every analysis time point t. Estimated model M is used in the first process S21.
 音響信号Aの各解析時点tの特徴量F(t)と、当該解析時点tが拍点Bに該当する確率P(t)との間には相関がある。推定モデルMは、以上の相関を学習した統計モデルである。すなわち、推定モデルMは、特徴量F(t)と確率P(t)との関係を機械学習により学習した学習済モデルである。推定モデルMは、特徴量F(t)と確率P(t)との関係を訓練(機械学習)により獲得した訓練済モデルとも表現される。第1処理部31は、各解析時点tにおける音響信号Aの特徴量F(t)を推定モデルMにより処理することで確率P(t)を生成する。具体的には、第1処理部31は、特徴量F(t)を含む入力データを推定モデルMに入力することで確率P(t)を生成する。 There is a correlation between the feature amount F(t) of each analysis time point t of the acoustic signal A and the probability P(t) that the analysis time point t corresponds to the beat point B. The estimated model M is a statistical model that has learned the above correlation. That is, the estimated model M is a learned model that has learned the relationship between the feature amount F(t) and the probability P(t) by machine learning. The estimated model M is also expressed as a trained model that has acquired the relationship between the feature quantity F(t) and the probability P(t) through training (machine learning). The first processing unit 31 generates a probability P(t) by processing the feature amount F(t) of the acoustic signal A at each analysis time point t using the estimation model M. Specifically, the first processing unit 31 generates the probability P(t) by inputting input data including the feature amount F(t) to the estimation model M.
 推定モデルMは、例えば深層ニューラルネットワーク(DNN:Deep Neural Network)で構成される。例えば、再帰型ニューラルネットワーク(RNN:Recurrent Neural Network)、または畳込ニューラルネットワーク(CNN:Convolutional Neural Network)等の任意の形式の深層ニューラルネットワークが、推定モデルMとして利用される。複数種の深層ニューラルネットワークの組合せにより推定モデルMが構成されてもよい。また、長短期記憶(LSTM:Long Short-Term Memory)またはAttention等の付加的な要素が推定モデルMに搭載されてもよい。 The estimation model M is composed of, for example, a deep neural network (DNN). For example, any type of deep neural network such as a recurrent neural network (RNN) or a convolutional neural network (CNN) is used as the estimation model M. The estimation model M may be configured by a combination of multiple types of deep neural networks. Further, additional elements such as long short-term memory (LSTM) or attention may be included in the estimation model M.
 推定モデルMは、特徴量F(t)から確率P(t)を生成する演算を制御装置11に実行させるプログラムと、当該演算に適用される複数の変数(具体的には加重値およびバイアス)との組合せで実現される。推定モデルMを実現するプログラムおよび複数の変数は、記憶装置12に記憶される。推定モデルMを規定する複数の変数の各々の数値は、機械学習により事前に設定される。 The estimation model M includes a program that causes the control device 11 to execute a calculation to generate a probability P(t) from a feature amount F(t), and a plurality of variables (specifically, weight values and biases) applied to the calculation. This is realized in combination with A program and a plurality of variables that realize the estimation model M are stored in the storage device 12. The numerical values of each of the plurality of variables that define the estimation model M are set in advance by machine learning.
 図3の第2処理S22は、第1処理S21により生成された確率P(t)の時系列から楽曲内の複数の拍点Bを推定する処理である。第2処理S22には、各種の状態遷移モデルが利用される。状態遷移モデルは、例えば隠れセミマルコフモデル(HSMM:Hidden Semi-Markov Model)で構成され、動的計画法の一例であるビタビ(Viterbi)アルゴリズムにより複数の拍点Bが推定される。例えば、確率P(t)が極大となる時点が拍点Bとして推定される。 The second process S22 in FIG. 3 is a process for estimating a plurality of beat points B in the song from the time series of probabilities P(t) generated in the first process S21. Various state transition models are used in the second process S22. The state transition model is composed of, for example, a Hidden Semi-Markov Model (HSMM), and a plurality of beat points B are estimated by the Viterbi algorithm, which is an example of dynamic programming. For example, the time point when probability P(t) becomes maximum is estimated as beat point B.
 図4は、推定モデルMを確立する機械学習の説明図である。例えば音響解析システム100とは別体の機械学習システム200による機械学習で推定モデルMが確立される。機械学習システム200から音響解析システム100に推定モデルMが提供される。なお、機械学習システム200の機能が音響解析システム100に搭載されてもよい。 FIG. 4 is an explanatory diagram of machine learning to establish the estimation model M. For example, the estimated model M is established by machine learning using a machine learning system 200 that is separate from the acoustic analysis system 100. An estimated model M is provided from the machine learning system 200 to the acoustic analysis system 100. Note that the functions of the machine learning system 200 may be installed in the acoustic analysis system 100.
 推定モデルMの機械学習には複数の教師データZが利用される。複数の教師データZの各々は、機械学習用の特徴量Fmと機械学習用の確率Pmとの組合せで構成される。特徴量Fmは、機械学習用に用意された音響信号Amのうち特定の時点における特徴量F(t)である。音響信号Amは、音響空間内に放射された音響を収録した信号、または公知の音響合成処理により合成された信号である。特定の時点に対応する機械学習用の確率Pmは、当該時点が楽曲の拍点Bに該当する確率(すなわち正解値)である。拍点Bが既知である多数の楽曲について複数の教師データZが用意される。なお、音響信号Amは「学習用音響信号」の一例である。 A plurality of training data Z are used for machine learning of the estimated model M. Each of the plurality of teacher data Z is composed of a combination of a feature amount Fm for machine learning and a probability Pm for machine learning. The feature amount Fm is a feature amount F(t) at a specific time point of the acoustic signal Am prepared for machine learning. The acoustic signal Am is a signal recording the sound radiated into the acoustic space, or a signal synthesized by known sound synthesis processing. The probability Pm for machine learning corresponding to a specific point in time is the probability (that is, the correct value) that the point in time corresponds to beat point B of the song. A plurality of pieces of teacher data Z are prepared for a large number of songs whose beat points B are known. Note that the acoustic signal Am is an example of a "learning acoustic signal."
 機械学習システム200は、各教師データZの特徴量Fmを入力したきに初期的または暫定的なモデル(以下「暫定モデル」という)M0が出力する確率P(t)と、当該教師データZの確率Pmとの誤差を表す誤差関数を算定する。そして、機械学習システム200は、誤差関数が低減されるように暫定モデルM0の複数の変数を更新する。複数の教師データZの各々について以上の処理が反復された時点の暫定モデルM0が、推定モデルMとして確定される。 The machine learning system 200 calculates the probability P(t) that an initial or provisional model (hereinafter referred to as "provisional model") M0 outputs when the feature quantity Fm of each teaching data Z is input, and An error function representing the error with probability Pm is calculated. The machine learning system 200 then updates the multiple variables of the interim model M0 so that the error function is reduced. The provisional model M0 at the time when the above process is repeated for each of the plurality of teacher data Z is determined as the estimated model M.
 したがって、推定モデルMは、複数の教師データZにおける特徴量Fmと確率Pmとの間に潜在する関係のもとで、未知の特徴量F(t)に対して統計的に妥当な確率P(t)を出力する。すなわち、推定モデルMは、機械学習用の音響信号Amの特徴量Fmと、当該特徴量が観測される時点が拍点Bに該当する確率Pmとの関係を学習した学習済モデルである。第1処理部31は、以上の手順で確立された推定モデルMにより各解析時点tの特徴量F(t)を処理することで、当該解析時点tが楽曲の拍点Bに該当する確率P(t)を生成する。 Therefore, the estimation model M calculates a statistically valid probability P( t) is output. That is, the estimated model M is a learned model that has learned the relationship between the feature amount Fm of the acoustic signal Am for machine learning and the probability Pm that the time point at which the feature amount is observed corresponds to the beat point B. The first processing unit 31 processes the feature amount F(t) of each analysis time point t using the estimation model M established in the above procedure, and calculates the probability that the analysis time point t corresponds to the beat point B of the song. Generate (t).
 以上の説明の通り、第1実施形態においては、機械学習用の音響信号Amの特徴量Fmと、当該特徴量Fmが観測される解析時点tが拍点Bに該当する確率Pmとの関係を学習した推定モデルMを利用して、音響信号Aから複数の拍点Bが推定される。したがって、特徴量F(t)が多様に変化する未知の音響信号Aについて複数の拍点Bを高精度に推定できる。 As explained above, in the first embodiment, the relationship between the feature amount Fm of the acoustic signal Am for machine learning and the probability Pm that the analysis time t at which the feature amount Fm is observed corresponds to the beat point B is calculated. A plurality of beat points B are estimated from the acoustic signal A using the learned estimation model M. Therefore, a plurality of beat points B can be estimated with high accuracy for an unknown acoustic signal A in which the feature amount F(t) varies in various ways.
 図2の表示制御部22は、表示装置13に画像を表示する。具体的には、表示制御部22は、図5の確認画像Gを表示装置13に表示する。確認画像Gは、波形領域Gaと拍点領域Gbとを含む。波形領域Gaと拍点領域Gbとには共通の時間軸が設定される。 The display control unit 22 in FIG. 2 displays an image on the display device 13. Specifically, the display control unit 22 displays the confirmation image G in FIG. 5 on the display device 13. The confirmation image G includes a waveform area Ga and a beat area Gb. A common time axis is set for the waveform area Ga and the beat area Gb.
 波形領域Gaには、音響信号Aのうち特定の範囲(以下「表示範囲」という)内の波形が表示される。表示制御部22は、音響信号Aの表示範囲を、操作装置14に対する利用者からの指示に応じて変更する。拍点領域Gbには、拍点推定部21が音響信号Aから推定した複数の拍点Bが表示される。具体的には、音響信号Aのうち表示範囲内の複数の拍点Bが拍点領域Gbに表示される。拍点領域Gbは「拍点画像」の一例である。 In the waveform area Ga, a waveform within a specific range (hereinafter referred to as "display range") of the acoustic signal A is displayed. The display control unit 22 changes the display range of the acoustic signal A according to a user's instruction to the operating device 14. A plurality of beat points B estimated from the acoustic signal A by the beat point estimation unit 21 are displayed in the beat point area Gb. Specifically, a plurality of beat points B within the display range of the acoustic signal A are displayed in the beat point area Gb. The beat area Gb is an example of a "beat point image."
 利用者は、操作装置14を操作することで音響信号Aの再生を指示できる。図2の再生制御部23は、音響信号Aを放音装置15に供給することで、音響信号Aが表す音響を再生する。表示制御部22は、図5に例示される通り、音響信号Aの再生に並行して確認画像Gに再生位置Gcを表示する。再生位置Gcは、音響信号Aのうち放音装置15により再生されている時点である。したがって、再生位置Gcは、音響信号Aの再生に並行して時間軸の方向に進行する。利用者は、放音装置15による再生音を聴取しながら拍点領域Gbを視認することで、直前の推定処理S2により推定された拍点Bの位置を確認できる。現時点の拍点Bの位置が利用者の意図に整合しない場合、利用者は、操作装置14を操作することで、推定済の拍点Bの位置の修正を指示することが可能である。 The user can instruct the reproduction of the audio signal A by operating the operating device 14. The reproduction control unit 23 in FIG. 2 reproduces the sound represented by the acoustic signal A by supplying the acoustic signal A to the sound emitting device 15. The display control unit 22 displays the reproduction position Gc on the confirmation image G in parallel with the reproduction of the acoustic signal A, as illustrated in FIG. The reproduction position Gc is the point in time when the acoustic signal A is being reproduced by the sound emitting device 15. Therefore, the reproduction position Gc advances in parallel with the reproduction of the acoustic signal A in the direction of the time axis. The user can confirm the position of the beat point B estimated by the immediately preceding estimation process S2 by visually recognizing the beat point area Gb while listening to the sound reproduced by the sound emitting device 15. If the current position of beat point B does not match the user's intention, the user can instruct correction of the estimated beat point B position by operating the operating device 14.
 図2の拍点編集部24は、利用者からの指示に応じて各拍点Bを時間軸上で移動する。拍点Bの移動は、時間軸上における拍点Bの位置を変更する処理である。図6は、拍点Bの移動に関する説明図である。図6の状態1は、前述の推定処理S2により複数の拍点Bが推定された状態である。図6には、第1処理S21で算定された確率P(t)の時系列が併記されている。 The beat point editing unit 24 in FIG. 2 moves each beat point B on the time axis according to instructions from the user. Moving the beat point B is a process of changing the position of the beat point B on the time axis. FIG. 6 is an explanatory diagram regarding the movement of the beat point B. State 1 in FIG. 6 is a state in which a plurality of beat points B have been estimated by the estimation process S2 described above. FIG. 6 also shows a time series of the probability P(t) calculated in the first process S21.
 利用者は、拍点領域Gbを確認しながら操作装置14を操作することで、拍点領域Gbに表示された複数の拍点Bの何れか(以下「対象拍点Bn」という)を選択できる。また、利用者は、拍点領域Gbを確認しながら操作装置14を操作することで、時間軸上における対象拍点Bnの移動を指示できる。具体的には、利用者は、対象拍点Bnの移動方向(前方/後方)および移動量δを指示できる。例えば、利用者は、自身が適正と考える時点に対象拍点Bnを移動することを指示できる。図6に状態2として例示される通り、拍点編集部24は、対象拍点Bnを、利用者から指示された方向(前方/後方)に、利用者から指示された移動量δだけ時間軸上で移動する。なお、図6においては対象拍点Bnが前方に移動する場合が例示されているが、対象拍点Bnは後方に移動してもよい。 The user can select any one of the plurality of beat points B (hereinafter referred to as "target beat point Bn") displayed in the beat point area Gb by operating the operating device 14 while checking the beat point area Gb. . Further, the user can instruct movement of the target beat point Bn on the time axis by operating the operating device 14 while checking the beat point area Gb. Specifically, the user can instruct the movement direction (forward/backward) and movement amount δ of the target beat point Bn. For example, the user can instruct the target beat point Bn to be moved to a point that he or she deems appropriate. As illustrated as state 2 in FIG. 6, the beat point editing unit 24 moves the target beat point Bn in the direction (forward/backward) specified by the user by the amount of movement δ specified by the user on the time axis. move on. Although FIG. 6 illustrates a case in which the target beat point Bn moves forward, the target beat point Bn may also move backward.
 図6に状態3として例示される通り、拍点編集部24は、複数の拍点Bのうち対象拍点Bnの直前に位置する拍点B(以下「隣接拍点Bn-1」という)と、対象拍点Bnの直後に位置する拍点B(以下「隣接拍点Bn+1」)とを、対象拍点Bnに連動して時間軸上で移動する。具体的には、拍点編集部24は、対象拍点Bnについて利用者が指示した移動方向(前方/後方)に、対象拍点Bnについて利用者が指示した移動量δだけ、隣接拍点Bn-1および隣接拍点Bn+1を時間軸上で移動する。すなわち、対象拍点Bnと前後の隣接拍点Bn±1との3個の拍点Bが、利用者からの指示に応じて時間軸上で同様に移動する。したがって、対象拍点Bnと各隣接拍点Bn±1との時間的な関係は、移動の前後で維持される。 As illustrated as state 3 in FIG. 6, the beat point editing unit 24 selects a beat point B located immediately before the target beat point Bn (hereinafter referred to as "adjacent beat point Bn-1") among the plurality of beat points B. , a beat point B located immediately after the target beat point Bn (hereinafter referred to as "adjacent beat point Bn+1") is moved on the time axis in conjunction with the target beat point Bn. Specifically, the beat point editing unit 24 edits the adjacent beat points Bn by the amount of movement δ specified by the user for the target beat point Bn in the movement direction (forward/backward) specified by the user for the target beat point Bn. -1 and the adjacent beat point Bn+1 are moved on the time axis. That is, the three beat points B, the target beat point Bn and the adjacent beat points Bn±1 before and after, move in the same way on the time axis according to instructions from the user. Therefore, the temporal relationship between the target beat point Bn and each adjacent beat point Bn±1 is maintained before and after the movement.
 表示装置13に表示される拍点領域Gbは、対象拍点Bnと隣接拍点Bn±1とを含む。表示制御部22は、拍点編集部24による各拍点Bの移動を、表示装置13に表示された拍点領域Gbに反映させる。具体的には、表示制御部22は、拍点領域Gbにおける対象拍点Bnと各隣接拍点Bn±1とを、利用者からの指示に応じて時間軸上で移動する。 The beat point area Gb displayed on the display device 13 includes the target beat point Bn and the adjacent beat points Bn±1. The display control section 22 causes the movement of each beat point B by the beat point editing section 24 to be reflected in the beat point area Gb displayed on the display device 13. Specifically, the display control unit 22 moves the target beat point Bn and each adjacent beat point Bn±1 in the beat point area Gb on the time axis according to an instruction from the user.
 図2の更新処理部25は、対象拍点Bnおよび各隣接拍点Bn±1の移動に応じて推定モデルMを更新する。具体的には、更新処理部25は、対象拍点Bnおよび各隣接拍点Bn±1の移動に応じた機械学習により、推定モデルMを更新する。 The update processing unit 25 in FIG. 2 updates the estimated model M according to the movement of the target beat point Bn and each adjacent beat point Bn±1. Specifically, the update processing unit 25 updates the estimated model M by machine learning according to the movement of the target beat point Bn and each adjacent beat point Bn±1.
 図7は、制御装置11(更新処理部25)が推定モデルMを更新する処理(以下「更新処理」という)S8のフローチャートである。対象拍点Bnおよび各隣接拍点Bn±1の移動を契機として更新処理S8が開始される。 FIG. 7 is a flowchart of the process (hereinafter referred to as "update process") S8 in which the control device 11 (update processing unit 25) updates the estimated model M. The updating process S8 is started with the movement of the target beat point Bn and each adjacent beat point Bn±1.
 更新処理S8が開始されると、更新処理部25は、移動後の対象拍点Bnおよび各隣接拍点Bn±1に対応する数値列Cを時間軸上に設定する(S81)。図6に状態4として例示される通り、数値列Cは、時間軸上の解析時点t毎に設定された数値Q(t)の時系列である。 When the update process S8 is started, the update processing unit 25 sets a numerical value string C corresponding to the moved target beat point Bn and each adjacent beat point Bn±1 on the time axis (S81). As illustrated as state 4 in FIG. 6, the numerical value string C is a time series of numerical values Q(t) set at each analysis time point t on the time axis.
 数値列Cは、移動後の対象拍点Bnおよび各隣接拍点Bn±1に対応する数値分布Dを含む。数値分布Dは、時間軸上の特定の範囲における数値Q(t)の分布である。数値分布Dは、時間軸上において時刻tを変数として定義される確率分布関数により表現される。第1実施形態における数値分布Dは、所定の分布幅にわたる線対称な三角分布である。拍点B毎に数値分布Dが個別に設定される。各拍点Bに対応する数値分布Dは、当該拍点Bにおいて最大値となるように時間軸上の位置が決定される。例えば、対象拍点Bnに対応する数値分布Dは当該対象拍点Bnにおいて最大値をとり、各隣接拍点Bn±1に対応する数値分布Dは当該隣接拍点Bn±1において最大値をとる。数値列Cのうち各数値分布D以外の各解析時点tにおける数値Q(t)はゼロに設定される。 The numerical value sequence C includes a numerical value distribution D corresponding to the target beat point Bn after movement and each adjacent beat point Bn±1. Numerical distribution D is a distribution of numerical values Q(t) in a specific range on the time axis. The numerical distribution D is expressed by a probability distribution function defined with time t as a variable on the time axis. The numerical distribution D in the first embodiment is a line-symmetric triangular distribution over a predetermined distribution width. Numerical distribution D is individually set for each beat point B. The position on the time axis of the numerical distribution D corresponding to each beat point B is determined so that the value is the maximum value at the beat point B. For example, the numerical distribution D corresponding to the target beat point Bn takes the maximum value at the target beat point Bn, and the numerical distribution D corresponding to each adjacent beat point Bn±1 takes the maximum value at the relevant adjacent beat point Bn±1. . Numerical values Q(t) at each analysis time point t other than each numerical distribution D in the numerical value string C are set to zero.
 図6に状態5として例示される通り、更新処理部25は、時間軸上の適用区間T内の各解析時点tについて誤差e(t)を算定する(S82)。適用区間Tは、隣接拍点Bn-1と隣接拍点Bn+1とを含む一連の区間である。具体的には、隣接拍点Bn-1と隣接拍点Bn+1とを端点とする時間軸上の期間が適用区間Tとして設定される。各解析時点tの誤差e(t)は、当該解析時点tにおける確率P(t)と数値列Cのうち当該解析時点tにおける数値Q(t)との相違に応じた数値である。例えば、確率P(t)と数値Q(t)との差分の自乗(={P(t)-Q(t)})が誤差e(t)として算定される。 As illustrated as state 5 in FIG. 6, the update processing unit 25 calculates the error e(t) for each analysis time point t within the applicable interval T on the time axis (S82). The applicable section T is a series of sections including the adjacent beat point Bn-1 and the adjacent beat point Bn+1. Specifically, a period on the time axis whose end points are the adjacent beat point Bn-1 and the adjacent beat point Bn+1 is set as the applicable section T. The error e(t) at each analysis time t is a numerical value corresponding to the difference between the probability P(t) at the analysis time t and the numerical value Q(t) of the numerical sequence C at the analysis time t. For example, the square of the difference between the probability P(t) and the numerical value Q(t) (={P(t)−Q(t)} 2 ) is calculated as the error e(t).
 更新処理部25は、適用区間T内の相異なる解析時点tについて算定された複数の誤差e(t)から誤差関数Eを算定する(S83)。誤差関数Eは、適用区間T内における確率P(t)と数値Q(t)との相違を表す目的関数である。例えば、適用区間T内の複数の誤差e(t)の合計が誤差関数Eとして算定される。 The update processing unit 25 calculates an error function E from a plurality of errors e(t) calculated for different analysis time points t within the applicable interval T (S83). The error function E is an objective function representing the difference between the probability P(t) and the numerical value Q(t) within the application interval T. For example, the sum of multiple errors e(t) within the applicable interval T is calculated as the error function E.
 更新処理部25は、誤差関数Eが最小化されるように推定モデルMを更新する(S84)。推定モデルMの更新には、公知の技術が任意に採用される。例えば、Self-Attentionを利用した適応処理が、推定モデルMの更新に採用される。推定モデルMに対する適応処理については、例えばKazuhiko Yamamoto, "HUMAN-IN-THE-LOOP ADAPTATION FOR INTERACTIVE MUSICAL BEAT TRACKING,", Proceedings of the 22nd ISMIR Conference, Online, November 7-12, 2021に記載されている。 The update processing unit 25 updates the estimated model M so that the error function E is minimized (S84). For updating the estimation model M, a known technique is arbitrarily adopted. For example, adaptive processing using Self-Attention is employed to update the estimated model M. The adaptive processing for the estimated model M is described in, for example, Kazuhiko Yamamoto, “HUMAN-IN-THE-LOOP ADAPTATION FOR INTERACTIVE MUSICAL BEAT TRACKING,” Proceedings of the 22nd ISMIR Conference, Online, November 7-12, 2021. .
 以上の説明から理解される通り、更新処理部25は、移動後の対象拍点Bnおよび各隣接拍点Bn±1に対応する数値分布D(数値Q(t))と、事前の推定処理S2(第1処理S21)により推定された確率P(t)の時系列との誤差e(t)が低減されるように、推定モデルMを更新する。したがって、対象拍点Bnおよび各隣接拍点Bn±1の移動を推定モデルMに適切に反映できる。 As understood from the above explanation, the update processing unit 25 calculates the numerical distribution D (numerical value Q(t)) corresponding to the target beat point Bn after movement and each adjacent beat point Bn±1, and the pre-estimation process S2 The estimated model M is updated so that the error e(t) between the probability P(t) estimated in (first process S21) and the time series is reduced. Therefore, the movement of the target beat point Bn and each adjacent beat point Bn±1 can be appropriately reflected in the estimation model M.
 図8は、制御装置11が実行する処理(以下「音響解析処理」という)のフローチャートである。例えば操作装置14に対する利用者からの指示を契機として音響解析処理が開始される。 FIG. 8 is a flowchart of the process (hereinafter referred to as "acoustic analysis process") executed by the control device 11. For example, the acoustic analysis process is started in response to a user's instruction to the operating device 14.
 音響解析処理が開始されると、制御装置11(特徴抽出部30)は、時間軸上の各解析時点tについて音響信号Aの特徴量F(t)を算定する(S1)。制御装置11(拍点推定部21)は、図3に例示した推定処理S2により、音響信号Aの各特徴量F(t)から複数の拍点Bを推定する。推定処理S2の第1処理S21には、特徴量F(t)と確率P(t)との関係を機械学習により学習した推定モデルMが利用される。制御装置11(表示制御部22)は、確認画像Gを表示装置13に表示する(S3)。確認画像Gの拍点領域Gbには、推定処理S2により推定された複数の拍点Bが表示される。 When the acoustic analysis process is started, the control device 11 (feature extraction unit 30) calculates the feature amount F(t) of the acoustic signal A for each analysis time point t on the time axis (S1). The control device 11 (beat point estimating unit 21) estimates a plurality of beat points B from each feature amount F(t) of the acoustic signal A by estimation processing S2 illustrated in FIG. In the first process S21 of the estimation process S2, an estimation model M that has learned the relationship between the feature amount F(t) and the probability P(t) by machine learning is used. The control device 11 (display control section 22) displays the confirmation image G on the display device 13 (S3). In the beat point area Gb of the confirmation image G, a plurality of beat points B estimated by the estimation process S2 are displayed.
 制御装置11は、終了条件が成立したか否かを判定する(S4)。終了条件は、例えば、操作装置14に対する操作で利用者が音響解析処理の終了を指示したことである。終了条件が成立した場合(S4:YES)、制御装置11は音響解析処理を終了する。終了条件が成立しない場合(S4:NO)、制御装置11(拍点編集部24)は、対象拍点Bnを移動する指示を利用者から受付けたか否かを判定する(S5)。対象拍点Bnの移動が指示されない場合(S5:NO)、制御装置11は処理をステップS4に移行する。すなわち、制御装置11は、音響解析処理の終了の指示または対象拍点Bnの移動の指示を待機する。 The control device 11 determines whether the termination condition is satisfied (S4). The termination condition is, for example, that the user instructs to terminate the acoustic analysis process by operating the operating device 14. If the termination condition is satisfied (S4: YES), the control device 11 terminates the acoustic analysis process. If the end condition is not satisfied (S4: NO), the control device 11 (beat point editing unit 24) determines whether an instruction to move the target beat point Bn has been received from the user (S5). If movement of the target beat point Bn is not instructed (S5: NO), the control device 11 moves the process to step S4. That is, the control device 11 waits for an instruction to end the acoustic analysis process or an instruction to move the target beat point Bn.
 対象拍点Bnの移動の指示を受付けた場合(S5:YES)、制御装置11(拍点編集部24)は、対象拍点Bnと前後の隣接拍点Bn±1とを、利用者からの指示に応じて時間軸上で移動する(S6)。また、制御装置11(表示制御部22)は、拍点領域Gbにおける対象拍点Bnおよび各隣接拍点Bn±1を利用者からの指示に応じて移動する(S7)。 When receiving an instruction to move the target beat point Bn (S5: YES), the control device 11 (beat point editing unit 24) moves the target beat point Bn and adjacent beat points Bn±1 before and after it from the user. It moves on the time axis according to the instruction (S6). Further, the control device 11 (display control unit 22) moves the target beat point Bn and each adjacent beat point Bn±1 in the beat point area Gb according to an instruction from the user (S7).
 制御装置11(更新処理部25)は、図7に例示した更新処理S8により推定モデルMを更新する。推定モデルMが更新されると、制御装置11は、推定処理S2に処理を移行する。すなわち、制御装置11(拍点推定部21)は、更新後の推定モデルMを利用した推定処理S2を音響信号Aに対して実行することで、複数の拍点Bを推定する。第2回目以降の推定処理S2には、音響解析処理の開始の直後に算定された特徴量F(t)が適用される。 The control device 11 (update processing unit 25) updates the estimated model M by the update process S8 illustrated in FIG. When the estimated model M is updated, the control device 11 shifts the process to estimation processing S2. That is, the control device 11 (beat point estimating unit 21) estimates a plurality of beat points B by performing estimation processing S2 on the acoustic signal A using the updated estimation model M. The feature amount F(t) calculated immediately after the start of the acoustic analysis process is applied to the second and subsequent estimation processes S2.
 以上の説明から理解される通り、対象拍点Bnの移動毎に、推定モデルMの更新処理S8と、更新後の推定モデルMを利用した推定処理S2とが反復される。したがって、推定処理S2により推定される各拍点Bの位置は、推定処理S2の反復毎に、利用者からの指示が反映された位置に接近する。任意の1回の推定処理S2により推定される拍点Bが「第1拍点」の一例であり、推定モデルMの更新後における次回の推定処理S2により推定される拍点Bが「第2拍点」の一例である。 As understood from the above explanation, the updating process S8 of the estimated model M and the estimation process S2 using the updated estimated model M are repeated every time the target beat point Bn moves. Therefore, the position of each beat point B estimated by the estimation process S2 approaches the position where the user's instruction is reflected each time the estimation process S2 is repeated. The beat point B estimated by any one estimation process S2 is an example of the "first beat point", and the beat point B estimated by the next estimation process S2 after updating the estimation model M is the "second beat point". This is an example of "beat point".
 以上に説明した通り、第1実施形態においては、推定処理S2により推定された複数の拍点Bのうち、利用者が選択した対象拍点Bnと当該対象拍点Bnの周囲の隣接拍点Bn±1との移動に応じて推定モデルMが更新され、更新後の推定モデルMを適用した推定処理S2により複数の拍点Bが再推定される。すなわち、推定モデルMの更新においては、対象拍点Bnの移動だけでなく対象拍点Bnと各隣接拍点Bn±1との時間的な関係も、推定モデルMに反映される。したがって、対象拍点Bnの移動だけが推定モデルMに反映される形態(以下「対比例」という)と比較して、利用者の意図に適切に適合した拍点Bを推定できる。 As explained above, in the first embodiment, among the plurality of beat points B estimated by the estimation process S2, the target beat point Bn selected by the user and the adjacent beat points Bn around the target beat point Bn The estimation model M is updated according to the movement from ±1, and the plurality of beat points B are re-estimated by estimation processing S2 applying the updated estimation model M. That is, in updating the estimated model M, not only the movement of the target beat point Bn but also the temporal relationship between the target beat point Bn and each adjacent beat point Bn±1 is reflected in the estimated model M. Therefore, compared to a configuration in which only the movement of the target beat point Bn is reflected in the estimation model M (hereinafter referred to as a "comparative example"), a beat point B that appropriately matches the user's intention can be estimated.
 具体的には、対比例においては、対象拍点Bnと直前の隣接拍点Bn-1との間隔の縮小、および、対象拍点Bnと直後の隣接拍点Bn+1との間隔の拡大が、推定モデルMに反映される。したがって、対象拍点Bnの経過後に演奏速度が低下するという傾向(リタルダンド)が推定モデルMに付与される。しかし、対象拍点Bnが移動された場合、利用者は、演奏速度の変化を意図している場合よりも、楽曲全体にわたる拍点Bの修正を意図している可能性が高い。第1実施形態においては、対象拍点Bnと各隣接拍点Bn±1との時間的な関係が推定モデルMに反映されるから、対象拍点Bnの経過後に演奏速度が低下する対比例の問題は解消される。すなわち、前述の通り、対比例と比較して、利用者の意図に適切に適合した拍点Bを推定できる。ひいては、利用者の意図が適切に反映された拍点Bを推定できるという顧客体験を利用者に提供できる。 Specifically, in the comparison example, the interval between the target beat point Bn and the immediately preceding adjacent beat point Bn-1 is reduced, and the interval between the target beat point Bn and the immediately following adjacent beat point Bn+1 is increased. , reflected in the estimated model M. Therefore, the estimation model M is given a tendency (ritardando) that the performance speed decreases after the target beat point Bn has elapsed. However, when the target beat point Bn is moved, it is more likely that the user intends to modify the beat point B throughout the song than when the user intends to change the performance speed. In the first embodiment, since the temporal relationship between the target beat point Bn and each adjacent beat point Bn±1 is reflected in the estimation model M, the performance speed decreases after the target beat point Bn has passed. The problem will be resolved. That is, as described above, by comparing with the comparative example, it is possible to estimate the beat point B that appropriately matches the user's intention. Furthermore, it is possible to provide the user with the customer experience of being able to estimate the beat point B that appropriately reflects the user's intention.
 また、第1実施形態においては、拍点領域Gbが表示装置13に表示されるから、対象拍点Bnと各隣接拍点Bn±1とが利用者からの指示に応じて移動する様子を、利用者が視覚的に確認できる。したがって、更新後の推定モデルMにより推定される拍点Bを予測しながら、利用者は対象拍点Bnと各隣接拍点Bn±1との移動を指示できる。 In addition, in the first embodiment, since the beat point area Gb is displayed on the display device 13, the movement of the target beat point Bn and each adjacent beat point Bn±1 in accordance with instructions from the user can be viewed. Users can check visually. Therefore, while predicting the beat point B estimated by the updated estimation model M, the user can instruct movement between the target beat point Bn and each adjacent beat point Bn±1.
B:第2実施形態
 第2実施形態を説明する。なお、以下に例示する各形態において機能が第1実施形態と同様である要素については、第1実施形態の説明で使用したのと同様の符号を流用して各々の詳細な説明を適宜に省略する。
B: Second Embodiment The second embodiment will be described. In addition, in each of the embodiments exemplified below, for elements whose functions are similar to those in the first embodiment, the same reference numerals as used in the explanation of the first embodiment are used, and detailed explanations of each are omitted as appropriate. do.
 図9は、第2実施形態における音響解析システム100の機能的な構成を例示するブロック図である。第2実施形態の制御装置11は、記憶装置12に記憶されたプログラムを実行することで、第1実施形態と同様の要素(拍点推定部21,表示制御部22,再生制御部23,拍点編集部24および更新処理部25)に加えて区間設定部26として機能する。 FIG. 9 is a block diagram illustrating the functional configuration of the acoustic analysis system 100 in the second embodiment. By executing the program stored in the storage device 12, the control device 11 of the second embodiment has the same elements as the first embodiment (beat point estimating section 21, display control section 22, playback control section 23, beat point estimation section 21, display control section 22, playback control section 23, It functions as a section setting section 26 in addition to the point editing section 24 and update processing section 25).
 区間設定部26は、時間軸上における音響信号Aの一部の区間(以下「特定区間」という)を設定する。具体的には、区間設定部26は、利用者からの指示に応じて特定区間を設定する。例えば、利用者は、操作装置14を操作することで、波形領域Gaに表示された音響信号Aのうち特定の区間を指定できる。区間設定部26は、利用者から指定された区間を特定区間として設定する。 The section setting unit 26 sets a partial section (hereinafter referred to as a "specific section") of the acoustic signal A on the time axis. Specifically, the section setting unit 26 sets a specific section according to an instruction from the user. For example, by operating the operating device 14, the user can specify a specific section of the acoustic signal A displayed in the waveform area Ga. The section setting unit 26 sets the section specified by the user as a specific section.
 第2実施形態の制御装置11は、音響信号Aのうち特定区間について図8の音響解析処理を実行する。例えば、拍点推定部21による推定処理S2は特定区間について限定的に実行される。すなわち、楽曲内の特定区間内について複数の拍点Bが推定される。 The control device 11 of the second embodiment executes the acoustic analysis process of FIG. 8 for a specific section of the acoustic signal A. For example, the estimation process S2 by the beat point estimating unit 21 is executed limitedly for a specific section. That is, a plurality of beat points B are estimated within a specific section within the song.
 音響解析処理の具体的な手順は第1実施形態と同様である。したがって、第2実施形態においても第1実施形態と同様の効果が実現される。また、第2実施形態においては、音響信号Aの一部の区間(特定区間)について限定的に拍点Bを推定できる。 The specific steps of the acoustic analysis process are the same as in the first embodiment. Therefore, the second embodiment also achieves the same effects as the first embodiment. Further, in the second embodiment, the beat point B can be estimated in a limited manner for a part of the section (specific section) of the acoustic signal A.
 なお、以上の説明においては利用者からの指示に応じて特定区間が設定される形態を例示したが、特定区間の設定の方法は任意であり、以上の例示には限定されない。例えば、区間設定部26は、利用者からの指示を必要とせずに、所定の規則により特定区間を設定してもよい。例えば、区間設定部26は、音響信号Aが表す楽曲の複数の構造区間の何れかを特定区間として設定してもよい。構造区間は、音楽的な意味に応じて楽曲を時間軸上で区分した区間である。構造区間は、例えばイントロ(intro)、Aメロ(verse)、Bメロ(bridge)、サビ(chorus)およびアウトロ(outro)等の各区間である。区間設定部26は、音響信号Aを解析することで当該音響信号Aを複数の構造区間に区分し、複数の構造区間のうち特定の構造区間を特定区間として設定する。以上の構成によれば、特定の構造区間について限定的に拍点Bを推定できる。 Note that in the above description, a mode in which a specific section is set according to an instruction from a user has been exemplified, but the method for setting a specific section is arbitrary and is not limited to the above example. For example, the section setting unit 26 may set the specific section according to predetermined rules without requiring instructions from the user. For example, the section setting unit 26 may set any one of a plurality of structural sections of the song represented by the acoustic signal A as the specific section. A structural section is a section in which a piece of music is divided on the time axis according to musical meaning. The structural sections are, for example, sections such as an intro, a verse, a bridge, a chorus, and an outro. The section setting unit 26 divides the acoustic signal A into a plurality of structural sections by analyzing the acoustic signal A, and sets a specific structural section among the plurality of structural sections as a specific section. According to the above configuration, beat points B can be estimated in a limited manner for a specific structural section.
C:変形例
 以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様を、相互に矛盾しない範囲で適宜に併合してもよい。
C: Modifications Specific modifications added to each of the above-mentioned embodiments will be exemplified below. Two or more aspects arbitrarily selected from the examples below may be combined as appropriate to the extent that they do not contradict each other.
(1)前述の各形態においては、対象拍点Bnの直前の隣接拍点Bn-1と直後の隣接拍点Bn+1とを対象拍点Bnとともに移動する形態を例示したが、隣接拍点Bn-1および隣接拍点Bn+1の一方のみを対象拍点Bnとともに移動する形態も想定される。例えば、拍点編集部24は、対象拍点Bnと直前の隣接拍点Bn-1とのみを利用者からの指示に応じて時間軸上で移動し、更新処理部25は、隣接拍点Bn-1と対象拍点Bnとの間の適用区間T内において誤差e(t)を算定してもよい。同様に、拍点編集部24は、対象拍点Bnと直後の隣接拍点Bn+1とのみを利用者からの指示に応じて時間軸上で移動し、更新処理部25は、対象拍点Bnと隣接拍点Bn+1との間の適用区間T内において誤差e(t)を算定してもよい。以上の説明から理解される通り、拍点編集部24は、複数の拍点Bのうち対象拍点Bnの周囲に位置する1以上の隣接拍点Bn±1を時間軸上で移動する要素として表現される。 (1) In each of the above embodiments, the adjacent beat point Bn-1 immediately before the target beat point Bn and the adjacent beat point Bn+1 immediately after the target beat point Bn are moved together with the target beat point Bn. A mode in which only one of Bn-1 and the adjacent beat point Bn+1 is moved together with the target beat point Bn is also assumed. For example, the beat point editing unit 24 moves only the target beat point Bn and the immediately preceding adjacent beat point Bn-1 on the time axis according to an instruction from the user, and the update processing unit 25 moves only the target beat point Bn and the immediately preceding adjacent beat point Bn-1. The error e(t) may be calculated within the applicable interval T between -1 and the target beat point Bn. Similarly, the beat point editing unit 24 moves only the target beat point Bn and the immediately following adjacent beat point Bn+1 on the time axis according to instructions from the user, and the update processing unit 25 moves only the target beat point Bn+1 The error e(t) may be calculated within the applicable interval T between Bn and the adjacent beat point Bn+1. As understood from the above explanation, the beat point editing unit 24 uses one or more adjacent beat points Bn±1 located around the target beat point Bn among the plurality of beat points B as an element to move on the time axis. expressed.
 なお、前述の各形態においては、対象拍点Bnの移動だけでなく、対象拍点Bnと直前の隣接拍点Bn-1との時間的な関係と、対象拍点Bnと直後の隣接拍点Bn+1との時間的な関係とについても、推定モデルMに反映される。したがって、対象拍点Bnと周囲の1個の隣接拍点Bとのみが推定モデルMに反映される形態と比較して、利用者の意図に適切に適合した拍点Bを推定できるように推定モデルMを更新できる。 In addition, in each of the above-mentioned embodiments, not only the movement of the target beat point Bn, but also the temporal relationship between the target beat point Bn and the immediately preceding adjacent beat point Bn-1, and the relationship between the target beat point Bn and the immediately following adjacent beat point The temporal relationship with Bn+1 is also reflected in the estimation model M. Therefore, compared to a configuration in which only the target beat point Bn and one surrounding adjacent beat point B are reflected in the estimation model M, the estimation model M is estimated so that a beat point B that appropriately matches the user's intention can be estimated. Model M can be updated.
(2)前述の各形態においては、移動後の対象拍点Bnおよび各隣接拍点Bn±1に対応する数値分布Dとして三角分布を例示したが、数値分布Dの種類または形状は、以上の例示に限定されない。例えば、正規分布等の確率分布、またはパルス状の分布等も、数値分布Dとして採用される。 (2) In each of the above embodiments, a triangular distribution was exemplified as the numerical distribution D corresponding to the target beat point Bn and each adjacent beat point Bn±1 after movement, but the type or shape of the numerical distribution D is as described above. Not limited to examples. For example, a probability distribution such as a normal distribution or a pulse-like distribution may also be employed as the numerical distribution D.
(3)特徴抽出部30が音響信号Aから算定する特徴量F(t)の種類は、前述の各形態における例示に限定されない。例えば、音響信号Aを構成する所定個のサンプルの時系列が、特徴量F(t)として推定処理S2に適用されてもよい。以上の形態においては、特徴抽出部30が音響信号Aからサンプルの時系列を抽出していると解釈できる一方、音響信号A自体が部分的に推定処理S2に適用されるという観点では、特徴抽出部30が省略された形態とも解釈され得る。 (3) The type of feature amount F(t) that the feature extraction unit 30 calculates from the acoustic signal A is not limited to the examples in each of the above embodiments. For example, a time series of a predetermined number of samples constituting the acoustic signal A may be applied to the estimation process S2 as the feature amount F(t). In the above embodiment, it can be interpreted that the feature extraction unit 30 extracts a time series of samples from the acoustic signal A, but from the viewpoint that the acoustic signal A itself is partially applied to the estimation process S2, the feature extraction It can also be interpreted as a form in which the section 30 is omitted.
(4)前述の各形態においては、利用者が指示した移動方向(前方/後方)および移動量δに応じて対象拍点Bnおよび各隣接拍点Bn±1を移動させたが、対象拍点Bnおよび各隣接拍点Bn±1の移動を利用者が指示するための方法は、以上の例示に限定されない。 (4) In each of the above embodiments, the target beat point Bn and each adjacent beat point Bn±1 were moved according to the movement direction (forward/backward) and movement amount δ specified by the user, but the target beat point The method by which the user instructs the movement of Bn and each adjacent beat point Bn±1 is not limited to the above example.
 例えば、利用者が入力した符号(±)および数値に応じて、拍点編集部24が対象拍点Bnおよび各隣接拍点Bn±1を移動してもよい。利用者が負数を入力した場合、拍点編集部24は、対象拍点Bnおよび各隣接拍点Bn±1を、当該負数の絶対値に相当する移動量δだけ時間軸上の前方に移動する。他方、利用者が正数を入力した場合、拍点編集部24は、対象拍点Bnおよび各隣接拍点Bn±1を、当該正数に相当する移動量δだけ時間軸上の後方に移動する。 For example, the beat point editing unit 24 may move the target beat point Bn and each adjacent beat point Bn±1 according to the sign (±) and numerical value input by the user. When the user inputs a negative number, the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn±1 forward on the time axis by a movement amount δ corresponding to the absolute value of the negative number. . On the other hand, if the user inputs a positive number, the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn±1 backward on the time axis by the amount of movement δ corresponding to the positive number. do.
 また、拍点編集部24は、利用者が指示した回数にわたり対象拍点Bnおよび各隣接拍点Bn±1を所定の単位量だけ時間軸上で移動してもよい。例えば、利用者からの移動指示を受付けるたびに、拍点編集部24は、利用者が指定した方向(前方/後方)に、対象拍点Bnおよび各隣接拍点Bn±1を単位量だけ移動する。したがって、所定の単位量と移動指示の回数との乗算値に相当する移動量δだけ、対象拍点Bnおよび各隣接拍点Bn±1が時間軸上で移動する。 Furthermore, the beat point editing unit 24 may move the target beat point Bn and each adjacent beat point Bn±1 by a predetermined unit amount on the time axis over the number of times instructed by the user. For example, every time a movement instruction is received from the user, the beat point editing unit 24 moves the target beat point Bn and each adjacent beat point Bn±1 by a unit amount in the direction (forward/backward) specified by the user. do. Therefore, the target beat point Bn and each adjacent beat point Bn±1 move on the time axis by a movement amount δ corresponding to the multiplication value of a predetermined unit amount and the number of movement instructions.
 以上の説明から理解される通り、本開示において拍点を「利用者からの指示に応じて移動する」とは、拍点の移動の条件(例えば移動方向および移動量)が利用者からの指示に応じて変化することを意味し、利用者による指示の方法および利用者が指示する事項は、本開示において任意である。また、「拍点の移動」は、時間軸上における拍点の位置を変更することを意味する。 As can be understood from the above explanation, in the present disclosure, "moving the beat point in response to an instruction from the user" means that the conditions for moving the beat point (for example, the direction and amount of movement) are based on the instruction from the user. The method of instruction by the user and the matters instructed by the user are arbitrary in this disclosure. Furthermore, "moving the beat point" means changing the position of the beat point on the time axis.
(5)前述の各形態においては、深層ニューラルネットワークを推定モデルMとして例示したが、推定モデルMの構成は以上の例示に限定されない。例えば、隠れマルコフモデル(HMM:Hidden Markov Model)またはサポートベクタマシン(SVM:Support Vector Machine)等の統計モデルも、推定モデルMとして利用される。なお、前述の各形態においては、更新処理S8により推定モデルMを更新した。推定モデルMは推定処理S2に適用されるから、更新処理S8は、推定処理S2を更新する処理とも表現される。 (5) In each of the above embodiments, a deep neural network is illustrated as the estimation model M, but the configuration of the estimation model M is not limited to the above examples. For example, a statistical model such as a Hidden Markov Model (HMM) or a Support Vector Machine (SVM) may also be used as the estimation model M. Note that in each of the above embodiments, the estimated model M was updated by the update process S8. Since the estimation model M is applied to the estimation process S2, the update process S8 can also be expressed as a process for updating the estimation process S2.
(6)例えばスマートフォンまたはタブレット端末等の情報装置との間で通信するサーバ装置により音響解析システム100を実現してもよい。例えば、音響解析システム100は、情報装置から受信した音響信号Aの解析により複数の拍点Bを推定し、複数の拍点Bを表すデータを情報装置に送信する。 (6) For example, the acoustic analysis system 100 may be realized by a server device that communicates with an information device such as a smartphone or a tablet terminal. For example, the acoustic analysis system 100 estimates a plurality of beat points B by analyzing the acoustic signal A received from the information device, and transmits data representing the plurality of beat points B to the information device.
(7)以上に例示した音響解析システム100の機能は、前述の通り、制御装置11を構成する単数または複数のプロセッサと、記憶装置12に記憶されたプログラムとの協働により実現される。本開示に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号(transitory, propagating signal)を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記憶装置が、前述の非一過性の記録媒体に相当する。 (7) The functions of the acoustic analysis system 100 exemplified above are realized by cooperation between one or more processors that constitute the control device 11 and the program stored in the storage device 12, as described above. A program according to the present disclosure may be provided in a form stored in a computer-readable recording medium and installed on a computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of. Note that the non-transitory recording medium includes any recording medium excluding transitory, propagating signals, and does not exclude volatile recording media. Furthermore, in a configuration in which a distribution device distributes a program via a communication network, a storage device that stores the program in the distribution device corresponds to the above-mentioned non-transitory recording medium.
D:付記
 以上に例示した形態から、例えば以下の構成が把握される。
D: Supplementary Note From the forms exemplified above, for example, the following configurations can be understood.
 本開示のひとつの態様(態様1)に係る音響解析システムは、音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部と、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部と、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部とを具備し、前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。 An acoustic analysis system according to one aspect (aspect 1) of the present disclosure includes a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal, and a beat point estimation unit that estimates a plurality of first beat points by an estimation process for an acoustic signal; A beat that moves a selected target beat point and one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user. a point editing unit; and an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, and the beat point estimation unit updates the estimation process after the update. A plurality of second beat points are estimated by performing the estimation on the acoustic signal.
 以上の態様によれば、利用者が選択した対象拍点と当該対象拍点の周囲に位置する1以上の隣接拍点とに関する時間軸上の移動に応じて推定処理が更新され、更新後の推定処理により複数の第2拍点が推定される。推定処理の更新においては、対象拍点の移動だけでなく、対象拍点と1以上の隣接拍点との時間的な関係も、推定処理に反映される。したがって、対象拍点の移動だけが推定処理に反映される形態と比較して、利用者の意図に適切に適合した第2拍点を推定できる。なお、音響解析システムは、音響解析装置と表現されてもよい。「音響解析システム」および「音響解析装置」の何れにおいても、単体の装置で構成されるか相互に別体の複数の装置で構成されるかは不問である。 According to the above aspect, the estimation process is updated according to the movement on the time axis regarding the target beat point selected by the user and one or more adjacent beat points located around the target beat point, and the updated A plurality of second beat points are estimated by the estimation process. In updating the estimation process, not only the movement of the target beat point but also the temporal relationship between the target beat point and one or more adjacent beat points is reflected in the estimation process. Therefore, compared to a configuration in which only the movement of the target beat point is reflected in the estimation process, it is possible to estimate the second beat point that appropriately matches the user's intention. Note that the acoustic analysis system may also be expressed as an acoustic analysis device. It does not matter whether the "acoustic analysis system" or the "acoustic analysis device" is composed of a single device or a plurality of mutually separate devices.
 「推定処理」は、音響信号から複数の拍点(第1拍点/第2拍点)を推定するための処理である。例えば、学習用の音響信号の特徴量と、当該特徴量が観測される時点が拍点に該当する確率との関係を学習した推定モデルを利用する処理が「推定処理」の一例である。具体的には、処理対象の音響信号のうち特定の時点における特徴量を推定モデルにより処理することで、当該時点が拍点に該当する確率が出力される。 The "estimation process" is a process for estimating multiple beat points (first beat point/second beat point) from the acoustic signal. For example, an example of "estimation processing" is a process that uses an estimation model that has learned the relationship between the feature quantity of the acoustic signal for learning and the probability that the time point at which the feature quantity is observed corresponds to a beat point. Specifically, by processing the feature amount at a specific point in time of the acoustic signal to be processed using the estimation model, the probability that the point in time corresponds to a beat point is output.
 「推定処理の更新」は、推定処理に適用される要素を更新する処理を意味する。例えば、推定モデルを利用する推定処理を想定すると、推定モデルを規定する変数を更新する機械学習が「推定処理の更新」に相当する。 "Updating estimation processing" means processing for updating elements applied to estimation processing. For example, assuming an estimation process that uses an estimation model, machine learning that updates variables that define the estimation model corresponds to "updating the estimation process."
 態様1の具体例(態様2)において、前記1以上の隣接拍点は、前記複数の第1拍点のうち前記対象拍点の直前に位置する第1拍点と、前記複数の第1拍点のうち前記対象拍点の直後に位置する第1拍点とを含む。以上の態様においては、対象拍点の移動だけでなく、対象拍点と直前の第1拍点との時間的な関係と、対象拍点と直後の第1拍点との時間的な関係とについても、推定処理に反映される。したがって、対象拍点と周囲の1個の隣接拍点とのみが推定処理に反映される形態と比較して、利用者の意図に適切に適合した第2拍点を推定できるように推定処理を更新できる。 In a specific example of aspect 1 (aspect 2), the one or more adjacent beat points include a first beat point located immediately before the target beat point among the plurality of first beat points, and a first beat point located immediately before the target beat point among the plurality of first beat points. A first beat point located immediately after the target beat point among the points. In the above aspect, not only the movement of the target beat point, but also the temporal relationship between the target beat point and the immediately preceding first beat point, and the temporal relationship between the target beat point and the immediately following first beat point are determined. are also reflected in the estimation process. Therefore, compared to a configuration in which only the target beat point and one surrounding adjacent beat point are reflected in the estimation process, the estimation process is designed to be able to estimate the second beat point that appropriately matches the user's intention. Can be updated.
 態様1または態様2の具体例(態様3)に係る音響解析システムは、前記対象拍点と前記1以上の隣接拍点とを表す拍点画像を表示装置13に表示し、前記拍点画像に含まれる前記対象拍点と前記1以上の隣接拍点とを前記利用者からの指示に応じて移動する表示制御部22をさらに具備する。以上の態様においては、対象拍点と1以上の隣接拍点とが利用者からの指示に応じて移動する様子を、利用者が視覚的に確認できる。したがって、更新後の推定処理により推定される第2拍点を予測しながら、利用者は対象拍点と1以上の隣接拍点との移動を指示できる。 The acoustic analysis system according to a specific example of aspect 1 or aspect 2 (aspect 3) displays a beat point image representing the target beat point and the one or more adjacent beat points on the display device 13, and displays the beat point image in the beat point image. The display control unit 22 further includes a display control unit 22 that moves the included target beat point and the one or more adjacent beat points in accordance with an instruction from the user. In the above aspect, the user can visually confirm how the target beat point and one or more adjacent beat points move according to instructions from the user. Therefore, the user can instruct the movement of the target beat point and one or more adjacent beat points while predicting the second beat point estimated by the updated estimation process.
 態様1から態様3の何れかの具体例(態様4)において、前記推定処理は、学習用音響信号の特徴量と、当該特徴量が観測される時点が拍点に該当する確率との関係を学習した推定モデルにより、前記音響信号の各時点における特徴量を処理することで、当該時点が拍点に該当する確率を生成する第1処理と、前記第1処理により生成された確率の時系列から前記複数の第1拍点を特定する第2処理とを含む。以上の態様においては、学習用音響信号の特徴量と当該特徴量の時点が拍点に該当する確率との関係を学習した推定モデルを利用して、音響信号から複数の拍点が推定される。したがって、特徴量が多様に変化する未知の音響信号について複数の拍点(第1拍点/第2拍点)を高精度に推定できる。 In a specific example of any one of aspects 1 to 3 (aspect 4), the estimation process calculates the relationship between the feature amount of the learning acoustic signal and the probability that the time point at which the feature amount is observed corresponds to a beat point. a first process of generating a probability that the point corresponds to a beat point by processing feature amounts at each point in time of the acoustic signal using a learned estimation model; and a time series of the probabilities generated by the first process. and a second process of identifying the plurality of first beat points from. In the above aspect, multiple beat points are estimated from the acoustic signal using an estimation model that has learned the relationship between the feature amount of the learning acoustic signal and the probability that the time point of the feature value corresponds to a beat point. . Therefore, a plurality of beat points (first beat point/second beat point) can be estimated with high accuracy for an unknown acoustic signal in which the feature value changes in various ways.
 態様4の具体例(態様5)において、前記更新処理部は、前記移動後の前記対象拍点および前記1以上の隣接拍点に対応して時間軸上に設定された数値分布と、前記第1処理により推定された確率の時系列と、の誤差が低減されるように、前記推定モデルを更新する。以上の態様においては、対象拍点および隣接拍点に対応する数値分布と、第1処理により推定された確率の時系列との誤差が低減されるように、推定モデルが更新されるから、対象拍点および隣接拍点の移動を推定モデルに適切に反映することが可能である。 In a specific example of Aspect 4 (Aspect 5), the update processing unit includes a numerical distribution set on a time axis corresponding to the target beat point after the movement and the one or more adjacent beat points; The estimation model is updated so that the error between the time series of probabilities estimated by one process and the time series is reduced. In the above aspect, the estimation model is updated so that the error between the numerical distribution corresponding to the target beat point and adjacent beat points and the time series of probabilities estimated by the first process is reduced. It is possible to appropriately reflect the movement of beat points and adjacent beat points in the estimation model.
 「数値分布」は、時間軸上における数値の分布である。数値分布の種類および形状は任意である。例えば、三角分布、正規分布、またはパルス状の分布が、「数値分布」として例示される。数値分布について、「(対象/隣接)拍点に対応して設定される」とは、拍点の時間軸上の位置と数値分布の時間軸上の位置が相互に対応することを意味する。すなわち、拍点の時間軸上の位置が変化に連動して数値分布の時間軸上の位置も変化する。例えば、数値分布の極大点が拍点に一致する関係が、「拍点に対応して設定される」関係の典型例である。 "Numerical distribution" is the distribution of numerical values on the time axis. The type and shape of the numerical distribution are arbitrary. For example, a triangular distribution, a normal distribution, or a pulsed distribution is exemplified as a "numeric distribution." Regarding the numerical distribution, "set corresponding to (target/adjacent) beat points" means that the position of the beat point on the time axis and the position of the numerical distribution on the time axis correspond to each other. That is, as the position of the beat point on the time axis changes, the position of the numerical distribution on the time axis also changes. For example, a relationship in which the maximum point of the numerical distribution coincides with a beat point is a typical example of a relationship "set corresponding to a beat point."
 態様1から態様5の何れかの具体例(態様6)に係る音響解析システムは、時間軸上における前記音響信号の一部の区間である特定区間を設定する区間設定部26をさらに具備し、前記拍点推定部による推定処理は、前記特定区間について実行される。以上の態様によれば、音響信号の一部の区間について限定的に第2拍点を推定できる。 The acoustic analysis system according to a specific example of any one of aspects 1 to 5 (aspect 6) further includes an interval setting unit 26 that sets a specific interval that is a part of the acoustic signal on the time axis, The estimation process by the beat point estimation unit is performed for the specific section. According to the above aspect, the second beat point can be estimated in a limited manner for a part of the acoustic signal.
 「特定区間」は、時間軸上における音響信号の任意の一部の区間である。例えば、利用者が指定した区間が「特定区間」の一例である。また、音響信号が表す楽曲の複数の構造区間の何れかを「特定区間」として拍点を推定してもよい。構造区間は、音楽的な意味に応じて楽曲を時間軸上で区分した区間である。構造区間は、例えば、イントロ(intro)、Aメロ(verse)、Bメロ(bridge)、サビ(chorus)およびアウトロ(outro)等の各区間である。 The "specific section" is an arbitrary part of the section of the acoustic signal on the time axis. For example, a section specified by the user is an example of a "specific section." Alternatively, the beat point may be estimated by using any one of a plurality of structural sections of the music represented by the acoustic signal as a "specific section." A structural section is a section in which a piece of music is divided on the time axis according to musical meaning. The structural sections are, for example, sections such as an intro, a verse, a bridge, a chorus, and an outro.
 本開示のひとつの態様に係る音響解析方法は、音響信号に対する推定処理により複数の第1拍点を推定し、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動し、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新し、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。なお、音響解析システムについて例示した各態様は、本開示に係る音響解析方法にも同様に適用される。 An acoustic analysis method according to one aspect of the present disclosure estimates a plurality of first beat points by estimation processing on an acoustic signal, and includes a target beat point selected by a user among the plurality of first beat points, and a target beat point selected by a user from among the plurality of first beat points. Among the first beat points, one or more adjacent beat points located around the target beat point are moved on the time axis according to instructions from the user, and the target beat point and the one or more adjacent beat points are The estimation process is updated according to the movement of adjacent beat points, and the updated estimation process is executed on the acoustic signal, thereby estimating a plurality of second beat points. Note that each aspect illustrated for the acoustic analysis system is similarly applied to the acoustic analysis method according to the present disclosure.
 本開示のひとつの態様に係るプログラムは、音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部、前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部、および、前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部、としてコンピュータシステムを機能させるプログラムであって、前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する。なお、音響解析システムについて例示した各態様は、本開示に係るプログラムにも同様に適用される。 A program according to one aspect of the present disclosure includes: a beat point estimation unit that estimates a plurality of first beat points by estimation processing on an acoustic signal; a target beat point selected by a user from among the plurality of first beat points; a beat point editing unit that moves one or more adjacent beat points located around the target beat point among the plurality of first beat points on a time axis according to an instruction from the user; A program that causes a computer system to function as an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points, the beat point estimation unit updating the estimation process after the update. A plurality of second beat points are estimated by performing this on the acoustic signal. Note that each aspect illustrated for the acoustic analysis system is similarly applied to the program according to the present disclosure.
100…音響解析システム、11…制御装置、12…記憶装置、13…表示装置、14…操作装置、15…放音装置、21…拍点推定部、22…表示制御部、23…再生制御部、24…拍点編集部、25…更新処理部、26…区間設定部、30…特徴抽出部、31…第1処理部、32…第2処理部。 DESCRIPTION OF SYMBOLS 100... Acoustic analysis system, 11... Control device, 12... Storage device, 13... Display device, 14... Operating device, 15... Sound emitting device, 21... Beat point estimation part, 22... Display control part, 23... Playback control part , 24... Beat point editing section, 25... Update processing section, 26... Section setting section, 30... Feature extraction section, 31... First processing section, 32... Second processing section.

Claims (13)

  1.  音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部と、
     前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部と、
     前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部とを具備し、
     前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する
     音響解析システム。
    a beat point estimation unit that estimates a plurality of first beat points by estimation processing on the acoustic signal;
    A target beat point selected by the user among the plurality of first beat points and one or more adjacent beat points located around the target beat point among the plurality of first beat points are received from the user. A beat point editing section that moves on the time axis according to the instructions of
    an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points,
    The beat point estimating unit estimates a plurality of second beat points by performing the updated estimation process on the acoustic signal.
  2.  前記1以上の隣接拍点は、
     前記複数の第1拍点のうち前記対象拍点の直前に位置する第1拍点と、
     前記複数の第1拍点のうち前記対象拍点の直後に位置する第1拍点とを含む
     請求項1の音響解析システム。
    The one or more adjacent beat points are
    a first beat point located immediately before the target beat point among the plurality of first beat points;
    The acoustic analysis system according to claim 1, further comprising: a first beat point located immediately after the target beat point among the plurality of first beat points.
  3.  前記対象拍点と前記1以上の隣接拍点とを表す拍点画像を表示装置に表示し、前記拍点画像に含まれる前記対象拍点と前記1以上の隣接拍点とを前記利用者からの指示に応じて移動する表示制御部
     をさらに具備する請求項1または請求項2の音響解析システム。
    A beat point image representing the target beat point and the one or more adjacent beat points is displayed on a display device, and the target beat point and the one or more adjacent beat points included in the beat point image are received from the user. The acoustic analysis system according to claim 1 or 2, further comprising a display control section that moves in response to an instruction.
  4.  前記推定処理は、
     学習用音響信号の特徴量と、当該特徴量が観測される時点が拍点に該当する確率との関係を学習した推定モデルにより、前記音響信号の各時点における特徴量を処理することで、当該時点が拍点に該当する確率を生成する第1処理と、
     前記第1処理により生成された確率の時系列から前記複数の第1拍点を特定する第2処理とを含む
     請求項1または請求項2の音響解析システム。
    The estimation process is
    By processing the feature values at each time point of the acoustic signal using an estimation model that has learned the relationship between the feature value of the learning acoustic signal and the probability that the time point at which the feature value is observed corresponds to a beat point, a first process of generating a probability that a time point corresponds to a beat point;
    3. The acoustic analysis system according to claim 1, further comprising a second process of identifying the plurality of first beat points from a time series of probabilities generated by the first process.
  5.  前記更新処理部は、
     前記移動後の前記対象拍点および前記1以上の隣接拍点に対応して時間軸上に設定された数値分布と、
     前記第1処理により推定された確率の時系列と、
     の誤差が低減されるように、前記推定モデルを更新する
     請求項4の音響解析システム。
    The update processing unit includes:
    a numerical distribution set on a time axis corresponding to the target beat point after the movement and the one or more adjacent beat points;
    a time series of probabilities estimated by the first process;
    The acoustic analysis system according to claim 4, wherein the estimation model is updated so that the error in the estimation model is reduced.
  6.  時間軸上における前記音響信号の一部の区間である特定区間を設定する区間設定部
     をさらに具備し、
     前記拍点推定部による推定処理は、前記特定区間について実行される
     請求項1の音響解析システム。
    further comprising: a section setting unit that sets a specific section that is a part of the acoustic signal on the time axis;
    The acoustic analysis system according to claim 1, wherein the estimation process by the beat point estimation unit is performed for the specific section.
  7.  音響信号に対する推定処理により複数の第1拍点を推定し、
     前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動し、
     前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新し、
     前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する
     コンピュータシステムにより実現される音響解析方法。
    Estimating a plurality of first beat points by estimation processing on the acoustic signal,
    A target beat point selected by the user among the plurality of first beat points and one or more adjacent beat points located around the target beat point among the plurality of first beat points are received from the user. Move on the time axis according to the instructions of
    updating the estimation process according to movement of the target beat point and the one or more adjacent beat points;
    An acoustic analysis method realized by a computer system, wherein a plurality of second beat points are estimated by executing the updated estimation process on the acoustic signal.
  8.  前記1以上の隣接拍点は、
     前記複数の第1拍点のうち前記対象拍点の直前に位置する第1拍点と、
     前記複数の第1拍点のうち前記対象拍点の直後に位置する第1拍点とを含む
     請求項7の音響解析方法。
    The one or more adjacent beat points are
    a first beat point located immediately before the target beat point among the plurality of first beat points;
    The acoustic analysis method according to claim 7, further comprising: a first beat point located immediately after the target beat point among the plurality of first beat points.
  9.  さらに、
     前記対象拍点と前記1以上の隣接拍点とを表す拍点画像を表示装置に表示し、
     前記拍点画像に含まれる前記対象拍点と前記1以上の隣接拍点とを前記利用者からの指示に応じて移動する
     請求項7または請求項8の音響解析方法。
    moreover,
    Displaying a beat point image representing the target beat point and the one or more adjacent beat points on a display device,
    The acoustic analysis method according to claim 7 or 8, wherein the target beat point and the one or more adjacent beat points included in the beat point image are moved in accordance with an instruction from the user.
  10.  前記推定処理は、
     学習用音響信号の特徴量と、当該特徴量が観測される時点が拍点に該当する確率との関係を学習した推定モデルにより、前記音響信号の各時点における特徴量を処理することで、当該時点が拍点に該当する確率を生成する第1処理と、
     前記第1処理により生成された確率の時系列から前記複数の第1拍点を特定する第2処理とを含む
     請求項7または請求項8の音響解析方法。
    The estimation process is
    By processing the feature values at each time point of the acoustic signal using an estimation model that has learned the relationship between the feature value of the learning acoustic signal and the probability that the time point at which the feature value is observed corresponds to a beat point, a first process of generating a probability that a time point corresponds to a beat point;
    9. The acoustic analysis method according to claim 7, further comprising a second process of identifying the plurality of first beat points from a time series of probabilities generated by the first process.
  11.  前記推定処理の更新においては、
     前記移動後の前記対象拍点および前記1以上の隣接拍点に対応して時間軸上に設定された数値分布と、
     前記第1処理により推定された確率の時系列と、
     の誤差が低減されるように、前記推定モデルを更新する
     請求項10の音響解析方法。
    In updating the estimation process,
    a numerical distribution set on a time axis corresponding to the target beat point after the movement and the one or more adjacent beat points;
    a time series of probabilities estimated by the first process;
    The acoustic analysis method according to claim 10, wherein the estimation model is updated so that an error in the estimation model is reduced.
  12.  さらに、
     時間軸上における前記音響信号の一部の区間である特定区間を設定し、
     前記推定処理は、前記特定区間について実行される
     請求項7の音響解析方法。
    moreover,
    setting a specific section that is a part of the acoustic signal on the time axis,
    The acoustic analysis method according to claim 7, wherein the estimation process is performed for the specific section.
  13.  音響信号に対する推定処理により複数の第1拍点を推定する拍点推定部、
     前記複数の第1拍点のうち利用者が選択した対象拍点と、前記複数の第1拍点のうち前記対象拍点の周囲に位置する1以上の隣接拍点とを、前記利用者からの指示に応じて時間軸上で移動する拍点編集部、および、
     前記対象拍点および前記1以上の隣接拍点の移動に応じて前記推定処理を更新する更新処理部、
     としてコンピュータシステムを機能させるプログラムであって、
     前記拍点推定部は、前記更新後の推定処理を前記音響信号に対して実行することで複数の第2拍点を推定する
     プログラム。
    a beat point estimation unit that estimates a plurality of first beat points by estimation processing on the acoustic signal;
    A target beat point selected by the user among the plurality of first beat points and one or more adjacent beat points located around the target beat point among the plurality of first beat points are received from the user. A beat point editing section that moves on the time axis according to instructions, and
    an update processing unit that updates the estimation process according to movement of the target beat point and the one or more adjacent beat points;
    A program that causes a computer system to function as
    The beat point estimating unit estimates a plurality of second beat points by performing the updated estimation process on the acoustic signal.
PCT/JP2023/021287 2022-07-01 2023-06-08 Acoustic analysis system, acoustic analysis method, and program WO2024004564A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022106820A JP2024006175A (en) 2022-07-01 2022-07-01 Acoustic analysis system, acoustic analysis method, and program
JP2022-106820 2022-07-01

Publications (1)

Publication Number Publication Date
WO2024004564A1 true WO2024004564A1 (en) 2024-01-04

Family

ID=89382817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/021287 WO2024004564A1 (en) 2022-07-01 2023-06-08 Acoustic analysis system, acoustic analysis method, and program

Country Status (2)

Country Link
JP (1) JP2024006175A (en)
WO (1) WO2024004564A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013171070A (en) * 2012-02-17 2013-09-02 Pioneer Electronic Corp Music information processing apparatus and music information processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013171070A (en) * 2012-02-17 2013-09-02 Pioneer Electronic Corp Music information processing apparatus and music information processing method

Also Published As

Publication number Publication date
JP2024006175A (en) 2024-01-17

Similar Documents

Publication Publication Date Title
US11727904B2 (en) Network musical instrument
JP4124247B2 (en) Music practice support device, control method and program
JP2019219570A (en) Electronic music instrument, control method of electronic music instrument, and program
JP2012037722A (en) Data generator for sound synthesis and pitch locus generator
US20220238088A1 (en) Electronic musical instrument, control method for electronic musical instrument, and storage medium
Luo et al. Singing voice correction using canonical time warping
JP2014174205A (en) Musical sound information processing device and program
WO2024004564A1 (en) Acoustic analysis system, acoustic analysis method, and program
US20230016425A1 (en) Sound Signal Generation Method, Estimation Model Training Method, and Sound Signal Generation System
US20230351989A1 (en) Information processing system, electronic musical instrument, and information processing method
JP6617784B2 (en) Electronic device, information processing method, and program
US20230395052A1 (en) Audio analysis method, audio analysis system and program
WO2024085175A1 (en) Data processing method and program
US20230419929A1 (en) Signal processing system, signal processing method, and program
US20230419934A1 (en) Responsive live musical sound generation
JP2019219661A (en) Electronic music instrument, control method of electronic music instrument, and program
WO2023171522A1 (en) Sound generation method, sound generation system, and program
WO2023182005A1 (en) Data output method, program, data output device, and electronic musical instrument
US20240087552A1 (en) Sound generation method and sound generation device using a machine learning model
US20230260493A1 (en) Sound synthesizing method and program
WO2023171497A1 (en) Acoustic generation method, acoustic generation system, and program
JP2022129742A (en) Method and system for analyzing audio and program
WO2022074754A1 (en) Information processing method, information processing system, and program
WO2022074753A1 (en) Information processing method, information processing system, and program
JP2022129738A (en) Method and system for analysing audio and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23831014

Country of ref document: EP

Kind code of ref document: A1