US20210326102A1 - Method and device for determining mixing parameters based on decomposed audio data - Google Patents
Method and device for determining mixing parameters based on decomposed audio data Download PDFInfo
- Publication number
- US20210326102A1 US20210326102A1 US17/343,386 US202117343386A US2021326102A1 US 20210326102 A1 US20210326102 A1 US 20210326102A1 US 202117343386 A US202117343386 A US 202117343386A US 2021326102 A1 US2021326102 A1 US 2021326102A1
- Authority
- US
- United States
- Prior art keywords
- audio
- track
- data
- decomposed
- audio track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002156 mixing Methods 0.000 title claims abstract description 121
- 238000000034 method Methods 0.000 title claims abstract description 101
- 230000005236 sound signal Effects 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 42
- 230000007704 transition Effects 0.000 claims description 226
- 230000001755 vocal effect Effects 0.000 claims description 65
- 238000000354 decomposition reaction Methods 0.000 claims description 25
- 238000013473 artificial intelligence Methods 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 22
- 238000012952 Resampling Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 86
- 241001342895 Chorus Species 0.000 description 22
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 description 21
- 230000000875 corresponding effect Effects 0.000 description 15
- 230000000694 effects Effects 0.000 description 12
- 239000000203 mixture Substances 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 230000001020 rhythmical effect Effects 0.000 description 9
- 230000008901 benefit Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010924 continuous production Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000009527 percussion Methods 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 229910001369 Brass Inorganic materials 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000010951 brass Substances 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 125000000391 vinyl group Chemical group [H]C([*])=C([H])[H] 0.000 description 1
- 229920002554 vinyl polymer Polymers 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/008—Visual indication of individual signal levels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
- G06F3/04847—Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04886—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0091—Means for obtaining special acoustic effects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/06—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
- G10H1/08—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by combining tones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/46—Volume control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0324—Details of processing therefor
- G10L21/034—Automatic adjustment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
- G10L21/043—Time compression or expansion by changing speed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H60/00—Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
- H04H60/02—Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
- H04H60/04—Studio equipment; Interconnection of studios
- H04H60/05—Mobile studios
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/12—Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/081—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/125—Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
- G10H2210/195—Modulation effects, i.e. smooth non-discontinuous variations over a time interval, e.g. within a note, melody or musical transition, of any sound parameter, e.g. amplitude, pitch, spectral response or playback speed
- G10H2210/241—Scratch effects, i.e. emulating playback velocity or pitch manipulation effects normally obtained by a disc-jockey manually rotating a LP record forward and backward
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/325—Musical pitch modification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/375—Tempo or beat alterations; Music timing control
- G10H2210/391—Automatic tempo adjustment, correction or control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
- G10H2220/106—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters using icons, e.g. selecting, moving or linking icons, on-screen symbols, screen regions or segments representing musical elements or parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2230/00—General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
- G10H2230/005—Device type or category
- G10H2230/015—PDA [personal digital assistant] or palmtop computing devices used for musical purposes, e.g. portable music players, tablet computers, e-readers or smart phones in which mobile telephony functions need not be used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/325—Synchronizing two or more audio tracks or files according to musical features or musical timings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/025—Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
- G10H2250/031—Spectrum envelope processing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/025—Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
- G10H2250/035—Crossfade, i.e. time domain amplitude envelope control of the transition between musical sounds or melodies, obtained for musical purposes, e.g. for ADSR tone generation, articulations, medley, remix
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/541—Details of musical waveform synthesis, i.e. audio waveshape processing from individual wavetable samples, independently of their origin or of the sound they represent
- G10H2250/641—Waveform sampler, i.e. music samplers; Sampled music loop processing, wherein a loop is a sample of a performance that has been edited to repeat seamlessly without clicks or artifacts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B1/00—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
- H04B1/06—Receivers
- H04B1/16—Circuits
- H04B1/1646—Circuits adapted for the reception of stereophonic signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H60/00—Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
- H04H60/02—Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
- H04H60/04—Studio equipment; Interconnection of studios
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/003—Digital PA systems using, e.g. LAN or internet
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/005—Audio distribution systems for home, i.e. multi-room use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/01—Input selection or mixing for amplifiers or loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/01—Aspects of volume control, not necessarily automatic, in sound systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/03—Synergistic effects of band splitting and sub-band processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/13—Aspects of volume control, not necessarily automatic, in stereophonic sound systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/07—Synergistic effects of band splitting and sub-band processing
Definitions
- the present invention relates to a method for processing audio data based on one or more audio tracks of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres.
- two different audio tracks representing two different pieces of music are used to be mixed when a DJ crossfades from one of the pieces of music to the other such as to avoid any audible interruption in the music performance.
- DAW digital audio workstation
- a mixing engineer mixes different audio tracks representing different instruments, vocals, etc.
- a sound engineer is recording different audio sources such as different instruments or voices, by means of a plurality of microphones, pickups, etc., so as to produce mixed audio data for transmission through radio/TV broadcasting services or via the Internet.
- the main parameters for successfully mixing audio tracks comprise the volumes of the audio tracks, the timing or phase of the audio tracks relative to one another, and audio effects that may be applied to the individual audio tracks before mixing.
- the audio engineer may obtain information about the musical content of the individual audio tracks, including for example a key of the music, a tempo, a beat grid (time signature, beat emphases or accents etc.) or a particular instrument or a group of instruments contained in the audio tracks.
- a DJ intending to change the song currently played usually tries to find a suitable transition point between the two songs, i.e. a point in time within the first song at which the first song is faded out, and a point in time within the second song at which the second song is faded in.
- a suitable transition point i.e. a point in time within the first song at which the first song is faded out, and a point in time within the second song at which the second song is faded in.
- the DJ needs to determine the song parts of both songs in order to find a suitable transition point including a suitable timing for starting the second song.
- a transition between two songs can sound particularly smooth if both songs have the same or matching chords at the transition points and/or if both songs have mutually matching timbres, i.e. timbres which mix well with one another, for example a drum timbre and a piano timbre, while avoiding clashing of certain timbres, for example two vocal timbres at the same point in time at the transition point.
- timbres which mix well with one another, for example a drum timbre and a piano timbre
- this object is achieved by a method for processing audio data, comprising the steps of providing a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; providing a second audio track; analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter; generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- At least the mixed input data of the first audio track are decomposed such as to extract therefrom decomposed data representing only some of the timbres of the mixed input data, and the decomposed data are analyzed to determine at least one mixing parameter. Mixing of first and second audio tracks is then performed based on the at least one mixing parameter.
- a content of the audio information contained in the mixed input data is accessible at a significantly higher level or is even made available for analysis at all.
- detection of the beats of a song can be achieved with higher accuracy when separating a drum timbre, and detecting a key or a chord progression of a piece of music can be achieved with higher certainty by analyzing decomposed data representing a bass timbre.
- the output track may then be generated by matching the beats or matching the keys of the two audio tracks before mixing the audio tracks.
- audio tracks may include digital audio data such as contained in audio files or digital audio streams.
- the files or streams may have a specific length or playback duration or alternatively may have an undefined or infinitive length or playback duration, such as for example in case of a live stream or a continuous data stream received from a content provider via Internet.
- digital audio tracks are usually stored in an audio file in association with consecutive time frames, the length of each time frame being dependent on the sampling rate of the audio data as conventionally known. For example, in an audio file sampled at a sampling rate of 44.1 kHz one time frame will have a length of 0.023 ms.
- audio tracks may be embodied by analog audio signals, for example signals played by an analog playback device such as a vinyl player, a tape player etc.
- audio tracks may be songs or other pieces of music provided in digital or analog format.
- audio signal refers to an audio track or any part or portion of an audio track at a certain position or time within the audio track.
- the audio signal may be a digital signal processed, stored or transmitted through an electronic control system, in particular computer hardware, or may be an analog signal processed, stored or transmitted by analog audio hardware such as an analog mixer, a PA system or the like.
- the output track may comprise a first portion containing predominantly the first output data, and a second portion arranged after said first portion and containing predominantly the second output data.
- This method may be used in a DJ environment, in particular when mixing two songs using DJ equipment. In the first portion of the output track, only the first song is played as the first output data, while in a second portion only the second song is played as the second output data. The output track therefore switches from playback of the first song to playback of the second song.
- the step of analyzing audio data may include analyzing the decomposed data to determine a transition point as the mixing parameter, and the output track may be generated using the transition point such that the first portion is arranged before the transition point, and the second portion is arranged after the transition point.
- the method of the present invention may be used to find a suitable transition point at which playback of the songs are swapped.
- a transition point on the timeline of the output track may be defined by a first transition point on the timeline of the first audio track (e.g. corresponding to the first song) and a second transition point on the timeline of the second audio track (e.g. corresponding to the second song), wherein the output track then comprises the first portion containing predominantly the first output data obtained from the first audio track in a portion before the first transition point, and comprises the second portion containing predominantly the second output data obtained from the second audio track in a portion after the second transition point.
- the method of the invention may in particular include decomposing the first audio track to obtain first decomposed data, decomposing the second audio track to obtain second decomposed data, analyzing the first decomposed data to determine the first transition point as a first mixing parameter, analyzing the second decomposed data to determine the second transition point as a second mixing parameter, and generating the output track based on the first and second mixing parameters, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- the transition point(s) may be found more appropriately to allow a smooth transition between the songs, for example at a point where the decomposed drum track has a break or pause such that abrupt rhythmic changes can be avoided.
- the end of a chorus, a verse or any other song part can be determined automatically and a transition point can be determined at a junction between adjacent song parts.
- the output track may further include a transition portion, which is a time interval larger than zero, arranged between the first portion and the second portion and associated to (including) the transition point on the timeline of the output track, wherein in the transition portion a volume level of the first output data is reduced and/or a volume level of the second output data is increased. Therefore, within some sections of the transition portion or even during the entire transition portion, first output data and second output data overlap, i.e. are mixed to be played at the same time, wherein the volume levels of the first output data and the second output data may be adjusted to allow for a smooth transition from the first output data to the second output data without sudden breaks, sound artefacts or dissonant mixes. For example, the volume of the first output data may be continuously decreased over a part or the entire transition portion, while the volume level of the second output data may be continuously increased over a part or the entire transition portion. Transitions of the above-described type are called crossfades.
- audio data which include at least the decomposed data, are analyzed to determine one or more mixing parameters.
- Mixing parameters therefore include, but are not limited to, the following examples:
- the mixing parameter may be a tempo of the first and/or second audio track, in particular a BPM (beats per minute) of the first and/or second audio track.
- Generation of the output track i.e. mixing, may then include a tempo matching process in which the tempo or BPM of at least one of the first and second audio tracks or at least one of the first and second output data may be changed, such that the audio tracks or output data have the same or matching tempi or BPM.
- the tempo or BPM can be determined with higher accuracy and/or higher reliability.
- the at least one mixing parameter may refer to a beat grid of the first and/or second audio track.
- the beat grid refers to a rhythmic framework of a piece of music, for example.
- the individual beats of each bar including optionally information about time signature (for example a three-four time, a four-four time, a six-eight time, etc.), beat emphases or accents etc., may form the beat grid of a piece of music.
- time signature for example a three-four time, a four-four time, a six-eight time, etc.
- beat emphases or accents etc. may form the beat grid of a piece of music.
- the beat grid may be determined as a mixing parameter based on analyzing decomposed data, for example decomposed drum data or decomposed bass data.
- the beat grid can be determined with higher accuracy and higher reliability according to the present invention.
- the step of generating an output track may take into account the determined beat grid or the determined beat grids of the first and/or second audio track by synchronizing the beat grids of the two audio tracks. Synchronizing beat grids may comprise resampling of audio data of the first and/or second audio track such as to stretch or compress the tempo of at least one of the audio tracks and thereby match the beat grids of the audio data.
- the at least one mixing parameter may refer to a beat phase of the first and/or second audio track.
- the beat phase relates to a position (i.e. a timing) on the timeline of a piece of music comprising multiple bars, each bar having multiple beats according to the time signature of the music, wherein the beat phase is defined relative to a beginning of the current bar, i.e. relative to the previous downbeat position (first beat of a bar).
- Synchronizing beat phase may comprise time-shifting the audio tracks relative to one another such as to achieve matching beats.
- the at least one mixing parameter may refer to a downbeat position within a first and/or a second audio track.
- a downbeat position refers to the position of the first beat of each bar.
- the at least one mixing parameter may refer to a beat shift between the first audio track and the second audio track.
- This embodiment achieves advantages similar as described above for the mixing parameters beat grid, beat phase or downbeat position.
- smooth mixing may be achieved by introducing a time shift between the first output data and the second output data in such a manner as to achieve zero beat shift or a beat shift equal to one or more beats.
- the at least one mixing parameter may refer to a key or a chord progression of the first and/or second audio track.
- a chord progression of a piece of music is a time-dependent parameter which denotes certain chords or root tones at certain points in time on the timeline of the music, such as for example C Major, C Major 7, A Minor etc.
- a key of the music is basically constant over the whole piece of music and relates to the root or key note of the tonic (home key) of the piece of music. Mixing of a first audio track and second audio track, in particular mixing of different pieces of music or different portions or components of a piece of music, achieves more favorable results, if the two audio tracks have equal or mutually matching keys.
- mutually matching keys may refer to keys which have a total interval of a fourth, a fifth or an octave or multiples thereof in between.
- other intervals may be regarded as matching in the sense of the present invention.
- the key of the first and/or second audio track is determined by decomposing the input audio data and analyzing the decomposed data obtained in the step of decomposing. This will achieve more accurate and more reliable results. For example, it may be advantageous to analyze decomposed bass data or decomposed guitar data or decomposed piano data, etc., as these instruments usually play an important role in defining the harmony of a piece of music and thereby the relevant key of the music.
- chord progression valuable information may be obtained regarding the structure of a piece of music, such as the sequence of particular song parts, for example verses, choruses, bridges, intros, outros, etc.
- the same chord progressions are usually used for each verse or for each chorus. Analyzing a chord progression may therefore be useful to find particular positions within the first audio track, which are suitable for mixing with a particular position in the second audio track such that these positions qualify as first and second transition points for generating a crossfade from the first audio track to the second audio track as described above, for example.
- the invention may generate an output track in which the first output data and the second output data are mixed together with similar volumes during a portion corresponding to the first and second portions, to create a mashup of two songs, while predominantly only the first output data or the second output data may be contained in the mix in other portions of the output track.
- the at least one mixing parameter may refer to a timbre or a group of timbres of the first and/or second audio track.
- This embodiment is based on the idea that some timbres mix better than other timbres.
- a vocal timbre mixes well with instrumental timbres such as a guitar timbre or a piano timbre, while mixing of two vocal timbres is usually unfavorable due to the clashing of the two voices.
- timbres transporting strong harmonic information may be more difficult to mix with other harmonic timbres, but may more easily be combined with non-harmonic timbres such as drums.
- determining that the first and/or the second audio track contains a particular timbre may be a useful information for the user to assist mixing or may even allow a semi-automatic or automatic mixing of the audio tracks.
- the at least one mixing parameter may refer to a song part junction of the first and/or second audio track.
- song part junctions may be suitable positions within a song at which various mixing effects, including crossfades or transitions to another song, remixing with another song, audio effects (reverb, loop effects, equalizer etc.), may be applied in a natural manner.
- the determination of song part junctions can therefore be used to assist the mixing process or to allow for semi-automatic or even automatic mixing of two audio tracks.
- the mixing parameter in this example a song part junction, may be determined by analyzing decomposed data.
- a component of the audio mix that most clearly represents the structure of the song for example a bass component, may be used to more accurately and more reliably determine the song part junctions.
- any of the above-mentioned mixing parameters is suitable to achieve the effects of the present invention, in particular to assist the mixing process.
- the results will become even better if a plurality of different mixing parameters are determined by analyzing the same or different decomposed data.
- a structure of a piece of music can be determined with particularly high accuracy and reliability, if for example a first mixing parameter referring to a beat grid is determined by analyzing decomposed drum data, and a second mixing parameter relating to a chord progression is determined by analyzing decomposed bass data, while a third mixing parameter relating to a song part junction may then be determined based on the determined structure of the piece of music, i.e. based on the first mixing parameter and the second mixing parameter.
- the step of analyzing audio data may include detecting silence data within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than ⁇ 30 dB.
- a value of ⁇ 30 dB refers to ⁇ 30 dB FS (peak), i.e. to a volume level which is 30 dB smaller than the volume level of the loudest sound of the track.
- said silence data preferably represents an audio signal having a volume level smaller than ⁇ 60 dB FS (RMS), i.e. referring to the absolute mean value.
- Silence within a particular timbre of the audio data i.e. silence of a particular musical instrument or a voice component, may provide valuable information regarding the structure of the piece of music.
- a bridge part is often characterized by a certain interval, such as four, eight or sixteen bars of silence in the bass component of the music.
- the onset of the vocals may be an indication for the beginning of the first verse. Therefore, the step of analyzing audio data may preferably include detecting silence data continuously extending over a predetermined time span, for example over a time span of one, two, four, eight, twelve or sixteen bars, thus indicating a certain song part.
- an onset of a signal or a first signal peak within decomposed data after the predetermined time span of silence may indicate a downbeat position of a next song part, i.e. a song part junction.
- the step of analyzing audio data may include determining at least a first mixing parameter based on the decomposed data, and at least a second mixing parameter based on the first mixing parameter.
- the first mixing parameter may be a key of the first or second audio track
- the second mixing parameter may be a pitch shift value referring to a pitch shift to be applied to either one of the first and second audio tracks such as to match the keys of the first and second audio tracks.
- the second mixing parameter may be the transition point at which the output track includes a transition from the first output data to the second output data, for example by means of a crossfade.
- the first mixing parameter may for example be a song part junction, a beat phase or any other mixing parameter referring to a particular position or range within a piece of music relative to the musical content (song parts, bars, musical breaks, etc.).
- Such embodiments are particularly suitable to allow a DJ to find suitable transition points for changing from a first song to a second song.
- the transition point is one of the mixing parameters, semi-automatic or automatic transitions can be realized in which a user, for example a DJ, just inputs his/her intention to change from playback of the first song towards playback of the second song or just specifies which songs should be mixed, wherein a suitable transition point is then automatically determined by a computer program according to a method of the present invention.
- One or more suitable transition points may then be proposed to the DJ for manual selection (semi-automatic mixing) or, alternatively, mixing is automatically initiated and carried out at a suitable transmission point without any further user interaction (automatic mixing).
- Methods according to the first aspect of the invention use a step of decomposing mixed input data to obtain decomposed data.
- decomposing algorithms and services are known in the art, which allow decomposing audio signals to separate therefrom one or more signal components of different timbres, such as vocal components, drum components or instrumental components.
- Such decomposed signals and decomposed tracks have been used in the past to create certain artificial effects such as removing vocals from a song to create a karaoke version of a song.
- Some AI systems usually implement a convolutional neural network (CNN), which has been trained by a plurality of data sets for example including a vocal track, a harmonic/instrumental track and a mix of the vocal track and the harmonic/instrumental track.
- CNN convolutional neural network
- Examples for such conventional AI systems capable of separating source tracks such as a singing voice track from a mixed audio signal include: Prétet, “Singing Voice Separation: A study on training data”, Acoustics, Speech and Signal Processing (ICASSP), 2019, pages 506-510; “spleeter”—an open-source tool provided by the music streaming company Deezer based on the teaching of Prétet above, “PhonicMind” (https://phonicmind.com)—a voice and source separator based on deep neural networks, “Open-Unmix”—a music source separator based on deep neural networks in the frequency domain, or “Demucs” by Facebook AI Research—a music source separator based on deep neural networks in the waveform domain.
- These tools accept music files in standard formats (for example MP3, WAV, AIFF) and decompose the song to provide decomposed/separated tracks of the song, for example a vocal track, a bass track, a drum track, an accompaniment track or any mixture thereof.
- standard formats for example MP3, WAV, AIFF
- the step of decomposing the mixed input data includes processing the mixed input data, in particular the first audio track and/or the second audio track, within an AI system comprising a trained neural network.
- AI systems achieve a high level of quality and in particular allow decomposing different timbres of a mixed audio signal, which in particular may correspond or resemble certain source tracks that were originally mixed when producing or generating the input audio track, such as certain instrumental tracks, vocal tracks, drum tracks etc.
- the step of decomposing may include decomposing the first/second audio tracks with regard to predetermined timbres such as to obtain decomposed signals of different timbres, preferably being selected from the group consisting of a vocal timbre, a non-vocal timbre, a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, and any combination thereof.
- the non-vocal timbre, the non-drum timbre and the non-harmonic timbre may in particular be respective complement signals to that of the vocal timbre, the drum timbre and the harmonic timbre.
- Complement signals may be obtained by excising from the input signal one decomposed signal of a specific timbre.
- an input signal may be decomposed or separated into two decomposed signals, a decomposed vocal signal of a vocal timbre, and its complement, a decomposed non-vocal signal of a non-vocal timbre, which means that a mixture of the decomposed vocal signal and the decomposed non-vocal signal results in a signal substantially equal to the input signal.
- decomposition can be carried out to obtain a decomposed vocal track and a plurality of decomposed non-vocal tracks such as a decomposed drum track and a decomposed harmonic track (including harmonic instruments such as guitars, piano, synthesizer).
- At least one of the steps of analyzing the audio data and generating the output track may include processing of audio data within an AI system comprising a trained neural network.
- a neural network capable of analyzing audio data to determine at least one mixing parameter as described above may be obtained by training using training data containing a plurality of pieces of music together with data relating to the respective musical structure, such as beat grid, downbeat position, key, chord progression, song parts or song part junctions. After the training process, the neural network may then be capable of detecting such mixing parameters based on decomposed data of new pieces of music.
- a neural network suitable for generating the output track may be trained using training data in which each set of training data contains two audio tracks and one or more associated mixing parameters suitable for mixing the two audio tracks without dissonances or sound artefacts.
- the trained neural network will then be capable of mixing new audio tracks based on at least one mixing parameter determined by analyzing decomposed data and additional mixing parameters determined through artificial intelligence (AI).
- AI artificial intelligence
- the method of the present invention may generally be used in all situations of audio processing, in which two audio tracks are to be mixed.
- the present invention may be implemented as a plugin or in the form of any other suitable software algorithm in order to help a user to mix different audio tracks referring to different instruments, song parts, songs or other audio signals in general.
- the method may be used in a DJ environment, for example in a DJ software application, in order to assist a DJ when mixing a piece of music with any other audio signal such as a second piece of music, and even to allow automatic, autonomous mixes without needing any human supervision.
- the method of the present invention may further include a step of playing the output track, including a playback through a PA system, loudspeakers, headphones or any other sound-reproducing equipment.
- the method of the present invention can be applied to any type of input audio track.
- the input audio track may be stored on a local device such as a storing means of a computer, and may be present as a digital audio file.
- the first audio track or the second audio track may be received as a continuous stream, for example a data stream received via Internet, a real-time audio stream received from a live audio source or from a playback device in playback mode.
- the range of applications is basically not limited to a specific medium.
- playback of the output track may be started while continuing to receive the continuous stream.
- decomposing first and/or second audio tracks is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal.
- Partitioning the first and/or second input signals into segments (preferably segments of equal lengths) and operating the method of the invention based on these segments allows using the decomposition result for generating the output track at an earlier point in time, i.e. after finishing decomposition of just one segment, without having to wait until the decomposition result of an entire audio file for example is available.
- decomposition of the second audio track can start at an arbitrary point within the second audio track. For example, when a transition is to be made from the first audio track towards the second audio track such as to start playback of the second audio track at e.g. 01:20 (one minute, twenty seconds), decomposition of the second audio track can start at the segment closest to 01:20, and the beginning part of the second audio track which is not used does not have to be decomposed. This saves performance and ensures that decomposition results are available much faster.
- one segment has a playback duration which smaller than 20 seconds.
- the method steps in particular the steps of providing the first and second audio tracks, decomposing the mixed input data, analyzing the decomposed data and generating the output track, may be carried out in a continuous process, wherein a time shift between receiving the first audio track or a first portion of a continuous stream of the first audio track and obtaining the output track or the first segments of the output track is preferably less than 10 seconds, more preferably less than 2 seconds.
- At least one, preferably all of the mixed input data, the first and second audio tracks, the decomposed data, the output track, and the first and second output data represent stereo signals, each comprising a left channel signal portion and a right channel signal portion, respectively.
- the method is thus suitable for playing music at high quality.
- a device for processing audio data preferably device adapted to carry out a method according to at least one of the preceding as described in the above claims, said device comprising a first input unit for receiving a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; a second input unit for receiving a second audio track; a decomposition unit for decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; an analyzing unit for analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter; and an output generation unit for generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- the second aspect of the invention provides a device having similar or corresponding features as the method of the first aspect of the present invention described above. Therefore, similar or corresponding effects and advantages may be achieved by a device of the second aspect of the present invention as described above for the first aspect of the present invention.
- a device of the second aspect of the invention may be adapted to carry out a method of the first aspect of the present invention.
- embodiments of the device of the second aspect of the present invention may be particularly adapted to carry out one or more of the steps described above for embodiments of the first aspect of the present invention in order to achieve the same effects and advantages.
- the device of the second aspect of the present invention is preferably embodied as a computer, in particular a table, a smartphone, a smartwatch or another wearable device and may include in the manner as conventionally known a RAM, a ROM, a microprocessor and suitable input/output means. Included in the computer or connected to the computer may be an audio interface which may be connected, for example wireless (e.g. via Bluetooth or similar technology), to speakers, headphones or a PA system in order to output sound when playing the first and second output signals, respectively. As a further alternative, the device may be embodied as a standalone DJ device including suitable electronic hardware or computing means.
- the device is running a suitable software application in order to control its hardware components, usually standard hardware components of general purpose computers, tablets, smartphones, smartwatches or other wearable devices, such as to function as units of the device of the second embodiment and/or such as to implement the steps of the method of the first embodiment of the invention.
- a suitable software application in order to control its hardware components, usually standard hardware components of general purpose computers, tablets, smartphones, smartwatches or other wearable devices, such as to function as units of the device of the second embodiment and/or such as to implement the steps of the method of the first embodiment of the invention.
- the device preferably has a decomposition unit which includes the AI system comprising a trained neural network.
- the complete AI system including the trained neural network may be integrated within the device, for example as a software application or software plugin running locally in a memory integrated within the device.
- the device preferably includes a user interface embodied by either a display such as a touch display or a display to be operated by a pointer device, or as one or more hardware control elements such as a hardware fader or rotatable hardware knobs, or by a voice command or by any other user input/output technology.
- the above-mentioned object is achieved by a computer program which is adapted, when run on a computer, such as a tablet, a smartphone, a smartwatch or another wearable device, to carry out a method according to the first aspect of the present invention, or to control the computer as a device according to the second aspect of the present invention.
- a computer program according to the third aspect of the present invention therefore achieves the same or corresponding effects and advantages as described above for the first and second aspects of the present invention.
- the above-mentioned object is achieved by a method for processing audio data, comprising the steps of providing an audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; and analyzing the decomposed data to determine a transition point or a song part junction between a first song part and a second song part within the audio track, or to determine any other track parameter.
- a method of the fourth aspect of the present invention allows determination of one or more song part junctions within a piece of music based on analyzing decomposed data.
- a song structure of an audio track containing mixed input data i.e. a song containing a plurality of different timbres
- Song parts may therefore be determined more accurately and more reliably.
- the junction between the song parts provide valuable information to the user, in particular to a DJ or an audio engineer during music production.
- one or more junctions within a piece of music may be indicated graphically on a screen, and the method may allow a user to control a mixing process based on the one or more junctions, for example to jump to a junction, to cut out a song part between two junctions, to time-shift songs such as to synchronize junctions, etc.
- the method of the fourth aspect allows determination of any other track parameter, such as at least one of a tempo, a beat, a BPM value, a beat grid, a beat phase, a key and a chord progression of the respective audio track.
- a method for processing audio data comprising the steps of providing a set of audio tracks, each including mixed input data, said mixed input data representing audio signals containing a plurality of different timbres; decomposing each audio track of the set of audio tracks, such as to obtain a decomposed track associated with the respective audio track, wherein the decomposed track represents an audio signal containing at least one, but not all, of the plurality of different timbres of the respective audio track, thereby obtaining a set of decomposed tracks; analyzing each decomposed track of the set of decomposed tracks to determine at least one track parameter of the respective audio track which the decomposed track is associated with; selecting or allowing a user to select at least one selected audio track out of the set of audio tracks, based on at least one of the track parameters; and generating an output track based on the at least one selected audio track.
- a method of the fifth aspect of the present invention basically assists a user in selecting one of a plurality of audio tracks for further processing, in particular mixing, editing and playback.
- a user is to select one of a plurality of pieces of music
- conventional metadata available for music provided through conventional music distribution services such as through Internet streaming providers
- the method according to the fifth embodiment of the invention allows adding additional information related to the musical content of the particular audio tracks in the form of the at least one track parameter, wherein the track parameter, according to the fifth aspect of the invention, is determined through analyzing at least one decomposed track obtained from the particular audio track.
- the selection of songs is greatly assisted, in particular in cases where the candidate pieces of music are partially or fully unknown to the user. Selection and processing of music is thus improved in particular for unexperienced users or when less common pieces of music are to be selected.
- automatic selection of audio tracks by an algorithm based on the track parameter, without user interaction can be implemented.
- playlists could automatically be generated based on timbres or proportions of individual timbres included in the audio tracks. For example, a non-vocal playlist or instrumental playlist could be generated by automatic selection of songs that do not contain vocal timbres.
- the track parameter may refer to at least one timbre of the respective audio track.
- the user may therefore be informed about timbres contained in the plurality of audio tracks.
- the method may indicate to a user which of a plurality of audio tracks contains vocal music or which tracks contain a predominant piano timbre. Audio tracks may be suitably marked or highlighted such as to inform the user about the timbres included therein, or the method may allow for sorting or filtering a list of audio tracks based on timbres.
- a DJ currently playing a song that includes vocals may look for a second song predominantly containing a guitar or a piano timbre, wherein the method of the fifth aspect of the invention may assist and accelerate such selection and/or even allow selection of guitar/piano songs from a list of audio tracks unknown to the user as such.
- the method of the fifth aspect of the invention may be useful to accelerate the process of selecting a suitable audio track.
- the track parameter may refer to at least one of a tempo, a beat, a BPM value, a beat grid, a beat phase, a key and a chord progression of the respective audio track.
- the at least one track parameter may likewise be indicated to the user by virtue of a suitable graphical representation, highlighting, coloring or numeral representation.
- sorting or filtering of lists of audio tracks may be based on the at least one track parameter.
- the method according to the fifth aspect of the invention may be used to search for a second song among a set of audio tracks, which contains the same or at least partially the same chord progression as the first song, such that mixing of the two songs or crossfading between the songs will result in a particularly continuous sound of the output track without audible breaks or dissonances.
- the selected audio track is just played back, in particular without mixing, editing or otherwise changing its content.
- the method of the fifth aspect of the invention may in particular be applied in a music player and may assist a user in finding and selecting a desired song for playback.
- the at least one track parameter relates to a beat grid of the respective audio tracks (for example a time signature)
- a user may be enabled to easily find songs of certain beat grids, for example three-four time songs from among a plurality of audio tracks.
- a second audio track may contain mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, wherein the mixed input data are decomposed to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, wherein analyzing may be carried out taking into account the decomposed data obtained from the second audio track. Accordingly, in the step of analyzing and determining the at least one mixing parameter, both the first audio track and the second audio track may be analyzed on the basis of their respective decomposed data.
- parameters such as tempo, beat, BPM value, beat grid (the beats contained within a song, optionally including information about at least one of time signature, emphases and downbeat positions), beat phase, key, chord progression, song parts and song part junctions, etc.
- the mixed input data of the first and/or second audio track are decomposed to obtain at least decomposed data of a vocal timbre, decomposed data of a harmonic timbre and decomposed data of a drum timbre or to obtain exactly three decomposed tracks which are a decomposed track of a vocal timbre, a decomposed track of a harmonic timbre and a decomposed track of a drum timbre, wherein the three tracks preferably sum up to an audio track substantially equal to the first and/or second audio track, respectively.
- a vocal timbre may include a simple vocal component or a mixture of different vocal components of the piece of music.
- a drum timbre may include the sound of a single drum instrument, a drum ensemble, a percussion instrument, etc.
- the drum timbre does usually not contain harmonic information.
- a harmonic timbre may include timbres of harmonic instruments such as a piano, a guitar, synthesizers, brass, etc.
- Decomposition into vocal, drum and harmonic timbres produces the most important components defining the musical content and structure of most music, in particular most pieces of western music. Such decomposition therefore provides a good yet efficient basis for analyzing the audio data and determining at least one mixing parameter and/or at least one track parameter.
- decomposition into vocal, drum and harmonic timbres greatly assists the mixing process, i.e. generation of an output track based on mixing two or more of the decomposed tracks.
- FIG. 1 a shows a device according to an embodiment of the present invention
- FIG. 1 b shows a song select window that may be displayed by a device of the embodiment of the invention
- FIG. 2 shows a schematic functional diagram of components of the device of the embodiment shown in FIG. 1 a
- FIG. 3 shows a schematic illustration of an example mode of operation of the device shown in FIG. 1 a , 1 b and 2 , and a method for processing audio data according to an embodiment of the invention.
- a device 10 may be formed by a computer such as a tablet computer, a smartphone, a smartwatch or another wearable device, which comprises standard hardware components such as input/output ports, wireless connectivity, a housing, a touchscreen, an internal storage as well as a plurality of microprocessors, RAM and ROM.
- Essential features of the present invention are implemented in device 10 by means of a suitable software application or a software plugin running on device 10 .
- the display of device 10 preferably has a first section 12 a associated to a first song A and a second section 12 b associated to a second song B.
- First section 12 a includes a first waveform display region 14 a which displays at least one graphical representation of song A, in particular one or more waveform signals associated to song A.
- the first waveform display region 14 a may display a waveform of song A and/or one or more waveforms of decomposed signals obtained from decomposing song A.
- decomposition of song A may be carried out to obtain a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which may be displayed within the first waveform display region 14 a .
- a second waveform display region 14 b may be included in the second section 12 b such as to display a graphical representation related to song B in the same or corresponding manner as described above for song A.
- the second waveform display region 14 b may display one or more waveforms of song B and/or at least one waveform of a decomposed signal obtained from song B.
- first and second waveform display regions 14 a , 14 b may each display a play-head 16 a , 16 b , respectively, which show a current playback position within song A and song B, respectively.
- the first waveform display region 14 a may have a song select button A, which may be pressed by a user to select song A from among a plurality of audio tracks offered by an Internet provider or stored on a local storage device.
- a second waveform display region 14 b includes a song select button B, which may be activated by a user to select song B from a plurality of audio tracks.
- FIG. 1 b shows an example of a song select window, which may pop up when song select button A is activated by a user.
- the song select window offers a list of audio tracks and invites the user to select one of the audio tracks as song A.
- the list of audio tracks as shown in FIG. 1 b shows metadata of each audio track which include, for each audio track, a title, an artist name, a track length, a BPM value, a main timbre and timbre component data referring to proportions of individual timbres within the audio track.
- the title, the artist and the track length may be directly read from metadata of the audio file as usually provided through commercial music providers, or may be stored as metadata together with the audio data of the audio track on a storage device
- the BPM value, the main timbre and the timbre component data are examples for track parameters in the sense of the present invention, which are usually not provided by the distributors with the original audio tracks but which are obtained by device 10 according to the embodiment of the invention through decomposing the particular audio track and then analyzing the decomposed data.
- a BPM value can be obtained for a given audio track.
- a plurality of decomposed tracks associated to particular timbres such as a vocal timbre, a harmonic/instrumental timbre or a drum timbre
- information regarding the presence and/or distribution (i.e. relative proportions) of certain timbres, i.e. certain instruments can be obtained.
- a predominant timbre of an audio track can be determined, which represents a main character of the music contained in the audio track and is denoted as a main timbre for each audio track in the example of FIG. 1 b .
- a proportion of a drum timbre within the audio track is indicated by a drum proportion indicator
- a proportion of a harmonic/instrumental timbre within the audio track is indicated by a harmonic/instrumental indicator
- a proportion of a vocal timbre within the audio track is indicated by a vocal indicator.
- the indicators may be formed by level indicators showing the proportion of the respective timbre from a minimum value (not present, for example 0) to a maximum value (maximum proportion, for example 5).
- device 10 may analyze decomposed harmonic tracks (instrumental, vocals etc.) of the audio tracks in order to determine a key or a chord progression as track parameters of the audio tracks.
- each of the first and second sections 12 a and 12 b may further include a number of control elements for controlling playback, effects and other features related to song A and song B, respectively.
- the first section 12 a may include a play button 18 a which can be pushed by a user to alternatively start and stop playback of song A (more precisely audio signals obtained from Song A, such as decomposed signals).
- the second section 12 b may include a play button 18 b which may be pushed by a user to alternatively start and stop playback of song B (more precisely audio signals obtained from Song B, such as decomposed signals).
- An output signal generated by device 10 in accordance with the settings of device 10 and with a control input received from a user may be output at an output port 20 in digital or analog format, such as to be transmitted to a further audio processing unit or directly to a PA system, speakers or head phones. Alternatively, the output signal may be output through internal speakers of device 10 .
- device 10 can perform a smooth transition from playback of song A to playback of song B by virtue of a transition unit, which will be explained in more detail below.
- device 10 may comprise a transition button 22 displayed on the display of device 10 , which may be pushed by a user to initiate a transition from playback of song A towards playback of song B.
- transition button 22 By a single operation of transition button 22 (pushing the button 22 ), device 10 starts changing individual volumes of individual decomposed signals of songs A and B according to respective transition functions (volume level as a function of time) such as to smoothly cross-fade from song A to song B within a predetermined transition time interval.
- Pressing the transition button 22 can directly or immediately start the transition from song A to song B or may control a transition unit, which is to be described in more detail later, such as to analyze decomposed signals of song A and/or song B in order to determine at least one mixing parameter and to play an automatic transition based on the at least one mixing parameter.
- a suitable transition point i.e. a suitable first transition point on the timeline of song A and/or a suitable second transition point on the timeline of song B, and/or a length of a transition portion (duration of the transition) may be determined by the transition unit in response to an activation of transition button 22 .
- device 10 may include a transition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B.
- a transition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B. This allows controlling the volumes of individual decomposed signals of songs A and B using transition functions, which are based not on time but on controller position of the transition controller 24 . In this manner, in particular the speed and progress of the transition can manually be controlled through the transition controller 24 .
- FIG. 2 shows a schematic illustration of internal components of device 10 and a signal flow within device 10 .
- Audio processing is based on a first input track and a second input track, which may be stored within the device 10 , for example in an internal memory of the device, a hard drive or any other storage medium.
- First and second input tracks are preferably digital audio files of a standard compressed or uncompressed audio file format such as mp3, WAV, AIFF or the like.
- first and second input tracks may be received as continuous streams, for example via an Internet connection of device 10 or from an external playback device via an input audio interface or via a microphone.
- First and second input tracks are preferably processed within first and second input units 26 a and 26 b , respectively, which may be configured to decrypt or decompress the audio data, if necessary, and/or may be configured to extract a segment of the first input track and a segment of the second input track in order to continue processing based on the segments.
- This has an advantage that time-consuming processing algorithms, such as the decomposition based on a neural network, will not have to analyze the entire first or second input track upfront, but will perform processing based on shorter segments, which allows continuing processing and eventually start playback at an earlier point in time.
- the output of the first and second input units 26 a , 26 b form first and second input signals, and they are input into first and second AI systems 28 a , 28 b of a decomposition unit 40 .
- Each AI system 28 a , 28 b includes a neural network trained to decompose the first and second input signals, respectively, with respect to sound components of different timbres.
- Decomposition unit 40 thus decomposes the first input signal to obtain a first group of decomposed signals and decomposes the second input signal to obtain a second group of decomposed signals.
- each group of decomposed signals includes a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which each form a complete set of decomposed signals or a complete decomposition, which means that a sum of all decomposed signals of the first group will resemble the first input signal, and the sum of all decomposed signals of the second group will resemble the second input signal.
- decomposition unit 40 may also include only one AI system and only one neural network, which is trained and configured to determine all decomposed signals of the first input signal as well as all decomposed signals of the second input signal.
- more than two AI systems may be used, for example a separate AI system and a separate neural network may be used to generate each of the decomposed signals.
- Playback unit 42 comprises a transition unit 44 , which is basically adapted to recombine the decomposed signals of both groups taking into account specific volume levels associated to each of the decomposed signals.
- Transition unit 44 is configured to recombine the decomposed signals in such a manner as to either play only a first output signal obtained from a sum of all decomposed signals of the first input signal, or a second output signal obtained from a sum of all decomposed signals of the second input signal, or any transition in between the first and the second output signals where decomposed signals of both first and second input signals are played.
- transition unit 44 may store individual transition functions DA, VA, HA, DB, VB, HB for each of the decomposed signals which each define a specific volume level for each time frame within a transition interval, i.e. a time interval in which one of the songs A and B is crossfaded into the other song (first and second output signals are crossfaded in one or the other direction), or for each controller position of the transfer controller within a controller range. Taking into account the respective volume levels according to the respective transition functions DA, VA, HA, DB, VB, HB, all decomposed signals will then be recombined to obtain the output signal.
- Playback unit 42 may further include a control unit 45 , which is adapted to control at least one or the transition functions DA, VA, HA, DB, VB, HB based on a user input.
- the output signal generated by playback unit 42 may then be routed to an output audio interface 46 for a sound output.
- one or more sound effects may be inserted into the audio signal by means of one or more effect chains 48 .
- effect chain 48 is located between playback unit 42 and output audio interface 46 .
- FIG. 3 illustrates an operation of transition unit 44 according to an embodiment of the present invention and a method for processing audio data according to an embodiment of the present invention.
- Decomposed data as received from the first input track (first audio track) representing song A comprises, in the particular embodiment, a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal (denoted by drum, vocal and harmonic in FIG. 3 ).
- Decomposed data received from the second input track (second audio track) relating to song B comprises a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal (denoted by drum, vocal and harmonic in FIG. 3 ).
- the decomposed signals are each shown by respective waveforms, wherein the horizontal axis represents the timeline of song A and the timeline of song B, respectively, and the vertical axis represents the time-dependent amplitude of the corresponding audio signal.
- the decomposed signals are analyzed to determine at least one mixing parameter.
- the decomposed drum signal of song A is analyzed to determine, inter alia, a tempo value, a BPM value and a beat grid of song A
- a decomposed drum signal of song B is analyzed to determine, inter alia, a tempo value, a BPM value and a beat grid of song B.
- the algorithm can then determine a rhythmic pattern of song A including a first beat at the beginning of song A at a time t0, a sequence of beats following one another at substantially equal time intervals, wherein four beats form a bar and therefore a beat grid of a four-four time type.
- the bars are denoted by vertical lines, wherein each bar includes four beats that are not illustrated.
- transition unit 44 analyzes the decomposed drum signal of song B in order to determine beats, bars, a tempo, a BPM value, a beat grid etc., as mixing parameters of song B.
- a structure of song A and/or song B i.e. a sequence of song parts such as intro, verse, bridge, chorus, interlude and outro, may be detected as mixing parameters by analyzing the decomposed data.
- the decomposed drum signal of song A shows a first pattern within the first four bars of the song, whereas in the following eight bars (bars 5 to 12), the drum timbre shows a second pattern different from the first pattern.
- the drum timbre shows a second pattern different from the first pattern.
- silence is detected in the decomposed drum signal, which means that the drums have a break for eight bars. Then, throughout the rest of song A, the decomposed drum data again show the first pattern.
- analyzing the decomposed vocal signal reveals that the first four bars as well as the last four bars of song A do not contain vocals (decomposed vocal signal is silent), whereas the rest of song A contains vocals.
- the decomposed harmonic signal is analyzed by a chord/harmony detection algorithm known as such in the prior art such as to detect a chord progression of the harmonic components of song A. Since the decomposed harmonic signal does not contain the vocal components and the drum components of the original audio track, the chord/harmony detection algorithm can be operated with much higher accuracy and reliability. Accordingly, a sequence of chords is detected, which usually changes for each bar.
- the chord progression shows a four-bar pattern which repeats three times within the first 12 bars, i.e. a pattern G major, D major, E minor, C major.
- a pattern G major, D major, E minor, C major In the following eight bars (bars 13 to 20), the chord progression deviates from the before-mentioned four-bar pattern and now shows a new four-bar pattern D major, E minor, C major, C major, which is repeated once to obtain eight bars in total.
- the first four-part pattern that was played at the beginning of song A is again repeated until the end of song A.
- the method according to the embodiment of the invention can deduce, from analyzing the three decomposed signals of song A, particular song parts, namely a first song part that may be called “intro”, forming the first four bars of song A, and a second song part which may be called “verse 1” forming the following eight bars after the intro, a third song part which may be called “bridge” forming the following eight bars after verse 1, a fourth song part which may be called “chorus 1” forming the following eight bars after the bridge, a fifth song part which may be called “interlude” forming the following four bars after chorus 1, a sixth song part which may be called “chorus 2” forming the following eight bars after the interlude, and a seventh song part which may be called “outro” forming the following four bars after chorus 2.
- the method thus recognizes different song parts and corresponding song part junctions, i.e. the junction between the last bar of a previous song part and the first bar of a following song part.
- the method may determine a song structure of song B by analyzing the decomposed drum signal, the decomposed vocal signal and the decomposed harmonic signal of song B.
- the method may determine that song B has a song structure comprising four bars of intro, eight bars of verse 1, eight bars of chorus 1, eight bars of verse 2, eight bars of chorus 2 and four bars of outro.
- the mixing parameters determined based on an analysis of the decomposed data of song A and song B as described above may be used by device 10 and in a method according to the embodiment of the present invention for assisting a DJ in mixing songs A and B or for achieving semi-automatic or even automatic mixing of songs A and B.
- the mixing parameters described above may simply be displayed on a screen of device 10 such as to inform a user of the device 10 , in particular show the detected song parts and thereby assist mixing.
- a DJ may recognize certain song parts or song part junctions as suitable transition points at which a crossfade from song A to song B or vice versa can suitably be initiated, for example by pressing transition button 22 or operating transition controller 24 at a suitable point in time.
- the device 10 and the method according to the embodiment of the invention may automatically generate an output track by automatically mixing songs A and B, for example by playing a transition from song A to song B at a suitable point in time as determined from the song structure.
- transition points may be determined as the mixing parameters based on the detected song parts. For example a first transition point on the timeline of song A may be the end of the interlude of song A, whereas a second transition point on the timeline of song B may be the beginning of chorus 1 of song B.
- the device 10 may then generate an output track that plays song A from its beginning to shortly before the end of the interlude, then plays a cross fade to song B starting song B at the beginning of its chorus 1, and then plays the rest of song B from the beginning of chorus 1 till the outro of song B.
- Other examples for suitable transition points would be the end of chorus 2 of song A on the one hand, and the beginning of verse 1 of song B (or the beginning of chorus 1 of song B) on the other hand.
- song B could be played almost from the beginning after song A has reached almost its end. This could be used as an automatic crossfade between subsequent songs of a playlist, for example.
- T1 transition start time
- T3 transition end time
- step of decomposing includes processing the first audio signal and/or the second audio signal within an AI system comprising a trained neural network.
- step of decomposing includes decomposing the first audio signal and/or the second audio signal with regard to predetermined timbres, such as to obtain decomposed signals of different timbres, said timbres preferably being selected from the group consisting of:
- Method of item 9 wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
- Method of item 9 or item 10 wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or wherein at least at a transition reference time and/or at a controller reference position a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function.
- Method of at least one of the preceding items further including a step of analyzing an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
- song parts of a song are usually distinguishable by an analyzing algorithm since they differ in several characteristics such as instrumental density, medium pitch or rhythmic pattern.
- Song parts may in particular be a verse, a chorus, a bridge, an intro or an outro as conventionally known.
- Certain instrumental or rhythmic patterns will remain constant within a song part and will change in the next song part.
- Recognition of song parts may be supported by analyzing not only the entire input signal but instead or in addition thereto at least one of the decomposed signals, as described in item 14. For example, by analyzing a decomposed bass signal in isolation from the remaining sound components, it will be easy to derive therefrom a chord progression of the song which is one of the key criteria to differentiate song parts.
- an analysis of the decomposed drum signals allows a more accurate recognition of a rhythmic pattern and thus a more accurate detection of certain song parts.
- a song part junction then refers to a junction between one song part and the next song part.
- transition time intervals may include song part junctions which allow to carry out the transition between two songs at the end of the song part which further improves smoothness and likeability of the transition.
- Song parts may be detected by analyzing at least one of the decomposed signals within an AI system comprising a trained neural network.
- such analyzing includes detecting silence within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than ⁇ 30 dB.
- the step of analyzing decomposed signals may include detecting silence continuously extending over a predetermined time span within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than ⁇ 30 dB.
- start- and/or end points of silence may be taken as song part junctions.
- first input audio track and or the second input audio track are received as a continuous stream, for example a data stream received via internet, a real-time audio stream received from a live audio source or from a playback device in playback mode, and wherein playback of the first output signal and/or second output signal is started while continuing to receive the continuous stream.
- decomposing first and/or second input signal is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal.
- Method of at least one of the preceding items wherein the method steps, in particular the steps of providing the first and second input signals, decomposing the first input signal, starting playback of the first output signal and starting playback of the second output signal, are carried out in a continuous process, wherein a time shift between receiving the first input audio track or a first portion of a continuous stream of the first input audio track and starting playback of the first output signal is preferably less than 10 seconds, more preferably less than 2 seconds, and/or wherein a time shift between receiving the second input audio track or a first portion of a continuous stream of the second input audio track and starting playback of the second output signal is preferably less than 10 seconds, more preferably less than 2 seconds.
- Device for processing audio signals comprising:
- the decomposition unit is configured to decompose the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal
- Device of item 30 wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
- the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or wherein a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1) and the transition end time (T3) and/or between the controller first end position and the controller second end position.
- Device of at least one of items 22 to 34 further including an analyzing unit configured to analyze an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
- an analyzing unit configured to analyze an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
- Device of at least one of items 22 to 35 further including a user interface configured to accept a user input referring to a transition command, including at least one transition parameter, wherein the transition unit is configured to set at least one of the transition functions according to the transition parameter, wherein the transition parameter is preferably selected from the group consisting of:
- Device of item 36 wherein the device includes a display unit configured to display a graphical representation of the first input audio track and/or the second input audio track, wherein the user interface is configured to receive at least one transition parameter through a selection or marker applied by the user in relation to the graphical representation of the first input audio track and/or the second input audio track.
- Device of item 36 or item 37 wherein the device includes a display unit configured to display a graphical representation of at least one of the decomposed signals, wherein the user interface is configured to allow a user to assign or deassign a preset transition function to or from a selected one of the plurality of decomposed tracks.
- Device of at least one of items 22 to 38 further comprising a tempo matching unit configured to determine a tempo of the first and/or second input track, and to carry out a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching tempos.
- a tempo matching unit configured to determine a tempo of the first and/or second input track, and to carry out a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching tempos.
- Device of at least one of items 22 to 39 further comprising a key matching unit configured to determine a key of the first and/or second input track, and to carry out a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
- a key matching unit configured to determine a key of the first and/or second input track, and to carry out a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
- a transition point as mentioned in the first to fifth aspects of the invention and in the claims may correspond to any of the transition start time, the transition end time and the transition reference time as described in the above items.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Otolaryngology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Electrophonic Musical Instruments (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
- Machine Translation (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Abstract
Description
- The present invention relates to a method for processing audio data based on one or more audio tracks of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres.
- Processing and reproducing audio data frequently involves mixing of different audio files. For example, in a DJ environment, two different audio tracks representing two different pieces of music are used to be mixed when a DJ crossfades from one of the pieces of music to the other such as to avoid any audible interruption in the music performance. In other applications, such as during music production in a digital audio workstation (DAW), a mixing engineer mixes different audio tracks representing different instruments, vocals, etc. In a yet further example, during live broadcasting or live recording of a concert, a sound engineer is recording different audio sources such as different instruments or voices, by means of a plurality of microphones, pickups, etc., so as to produce mixed audio data for transmission through radio/TV broadcasting services or via the Internet.
- In all cases, mixing of audio tracks requires a significant amount of work of an experienced audio engineer or DJ to provide a satisfactory mixing result. The main parameters for successfully mixing audio tracks comprise the volumes of the audio tracks, the timing or phase of the audio tracks relative to one another, and audio effects that may be applied to the individual audio tracks before mixing. In order to correctly set those parameters such as to avoid any audio artefacts, dissonances or timing inaccuracies, the audio engineer may obtain information about the musical content of the individual audio tracks, including for example a key of the music, a tempo, a beat grid (time signature, beat emphases or accents etc.) or a particular instrument or a group of instruments contained in the audio tracks. Other relevant information relate to certain song parts such as a verse, a chorus, a bridge, an intro or an outro of a song. The audio engineer usually takes into account all of these parameters as mixing parameters when deciding about a suitable process for mixing particular audio tracks during production, processing or reproducing of audio.
- As a particular example, a DJ intending to change the song currently played usually tries to find a suitable transition point between the two songs, i.e. a point in time within the first song at which the first song is faded out, and a point in time within the second song at which the second song is faded in. For example, it may be advantageous to fade out the first song at the end of a chorus of the first song and, at the same time, to fade in the second song with the beginning of a verse of the second song. Accordingly, the DJ needs to determine the song parts of both songs in order to find a suitable transition point including a suitable timing for starting the second song. Furthermore, a transition between two songs can sound particularly smooth if both songs have the same or matching chords at the transition points and/or if both songs have mutually matching timbres, i.e. timbres which mix well with one another, for example a drum timbre and a piano timbre, while avoiding clashing of certain timbres, for example two vocal timbres at the same point in time at the transition point.
- As a result, the mixing of audio tracks requires a large amount of experience and attention of an audio engineer such that mixing of audio tracks is limited to professional applications.
- It was therefore an object of the present invention to provide a method and a device for processing audio data which assist mixing of audio tracks, in particular to obtain one or more mixing parameters that can be used to determine suitable mixing conditions or even allow semi-automatic or automatic mixing of audio tracks.
- According to a first aspect of the present invention, this object is achieved by a method for processing audio data, comprising the steps of providing a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; providing a second audio track; analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter; generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- Therefore, according to an important feature of the present invention, at least the mixed input data of the first audio track are decomposed such as to extract therefrom decomposed data representing only some of the timbres of the mixed input data, and the decomposed data are analyzed to determine at least one mixing parameter. Mixing of first and second audio tracks is then performed based on the at least one mixing parameter.
- By decomposing the mixed input data according to the different timbres contained therein, a content of the audio information contained in the mixed input data is accessible at a significantly higher level or is even made available for analysis at all.
- For example, detection of the beats of a song can be achieved with higher accuracy when separating a drum timbre, and detecting a key or a chord progression of a piece of music can be achieved with higher certainty by analyzing decomposed data representing a bass timbre. The output track may then be generated by matching the beats or matching the keys of the two audio tracks before mixing the audio tracks.
- In the present disclosure, audio tracks, in particular the first audio track and the second audio track, may include digital audio data such as contained in audio files or digital audio streams. The files or streams may have a specific length or playback duration or alternatively may have an undefined or infinitive length or playback duration, such as for example in case of a live stream or a continuous data stream received from a content provider via Internet. Note that digital audio tracks are usually stored in an audio file in association with consecutive time frames, the length of each time frame being dependent on the sampling rate of the audio data as conventionally known. For example, in an audio file sampled at a sampling rate of 44.1 kHz one time frame will have a length of 0.023 ms. Furthermore, audio tracks may be embodied by analog audio signals, for example signals played by an analog playback device such as a vinyl player, a tape player etc. In specific embodiments, audio tracks may be songs or other pieces of music provided in digital or analog format.
- Furthermore, the term “audio signal” refers to an audio track or any part or portion of an audio track at a certain position or time within the audio track. The audio signal may be a digital signal processed, stored or transmitted through an electronic control system, in particular computer hardware, or may be an analog signal processed, stored or transmitted by analog audio hardware such as an analog mixer, a PA system or the like.
- In a preferred embodiment of the present invention, the output track may comprise a first portion containing predominantly the first output data, and a second portion arranged after said first portion and containing predominantly the second output data. This method may be used in a DJ environment, in particular when mixing two songs using DJ equipment. In the first portion of the output track, only the first song is played as the first output data, while in a second portion only the second song is played as the second output data. The output track therefore switches from playback of the first song to playback of the second song.
- In the above embodiment, the step of analyzing audio data may include analyzing the decomposed data to determine a transition point as the mixing parameter, and the output track may be generated using the transition point such that the first portion is arranged before the transition point, and the second portion is arranged after the transition point. Thus, in a DJ application in which playback is switched from a first song to a second song, the method of the present invention may be used to find a suitable transition point at which playback of the songs are swapped.
- In particular, a transition point on the timeline of the output track may be defined by a first transition point on the timeline of the first audio track (e.g. corresponding to the first song) and a second transition point on the timeline of the second audio track (e.g. corresponding to the second song), wherein the output track then comprises the first portion containing predominantly the first output data obtained from the first audio track in a portion before the first transition point, and comprises the second portion containing predominantly the second output data obtained from the second audio track in a portion after the second transition point. Thus, the method of the invention may in particular include decomposing the first audio track to obtain first decomposed data, decomposing the second audio track to obtain second decomposed data, analyzing the first decomposed data to determine the first transition point as a first mixing parameter, analyzing the second decomposed data to determine the second transition point as a second mixing parameter, and generating the output track based on the first and second mixing parameters, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- Since the decomposed data are analyzed, the transition point(s) may be found more appropriately to allow a smooth transition between the songs, for example at a point where the decomposed drum track has a break or pause such that abrupt rhythmic changes can be avoided. In another example, by analyzing a decomposed bass track, in particular a chord progression defined by the bass track, the end of a chorus, a verse or any other song part can be determined automatically and a transition point can be determined at a junction between adjacent song parts.
- Moreover, in the embodiments described above, the output track may further include a transition portion, which is a time interval larger than zero, arranged between the first portion and the second portion and associated to (including) the transition point on the timeline of the output track, wherein in the transition portion a volume level of the first output data is reduced and/or a volume level of the second output data is increased. Therefore, within some sections of the transition portion or even during the entire transition portion, first output data and second output data overlap, i.e. are mixed to be played at the same time, wherein the volume levels of the first output data and the second output data may be adjusted to allow for a smooth transition from the first output data to the second output data without sudden breaks, sound artefacts or dissonant mixes. For example, the volume of the first output data may be continuously decreased over a part or the entire transition portion, while the volume level of the second output data may be continuously increased over a part or the entire transition portion. Transitions of the above-described type are called crossfades.
- As stated above, according an important feature of the present invention, audio data, which include at least the decomposed data, are analyzed to determine one or more mixing parameters. Basically all parameters having an influence on the mixing process qualify as mixing parameters in the sense of the present invention. Mixing parameters therefore include, but are not limited to, the following examples:
- The mixing parameter may be a tempo of the first and/or second audio track, in particular a BPM (beats per minute) of the first and/or second audio track. Generation of the output track, i.e. mixing, may then include a tempo matching process in which the tempo or BPM of at least one of the first and second audio tracks or at least one of the first and second output data may be changed, such that the audio tracks or output data have the same or matching tempi or BPM. By analyzing decomposed data, for example a drum timbre, the tempo or BPM can be determined with higher accuracy and/or higher reliability.
- In a further embodiment of the invention, the at least one mixing parameter may refer to a beat grid of the first and/or second audio track. The beat grid refers to a rhythmic framework of a piece of music, for example. In particular, the individual beats of each bar, including optionally information about time signature (for example a three-four time, a four-four time, a six-eight time, etc.), beat emphases or accents etc., may form the beat grid of a piece of music. Although conventional algorithms are known to recognize a beat grid of a piece of music, according to the present invention, the beat grid may be determined as a mixing parameter based on analyzing decomposed data, for example decomposed drum data or decomposed bass data. Since the beat grid is frequently determined by a drum pattern or a bass pattern, the beat grid can be determined with higher accuracy and higher reliability according to the present invention. Based on a determined beat grid, the step of generating an output track may take into account the determined beat grid or the determined beat grids of the first and/or second audio track by synchronizing the beat grids of the two audio tracks. Synchronizing beat grids may comprise resampling of audio data of the first and/or second audio track such as to stretch or compress the tempo of at least one of the audio tracks and thereby match the beat grids of the audio data.
- In another embodiment of the invention, the at least one mixing parameter may refer to a beat phase of the first and/or second audio track. The beat phase relates to a position (i.e. a timing) on the timeline of a piece of music comprising multiple bars, each bar having multiple beats according to the time signature of the music, wherein the beat phase is defined relative to a beginning of the current bar, i.e. relative to the previous downbeat position (first beat of a bar). For example, by matching beat phases of two pieces of music defined by the first and second audio tracks, a timing of the two pieces of music relative to their respective downbeat positions can be synchronized to achieve smooth mixing of the audio data without rhythmic artefacts. Synchronizing beat phase may comprise time-shifting the audio tracks relative to one another such as to achieve matching beats.
- In a further embodiment of the present invention, the at least one mixing parameter may refer to a downbeat position within a first and/or a second audio track. In audio data containing music comprising a plurality of bars, a downbeat position refers to the position of the first beat of each bar. By analyzing decomposed data referring to an instrument of a rhythm section of the piece of music for example to drums, percussions, bass, rhythm guitar, etc., determination of the downbeat position can be achieved with higher accuracy and higher reliability as compared to results achieved by analyzing the mixed input data. In the step of generating an output track, first and second output data may be mixed in such a manner that their respective downbeat positions are synchronized in order to avoid any rhythmic clashes in the mix.
- In a further embodiment of the present invention, the at least one mixing parameter may refer to a beat shift between the first audio track and the second audio track. This embodiment achieves advantages similar as described above for the mixing parameters beat grid, beat phase or downbeat position. In particular, if the beat shift between the first and second audio tracks is determined as the mixing parameter, smooth mixing may be achieved by introducing a time shift between the first output data and the second output data in such a manner as to achieve zero beat shift or a beat shift equal to one or more beats.
- According to a further embodiment of the present invention, the at least one mixing parameter may refer to a key or a chord progression of the first and/or second audio track. As used herein, a chord progression of a piece of music is a time-dependent parameter which denotes certain chords or root tones at certain points in time on the timeline of the music, such as for example C Major, C Major 7, A Minor etc. A key of the music is basically constant over the whole piece of music and relates to the root or key note of the tonic (home key) of the piece of music. Mixing of a first audio track and second audio track, in particular mixing of different pieces of music or different portions or components of a piece of music, achieves more favorable results, if the two audio tracks have equal or mutually matching keys. This will avoid any harmonic dissonances or other sound artefacts. Therein, mutually matching keys may refer to keys which have a total interval of a fourth, a fifth or an octave or multiples thereof in between. However, in order to achieve certain artistic effects, other intervals may be regarded as matching in the sense of the present invention. Although it is in general known to determine the key of an audio track, according to the present invention, the key of the first and/or second audio track is determined by decomposing the input audio data and analyzing the decomposed data obtained in the step of decomposing. This will achieve more accurate and more reliable results. For example, it may be advantageous to analyze decomposed bass data or decomposed guitar data or decomposed piano data, etc., as these instruments usually play an important role in defining the harmony of a piece of music and thereby the relevant key of the music.
- Furthermore, by analyzing a chord progression, valuable information may be obtained regarding the structure of a piece of music, such as the sequence of particular song parts, for example verses, choruses, bridges, intros, outros, etc. In particular, in songs of western music, the same chord progressions are usually used for each verse or for each chorus. Analyzing a chord progression may therefore be useful to find particular positions within the first audio track, which are suitable for mixing with a particular position in the second audio track such that these positions qualify as first and second transition points for generating a crossfade from the first audio track to the second audio track as described above, for example. In another example, by identifying equal or similar chord progressions within a first portion within the first audio track and a second portion within the second audio track, the invention may generate an output track in which the first output data and the second output data are mixed together with similar volumes during a portion corresponding to the first and second portions, to create a mashup of two songs, while predominantly only the first output data or the second output data may be contained in the mix in other portions of the output track.
- In a further embodiment of the present invention, the at least one mixing parameter may refer to a timbre or a group of timbres of the first and/or second audio track. This embodiment is based on the idea that some timbres mix better than other timbres. For example, a vocal timbre mixes well with instrumental timbres such as a guitar timbre or a piano timbre, while mixing of two vocal timbres is usually unfavorable due to the clashing of the two voices. Furthermore, timbres transporting strong harmonic information may be more difficult to mix with other harmonic timbres, but may more easily be combined with non-harmonic timbres such as drums. In essence, determining that the first and/or the second audio track contains a particular timbre, for example within a predetermined time interval of the respective track, may be a useful information for the user to assist mixing or may even allow a semi-automatic or automatic mixing of the audio tracks.
- In a further embodiment of the present invention, the at least one mixing parameter may refer to a song part junction of the first and/or second audio track. As mentioned already above, song part junctions may be suitable positions within a song at which various mixing effects, including crossfades or transitions to another song, remixing with another song, audio effects (reverb, loop effects, equalizer etc.), may be applied in a natural manner. The determination of song part junctions can therefore be used to assist the mixing process or to allow for semi-automatic or even automatic mixing of two audio tracks. According to the present invention, the mixing parameter, in this example a song part junction, may be determined by analyzing decomposed data. Thus, a component of the audio mix that most clearly represents the structure of the song, for example a bass component, may be used to more accurately and more reliably determine the song part junctions.
- It should be noted that any of the above-mentioned mixing parameters is suitable to achieve the effects of the present invention, in particular to assist the mixing process. However, the results will become even better if a plurality of different mixing parameters are determined by analyzing the same or different decomposed data. For example, a structure of a piece of music can be determined with particularly high accuracy and reliability, if for example a first mixing parameter referring to a beat grid is determined by analyzing decomposed drum data, and a second mixing parameter relating to a chord progression is determined by analyzing decomposed bass data, while a third mixing parameter relating to a song part junction may then be determined based on the determined structure of the piece of music, i.e. based on the first mixing parameter and the second mixing parameter.
- The step of analyzing audio data may include detecting silence data within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than −30 dB. Herein a value of −30 dB refers to −30 dB FS (peak), i.e. to a volume level which is 30 dB smaller than the volume level of the loudest sound of the track. Alternatively, said silence data preferably represents an audio signal having a volume level smaller than −60 dB FS (RMS), i.e. referring to the absolute mean value. Silence within a particular timbre of the audio data, i.e. silence of a particular musical instrument or a voice component, may provide valuable information regarding the structure of the piece of music. For example, a bridge part is often characterized by a certain interval, such as four, eight or sixteen bars of silence in the bass component of the music. Further, while an intro part of a song is usually without any vocal timbres, the onset of the vocals may be an indication for the beginning of the first verse. Therefore, the step of analyzing audio data may preferably include detecting silence data continuously extending over a predetermined time span, for example over a time span of one, two, four, eight, twelve or sixteen bars, thus indicating a certain song part. Furthermore, an onset of a signal or a first signal peak within decomposed data after the predetermined time span of silence may indicate a downbeat position of a next song part, i.e. a song part junction.
- According to a preferred embodiment of the present invention, the step of analyzing audio data may include determining at least a first mixing parameter based on the decomposed data, and at least a second mixing parameter based on the first mixing parameter. For example, the first mixing parameter may be a key of the first or second audio track, while the second mixing parameter may be a pitch shift value referring to a pitch shift to be applied to either one of the first and second audio tracks such as to match the keys of the first and second audio tracks. In another example, the second mixing parameter may be the transition point at which the output track includes a transition from the first output data to the second output data, for example by means of a crossfade. If the second mixing parameter is the transition point, the first mixing parameter may for example be a song part junction, a beat phase or any other mixing parameter referring to a particular position or range within a piece of music relative to the musical content (song parts, bars, musical breaks, etc.). Such embodiments are particularly suitable to allow a DJ to find suitable transition points for changing from a first song to a second song. In particular, if the transition point is one of the mixing parameters, semi-automatic or automatic transitions can be realized in which a user, for example a DJ, just inputs his/her intention to change from playback of the first song towards playback of the second song or just specifies which songs should be mixed, wherein a suitable transition point is then automatically determined by a computer program according to a method of the present invention. One or more suitable transition points may then be proposed to the DJ for manual selection (semi-automatic mixing) or, alternatively, mixing is automatically initiated and carried out at a suitable transmission point without any further user interaction (automatic mixing).
- Methods according to the first aspect of the invention use a step of decomposing mixed input data to obtain decomposed data. Several decomposing algorithms and services are known in the art, which allow decomposing audio signals to separate therefrom one or more signal components of different timbres, such as vocal components, drum components or instrumental components. Such decomposed signals and decomposed tracks have been used in the past to create certain artificial effects such as removing vocals from a song to create a karaoke version of a song.
- More specifically, with regard to decomposing audio data there have also been several approaches based on artificial intelligence (AI) and deep neural networks in order to decompose mixed audio signals to separate therefrom signals of certain timbres. Some AI systems usually implement a convolutional neural network (CNN), which has been trained by a plurality of data sets for example including a vocal track, a harmonic/instrumental track and a mix of the vocal track and the harmonic/instrumental track. Examples for such conventional AI systems capable of separating source tracks such as a singing voice track from a mixed audio signal include: Prétet, “Singing Voice Separation: A study on training data”, Acoustics, Speech and Signal Processing (ICASSP), 2019, pages 506-510; “spleeter”—an open-source tool provided by the music streaming company Deezer based on the teaching of Prétet above, “PhonicMind” (https://phonicmind.com)—a voice and source separator based on deep neural networks, “Open-Unmix”—a music source separator based on deep neural networks in the frequency domain, or “Demucs” by Facebook AI Research—a music source separator based on deep neural networks in the waveform domain. These tools accept music files in standard formats (for example MP3, WAV, AIFF) and decompose the song to provide decomposed/separated tracks of the song, for example a vocal track, a bass track, a drum track, an accompaniment track or any mixture thereof.
- In general, all types of decomposing algorithms can be used for decomposing the mixed input data. Different algorithms, for example algorithms as known in the art and mentioned above, achieve different results with respect to quality of the decomposition and speed of processing. Preferably, in embodiments of the present invention, the step of decomposing the mixed input data includes processing the mixed input data, in particular the first audio track and/or the second audio track, within an AI system comprising a trained neural network. AI systems achieve a high level of quality and in particular allow decomposing different timbres of a mixed audio signal, which in particular may correspond or resemble certain source tracks that were originally mixed when producing or generating the input audio track, such as certain instrumental tracks, vocal tracks, drum tracks etc. More particular, the step of decomposing may include decomposing the first/second audio tracks with regard to predetermined timbres such as to obtain decomposed signals of different timbres, preferably being selected from the group consisting of a vocal timbre, a non-vocal timbre, a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, and any combination thereof. The non-vocal timbre, the non-drum timbre and the non-harmonic timbre may in particular be respective complement signals to that of the vocal timbre, the drum timbre and the harmonic timbre.
- Complement signals may be obtained by excising from the input signal one decomposed signal of a specific timbre. For example, an input signal may be decomposed or separated into two decomposed signals, a decomposed vocal signal of a vocal timbre, and its complement, a decomposed non-vocal signal of a non-vocal timbre, which means that a mixture of the decomposed vocal signal and the decomposed non-vocal signal results in a signal substantially equal to the input signal. Alternatively, decomposition can be carried out to obtain a decomposed vocal track and a plurality of decomposed non-vocal tracks such as a decomposed drum track and a decomposed harmonic track (including harmonic instruments such as guitars, piano, synthesizer).
- Furthermore, at least one of the steps of analyzing the audio data and generating the output track may include processing of audio data within an AI system comprising a trained neural network. For example, a neural network capable of analyzing audio data to determine at least one mixing parameter as described above may be obtained by training using training data containing a plurality of pieces of music together with data relating to the respective musical structure, such as beat grid, downbeat position, key, chord progression, song parts or song part junctions. After the training process, the neural network may then be capable of detecting such mixing parameters based on decomposed data of new pieces of music. On the other hand, a neural network suitable for generating the output track may be trained using training data in which each set of training data contains two audio tracks and one or more associated mixing parameters suitable for mixing the two audio tracks without dissonances or sound artefacts. The trained neural network will then be capable of mixing new audio tracks based on at least one mixing parameter determined by analyzing decomposed data and additional mixing parameters determined through artificial intelligence (AI).
- The method of the present invention may generally be used in all situations of audio processing, in which two audio tracks are to be mixed. For example, in a DAW, the present invention may be implemented as a plugin or in the form of any other suitable software algorithm in order to help a user to mix different audio tracks referring to different instruments, song parts, songs or other audio signals in general. In a further preferred application, the method may be used in a DJ environment, for example in a DJ software application, in order to assist a DJ when mixing a piece of music with any other audio signal such as a second piece of music, and even to allow automatic, autonomous mixes without needing any human supervision. In view of this background, the method of the present invention may further include a step of playing the output track, including a playback through a PA system, loudspeakers, headphones or any other sound-reproducing equipment.
- In general, the method of the present invention can be applied to any type of input audio track. For example, the input audio track may be stored on a local device such as a storing means of a computer, and may be present as a digital audio file. Furthermore, the first audio track or the second audio track may be received as a continuous stream, for example a data stream received via Internet, a real-time audio stream received from a live audio source or from a playback device in playback mode. Thus, the range of applications is basically not limited to a specific medium. When receiving the first/second audio track as a continuous stream, playback of the output track may be started while continuing to receive the continuous stream. This has particular advantages in many situations where the audio tracks do not have a certain length or playback duration as the length is either unlimited or undefined, for example in case of processing signals from a live concert or live broadcasting. Furthermore, it is not necessary to wait until a certain audio file is completely downloaded or received or until a certain audio track has completely been played by the playback device, but instead playback of the output signals based on the received input signals can be started earlier.
- In another preferred embodiment of the present invention, decomposing first and/or second audio tracks is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal. Partitioning the first and/or second input signals into segments (preferably segments of equal lengths) and operating the method of the invention based on these segments allows using the decomposition result for generating the output track at an earlier point in time, i.e. after finishing decomposition of just one segment, without having to wait until the decomposition result of an entire audio file for example is available. Another advantage of the segmentation is that decomposition of the second audio track, if applicable, can start at an arbitrary point within the second audio track. For example, when a transition is to be made from the first audio track towards the second audio track such as to start playback of the second audio track at e.g. 01:20 (one minute, twenty seconds), decomposition of the second audio track can start at the segment closest to 01:20, and the beginning part of the second audio track which is not used does not have to be decomposed. This saves performance and ensures that decomposition results are available much faster. Preferably one segment has a playback duration which smaller than 20 seconds.
- The method steps, in particular the steps of providing the first and second audio tracks, decomposing the mixed input data, analyzing the decomposed data and generating the output track, may be carried out in a continuous process, wherein a time shift between receiving the first audio track or a first portion of a continuous stream of the first audio track and obtaining the output track or the first segments of the output track is preferably less than 10 seconds, more preferably less than 2 seconds.
- In a further embodiment of the present invention, at least one, preferably all of the mixed input data, the first and second audio tracks, the decomposed data, the output track, and the first and second output data, represent stereo signals, each comprising a left channel signal portion and a right channel signal portion, respectively. The method is thus suitable for playing music at high quality.
- According to a second aspect of the present invention, the above-mentioned object is achieved by a device for processing audio data, preferably device adapted to carry out a method according to at least one of the preceding as described in the above claims, said device comprising a first input unit for receiving a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; a second input unit for receiving a second audio track; a decomposition unit for decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; an analyzing unit for analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter; and an output generation unit for generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
- Thus, the second aspect of the invention provides a device having similar or corresponding features as the method of the first aspect of the present invention described above. Therefore, similar or corresponding effects and advantages may be achieved by a device of the second aspect of the present invention as described above for the first aspect of the present invention. In addition, a device of the second aspect of the invention may be adapted to carry out a method of the first aspect of the present invention. Furthermore, embodiments of the device of the second aspect of the present invention may be particularly adapted to carry out one or more of the steps described above for embodiments of the first aspect of the present invention in order to achieve the same effects and advantages.
- The device of the second aspect of the present invention is preferably embodied as a computer, in particular a table, a smartphone, a smartwatch or another wearable device and may include in the manner as conventionally known a RAM, a ROM, a microprocessor and suitable input/output means. Included in the computer or connected to the computer may be an audio interface which may be connected, for example wireless (e.g. via Bluetooth or similar technology), to speakers, headphones or a PA system in order to output sound when playing the first and second output signals, respectively. As a further alternative, the device may be embodied as a standalone DJ device including suitable electronic hardware or computing means. Preferably, the device is running a suitable software application in order to control its hardware components, usually standard hardware components of general purpose computers, tablets, smartphones, smartwatches or other wearable devices, such as to function as units of the device of the second embodiment and/or such as to implement the steps of the method of the first embodiment of the invention.
- If the device uses an AI system for decomposing audio data, the device preferably has a decomposition unit which includes the AI system comprising a trained neural network. This means that the complete AI system including the trained neural network may be integrated within the device, for example as a software application or software plugin running locally in a memory integrated within the device. Furthermore, the device preferably includes a user interface embodied by either a display such as a touch display or a display to be operated by a pointer device, or as one or more hardware control elements such as a hardware fader or rotatable hardware knobs, or by a voice command or by any other user input/output technology.
- According to a third aspect of the present invention, the above-mentioned object is achieved by a computer program which is adapted, when run on a computer, such as a tablet, a smartphone, a smartwatch or another wearable device, to carry out a method according to the first aspect of the present invention, or to control the computer as a device according to the second aspect of the present invention. A computer program according to the third aspect of the present invention therefore achieves the same or corresponding effects and advantages as described above for the first and second aspects of the present invention.
- According to a fourth aspect of the present invention, the above-mentioned object is achieved by a method for processing audio data, comprising the steps of providing an audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres; decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres; and analyzing the decomposed data to determine a transition point or a song part junction between a first song part and a second song part within the audio track, or to determine any other track parameter. A method of the fourth aspect of the present invention allows determination of one or more song part junctions within a piece of music based on analyzing decomposed data. It therefore becomes possible to analyze a song structure of an audio track containing mixed input data, i.e. a song containing a plurality of different timbres, for example by analyzing decomposed audio data representing characteristic timbres such as a bass timbre. Song parts may therefore be determined more accurately and more reliably. The junction between the song parts provide valuable information to the user, in particular to a DJ or an audio engineer during music production. For example, one or more junctions within a piece of music may be indicated graphically on a screen, and the method may allow a user to control a mixing process based on the one or more junctions, for example to jump to a junction, to cut out a song part between two junctions, to time-shift songs such as to synchronize junctions, etc. Furthermore, the method of the fourth aspect allows determination of any other track parameter, such as at least one of a tempo, a beat, a BPM value, a beat grid, a beat phase, a key and a chord progression of the respective audio track.
- According to a fifth aspect of the present invention, the above object is achieved by a method for processing audio data, comprising the steps of providing a set of audio tracks, each including mixed input data, said mixed input data representing audio signals containing a plurality of different timbres; decomposing each audio track of the set of audio tracks, such as to obtain a decomposed track associated with the respective audio track, wherein the decomposed track represents an audio signal containing at least one, but not all, of the plurality of different timbres of the respective audio track, thereby obtaining a set of decomposed tracks; analyzing each decomposed track of the set of decomposed tracks to determine at least one track parameter of the respective audio track which the decomposed track is associated with; selecting or allowing a user to select at least one selected audio track out of the set of audio tracks, based on at least one of the track parameters; and generating an output track based on the at least one selected audio track.
- A method of the fifth aspect of the present invention basically assists a user in selecting one of a plurality of audio tracks for further processing, in particular mixing, editing and playback. For example, in a situation where a user is to select one of a plurality of pieces of music, while conventional metadata available for music provided through conventional music distribution services, such as through Internet streaming providers, are limited to certain standard information such as the title of a song, the length of a song, an artist name, a musical genre, etc., the method according to the fifth embodiment of the invention allows adding additional information related to the musical content of the particular audio tracks in the form of the at least one track parameter, wherein the track parameter, according to the fifth aspect of the invention, is determined through analyzing at least one decomposed track obtained from the particular audio track. Accordingly, the selection of songs is greatly assisted, in particular in cases where the candidate pieces of music are partially or fully unknown to the user. Selection and processing of music is thus improved in particular for unexperienced users or when less common pieces of music are to be selected. Furthermore, automatic selection of audio tracks by an algorithm based on the track parameter, without user interaction, can be implemented. In this way, playlists could automatically be generated based on timbres or proportions of individual timbres included in the audio tracks. For example, a non-vocal playlist or instrumental playlist could be generated by automatic selection of songs that do not contain vocal timbres.
- For example, the track parameter may refer to at least one timbre of the respective audio track. The user may therefore be informed about timbres contained in the plurality of audio tracks. For example, the method may indicate to a user which of a plurality of audio tracks contains vocal music or which tracks contain a predominant piano timbre. Audio tracks may be suitably marked or highlighted such as to inform the user about the timbres included therein, or the method may allow for sorting or filtering a list of audio tracks based on timbres. As a mere example, a DJ currently playing a song that includes vocals may look for a second song predominantly containing a guitar or a piano timbre, wherein the method of the fifth aspect of the invention may assist and accelerate such selection and/or even allow selection of guitar/piano songs from a list of audio tracks unknown to the user as such. However, even for experienced DJs who are familiar with all songs of the set of audio tracks, the method of the fifth aspect of the invention may be useful to accelerate the process of selecting a suitable audio track.
- In further embodiments of the invention according to the fifth aspect, the track parameter may refer to at least one of a tempo, a beat, a BPM value, a beat grid, a beat phase, a key and a chord progression of the respective audio track. The at least one track parameter may likewise be indicated to the user by virtue of a suitable graphical representation, highlighting, coloring or numeral representation. Moreover, sorting or filtering of lists of audio tracks may be based on the at least one track parameter. For example, if a DJs plays a particular first song having a particular first chord progression, the method according to the fifth aspect of the invention may be used to search for a second song among a set of audio tracks, which contains the same or at least partially the same chord progression as the first song, such that mixing of the two songs or crossfading between the songs will result in a particularly continuous sound of the output track without audible breaks or dissonances.
- In a particular simple embodiment of the invention, the selected audio track is just played back, in particular without mixing, editing or otherwise changing its content. In this embodiment, the method of the fifth aspect of the invention may in particular be applied in a music player and may assist a user in finding and selecting a desired song for playback. For example, if the at least one track parameter relates to a beat grid of the respective audio tracks (for example a time signature), a user may be enabled to easily find songs of certain beat grids, for example three-four time songs from among a plurality of audio tracks.
- In the methods and embodiments mentioned above, a second audio track may contain mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, wherein the mixed input data are decomposed to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, wherein analyzing may be carried out taking into account the decomposed data obtained from the second audio track. Accordingly, in the step of analyzing and determining the at least one mixing parameter, both the first audio track and the second audio track may be analyzed on the basis of their respective decomposed data. This will in particular allow comparing the first audio track and the second audio track with regard to parameters such as tempo, beat, BPM value, beat grid (the beats contained within a song, optionally including information about at least one of time signature, emphases and downbeat positions), beat phase, key, chord progression, song parts and song part junctions, etc.
- In a further embodiment of the present invention, the mixed input data of the first and/or second audio track are decomposed to obtain at least decomposed data of a vocal timbre, decomposed data of a harmonic timbre and decomposed data of a drum timbre or to obtain exactly three decomposed tracks which are a decomposed track of a vocal timbre, a decomposed track of a harmonic timbre and a decomposed track of a drum timbre, wherein the three tracks preferably sum up to an audio track substantially equal to the first and/or second audio track, respectively. A vocal timbre may include a simple vocal component or a mixture of different vocal components of the piece of music. A drum timbre may include the sound of a single drum instrument, a drum ensemble, a percussion instrument, etc. The drum timbre does usually not contain harmonic information. A harmonic timbre may include timbres of harmonic instruments such as a piano, a guitar, synthesizers, brass, etc. Decomposition into vocal, drum and harmonic timbres produces the most important components defining the musical content and structure of most music, in particular most pieces of western music. Such decomposition therefore provides a good yet efficient basis for analyzing the audio data and determining at least one mixing parameter and/or at least one track parameter. In addition, decomposition into vocal, drum and harmonic timbres greatly assists the mixing process, i.e. generation of an output track based on mixing two or more of the decomposed tracks.
- Preferred embodiments of the present invention will be described in the following on the basis of the attached drawings, wherein
-
FIG. 1a shows a device according to an embodiment of the present invention, -
FIG. 1b shows a song select window that may be displayed by a device of the embodiment of the invention, -
FIG. 2 shows a schematic functional diagram of components of the device of the embodiment shown inFIG. 1a , andFIG. 3 shows a schematic illustration of an example mode of operation of the device shown inFIG. 1a, 1b and 2, and a method for processing audio data according to an embodiment of the invention. - A
device 10 according to an embodiment of the present invention may be formed by a computer such as a tablet computer, a smartphone, a smartwatch or another wearable device, which comprises standard hardware components such as input/output ports, wireless connectivity, a housing, a touchscreen, an internal storage as well as a plurality of microprocessors, RAM and ROM. Essential features of the present invention are implemented indevice 10 by means of a suitable software application or a software plugin running ondevice 10. - The display of
device 10 preferably has afirst section 12 a associated to a first song A and asecond section 12 b associated to a second songB. First section 12 a includes a firstwaveform display region 14 a which displays at least one graphical representation of song A, in particular one or more waveform signals associated to song A. For example, the firstwaveform display region 14 a may display a waveform of song A and/or one or more waveforms of decomposed signals obtained from decomposing song A. For example, decomposition of song A may be carried out to obtain a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which may be displayed within the firstwaveform display region 14 a. Likewise, a secondwaveform display region 14 b may be included in thesecond section 12 b such as to display a graphical representation related to song B in the same or corresponding manner as described above for song A. Thus, the secondwaveform display region 14 b may display one or more waveforms of song B and/or at least one waveform of a decomposed signal obtained from song B. - Furthermore, first and second
waveform display regions head - The first
waveform display region 14 a may have a song select button A, which may be pressed by a user to select song A from among a plurality of audio tracks offered by an Internet provider or stored on a local storage device. In a corresponding manner, a secondwaveform display region 14 b includes a song select button B, which may be activated by a user to select song B from a plurality of audio tracks.FIG. 1b shows an example of a song select window, which may pop up when song select button A is activated by a user. The song select window offers a list of audio tracks and invites the user to select one of the audio tracks as song A. - According to an embodiment of the present invention, the list of audio tracks as shown in
FIG. 1b shows metadata of each audio track which include, for each audio track, a title, an artist name, a track length, a BPM value, a main timbre and timbre component data referring to proportions of individual timbres within the audio track. While the title, the artist and the track length may be directly read from metadata of the audio file as usually provided through commercial music providers, or may be stored as metadata together with the audio data of the audio track on a storage device, the BPM value, the main timbre and the timbre component data are examples for track parameters in the sense of the present invention, which are usually not provided by the distributors with the original audio tracks but which are obtained bydevice 10 according to the embodiment of the invention through decomposing the particular audio track and then analyzing the decomposed data. - For example, by analyzing a decomposed drum track, a BPM value can be obtained for a given audio track. Likewise, by analyzing a plurality of decomposed tracks associated to particular timbres such as a vocal timbre, a harmonic/instrumental timbre or a drum timbre, information regarding the presence and/or distribution (i.e. relative proportions) of certain timbres, i.e. certain instruments, can be obtained. In particular, a predominant timbre of an audio track, can be determined, which represents a main character of the music contained in the audio track and is denoted as a main timbre for each audio track in the example of
FIG. 1b . Furthermore, in the example ofFIG. 1b , a proportion of a drum timbre within the audio track is indicated by a drum proportion indicator, a proportion of a harmonic/instrumental timbre within the audio track is indicated by a harmonic/instrumental indicator, and a proportion of a vocal timbre within the audio track is indicated by a vocal indicator. The indicators may be formed by level indicators showing the proportion of the respective timbre from a minimum value (not present, for example 0) to a maximum value (maximum proportion, for example 5). - Therefore, the user may easily create desired mixes, for example a mix of a vocal song and an instrumental song. In addition or alternatively,
device 10 may analyze decomposed harmonic tracks (instrumental, vocals etc.) of the audio tracks in order to determine a key or a chord progression as track parameters of the audio tracks. - With reference again to
FIG. 1a , each of the first andsecond sections first section 12 a may include aplay button 18 a which can be pushed by a user to alternatively start and stop playback of song A (more precisely audio signals obtained from Song A, such as decomposed signals). Likewise, thesecond section 12 b may include aplay button 18 b which may be pushed by a user to alternatively start and stop playback of song B (more precisely audio signals obtained from Song B, such as decomposed signals). - An output signal generated by
device 10 in accordance with the settings ofdevice 10 and with a control input received from a user may be output at anoutput port 20 in digital or analog format, such as to be transmitted to a further audio processing unit or directly to a PA system, speakers or head phones. Alternatively, the output signal may be output through internal speakers ofdevice 10. - According to the embodiment of the present invention,
device 10 can perform a smooth transition from playback of song A to playback of song B by virtue of a transition unit, which will be explained in more detail below. In the present embodiment,device 10 may comprise atransition button 22 displayed on the display ofdevice 10, which may be pushed by a user to initiate a transition from playback of song A towards playback of song B. By a single operation of transition button 22 (pushing the button 22),device 10 starts changing individual volumes of individual decomposed signals of songs A and B according to respective transition functions (volume level as a function of time) such as to smoothly cross-fade from song A to song B within a predetermined transition time interval. - Pressing the
transition button 22 can directly or immediately start the transition from song A to song B or may control a transition unit, which is to be described in more detail later, such as to analyze decomposed signals of song A and/or song B in order to determine at least one mixing parameter and to play an automatic transition based on the at least one mixing parameter. For example, as will be described later as well, a suitable transition point, i.e. a suitable first transition point on the timeline of song A and/or a suitable second transition point on the timeline of song B, and/or a length of a transition portion (duration of the transition), may be determined by the transition unit in response to an activation oftransition button 22. - In addition or alternatively,
device 10 may include atransition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B. This allows controlling the volumes of individual decomposed signals of songs A and B using transition functions, which are based not on time but on controller position of thetransition controller 24. In this manner, in particular the speed and progress of the transition can manually be controlled through thetransition controller 24. -
FIG. 2 shows a schematic illustration of internal components ofdevice 10 and a signal flow withindevice 10. - Audio processing is based on a first input track and a second input track, which may be stored within the
device 10, for example in an internal memory of the device, a hard drive or any other storage medium. First and second input tracks are preferably digital audio files of a standard compressed or uncompressed audio file format such as mp3, WAV, AIFF or the like. Alternatively, first and second input tracks may be received as continuous streams, for example via an Internet connection ofdevice 10 or from an external playback device via an input audio interface or via a microphone. - First and second input tracks are preferably processed within first and
second input units 26 a and 26 b, respectively, which may be configured to decrypt or decompress the audio data, if necessary, and/or may be configured to extract a segment of the first input track and a segment of the second input track in order to continue processing based on the segments. This has an advantage that time-consuming processing algorithms, such as the decomposition based on a neural network, will not have to analyze the entire first or second input track upfront, but will perform processing based on shorter segments, which allows continuing processing and eventually start playback at an earlier point in time. In addition, in case of receiving the first and second input tracks as continuous streams, it would in many cases not be feasible to wait until the complete input tracks are received before starting to process the data. - The output of the first and
second input units 26 a, 26 b, for example the segments of the first and second input tracks, form first and second input signals, and they are input into first andsecond AI systems decomposition unit 40. EachAI system Decomposition unit 40 thus decomposes the first input signal to obtain a first group of decomposed signals and decomposes the second input signal to obtain a second group of decomposed signals. In the present example, each group of decomposed signals includes a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which each form a complete set of decomposed signals or a complete decomposition, which means that a sum of all decomposed signals of the first group will resemble the first input signal, and the sum of all decomposed signals of the second group will resemble the second input signal. - It should be noted that although in the present embodiment two
AI systems decomposition unit 40 may also include only one AI system and only one neural network, which is trained and configured to determine all decomposed signals of the first input signal as well as all decomposed signals of the second input signal. As a further alternative, more than two AI systems may be used, for example a separate AI system and a separate neural network may be used to generate each of the decomposed signals. - All decomposed signals, in particular both groups of decomposed signals, are then input into a
playback unit 42 in order to generate an output signal for playback.Playback unit 42 comprises atransition unit 44, which is basically adapted to recombine the decomposed signals of both groups taking into account specific volume levels associated to each of the decomposed signals.Transition unit 44 is configured to recombine the decomposed signals in such a manner as to either play only a first output signal obtained from a sum of all decomposed signals of the first input signal, or a second output signal obtained from a sum of all decomposed signals of the second input signal, or any transition in between the first and the second output signals where decomposed signals of both first and second input signals are played. - In particular,
transition unit 44 may store individual transition functions DA, VA, HA, DB, VB, HB for each of the decomposed signals which each define a specific volume level for each time frame within a transition interval, i.e. a time interval in which one of the songs A and B is crossfaded into the other song (first and second output signals are crossfaded in one or the other direction), or for each controller position of the transfer controller within a controller range. Taking into account the respective volume levels according to the respective transition functions DA, VA, HA, DB, VB, HB, all decomposed signals will then be recombined to obtain the output signal.Playback unit 42 may further include a control unit 45, which is adapted to control at least one or the transition functions DA, VA, HA, DB, VB, HB based on a user input. - The output signal generated by
playback unit 42 may then be routed to anoutput audio interface 46 for a sound output. At any location within the signal flow, one or more sound effects may be inserted into the audio signal by means of one ormore effect chains 48. In the present example,effect chain 48 is located betweenplayback unit 42 andoutput audio interface 46. -
FIG. 3 illustrates an operation oftransition unit 44 according to an embodiment of the present invention and a method for processing audio data according to an embodiment of the present invention. - Decomposed data as received from the first input track (first audio track) representing song A comprises, in the particular embodiment, a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal (denoted by drum, vocal and harmonic in
FIG. 3 ). Decomposed data received from the second input track (second audio track) relating to song B comprises a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal (denoted by drum, vocal and harmonic inFIG. 3 ). The decomposed signals are each shown by respective waveforms, wherein the horizontal axis represents the timeline of song A and the timeline of song B, respectively, and the vertical axis represents the time-dependent amplitude of the corresponding audio signal. - According to the present invention, the decomposed signals are analyzed to determine at least one mixing parameter. In the example shown in
FIG. 3 , for example the decomposed drum signal of song A is analyzed to determine, inter alia, a tempo value, a BPM value and a beat grid of song A, and a decomposed drum signal of song B is analyzed to determine, inter alia, a tempo value, a BPM value and a beat grid of song B. From the rhythmic pattern of the separated drum timbre of song A, the algorithm can then determine a rhythmic pattern of song A including a first beat at the beginning of song A at a time t0, a sequence of beats following one another at substantially equal time intervals, wherein four beats form a bar and therefore a beat grid of a four-four time type. InFIG. 3 , the bars are denoted by vertical lines, wherein each bar includes four beats that are not illustrated. In a similar manner,transition unit 44 analyzes the decomposed drum signal of song B in order to determine beats, bars, a tempo, a BPM value, a beat grid etc., as mixing parameters of song B. - Furthermore, according to this embodiment, a structure of song A and/or song B, i.e. a sequence of song parts such as intro, verse, bridge, chorus, interlude and outro, may be detected as mixing parameters by analyzing the decomposed data. In the particular example shown in
FIG. 3 , the decomposed drum signal of song A shows a first pattern within the first four bars of the song, whereas in the following eight bars (bars 5 to 12), the drum timbre shows a second pattern different from the first pattern. Furthermore, in the following eight bars (bars 13 to 20), silence is detected in the decomposed drum signal, which means that the drums have a break for eight bars. Then, throughout the rest of song A, the decomposed drum data again show the first pattern. In a similar manner, analyzing the decomposed vocal signal reveals that the first four bars as well as the last four bars of song A do not contain vocals (decomposed vocal signal is silent), whereas the rest of song A contains vocals. In addition, the decomposed harmonic signal is analyzed by a chord/harmony detection algorithm known as such in the prior art such as to detect a chord progression of the harmonic components of song A. Since the decomposed harmonic signal does not contain the vocal components and the drum components of the original audio track, the chord/harmony detection algorithm can be operated with much higher accuracy and reliability. Accordingly, a sequence of chords is detected, which usually changes for each bar. In the present example, it turns out that the chord progression shows a four-bar pattern which repeats three times within the first 12 bars, i.e. a pattern G major, D major, E minor, C major. In the following eight bars (bars 13 to 20), the chord progression deviates from the before-mentioned four-bar pattern and now shows a new four-bar pattern D major, E minor, C major, C major, which is repeated once to obtain eight bars in total. After that, the first four-part pattern that was played at the beginning of song A is again repeated until the end of song A. - In this way, the method according to the embodiment of the invention can deduce, from analyzing the three decomposed signals of song A, particular song parts, namely a first song part that may be called “intro”, forming the first four bars of song A, and a second song part which may be called “
verse 1” forming the following eight bars after the intro, a third song part which may be called “bridge” forming the following eight bars afterverse 1, a fourth song part which may be called “chorus 1” forming the following eight bars after the bridge, a fifth song part which may be called “interlude” forming the following four bars afterchorus 1, a sixth song part which may be called “chorus 2” forming the following eight bars after the interlude, and a seventh song part which may be called “outro” forming the following four bars afterchorus 2. The method thus recognizes different song parts and corresponding song part junctions, i.e. the junction between the last bar of a previous song part and the first bar of a following song part. - In the same or corresponding way, the method may determine a song structure of song B by analyzing the decomposed drum signal, the decomposed vocal signal and the decomposed harmonic signal of song B. Thus, by detecting different drum patterns within
chorus 1 andchorus 2, detecting silence of the decomposed vocal signal in an outro, detecting silence of the decomposed harmonic signal in an intro and by detecting different chord progression patterns withinverse 1 andverse 2 on the one hand andchorus 1 andchorus 2 on the other hand, the method may determine that song B has a song structure comprising four bars of intro, eight bars ofverse 1, eight bars ofchorus 1, eight bars ofverse 2, eight bars ofchorus 2 and four bars of outro. These specifications defining the song parts of song B form mixing parameters according to the present invention. - The mixing parameters determined based on an analysis of the decomposed data of song A and song B as described above may be used by
device 10 and in a method according to the embodiment of the present invention for assisting a DJ in mixing songs A and B or for achieving semi-automatic or even automatic mixing of songs A and B. In particular, the mixing parameters described above may simply be displayed on a screen ofdevice 10 such as to inform a user of thedevice 10, in particular show the detected song parts and thereby assist mixing. A DJ may recognize certain song parts or song part junctions as suitable transition points at which a crossfade from song A to song B or vice versa can suitably be initiated, for example by pressingtransition button 22 oroperating transition controller 24 at a suitable point in time. In another example, thedevice 10 and the method according to the embodiment of the invention may automatically generate an output track by automatically mixing songs A and B, for example by playing a transition from song A to song B at a suitable point in time as determined from the song structure. In particular, transition points may be determined as the mixing parameters based on the detected song parts. For example a first transition point on the timeline of song A may be the end of the interlude of song A, whereas a second transition point on the timeline of song B may be the beginning ofchorus 1 of song B. Thedevice 10 may then generate an output track that plays song A from its beginning to shortly before the end of the interlude, then plays a cross fade to song B starting song B at the beginning of itschorus 1, and then plays the rest of song B from the beginning ofchorus 1 till the outro of song B. Other examples for suitable transition points would be the end ofchorus 2 of song A on the one hand, and the beginning ofverse 1 of song B (or the beginning ofchorus 1 of song B) on the other hand. In the latter example, song B could be played almost from the beginning after song A has reached almost its end. This could be used as an automatic crossfade between subsequent songs of a playlist, for example. - It should be noted that the mixing results are improved if songs A and B have similar keys and/or similar BPM values. Conventional methods may be used which are known as such for DJ equipment including DJ software and which allow pitch shifting, time stretching or time compression of one or both of songs A and B such as to ensure that songs A and B have matching keys and/or BPM values.
- Further aspects of the present invention are described by the following items:
- 1. Method for processing audio signals, comprising the steps of
-
- providing a first input signal of a first input audio track and a second input signal of a second input audio track,
- decomposing the first input signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal,
- assigning a first volume level to the first decomposed signal and a second volume level to the second decomposed signal,
- starting playback of a first output signal obtained from recombining at least the first decomposed signal at the first volume level with the second decomposed signal at the second volume level, such that the first output signal substantially equals the first input signal,
- while playing the first output signal, reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function,
- starting playback of a second output signal obtained from the second input signal after starting playback of the first output signal but before volume levels of all decomposed signals of the first input signal have reached substantially zero.
- 2. Method of
item 1, further comprising the steps of -
- decomposing the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal,
- assigning a third volume level to the third decomposed signal and a fourth volume level to the fourth decomposed signal,
- starting playback of the second output signal obtained from recombining at least the third decomposed signal and the fourth decomposed signal,
- while playing the second output signal, increasing the third volume level according to a third transition function and increasing the fourth volume level according to a fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
- 3. Method of
item 1 oritem 2, wherein each of the transition functions assigns a predetermined volume level or a predetermined change in volume level - to each of a plurality of time frames within a transition time interval defined between a transition start time (T1) and a transition end time (T3), and/or
- to each of a plurality of controller positions within a controller range of a user operated controller defined between a controller first end position and a controller second end position.
- 4. Method of
item 3, -
- wherein the first transition function and the second transition function are defined such that the volume level is at a maximum at the transition start time (T1) and/or at the controller first end position, and at a minimum, in particular corresponding to substantially silence at the transition end time (T3) and/or at the controller second end position, and/or
- wherein the third transition function and the fourth transition function are defined such that the volume level is at a minimum, in particular corresponding to substantially silence at the transition start time (T1) and/or at the controller first end position, and at a maximum at the transition end time (T3) and/or at the controller second end position.
- 5. Method of at least one of the preceding items, wherein at least one of the transition functions is a linear function or contains a linear portion.
- 6. Method of at least one of the preceding items, wherein at least one of the transition functions is a continuous function and/or a monotonic function.
- 7. Method of at least one of the preceding items, wherein the first transition function and the second transition function differ from each other with regard to slope and/or wherein the third transition function and the fourth transition function differ from each other with regard to slope.
- 8. Method of at least one of the preceding items, wherein the step of decomposing includes processing the first audio signal and/or the second audio signal within an AI system comprising a trained neural network.
- 9. Method of at least one of the preceding items, wherein the step of decomposing includes decomposing the first audio signal and/or the second audio signal with regard to predetermined timbres, such as to obtain decomposed signals of different timbres, said timbres preferably being selected from the group consisting of:
-
- a vocal timbre,
- a non-vocal timbre,
- a drum timbre,
- a non-drum timbre,
- a harmonic timbre,
- a non-harmonic timbre,
- any combination thereof.
- 10. Method of item 9, wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
- 11. Method of item 9 or
item 10, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or wherein at least at a transition reference time and/or at a controller reference position a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function. - 12. Method of item 4 and at least one of items 9 to 11, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, and/or wherein a sum of the first transition function and the third transition function is substantially constant, preferably a maximum volume level, throughout the entire transition time interval and/or the entire controller range.
- 13. Method of item 4 and at least one of items 9 to 12, wherein the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or wherein a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1) and the transition end time (T3) and/or between the controller first end position and the controller second end position.
- 14. Method of at least one of the preceding items, further including a step of analyzing an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
- Referring to item 14, song parts of a song are usually distinguishable by an analyzing algorithm since they differ in several characteristics such as instrumental density, medium pitch or rhythmic pattern. Song parts may in particular be a verse, a chorus, a bridge, an intro or an outro as conventionally known. Certain instrumental or rhythmic patterns will remain constant within a song part and will change in the next song part. Recognition of song parts may be supported by analyzing not only the entire input signal but instead or in addition thereto at least one of the decomposed signals, as described in item 14. For example, by analyzing a decomposed bass signal in isolation from the remaining sound components, it will be easy to derive therefrom a chord progression of the song which is one of the key criteria to differentiate song parts. Furthermore, an analysis of the decomposed drum signals allows a more accurate recognition of a rhythmic pattern and thus a more accurate detection of certain song parts. A song part junction then refers to a junction between one song part and the next song part.
- According to item 14, transition time intervals may include song part junctions which allow to carry out the transition between two songs at the end of the song part which further improves smoothness and likeability of the transition.
- Song parts may be detected by analyzing at least one of the decomposed signals within an AI system comprising a trained neural network. Preferably, such analyzing includes detecting silence within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than −30 dB. In particular, the step of analyzing decomposed signals may include detecting silence continuously extending over a predetermined time span within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than −30 dB. Thus, in embodiments of the invention start- and/or end points of silence may be taken as song part junctions.
- 15. Method of at least one of the preceding items, further including the steps of
-
- receiving a user input referring to a transition command, including at least one transition parameter,
- setting at least one of the transition functions according to the transition parameter,
wherein the transition parameter is preferably selected from the group consisting of: - a transition start time (T1) of a transition time interval of at least one of the transition functions,
- a transition end time (T3) of a transition time interval of at least one of the transition functions,
- a length (T3-T1) of a transition time interval of at least one of the transition functions,
- a transition reference time (T2) within the transition time interval of at least one of the transition functions,
- a slope, shape or offset of at least one of the transition functions,
- an assignment or deassignment of a preset transition function to or from a selected one of the plurality of decomposed signals.
- 16. Method of at least one of the preceding items, further comprising the steps of
-
- determining at least one tempo parameter of the first and/or second input track, in particular a BPM (beats per minute) and/or a beat grid and/or a beat phase of the first and/or second input track and
- a tempo matching processing based on the determined tempo parameter, including a time stretching and/or time shifting and/or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching BPM and/or mutually matching beat phases.
- 17. Method of at least one of the preceding items, further comprising the steps of
-
- determining a key of the first and/or second input track and
- a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
- 18. Method of at least one of the preceding items, wherein the first input audio track and or the second input audio track are received as a continuous stream, for example a data stream received via internet, a real-time audio stream received from a live audio source or from a playback device in playback mode, and wherein playback of the first output signal and/or second output signal is started while continuing to receive the continuous stream.
- 19. Method of at least one of the preceding items, wherein decomposing first and/or second input signal is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal.
- 20. Method of at least one of the preceding items, wherein the method steps, in particular the steps of providing the first and second input signals, decomposing the first input signal, starting playback of the first output signal and starting playback of the second output signal, are carried out in a continuous process, wherein a time shift between receiving the first input audio track or a first portion of a continuous stream of the first input audio track and starting playback of the first output signal is preferably less than 10 seconds, more preferably less than 2 seconds, and/or wherein a time shift between receiving the second input audio track or a first portion of a continuous stream of the second input audio track and starting playback of the second output signal is preferably less than 10 seconds, more preferably less than 2 seconds.
- 21. Method of at least one of the preceding items, wherein at least one, preferably all of the first and second input signals, the decomposed signals and the first and second output signals represent stereo signals, each comprising a left-channel signal portion and a right-channel signal portion, respectively.
- 22. Device for processing audio signals, comprising:
-
- a first input unit providing a first input signal of a first input audio track and a second input unit providing a second input signal of a second input audio track,
- a decomposition unit configured to decompose the first input audio signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal,
- a playback unit configured to start playback of a first output signal obtained from recombining at least the first decomposed signal at a first volume level with the second decomposed signal at a second volume level, such that the first output signal substantially equals the first input signal,
- a transition unit for performing a transition between playback of the first output signal and playback of a second output signal obtained from the second input signal, wherein the transition unit has a volume control section adapted for reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function.
- 23. Device of
item 22, wherein the decomposition unit is configured to decompose the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal, -
- wherein the second output signal is obtained from recombining at least the third decomposed signal at a third volume level and the fourth decomposed signal at a fourth volume level,
- wherein the volume control section is adapted for increasing the third volume level according to a third transition function and increasing the fourth volume level according to a fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
- 24. Device of
item 22 or item 23, wherein each of the transition functions assigns a predetermined volume level or a predetermined change in volume level -
- to each of a plurality of time frames within a transition time interval defined between a transition start time (T1) and a transition end time (T3), and/or
- to each of a plurality of controller positions within a controller range of a user operated controller defined between a controller first end position and a controller second end position.
- 25. Device of
item 24, -
- wherein the first transition function and the second transition function are defined such that the volume level is at a maximum at the transition start time (T1) and/or at the controller first end position, and at a minimum, in particular corresponding to substantially silence at the transition end time (T3) and/or at the controller second end position, and/or
- wherein the third transition function and the fourth transition function are defined such that the volume level is at a minimum, in particular corresponding to substantially silence at the transition start time (T1) and/or at the controller first end position, and at a maximum at the transition end time (T3) and/or at the controller second end position.
- 26. Device of at least one of
items 22 to 25, wherein at least one of the transition functions is a linear function or contains a linear portion. - 27. Device of at least one of
items 22 to 26, wherein at least one of the transition functions is a continuous function and/or a monotonic function. - 28. Device of at least one of
items 22 to 27, wherein the first transition function and the second transition function differ from each other with regard to slope and/or wherein the third transition function and the fourth transition function differ from each other with regard to slope. - 29. Device of at least one of
items 22 to 28, wherein the decomposition unit includes an AI system comprising a trained neural network. - 30. Device of at least one of
items 22 to 29, wherein the decomposition unit is configured to decompose the first audio signal and/or the second audio signal with regard to predetermined timbres, such as to obtain decomposed signals of different timbres, said timbres preferably being selected from the group consisting of: -
- a vocal timbre,
- a non-vocal timbre,
- a drum timbre,
- a non-drum timbre,
- a harmonic timbre,
- a non-harmonic timbre,
- any combination thereof.
- 31. Device of item 30, wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
- 32. Device of item 30 or item 31, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or wherein at least at a transition reference time and/or at a controller reference position a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function.
- 33. Device of item 25 and at least one of items 30 to 32, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, and/or wherein a sum of the first transition function and the third transition function is substantially constant, preferably a maximum volume level, throughout the entire transition time interval and/or the entire controller range.
- 34. Device of item 25 and at least one of items 30 to 33, wherein the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or wherein a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1) and the transition end time (T3) and/or between the controller first end position and the controller second end position.
- 35. Device of at least one of
items 22 to 34, further including an analyzing unit configured to analyze an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction. - 36. Device of at least one of
items 22 to 35, further including a user interface configured to accept a user input referring to a transition command, including at least one transition parameter, wherein the transition unit is configured to set at least one of the transition functions according to the transition parameter, wherein the transition parameter is preferably selected from the group consisting of: -
- a transition start time (T1) of a transition time interval of at least one of the transition functions,
- a transition end time (T3) of a transition time interval of at least one of the transition functions,
- a length of a transition time interval of at least one of the transition functions,
- a transition reference time (T2) within the transition time interval of at least one of the transition functions,
- a slope, shape or offset of at least one of the transition functions,
- an assignment or deassignment of a preset transition function to or from a selected one of the plurality of decomposed tracks.
- 37. Device of item 36, wherein the device includes a display unit configured to display a graphical representation of the first input audio track and/or the second input audio track, wherein the user interface is configured to receive at least one transition parameter through a selection or marker applied by the user in relation to the graphical representation of the first input audio track and/or the second input audio track.
- 38. Device of item 36 or item 37, wherein the device includes a display unit configured to display a graphical representation of at least one of the decomposed signals, wherein the user interface is configured to allow a user to assign or deassign a preset transition function to or from a selected one of the plurality of decomposed tracks.
- 39. Device of at least one of
items 22 to 38, further comprising a tempo matching unit configured to determine a tempo of the first and/or second input track, and to carry out a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching tempos. - 40. Device of at least one of
items 22 to 39, further comprising a key matching unit configured to determine a key of the first and/or second input track, and to carry out a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys. - It should be noted that methods and devices as described above as first to fifth aspects of the invention and in the claims may be understood as embodiments of methods and devices as described above in
items 1 to 40. In particular, a transition point as mentioned in the first to fifth aspects of the invention and in the claims may correspond to any of the transition start time, the transition end time and the transition reference time as described in the above items.
Claims (28)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2020/056124 WO2021175455A1 (en) | 2020-03-06 | 2020-03-06 | Method and device for decomposing and recombining of audio data and/or visualizing audio data |
PCT/EP2020/057330 WO2021175456A1 (en) | 2020-03-06 | 2020-03-17 | Method and device for decomposing, recombining and playing audio data |
PCT/EP2020/062151 WO2021175457A1 (en) | 2020-03-06 | 2020-04-30 | Live decomposition of mixed audio data |
PCT/EP2020/065995 WO2021175458A1 (en) | 2020-03-06 | 2020-06-09 | Playback transition from first to second audio track with transition functions of decomposed signals |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2020/065995 Continuation WO2021175458A1 (en) | 2020-03-06 | 2020-06-09 | Playback transition from first to second audio track with transition functions of decomposed signals |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210326102A1 true US20210326102A1 (en) | 2021-10-21 |
Family
ID=69846409
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/905,555 Pending US20230089356A1 (en) | 2020-03-06 | 2020-03-06 | Ai-based dj system and method for decomposing, mising and playing of audio data |
US16/892,063 Active US11216244B2 (en) | 2020-03-06 | 2020-06-03 | Method and device for processing, playing and/or visualizing audio data, preferably based on AI, in particular decomposing and recombining of audio data in real-time |
US17/343,386 Pending US20210326102A1 (en) | 2020-03-06 | 2021-06-09 | Method and device for determining mixing parameters based on decomposed audio data |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/905,555 Pending US20230089356A1 (en) | 2020-03-06 | 2020-03-06 | Ai-based dj system and method for decomposing, mising and playing of audio data |
US16/892,063 Active US11216244B2 (en) | 2020-03-06 | 2020-06-03 | Method and device for processing, playing and/or visualizing audio data, preferably based on AI, in particular decomposing and recombining of audio data in real-time |
Country Status (7)
Country | Link |
---|---|
US (3) | US20230089356A1 (en) |
EP (2) | EP4005243B1 (en) |
CA (1) | CA3170462A1 (en) |
DE (1) | DE202020005830U1 (en) |
ES (1) | ES2960983T3 (en) |
MX (1) | MX2022011059A (en) |
WO (5) | WO2021175455A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11232773B2 (en) * | 2019-05-07 | 2022-01-25 | Bellevue Investments Gmbh & Co. Kgaa | Method and system for AI controlled loop based song construction |
US20220284875A1 (en) * | 2020-03-06 | 2022-09-08 | Algoriddim Gmbh | Method, device and software for applying an audio effect |
US11604622B1 (en) * | 2020-06-01 | 2023-03-14 | Meta Platforms, Inc. | Selecting audio clips for inclusion in content items |
US20230260531A1 (en) * | 2022-02-16 | 2023-08-17 | Sony Group Corporation | Intelligent audio procesing |
US11740862B1 (en) * | 2022-11-22 | 2023-08-29 | Algoriddim Gmbh | Method and system for accelerated decomposing of audio data using intermediate data |
US20230335091A1 (en) * | 2020-03-06 | 2023-10-19 | Algoriddim Gmbh | Method and device for decomposing, recombining and playing audio data |
US20240249706A1 (en) * | 2021-05-27 | 2024-07-25 | Alphatheta Corporation | Sound device, program, and control method |
WO2025033121A1 (en) * | 2023-08-07 | 2025-02-13 | ヤマハ株式会社 | Signal generation method, display control method, and program |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018129388A1 (en) * | 2017-01-09 | 2018-07-12 | Inmusic Brands, Inc. | Systems and methods for generating a visual color display of audio-file data |
CN110688082B (en) * | 2019-10-10 | 2021-08-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device, equipment and storage medium for determining adjustment proportion information of volume |
US11475867B2 (en) * | 2019-12-27 | 2022-10-18 | Spotify Ab | Method, system, and computer-readable medium for creating song mashups |
EP4115630A1 (en) * | 2020-03-06 | 2023-01-11 | algoriddim GmbH | Method, device and software for controlling timing of audio data |
EP4115628A1 (en) | 2020-03-06 | 2023-01-11 | algoriddim GmbH | Playback transition from first to second audio track with transition functions of decomposed signals |
US12242532B2 (en) * | 2020-03-31 | 2025-03-04 | Aries Adaptive Media, LLC | Processes and systems for mixing audio tracks according to a template |
CA3214519A1 (en) | 2021-04-20 | 2022-10-27 | Jesse Dorogusker | Live playback streams |
CN114302309B (en) * | 2021-12-16 | 2024-06-25 | 合肥联宝信息技术有限公司 | Method and device for detecting audio collector |
CN114299976A (en) * | 2022-03-06 | 2022-04-08 | 荣耀终端有限公司 | Audio data processing method and electronic equipment |
US12236926B2 (en) * | 2022-04-26 | 2025-02-25 | Algoriddim Gmbh | System for selection and playback of song versions from vinyl type control interfaces |
US20230360618A1 (en) * | 2022-05-05 | 2023-11-09 | Lemon Inc. | Automatic and interactive mashup system |
WO2023217352A1 (en) | 2022-05-09 | 2023-11-16 | Algoriddim Gmbh | Reactive dj system for the playback and manipulation of music based on energy levels and musical features |
JP2024048970A (en) * | 2022-09-28 | 2024-04-09 | パナソニックオートモーティブシステムズ株式会社 | Signal processing device, signal processing method and program |
CN116185167A (en) * | 2022-10-20 | 2023-05-30 | 瑞声开泰声学科技(上海)有限公司 | Haptic feedback method, system and related equipment for music track-dividing matching vibration |
US20240233694A9 (en) * | 2022-10-20 | 2024-07-11 | Tuttii Inc. | System and method for enhanced audio data transmission and digital audio mashup automation |
EP4375984A1 (en) | 2022-11-22 | 2024-05-29 | algoriddim GmbH | Method and system for accelerated decomposing of audio data using intermediate data |
US12165622B2 (en) | 2023-02-03 | 2024-12-10 | Applied Insights, Llc | Audio infusion system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170301372A1 (en) * | 2016-03-25 | 2017-10-19 | Spotify Ab | Transitions between media content items |
US20180308460A1 (en) * | 2017-04-21 | 2018-10-25 | Yamaha Corporation | Musical performance support device and program |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6184898B1 (en) | 1998-03-26 | 2001-02-06 | Comparisonics Corporation | Waveform display utilizing frequency-based coloring and navigation |
US8311656B2 (en) * | 2006-07-13 | 2012-11-13 | Inmusic Brands, Inc. | Music and audio playback system |
US7525037B2 (en) * | 2007-06-25 | 2009-04-28 | Sony Ericsson Mobile Communications Ab | System and method for automatically beat mixing a plurality of songs using an electronic equipment |
US20120109348A1 (en) * | 2009-05-25 | 2012-05-03 | Pioneer Corporation | Cross fader unit, mixer and program |
US8910046B2 (en) * | 2010-07-15 | 2014-12-09 | Apple Inc. | Media-editing application with anchored timeline |
US20130094131A1 (en) * | 2011-10-18 | 2013-04-18 | Inmusic Brands, Inc. | Case for a tablet computer |
US20130290818A1 (en) * | 2012-04-27 | 2013-10-31 | Nokia Corporation | Method and apparatus for switching between presentations of two media items |
US8812144B2 (en) * | 2012-08-17 | 2014-08-19 | Be Labs, Llc | Music generator |
US9398390B2 (en) * | 2013-03-13 | 2016-07-19 | Beatport, LLC | DJ stem systems and methods |
EP2808870B1 (en) * | 2013-05-30 | 2016-03-16 | Spotify AB | Crowd-sourcing of remix rules for streamed music. |
US20150268924A1 (en) * | 2014-03-19 | 2015-09-24 | Hipolito Torrales, JR. | Method and system for selecting tracks on a digital file |
US10014002B2 (en) * | 2016-02-16 | 2018-07-03 | Red Pill VR, Inc. | Real-time audio source separation using deep neural networks |
US10002596B2 (en) * | 2016-06-30 | 2018-06-19 | Nokia Technologies Oy | Intelligent crossfade with separated instrument tracks |
US11024276B1 (en) * | 2017-09-27 | 2021-06-01 | Diana Dabby | Method of creating musical compositions and other symbolic sequences by artificial intelligence |
WO2019229199A1 (en) * | 2018-06-01 | 2019-12-05 | Sony Corporation | Adaptive remixing of audio content |
US10991385B2 (en) * | 2018-08-06 | 2021-04-27 | Spotify Ab | Singing voice separation with deep U-Net convolutional networks |
WO2021090495A1 (en) * | 2019-11-08 | 2021-05-14 | AlphaTheta株式会社 | Acoustic device, display control method, and display control program |
-
2020
- 2020-03-06 EP EP20712463.7A patent/EP4005243B1/en active Active
- 2020-03-06 MX MX2022011059A patent/MX2022011059A/en unknown
- 2020-03-06 CA CA3170462A patent/CA3170462A1/en active Pending
- 2020-03-06 WO PCT/EP2020/056124 patent/WO2021175455A1/en unknown
- 2020-03-06 US US17/905,555 patent/US20230089356A1/en active Pending
- 2020-03-06 ES ES20712463T patent/ES2960983T3/en active Active
- 2020-03-06 EP EP23192603.1A patent/EP4311268A3/en active Pending
- 2020-03-17 DE DE202020005830.0U patent/DE202020005830U1/en active Active
- 2020-03-17 WO PCT/EP2020/057330 patent/WO2021175456A1/en unknown
- 2020-04-30 WO PCT/EP2020/062151 patent/WO2021175457A1/en active Application Filing
- 2020-06-03 US US16/892,063 patent/US11216244B2/en active Active
- 2020-06-09 WO PCT/EP2020/065995 patent/WO2021175458A1/en unknown
- 2020-11-09 WO PCT/EP2020/081540 patent/WO2021175464A1/en unknown
-
2021
- 2021-06-09 US US17/343,386 patent/US20210326102A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170301372A1 (en) * | 2016-03-25 | 2017-10-19 | Spotify Ab | Transitions between media content items |
US20180308460A1 (en) * | 2017-04-21 | 2018-10-25 | Yamaha Corporation | Musical performance support device and program |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11232773B2 (en) * | 2019-05-07 | 2022-01-25 | Bellevue Investments Gmbh & Co. Kgaa | Method and system for AI controlled loop based song construction |
US20220284875A1 (en) * | 2020-03-06 | 2022-09-08 | Algoriddim Gmbh | Method, device and software for applying an audio effect |
US11462197B2 (en) * | 2020-03-06 | 2022-10-04 | Algoriddim Gmbh | Method, device and software for applying an audio effect |
US20230335091A1 (en) * | 2020-03-06 | 2023-10-19 | Algoriddim Gmbh | Method and device for decomposing, recombining and playing audio data |
US11604622B1 (en) * | 2020-06-01 | 2023-03-14 | Meta Platforms, Inc. | Selecting audio clips for inclusion in content items |
US20240249706A1 (en) * | 2021-05-27 | 2024-07-25 | Alphatheta Corporation | Sound device, program, and control method |
US20230260531A1 (en) * | 2022-02-16 | 2023-08-17 | Sony Group Corporation | Intelligent audio procesing |
US11740862B1 (en) * | 2022-11-22 | 2023-08-29 | Algoriddim Gmbh | Method and system for accelerated decomposing of audio data using intermediate data |
WO2025033121A1 (en) * | 2023-08-07 | 2025-02-13 | ヤマハ株式会社 | Signal generation method, display control method, and program |
Also Published As
Publication number | Publication date |
---|---|
ES2960983T3 (en) | 2024-03-07 |
US11216244B2 (en) | 2022-01-04 |
EP4311268A3 (en) | 2024-04-10 |
WO2021175458A1 (en) | 2021-09-10 |
EP4311268A2 (en) | 2024-01-24 |
WO2021175455A1 (en) | 2021-09-10 |
WO2021175456A1 (en) | 2021-09-10 |
EP4005243A1 (en) | 2022-06-01 |
US20230089356A1 (en) | 2023-03-23 |
DE202020005830U1 (en) | 2022-09-26 |
MX2022011059A (en) | 2022-09-19 |
EP4005243B1 (en) | 2023-08-23 |
US20210279030A1 (en) | 2021-09-09 |
WO2021175464A1 (en) | 2021-09-10 |
CA3170462A1 (en) | 2021-09-10 |
WO2021175457A1 (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210326102A1 (en) | Method and device for determining mixing parameters based on decomposed audio data | |
US11347475B2 (en) | Transition functions of decomposed signals | |
AU2022218554B2 (en) | Method and device for decomposing, recombining and playing audio data | |
US20230120140A1 (en) | Ai based remixing of music: timbre transformation and matching of mixed audio data | |
US11462197B2 (en) | Method, device and software for applying an audio effect | |
US8710343B2 (en) | Music composition automation including song structure | |
JP6056437B2 (en) | Sound data processing apparatus and program | |
JP6926354B1 (en) | AI-based DJ systems and methods for audio data decomposition, mixing, and playback | |
JP5879996B2 (en) | Sound signal generating apparatus and program | |
JP7136979B2 (en) | Methods, apparatus and software for applying audio effects | |
WO2021175461A1 (en) | Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal | |
US20230343314A1 (en) | System for selection and playback of song versions from vinyl type control interfaces | |
WO2023217352A1 (en) | Reactive dj system for the playback and manipulation of music based on energy levels and musical features | |
NZ791507A (en) | Method and device for decomposing, recombining and playing audio data | |
NZ791398B2 (en) | Method and device for decomposing, recombining and playing audio data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ALGORIDDIM GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MORSY, KARIEM;TESSMANN, FEDERICO;TESCHNER, CHRISTOPH;REEL/FRAME:057201/0597 Effective date: 20210720 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |