Reverse-engineering synthesizer patches from audio clips.
Software synthesizers have become popular tools for music production in the last few decades, for reasons including their incredible power and relative cheapness. Synthesizers are typically made up of oscillators, filters, envelopes, low-frequency modulators (LFOs), and even effects such as reverb or delay. They also feature a preset or patch functionality, whereby the synthesizers configuration can be saved and reloaded later. This project attempts to reverse-engineering synth patches. I would like to train a deep neural network to accept audio samples as input, and then predict synthesizer settings as output. The purpose of this would be to automatically recreate synthesizer sounds.
Upon having this idea, I dug around a bit and discovered that many smart people have already thought of this possibility, which is sometimes termed 'parameter inference'. Past research includes the application of genetic algorithms (Vinetics), effect approximation on spectrograms (SerumRNN), and patch creation with generative models (Sistema). Research closer to my idea is InverSynth and Syntheon. Approaches can utilize variational autoencoders to derive latent representations from the audio, which are then passed through decoder feed-forward networks (source). One challenge is the complexity of popular software synths. Another observation I've made is that many current solutions are very complicated. I find this somewhat dissatisfying. I'm interested in training a relatively simple but sufficiently large network on pure audio input and parameter output
A potentially major challenge on this project will be acquiring the data necessary for training a model. This was my biggest concern with this idea, and I've already explored several avenues. To generate the data, I will need a method for 1) obtaining large amounts of presets for a software synth, and 2) generating audio samples of these presets efficiently. For (1), I could pick a synth that has many user-made presets online. I could also generate presets by randomizing synth parameters. The caveat of such an approach is that some parameters, such as volume, may result in bad patches.
A larger issue is that many popular software synths are not very programmer friendly. For instance, Serum, a powerful commercial synth, uses files for presets that cannot easily be decoded. By comparison, some synths use preset files that are easily read as a text file. Furthermore, Serum is a very complex synth; termed a 'wavetable synth', it includes the ability to load many different complex waveforms from files. I haven't found a way to encode these preset files in easily-readable manner. Alternatives I have discovered include TAL-Noisemaker and Helm. Both are free and somewhat simpler, and their preset files are just JSON. I may be able to randomly generate patches from this, but it will take some time to parse through the parameters.
Generating audio files from presets is also a challenge. A synthesizer alone cannot export sound to a .wav file. Typically, synthesizers are used within Digital Audio Workstations (DAWs). Most DAWs are designed to be used manually, and don't feature any programmatic or automatable functions. An alternative I have found is DawDreamer, a Pythonic DAW. The problem with DawDreamer is that it's just a bit janky, and it's compatibility with loading synth plugins and patches is extremely limited. However, I have so far been able to load TAL Noisemaker, load a preset, and export a .wav file, so that's a good start. A possible alternative is RenderMan.
WIP