At the final stage of packaging up the repository, will get everything done by the end of May.
Once ready, a release tag will be added to the repository, with the external objects, example Max patches, instructions to compile.
Latent terrain is a coordinates-to-latents mapping model for neural audio autoencoders, can be used to build a mountainous and steep surface map for the autoencoder's latent space. A terrain produces continuous latent vectors in real-time, taking coordinates in the control space as inputs.
- What's a Neural Audio Autoencoder and why Latent Terrain?
- Demos
- Compatibility & Installation
- Usage
- TODOs
- Build Instructions
- Acknowledgements
Hi, this is Shuoyang (Jasper). nn.terrain
is part of my ongoing PhD work on Discovering Musical Affordances in Neural Audio Synthesis, and part of the work has been (will be) on putting AI audio generators into the hands of composers/musicians.
Therefore, I would love to have you involved in it - if you have any feedback, a features request, a demo / a device / or a ^@#*$- made with nn.terrain, I would love to hear. If you would like to collaborate on anything, please leave a message in this form: https://forms.office.com/e/EJ4WHfru1A
A neural audio autoencoder (such as RAVE) is an AI audio generation tool, it has two components: an encoder and a decoder.
-
The
encoder
compresses a piece of audio signal into a sequence of latent vectors (a latent trajectory). This compression happens in the time domain, so that the sampling rate goes from 44100Hz (audio sampling rate) to 21.5Hz (latent space sampling rate).
-
The
decoder
takes the latent trajectory to produce a piece of audio signal. The decoder can also be used as a parametric synthesiser by manually navigating the latent space (i.e., latent space walk).
Latent terrain allows you to navigate latent space of the generative AI like walking on a terrain surface. It tailors the latent space to a low-dimensional (e.g., a 2D plane) control space. And this terrain surface is nonlinear (i.e., able to produce complex sequential patterns), continuous (i.e., allows for smooth interpolations), and tailorable (i.e., DIY your own materials with interactive machine learning).
This repository is a set of Max externals to build, visualise, and program latent terrain:
Object | Description |
---|---|
nn.terrain~ |
|
nn.terrain.encode |
|
nn.terrain.record |
|
nn.terrain.gui |
|
The projection from a latent space to a latent terrain is done by pairing latent trajectories and spatial trajectories on a 2D plane (or any low-dimensional space). After providing examples of inputs (spatial trajectories) and their corresponding outputs (latent trajectories), the terrain can be trained very quickly (~15s) using supervised machine learning.
train.mp4
Sound synthesising with latent terrain is similar to wave terrain synthesis, operating in the latent space of an audio autoencoder. An audio fragment can be synthesised by pathing through the terrain surface.
play.mp4
A presentation at the IRCAM Forum Workshops 2025 can be found in this article.
This external works with nn_tilde v1.5.6 (torch v2.0.0/2.0.1). If you have a nn~
built from another torch version, you might have to build this yourself. See the Build Instructions documentation.
We only have MaxMSP version at the moment, sorry. Windows and arm64 macOS supported.
Uncompress the .tar.gz
file in the Package
folder of your Max installation, which is usually in ~/Documents/Max 9/Packages/
.
Reopen Max and you can find all nn.terrain objects. You might get a quarantine warning, proceed will disable this warning. If they still have trouble opening, or doesn't work correctly with nn_tilde, you might want to build the externals yourself, see Build Instructions.
Uncompress the .tar.gz
file in the Package
folder of your Max installation, which is usually in ~/Documents/Max 9/Packages/
.
Copy all .dll
files in the package next to the ˋMax.exeˋ executable (if you have already done this for nn_tilde, you don't need to do this again).
Here we briefly walk through the features/functionalities, while detailed walkthroughs can be found in the .maxhelp
help file for each object.
A terrain is built by pairing latent trajectories and coordinate trajectories. And then using a supervised machine learning algorithm to learn this pairing. Here are the steps:
First, we'll define the dimensionality of the latent space and control space. This can be set by the first two arguments of the object. For instance, nn.terrain~ 2 8
will create a terrain that takes 2 input signals and produces a 8-dimensional latent vector.
Arguments of nn.terrain~
:
Object | Arguments | Description |
---|---|---|
nn.terrain~ | control_dim | Number of input channels |
latent_dim | Number of output channels, this should usually be the dimensionality of the autoencoder's latent space. | |
gauss_scale |
(optional)
|
|
network_channel | (optional) The number of neurons in each hidden layer. Defined by the network_channel argument. By default 128 | |
feature_size | (optional) The size of the random Fourier feature mapping, a higher feature size is suggested when using a high-dimensional control space. By default 256 | |
buffer_size | (optional) The inference of the terrain will happen once per buffer size, keeping it the same with the buffer size of nn~ is suggested. |
-
Gethering latent trajectories:
-
Gethering spatial trajectories:
- Terrain training can be done within the
nn.terrain~
object:-
Send the data dictionaries we got in previous step to
nn.terrain~
-
The terrain will be trained for 10 epochs once a
train
message is received (this number can be changed by thetraining_epoch
attribute). -
route
the last outlet toloss
andepoch
to inspect the training loss. -
Object Training Attributes Description nn.terrain~ training_batchsize Batch size used to train the neural network in the terrain (i.e., how many coordinate-latent pairs). Just a matter of memory consumption, won't affect the result too much. training_epoch How many epochs will be trained once a train
message is received.training_lr Learning rate used when training. training_worker The same effect as PyTorch dataloader's num_workers
, won't affect the result too much.
-
- After you feel good about the training loss, the terrain is good to go.
- Use the
checkpoint
message to save the tarrain to a.pt
file. Saving name and path can be set in attributes. - Load a terrain
.pt
file by giving its file name as an argument.
Since the control space is 2D, the latent space can be visualised by sampling the control space across a closed interval (i.e., width and height in this example). Use the plot_interval
message to do this:
plot_interval
for 2D plane takes 6 arguments:- lower and upper bound values of the x and y axes in the control space (these usually are the same as the
va 8692 lues_bound
attribute ofnn.terrain.gui
) - resolution of the x and y axes (these usually are the same as the width and height of
nn.terrain.gui
)
You can create trajectories to navigate the terrain. This trajectory playback can be controled be a signal input.
- Set the 'UI Tasks' (
task
) attribute of nn.terrain.gui to 'play'. - This behaviour is similar to the
play~
object in Max.
Set the 'UI Tasks' (task
) attribute of nn.terrain.gui to 'stylus' to use it as a trackpad. If you are using a tablet/stylus, it also supports the pen pressure.
stylus.mp4
[todo] It also supports the point-by-point steering approach proposed by Vigliensoni and Fiebrink (2023).
- [✕︎] Load and inference scripted mapping model exported bt torchscript.
- [✔︎] Display terrain visualisation.
- [✔︎] Greyscale (one-channel)
- [✔︎] Multi-channel (yes but no documentation atm)
- [✔︎] Interactive training of terrain models in Max MSP.
- [✔︎] Customised configuration of Fourier-CPPNs (Tancik et al., 2020).
- [✕︎] Example patches, tutorials...
- [✕︎] PureData
If the externals have trouble opening in Max, or doesn't work correctly with nn_tilde, you might want to build the externals yourself, see the Build Instructions documentation.
-
Shuoyang Zheng, the author of this work, is supported by UK Research and Innovation [EP/S022694/1].
-
This is built on top of acids-ircam's nn_tilde, with a lot of reused code including the cmakelists templates,
backend.cpp
,circular_buffer.h
, and the model performing loop innn.terrain_tilde.cpp
. -
Caillon, A., Esling, P., 2022. Streamable Neural Audio Synthesis With Non-Causal Convolutions. https://doi.org/10.48550/arXiv.2204.07064
-
Tancik, M., Srinivasan, P.P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J.T., Ng, R., 2020. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. NeurIPS.
-
Vigliensoni, G., Fiebrink, R., 2023. Steering latent audio models through interactive machine learning, in: In Proceedings of the 14th International Conference on Computational Creativity.