Latent Terrain: Coordinates-to-Latents Generator for Neural Audio Autoencoders

At the final stage of packaging up the repository, will get everything done by the end of May.
Once ready, a release tag will be added to the repository, with the external objects, example Max patches, instructions to compile.

Latent terrain is a coordinates-to-latents mapping model for neural audio autoencoders, can be used to build a mountainous and steep surface map for the autoencoder's latent space. A terrain produces continuous latent vectors in real-time, taking coordinates in the control space as inputs.

Need help!

Hi, this is Shuoyang (Jasper). nn.terrain is part of my ongoing PhD work on Discovering Musical Affordances in Neural Audio Synthesis, and part of the work has been (will be) on putting AI audio generators into the hands of composers/musicians.

Therefore, I would love to have you involved in it - if you have any feedback, a features request, a demo / a device / or a ^@#*$- made with nn.terrain, I would love to hear. If you would like to collaborate on anything, please leave a message in this form: https://forms.office.com/e/EJ4WHfru1A

What's a Neural Audio Autoencoder and why Latent Terrain?

A neural audio autoencoder (such as RAVE) is an AI audio generation tool, it has two components: an encoder and a decoder.

The encoder compresses a piece of audio signal into a sequence of latent vectors (a latent trajectory). This compression happens in the time domain, so that the sampling rate goes from 44100Hz (audio sampling rate) to 21.5Hz (latent space sampling rate).
The decoder takes the latent trajectory to produce a piece of audio signal. The decoder can also be used as a parametric synthesiser by manually navigating the latent space (i.e., latent space walk).

Latent terrain allows you to navigate latent space of the generative AI like walking on a terrain surface. It tailors the latent space to a low-dimensional (e.g., a 2D plane) control space. And this terrain surface is nonlinear (i.e., able to produce complex sequential patterns), continuous (i.e., allows for smooth interpolations), and tailorable (i.e., DIY your own materials with interactive machine learning).

This repository is a set of Max externals to build, visualise, and program latent terrain:

Object	Description
`nn.terrain~`	Load, build, train, and save a terrain. Perform the coordinates-to-latents mapping.
`nn.terrain.encode`	Encode audio buffers into latent trajectories using a pre-trained audio autoencoder, to be used as training data for `nn.terrain~`.
`nn.terrain.record`	Record latent trajectories to be used as training data for `nn.terrain~`.
`nn.terrain.gui`	Edit spatial trajectories to be used as training data for `nn.terrain~`. Visualise the terrain. Create and program trajectory playbacks.

Demos

The projection from a latent space to a latent terrain is done by pairing latent trajectories and spatial trajectories on a 2D plane (or any low-dimensional space). After providing examples of inputs (spatial trajectories) and their corresponding outputs (latent trajectories), the terrain can be trained very quickly (~15s) using supervised machine learning.

train.mp4

Sound synthesising with latent terrain is similar to wave terrain synthesis, operating in the latent space of an audio autoencoder. An audio fragment can be synthesised by pathing through the terrain surface.

play.mp4

A presentation at the IRCAM Forum Workshops 2025 can be found in this article.

Compatibility and Installation

This external works with nn_tilde v1.5.6 (torch v2.0.0/2.0.1). If you have a nn~ built from another torch version, you might have to build this yourself. See the Build Instructions documentation.

We only have MaxMSP version at the moment, sorry. Windows and arm64 macOS supported.

macOS

Uncompress the .tar.gz file in the Package folder of your Max installation, which is usually in ~/Documents/Max 9/Packages/.

Reopen Max and you can find all nn.terrain objects. You might get a quarantine warning, proceed will disable this warning. If they still have trouble opening, or doesn't work correctly with nn_tilde, you might want to build the externals yourself, see Build Instructions.

Windows

Uncompress the .tar.gz file in the Package folder of your Max installation, which is usually in ~/Documents/Max 9/Packages/.

Copy all .dll files in the package next to the ˋMax.exeˋ executable (if you have already done this for nn_tilde, you don't need to do this again).

Usage

Here we briefly walk through the features/functionalities, while detailed walkthroughs can be found in the .maxhelp help file for each object.

Building a customised terrain

A terrain is built by pairing latent trajectories and coordinate trajectories. And then using a supervised machine learning algorithm to learn this pairing. Here are the steps:

Terrain parameters

First, we'll define the dimensionality of the latent space and control space. This can be set by the first two arguments of the object. For instance, nn.terrain~ 2 8 will create a terrain that takes 2 input signals and produces a 8-dimensional latent vector.

Arguments of nn.terrain~:

Object	Arguments	Description
nn.terrain~	control_dim	Number of input channels
	latent_dim	Number of output channels, this should usually be the dimensionality of the autoencoder's latent space.
	gauss_scale	(optional) Gaussian scale of the Fourier feature mapping. A higher Gaussian scale leads to a noisy terrain, lower leads to smoother less-mountainous terrain. A float value between 0.05 - 0.3 is suggested. If the Gaussian scale is 0, the Fourier feature mapping layer will be removed, resulting in a very smooth (low-frequency) terrain, useful for point-by-point mode.
	network_channel	(optional) The number of neurons in each hidden layer. Defined by the network_channel argument. By default 128
	feature_size	(optional) The size of the random Fourier feature mapping, a higher feature size is suggested when using a high-dimensional control space. By default 256
	buffer_size	(optional) The inference of the terrain will happen once per buffer size, keeping it the same with the buffer size of nn~ is suggested.

Training examples preparation

Gethering latent trajectories:
- Create a nn.terrain.encode and put audio sample(s) in buffer~ or polybuffer~
- Use append message to add buffers to the encoder, message encode to convert them to latent trajectory in a dictionary file.
- Additionally, see nn.terrain.encode's help file for how to encode samples from a playlist~ object.
Gethering spatial trajectories:
- Create a nn.terrain.gui and set the UI Task (task) attribute to Dataset
- Define the target length of trajectories (in ms) using a list message or an append message.
- Draw lines as trajectories.

Training

Terrain training can be done within the nn.terrain~ object:

Send the data dictionaries we got in previous step to nn.terrain~
The terrain will be trained for 10 epochs once a train message is received (this number can be changed by the training_epoch attribute).
route the last outlet to loss and epoch to inspect the training loss.

Object	Training Attributes	Description
nn.terrain~	training_batchsize	Batch size used to train the neural network in the terrain (i.e., how many coordinate-latent pairs). Just a matter of memory consumption, won't affect the result too much.
	training_epoch	How many epochs will be trained once a `train` message is received.
	training_lr	Learning rate used when training.
	training_worker	The same effect as PyTorch dataloader's `num_workers`, won't affect the result too much.

After you feel good about the training loss, the terrain is good to go.

Saving (Checkpoints)

Use the checkpoint message to save the tarrain to a .pt file. Saving name and path can be set in attributes.
Load a terrain .pt file by giving its file name as an argument.

Visualising a terrain

Since the control space is 2D, the latent space can be visualised by sampling the control space across a closed interval (i.e., width and height in this example). Use the plot_interval message to do this:

plot_interval for 2D plane takes 6 arguments:
lower and upper bound values of the x and y axes in the control space (these usually are the same as the va 8692 lues_bound attribute of nn.terrain.gui)
resolution of the x and y axes (these usually are the same as the width and height of nn.terrain.gui)

Programming trajectory playback

You can create trajectories to navigate the terrain. This trajectory playback can be controled be a signal input.

Set the 'UI Tasks' (task) attribute of nn.terrain.gui to 'play'.
This behaviour is similar to the play~ object in Max.

Stylus mode

Set the 'UI Tasks' (task) attribute of nn.terrain.gui to 'stylus' to use it as a trackpad. If you are using a tablet/stylus, it also supports the pen pressure.

stylus.mp4

Point-by-Point Steering

[todo] It also supports the point-by-point steering approach proposed by Vigliensoni and Fiebrink (2023).

TODOs

[✕︎] Load and inference scripted mapping model exported bt torchscript.
[✔︎] Display terrain visualisation.
- [✔︎] Greyscale (one-channel)
- [✔︎] Multi-channel (yes but no documentation atm)
[✔︎] Interactive training of terrain models in Max MSP.
[✔︎] Customised configuration of Fourier-CPPNs (Tancik et al., 2020).
[✕︎] Example patches, tutorials...
[✕︎] PureData

Build Instructions

If the externals have trouble opening in Max, or doesn't work correctly with nn_tilde, you might want to build the externals yourself, see the Build Instructions documentation.

Acknowledgements

Shuoyang Zheng, the author of this work, is supported by UK Research and Innovation [EP/S022694/1].
This is built on top of acids-ircam's nn_tilde, with a lot of reused code including the cmakelists templates, backend.cpp, circular_buffer.h, and the model performing loop in nn.terrain_tilde.cpp.
Caillon, A., Esling, P., 2022. Streamable Neural Audio Synthesis With Non-Causal Convolutions. https://doi.org/10.48550/arXiv.2204.07064
Tancik, M., Srinivasan, P.P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J.T., Ng, R., 2020. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. NeurIPS.
Vigliensoni, G., Fiebrink, R., 2023. Steering latent audio models through interactive machine learning, in: In Proceedings of the 14th International Conference on Computational Creativity.

Name		Name	Last commit message	Last commit date
Latest commit History 344 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
extras		extras
install		install
python_tools		python_tools
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_nn_tilde.md		README_nn_tilde.md
package-info.json.in		package-info.json.in
requirements.txt		requirements.txt
requirements_darwin_x64.txt		requirements_darwin_x64.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Terrain: Coordinates-to-Latents Generator for Neural Audio Autoencoders

Table of Contents

Need help!

What's a Neural Audio Autoencoder and why Latent Terrain?

Demos

Compatibility and Installation

macOS

Windows

Usage

Building a customised terrain

Terrain parameters

Training examples preparation

Training

Saving (Checkpoints)

Visualising a terrain

Programming trajectory playback

Stylus mode

Point-by-Point Steering

TODOs

Build Instructions

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

jasper-zheng/nn_terrain_tilde

Folders and files

Latest commit

History

Repository files navigation

Latent Terrain: Coordinates-to-Latents Generator for Neural Audio Autoencoders

Table of Contents

Need help!

What's a Neural Audio Autoencoder and why Latent Terrain?

Demos

Compatibility and Installation

macOS

Windows

Usage

Building a customised terrain

Terrain parameters

Training examples preparation

Training

Saving (Checkpoints)

Visualising a terrain

Programming trajectory playback

Stylus mode

Point-by-Point Steering

TODOs

Build Instructions

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages