(unknown artist) - Blending samples from the Volca Sample with an autoencoder
Blending samples from the Volca Sample with an autoencoder
(unknown artist)
00:00

Autoencoders are one of those ideas that are too tempting to try out for everything once you learn about them: by making a neural network try to recreate your data points after first compressing them to a small number of dimensions, you can make a compression method specifically for your dataset, which you can then reuse to classify or generate new points.

One article I enjoyed -- "RAVE: A variational autoencoder for fast and high-quality neural audio synthesis" -- suggests using autoencoders for sound generation: by setting up an autoencoder which compresses sounds to very few dimensions, you can blend between sounds and generate new ones by altering the compressed sound before you pass it to the decoder/decompressor.

For my first go at this, I'm ignoring most sensible advice from signal processing and machine learning in the past decades and jamming the time-domain signal directly into a plain old MLP/dense autoencoder. (To deal with the fixed input dimension, every sound is resampled to the same sample rate and cut/padded to the same length first.)

This works better than I expected, but is quite noisy, and sampling random latents (as in the clip) doesn't give as weird results as I'd hoped. It was a fun way of testing out the audio support in PyTorch, though - now to actually read the articles that describe how to do this properly...


You must log in to comment.