9 Dec 2021 An Introduction to Machine Learning with AWS DeepComposer
Associate IS Solutions Developer
9 December 2021 - 4 min read
Machine learning (ML) is an exciting branch of artificial intelligence (AI) that is rapidly developing and has the power to enhance the way humans go about their day-to-day lives.
For those curious about ML and AI, AWS has developed a series of hands-on educational tools designed to give developers a fun and creative way to get started with machine learning. These tools include AWS DeepRacer: a 3D race car simulator; AWS DeepLens: a deep learning enabled video camera; and AWS DeepComposer: a music composition tool that can transform a basic melody into a full composition through the power of Generative AI.
As a musician myself, I was eager to dive deep in to DeepComposer and explore the creative possibilities that could be achieved with both a human and AI involved in the composition process.
Machine learning approaches can be divided in to three general categories; supervised learning (the computer is fed a set of questions with correct answers and trained to ‘predict’ the answers to new, unseen questions), reinforcement learning (think Pavlov’s dogs: the computer is rewarded or punished for its actions) and unsupervised learning (the computer identifies patterns within data sets on its own) – which is the approach which we are concerned with when using DeepComposer.
Under the broad umbrella of unsupervised learning, we find Generative AI. Generative AI refers to computer programs that can use existing data in the form of text, images and audio files to create new, similar data. Instead of simply finding patterns in the data, Generative AI enables computers to attempt to recreate these patterns, thus generating similar content. There are many creative examples of Generative AI in action, including the ability to generate photographs of human faces that do not actually exist in the real world, random philosophical quote generators and even a Eurovision Song Contest entry.
But how does it work? Generative AI can be described as a wide category of algorithms – the three most popular being Generative Adversarial Networks (GAN’s), Variational Autoencoders and Autoregressive Models. At a high level, GAN’s are a type of machine learning network in which two networks compete against each other to generate new content. After the input is fed into the program, one network (the Generator) tries to generate realistic content based on the data, while the other network (the Discriminator) tries to distinguish the generated content against the real data.
In the context of DeepComposer, we can think of the Generator as the orchestra who train, practice and try to generate polished music, while the Discriminator is the conductor who judges the quality of the music based on the original repertoire. At first, the Generator produces random content samples resembling the input data and the Discriminator looks for features such as tempo, pitch and velocity from the original data in the samples to decide whether they are a good match. The Generator and Discriminator networks are trained in alternating cycles, so the Generator learns to produce more realistic data, while the Discriminator iteratively gets better at learning to differentiate original data from the newly generated data.
DeepComposer uses the GAN’s technique to create realistic accompaniment tracks. In the DeepComposer Music Studio, we can provide our own melody (using our virtual keyboard) or choose from a set of sample melodies (ranging from ‘Ode to Joy’ to ‘We Wish You a Merry Christmas’).
Next, we need to choose a generative algorithm to train our Machine Learning model. In the Music Studio, we have two different architectures available to us: MuseGAN and U-Net. The U-Net algorithm has been very successful in the image translation domain, while MuseGAN has been specifically designed for generating music. U-Net is a somewhat simpler architecture and therefore slightly easier for beginners to understand. The U-Net algorithm is also slightly limited as its training dataset is based on the music of Bach, whereas the MuseGAN algorithm supports symphony, jazz, pop or rock-based genre datasets. If we wanted to explore further, we could also create our own custom Generative AI architecture from scratch in Amazon SageMaker and use our own datasets to train our model. One DeepComposer user trained her model using Calypso and St Lucian folk music.
DeepComposer will then generate our composition as a series of accompaniment tracks. We can listen back to the composition and select different instrument types ranging from a Honky-tonk piano to a sawtooth synth lead to a sitar. We can also go back and tweak our melody or use the ‘Rhythm Assist’ feature to correct the timing of the notes.
AWS run a monthly competition – AWS DeepComposer Chartbusters – for developers to upload their DeepComposer creations and compete to win prizes. Users upload their compositions to SoundCloud, and a panel of human and AI judges evaluate entries for quality and creativity. Some of the submissions can be found here.
On the whole, I found AWS DeepComposer to be an interesting and exciting tool to play with. Having little prior knowledge of the technicalities of machine learning, the tech was easy to use, and the concepts were explained in a way that was easy to understand and digest as a beginner. While I do not feel that DeepComposer was designed to create innovative, ground-breaking music, or even a catchy ear-worm – I do feel it does the job of educating developers and making machine learning more accessible to beginners. I probably won’t be entering my newly generated compositions into the Eurovision song contest, but I’m certainly intrigued and eager to dive deeper into the world of machine learning – perhaps I’ll start by training my own model with some Europop.