Google has developed a new AI model called MusicLM that generates music from text descriptions.
The model uses a dataset of 5,521 music-text pairs and has been trained to map audio tokens to semantic tokens in captions for training.
The MusicLM is capable of creating 24 KHz musical audio from text descriptions and has the ability to transform a hummed melody into different musical styles.
Google's MusicLM model aims to eliminate the cost barriers faced by artists in the music industry.
According to ZDNet, finding a music producer and studio can be challenging, and Google's MusicLM might be a valuable tool for artists.
"What if you could just tell your computer to make the beat you envisioned for you at the touch of a button?" says ZDNet.
Using AI, MusicLM creates music from user text prompts.
Google's MusicLM uses a dataset called MusicCaps, which is composed of thousands of music-text pairs that Google collected from its AudioSet.
According to Ars Technica, AudioSet is a collection of over 2 million labeled 10-second sound clips pulled from YouTube videos.
MusicLM takes the sequence of audio tokens and maps it to semantic tokens in the captions for training.
Google's MusicLM has the ability to take a calm violin melody and back it with a distorted guitar riff or transform a hummed melody into a different musical style.
Researchers from Google say their model could output high-fidelity, sustained music for several minutes, but MusicLM isn't quite ready to produce a symphony anytime now.
They admitted that its limitations include a low amount of training data, and the inability to account for style preferences of different audiences.
Ars Technica warns that anyone looking for original compositions might be disappointed, as MusicLM doesn't synthesize new melodies but only produces variations.