Rapping is no easy feat. In-demand artists like Sean “Diddy” Combs and Jay Z bring home fortunes to the tune of $735 and $550 million respectively for their stylish flow. That kind of dough can buy you ~3,600,000 bottles at any hot club. Or ~816 million hours of GPU server time from AWS. Depends on your priorities.
But Dr. Dre better watch out, because a bunch of Finnish nerds just developed a “novel deep neural network architecture” to give him a run for his money ($700 million, to be precise). The technique is aptly named Dope Learning and powers an online tool called DeepBeat. When a homogenous bunch of European and presumably white dudes build an AI rapper, does this count as a new digital form of cultural appropriation? Debatable, but we can save that argument for another day.
Aside from broaching important cultural subjects, rap consists of intricate structures and complex rhyme patterns which require sophisticated language and lyrical skills to generate. Most of us couldn’t freestyle rap to save our lives. DeepBeat tackles the challenge by first looking at rap lyrics and predicting the next line. These predictions are then combined to merge lines from existing songs into new rhymes with meaning.
The team started with half a million lines of rap songs from over 100 artists. Lyrics creation was then modeled as an “information retrieval” problem, where the query is the first x number of lines of a song, and the answer is the most relevant follow up lyrics. This approach simplifies the challenge of measuring performance, because accuracy can be assessed against actual songs. A generative model that constructs new lyrics word by word would yield more creative output, but also drive the complexity up significantly. Perhaps an aspiring academic should hustle on this front and produce a Dope Learning 2 paper.
After mapping lines to high-dimensional vector space, DeepBeat leverages a Ranking SVM to pick the most relevant next lyric. Document ranking algorithms like PageRank use a single static ranking, but accuracy can be dramatically improved by combining multiple features via machine learning algorithms. Recurrent neural networks (RNNs) are typically used for textual predictions, but since rap lines are relatively short and equal in length, a feedforward network is a simpler and still suitable architecture for this problem.
Now for a lesson in rhyme schemes. Rap lyrics rarely employ “perfect rhymes” such as “Hey I got GPUs, but Google’s got TPUs.” Alliteration rhymes involve repeating the opening sounds of adjacent words, such as “Turing Test”, but don’t require the words start with the same letter, as in “Know Your Neural Networks.” Alliteration is a special subset of consonance, which is the repetition of consonants. Consonants can be matched throughout a word, as in “mode collapse just makes me gasp.” Assonance is the repetition of vowels and is by far the most popular rhyme scheme due to versatility. Consider this sentence “It is my greatest dream, to build a deep learning machine.” Finally, you can have multi-syllabic rhymes. I spent 15 minutes trying to come up with a deep learning themed multi-syllabic rhyme using only assonance but failed. Rapping is hard…
Dope Learning detects rhymes by translating words into phonetic representations and computing “rhyme density”, which represents the “average length of the longest rhyme per word.” The measure positively correlates with the complexity rating human rappers give their own lines. Song structure is another pattern the network must learn, as songs often alternate between verses and chorus. Any particular rhyme scheme can be maintained for multiple lines or even an entire verse or chorus.
DeepBeat launched in November 2015 and attracted 42,000 users by June 2016. The tool enables users to select keywords that must appear in generated lyrics, get automated suggestions, and also write their own content. Performance was measured in three distinct ways. For selecting the next lyric, the algorithm was given 299 randomly selected lines and asked to choose the best line. The result was an accuracy of 17%, or 50x better than chance. Second, using the measure of rhyme density (mathematical details can be perused in the paper), DeepBeat outperformed human rappers by 21%. Finally, real user data from the DeepBeat online tool was analyzed and machine choices correlated with lines preferred by humans.
Dr. Dre, I hope you’re ready for your epic rap battle against MC DeepBeat.