How AI Transforms Music Creation

Technology companies have been experimenting with artificial intelligence (AI) in music production for decades, just as they create software that allows a computer to beat a chess grandmaster and work on self-driving cars. The principles are the same in all cases: The computer is feed with millions of examples and you let it discover patterns itself. Significant breakthroughs have been achieved in recent years. As a result, a track created predominantly using AI is, to the layman, indistinguishable from ‘real’ music.  Especially when combined with the use of deepfakes. Look and listen to the following track.

I don’t think this track sounds too bad. It’s called ‘Peronu’ de la garä’. It was sung in Romanian by Lolita Cercel. But Lolita Cercel doesn’t exist: Her appearance and voice, the melody and the arrangement were generated using artificial intelligence. An exception?  Of all the tracks uploaded daily to Deezer – a streaming service – 34% are now created entirely or partly using artificial intelligence, usually without listeners realising it. How does this work and where is it heading?  Those are the questions I’ll answer in this post.

What can computers do when it comes to music production?

For years, computers have been able, for example, to isolate the vocal parts and instruments used in a piece of music and turn them into sheet music. It has also been possible for some time for them to ‘notate’ a hummed melody, including suggestions for accompanying chords. In these cases, computers do not create anything ‘new’. This is why we refer to it as non-generative artificial intelligence. 

Holly Hendorn has been producing music with computers for years. She mainly uses voices from professional singers and (amateur) choirs. She holds a PhD in ‘computer technology and music’ and has designed many of the AI applications she uses herself. She also performs live. She produces sounds and the computer turns them into a piece of music. Here you can see one such performance. 

Since the emergence of generative artificial intelligence, a computer has indeed been able to create something ‘new’. The scale on which this happens has increased enormously over the last few years. Moreover, affordable programs have come onto the market that produce a ready-made track within minutes. One of the first successful attempts to create compositions using AI dates to 2016. This was carried out by Sony Computer Science Laboratories Paris (Sony CSL Paris) with funding from the EU. The singer Benoît Carré wrote the lyrics and the computer generated the melodies based on just 45 Beatles songs. Listen to ‘Daddy’s Car’.

A year later, in 2017, Taryn Southern released the album ‘I am AI’. She deployed various AI applications. She wrote the lyrics; the melodies; instrumentation and backing vocals were generated using AI under her direction.  To generate the melody of the track you can now listen to, the computer was trained using piano sonatas from the 19th century.

A lot has changed since 2017. With the software that has been available for several years now, a layperson can have the computer generate a complete piece of music, including lyrics, music and the singer’s imaginary face. By far the most widely used application is SUNO.

The big question is: how does this work?

To explain how this works, I’ll start by looking at how a composer or lyricist goes about it. It’s likely that they listen to a lot of music and are skilled at recognising style, structure, word choice and ‘hooks’ (surprising fragments of text or combinations of sounds), and can use these creatively. If you ask this musician to write a song about some topic, there’s a good chance that ideas for this song will come to mind within moments.

Artificial intelligence does the same, albeit based on millions of songs. To achieve this, the ‘system’ has processed the combinations of sounds, chords, arrangements and instruments found within them; it possesses an almost unlimited vocabulary and distils atmosphere, timbre and style based on increasingly refined classes of characteristics.

I’m going to play a few examples. The first example is the fictional band ‘The Velvet Sundown’, which was streamed over a million times on Spotify in 2025. You’ll be listening to and watching the track ‘The corner bar’. 

After some time, Spotify also realised the deception and described the group The Velvet Sundown as “a synthetic music project, led by human creative direction, and composed, sung and visualised with the support of artificial intelligence”. Lawyers have no doubt spent a great deal of time deliberating over this wording.  AI-generated music videos resemble other music videos in many respects. The key difference is that the ‘musicians’ are usually deepfakes, actors or both, and share the common trait of being unable to sing or play any instrument.

A second example is ‘Walk my walk’, sung by a fictional singer called ‘Breaking Rust’. 

This singer gained 35,000 followers on Instagram. The lyrics, the music and the arrangement were all generated by SUNO.

Despite the impressive results of applying AI with virtually no human intervention, the term ‘artificial intelligence’ is misplaced. What we are seeing is the ultra-fast processing of an enormous amount of diverse data, ranging from encyclopedias, websites, dictionaries, images, films and existing pieces of music. The quality of the result depends on the quantity of available data and its variety. Not to mention talented programmers and energy-guzzling data centers.

How is this data processed? 

In data centers, millions of musical pieces are dissected down to their very core.  Combinations of melodies and chords, rhyme structures and their meanings, artists’ personal details and characteristics of their work are stored. Suppose you ‘map’ this data and its relationships in a multidimensional figure; after processing a few hundred tracks, you will still hear certain patterns emerging. As more data is added, you hear nothing but noise, without the original data having been lost. This process is called diffusion. 

You can put this massive database to work for you using simple textual commands, known as ‘prompts’. Such a prompt might, for example, read:

“Funky synthpop, downbeat with a driving, steady bassline, a characteristic 80s vibe, with alternating uplifting and resigned lyrics about relationships, sung as a duet with a backing choir”

Using a ‘prompt’ and several pre-programmed criteria that pieces of music of a certain type must meet, the computer reverses the diffusion process. From the noise, increasingly distinct combinations of text and sound emerge. The result can be different every time. 

In the following video, music critic and lecturer, producer, multi-instrumentalist and vlogger Rick Beato demonstrates how he produced a track himself, using AI, for the listeners of his podcast:

What are people’s views on AI-produced music?

Over time, various views have emerged regarding the nature of music in general.  

1. Music is the result of a feeling – whether authentic or not – that the creator wishes to express. In the entertainment industry, this is primarily ‘good feeling’.  To the extent that listeners recognise this, they may feel connected to it. Most songs fall into this category.

2. Music is an objectification of the creator’s lived experience and contains essential knowledge of reality, or as Langer puts it: ‘Music is lived experience that presents the morphology of felt life’. 

In the first description, the emphasis is on expression; in the second, on representation. In the first view, music is a means of evoking a mood. In the second view, music helps to answer the question of the meaning of events unfolding in the world.

From each of these perspectives, one can view AI-generated music differently. 

If music is primarily a means of evoking feelings (expression), then many listeners feel that how it is created does not matter that much. Others are deceived; they had already been looking for the opportunity to attend a live concert by their as it turns out non-existent idol.

The second perspective revolves around the experience of what is happening in the world and the singer’s view of it. Listeners may or may not identify with this. This is ‘in principle’ inconceivable with music generated by artificial intelligence. ‘In principle’, because a fascinating grey area is emerging, within which humans and machines collaborate.

Telisha Jones is an interesting case. She signed a $3 million contract to write lyrics for the (fictional) singer Xania Monet. The song she wrote, “How was I supposed to know?”, is the first AI-generated track to reach the Billboard radio charts.  In just a few months, more than 3 million people have listened to it on Spotify. Share them here:

Telisha Jones says she wrote the lyrics based on her own experiences. She uses SUNO to generate the vocals and melody, the arrangement and the instrumentation. In a television interview, Jones said she wants people to know that there is a real person behind Xania and that the lyrics express that person’s emotions.  She remained vague about that person’s identity; my impression is that it is herself. Listen to excerpts from an interview with Telisha Jones on CBS.

Whether Telisha Jones’s story is true is not that important.  The key question is whether it is possible for composers and lyricists to ‘collaborate’ with AI to create innovative and high-quality music. I was referring to a fascinating grey area. As far as I’m concerned, the creation of fictional performers who cannot play an instrument or sing themselves falls outside that area. Musicians who have SUNO provide lyrics or melodies which are complementary to own work are inside the area, provided there is no doubt about who contributed what.

Can AI-generated music be art?

Purists do not even consider this question because they believe that, in the case of AI, one cannot speak of music. They reserve the elegant German term ‘Ohrenkitzlung’ for AI-generated sounds. I have no objection to the term ‘music’, particularly if there is at least an equal contribution from human and machine to its creation. This also determines whether one can potentially speak of art. But what is art?

The most concise description is that art concerns creative expressions by people, which demonstrate craftsmanship. It is often added that this requires recognition from either expert art critics and/or the public.  Furthermore, this judgement must have sufficiently stood the test of time. In a few years’ time, it may turn out that Telisha Jones’s texts possess an exceptional poetic quality. They could then be considered a form of art.

Essentially, art is about the connection between something people have in their minds and the unique and skillful way in which they depict it, whether or not using technical aids. Many authors of ‘prompts’ devise, at most, a few characteristics of the piece of music that SUNO or whatever other application is to create. They have not the slightest idea beforehand of how this will sound. That changes as soon as there is an interactive process between human and machine. We saw this, among others, with Holly Hendorn.  She used software she had partly designed herself for this purpose. Her productions most closely resemble a co-production between human and machine and, thanks to her craftsmanship and consistent artistic achievements, have the potential to be called art.

I asked ChatGPT to draw up a step-by-step plan for this kind of co-production. You can download it here

You can now watch and listen to Johnny Keeley, a performing musician and lyricist. At one point, he decided to have one of his lyrics set to music using SUNO.  In the end, he fell for one of the versions created and used that for his own performance. 

Whether Telisha Jones is within the grey area is, in my view, questionable.  In the case of Johnny Keeley, I would answer this question in the affirmative. 

What does AI-generated music mean for copyright?

According to the European Union Intellectual Property Office, copyright protection requires an original work that reflects the creator’s personality. The US Copyright Office also states that it will not grant copyright to “works that lack human authorship.” 

Streaming services such as Spotify and Deezer intend to label music created using AI in the future. Deezer is already excluding such tracks from the playlists it compiles. 

The question is whether such a a black-and-white judgement is desirable. Consider Johnny Keeley. To be able to make a reliable and fair judgement on the degree of authenticity of songs, it is necessary to know the extent of AI’s role in its creation. This is for the benefit of bodies that must assess copyright claims, but also for listeners in general. For them, a star-rating system might be useful, possibly for lyrics, music and performance separately.  

As such a system is lacking, selling music created with a substantial contribution from AI remains a risky business. It can result in claims of infringement of copyright, the right of publicity and misleading representation.  Moreover, there are already several ongoing legal cases involving artists who believe that producers such as SUNO have wrongfully used their copyright-protected work as training material for computers. Following one such case, SUNO reached a settlement of 500 million (!) dollars with the Warner Music Group, thereby both purchasing the rights of the authors and indemnifying SUNO users against legal action.

What are the potential consequences of the rise of AI-generated music for the music industry?

The rise of AI-generated music will have consequences for the music industry, music producers and musicians. Many users of streaming services or listeners to radio stations will judge music based on how it sounds rather than on how it was created. The music industry will therefore, due to the low costs involved, promote music largely produced by AI. I don’t see that as a problem, if they provide clear information about it and there are dedicated playlists and charts for this type of music.

Fortunately, there will still be people who want to hear and see their idols. The Taylor Swifts, Bruce Springsteens and Coldplays of this world will continue to fill stadiums. Singer-songwriters will retain a devoted audience too, and the same applies to everyone in between, depending on their authenticity and quality. 

However, a lot will change in the back-office.  Songs will increasingly be co-productions of human and machine, in terms of lyrics, melody and arrangement. This will increase production efficiency and may also lead to better quality. As for albums and streamed music, you will often hear an artist’s voice whilst all the arrangements and accompaniment are produced using AI.

If you want to build a career in music?

Go for it. Set yourself apart through your craftsmanship as a songwriter, singer or musician – preferably all three. Show how good you are during live performances. Secondly, familiarise yourself with AI and be open about how you used it. Remember that “The added value lies not in what AI creates, but in how humans use it”, a quote taken verbatim from Chat GPT.

To write this article, I drew on the rapidly growing body of publications in Medium, The Riff, Music for Thought, The Guardian and the Dutch, English and French versions of Wikipedia. Chat GPT helped me to gather and compare different forms of music generation using AI.

Leave a comment