How Voice Cloning is Transforming Podcasting

By: Arto Yeritsyan, CEO of Podcastle.ai

The potential of voice duplication and synthetic voices is huge for podcast creators, offering efficiency and lowering the barriers to production and distribution.

Podcasting has come a long way since the days of right-clicking and saving an mp3 file to your desktop. Big tech companies have made podcasting a major part of their content play and video podcasting has been a huge step forward, with YouTube and Spotify getting in on the act.

The other big leap in podcasting is the arrival of generative AI technology that allows creators to produce audio content without saying a word. Synthetic voices allow for text-to-speech functionality that creates a life-like audio recording that sounds just like its creator, as easily as typing into a document.

For podcasters and other creators there are several benefits to using this technology. The first being efficiency. Rather than having to go back and re-record errors or promotional snippets, they can instead use an AI-powered voice to do it more quickly. They could even skip the whole recording process and create voice-overs or introductions entirely with the text-to-speech functionality.

It also means they’re less dependent on having access to recording equipment and a suitable location. Imagine you’re a self-employed content creator who works remotely and needs to quickly put together some recordings, but don’t have your microphone, pop shield, or sound-proof room. No problem, simply log-in to the software and access the synthetic version of your voice to start creating.

It’s clear that this technology has huge potential, but as we’ve seen with other generative AI technologies it can also be abused. So how can we ensure that the technology is being used ethically?

Marketing Technology News:  MarTech Interview with Eran Helft, Director of Product Management at monday.com

How does voice duplication work?

Voice duplication can seem like magic at first, but like many AI systems it involves inputting data, learning from it, and then replicating. Analyzing sound waves can teach the AI about the flow, pitch, and timbre of your voice in an effort to recreate it as naturally as possible.

This process is complex, but doesn’t necessarily require huge amounts of data to be successful. The major challenge is to be able to make it realistic, to capture users’ natural intonation and make it accurate as well as spontaneous.

 The potential problems that come with voice cloning relate to information security and illegal activities like phishing (the act of soliciting information via impersonation). In theory, you could clone the voice of a famous person or of someone you know and use that to gain access to financial accounts or to deceive them in another way.

This is why it’s essential that companies developing this technology consider the risks, just as they would with any other powerful technology. Taking Podcastle as an example, users who wish to clone their voice are required to perform a live reading of a script of around 70 specific phrases. This process can take up to 30 minutes and must be done by the person who wishes to have their voice duplicated.

These 70 recordings are then manually checked to ensure accuracy of a single voice, and then the recordings are processed through an AI model. It’s therefore not possible to simply upload audio from a TV show or interview or a short clip of someone speaking and create a copy of their voice. Technologies that promise this can often seem convincing at first, but become fairly easy to spot as fake on closer inspection.

The future of AI-powered podcasting

AI-powered podcasting represents an incredibly productive step forward for podcast creators, allowing them to focus on what they do best and reduce the pain involved in the creative process. Just like audio recording and editing software meant no longer having to rely on physical media, AI-powered software means no longer having to rely on manual processes that slow things down.

The potential of voice duplication and synthetic voices is huge for all media, podcasts included, and ensuring that it’s used responsibly should be a key aim for all purveyors of the technology.

Marketing Technology News: Looking to Shake up your User Acquisition marketing?

 

BONUS READ – Jon Miller, CMO at Demandbase chats about the evolution of ABM in this webchat with MarTechSeries:

Brought to you by
For Sales, write to: contact@martechseries.com
Copyright © 2024 MarTech Series. All Rights Reserved.Privacy Policy
To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.