Latest AI Voice Model Considers Context, Not Just Sounds
WellSaid Labs, the leading AI text-to-speech technology company, has invented the most natural speech markup language available to content creators. WellSaid Labs’ entirely new respelling system allows a content creator the ability to give precise instructions to the AI, delivering more control over word pronunciation and desired emphasis. With this more intuitive AI able to capture the natural human performances in a voice actors’ delivery, the AI can now more freely predict how the actual voice actor would have read such content, delivering companies and content creators huge time savings.
Marketing Technology News: WISeKey Strengthens its Technology Portfolio Across Cybersecurity, IoT, NFT and the Metaverse
Improved pronunciation, intonation, and user controls
Up until now, the Text to Speech (TTS) industry has only relied on a phonetic layer dictating how to pronounce words. However, voice-actors don’t read phonemes, they read graphemes, and now so does the WellSaid Labs model as well as having a pronunciation layer. Having only phonetic transcription can limit a model’s breadth of knowledge and therefore limit its ability to predict the pronunciation and delivery of new and unique words. Also it is difficult to empower users with a consistent system for guiding a voice avatar to pronounce words according to the user’s preferences, such as with correct vowel sounds and syllabic emphasis. WellSaid Labs has made an enormous breakthrough in overcoming these limitations.
“Customer feedback on using our new voice model is incredible,” says Rhyan Johnson, WellSaid Labs Senior Voice Data Engineer. “Using our new respelling system, content creators love the fact that words are being pronounced the way they choose, with the right intonation or regional preference to meet their brand’s voice identity. You say tomato, I say tomahto. And, so do the WellSaid Labs’ voice avatars.”
Marketing Technology News: Capitalizing on the Chaos in the Advertising and Marketing Hiring Market
New model focuses on improving the voice avatar’s correctness
“More words are pronounced correctly, more often. Sentence intonation is generally more natural, including questions, which are tough for other systems. We’ve also created our own text verbalization model to empower the AI to be smarter with non-standard words such as a dollar amount, a year, or a phone number. And it also does better with specialized text and speaking URLs, acronyms, or abbreviations,” explained Johnson.
WellSaid Labs’ Voice Avatars all come from real voice actors. Content creators now have even greater ability to ensure pronunciation and tone are exactly what they want whether narration, promotional, conversational or for a unique custom character. Users can now type $30M, or the year 2022, and the system should interpret the text correctly as “thirty million dollars” or “twenty twenty two,” instead of “dollar thirty M” or “two thousand and twenty two”, for example. Other text verbalization support includes:
- Ordinals – 1st, 2nd, 10th, 30th
- Times – 10:34 am, 10PM
- Phone Numbers – (890) 345-1234, 1-888-CALL-NOW
- Number Ranges – 1-3 as ‘one to three’
- Percent, Number Signs – 12.3% as ‘twelve point three percent’, #1, #2 as ‘number 1, number 2’
- URLs – wellsaidlabs.com, https://www.myurl.com/reference, firstname.lastname@example.org
- Acronyms – NASA, OPEC
- Initialisms – ESPN, NSA
- Abbreviations –
- Measurements (1 ft., 2 in., 4 oz.)
- Titles (Mr. as ‘Mister’, ‘Sr.’ as ‘Senior’)
- Others (St. for street, Apt.` for apartment, etc. as etcetera)
- Generic Numbers and Symbols
WellSaid Labs powers the synthetic media industry along with more than 7,000 customers across dozens of industries and offers a faster, more accurate way to turn words into voice. Customers value WellSaid Labs’ incredibly high level of realistic human-voice capability and rely on WellSaid Labs’ business mission critical enterprise infrastructure. And now they will have an even easier experience using WellSaid Labs’ voice avatars.
WellSaid Labs’ Voice Avatar library provides access to 50 AI voices companies can use for their productions. Many WellSaid Labs customers also choose to create their own AI Voice Avatars to spec — capturing the likeliness, style, and uniqueness of the voice needed to tell their stories in exactly the right way.
Marketing Technology News: MarTech Interview With Erez Nahom, CEO and Co-founder, Konnecto