AI for Voice Transcription: Is It Here to Last?

AI for Voice Transcription: Is It Here to Last?

AI is one of the driving forces behind what The World Economic Forum called “The Fourth Industrial Revolution”. Developments in this area are expected to help us further automate our workflows and simplify our daily tasks, making everything from our food production chains to management and even medical procedures, far more effective and agile. And, according to PwC, AI is expected to add up to 15.6 trillion dollars to the world economy by 2030.

AI is getting smarter faster than ever, with established players, such as Google or Amazon developing and integrating AI into their products and operations, and a generation of startups from all around the globe, developing and offering AI-based tools.

One of the main areas that AI is starting to be used in, is transcription services. Companies from all around the globe are training neural networks to recognize and transcribe speech, and these transcriptions can be up to 95% accurate. Of course, this percentage refers to occasional circumstances: A high-fidelity recording of slow speakers perfectly pronouncing common vocabulary.

It’s also worth noting that most AI-powered transcription tools are highly accessible, with the most expensive, high-accuracy packages available being priced between in the hundreds of dollars. But low-tier options are available either for free or charging very low, per-minute fees.

Read More: AI Takes Center Stage at Web Summit 2019

So, if it’s accurate and affordable — Should we be using AI for our transcription needs? 

Natural Language Processing is one of the hardest areas of Artificial Intelligence, and further development is needed for machines to be able to transcript (and even translate) material with the same accuracy, sensitivity to nuance and cultural awareness as a highly-trained human.

In a 2016 Wired Interview, Gerald Friedland, Director of UC Berkley’s Audio and Multimedia lab, said that “ depending who you ask, speech recognition is either solved or impossible… The truth is somewhere in between.”

Considering that the accuracy of AI transcriptions is increasing but far from what you’d get with a human transcriptionist, relying on these services for fast transcriptions that you don’t have to worry about or revise is pretty much a dream. It’s bound to be possible in the near future, but we’re not there yet. Human transcriptions or at least a “hybrid model” (50% automated, 50% human-powered) is recommendable, especially, if the material in question is sensitive or if it includes jargon that can be easily mishandled.

As noted in a recent Sydney Morning Herald article, while the best of these products might be able to perform equally regardless of the speakers’ accents, automated transcriptions tend to be full of little errors. The main competitive advantages of AI transcriptions, at the time, are speed and price. But this is made null if the product, the transcription doesn’t meet our standards. The initial transcription might be cheap (or even free), but we should foresee that it will likely require time and effort to correct, and these corrections might require that we return to the source material. It’s upon us to decide if it’s really worth it, or if we should delegate the entire process.

In the language industry, as in all industries, the greatest competitive advantage one can have is the proper combination of cutting-edge technology and excelling human resources. Some language services providers are currently using AI to do the heavy lifting and counting on humans to review the transcriptions and finetune them. In the end, you can get both fast and highly-accurate, human-powered transcriptions, just by hiring an AI-assisted company or professional.

Read More: Digital Transformation: Move Fast, Get Technology Out of Your Way

This “hybrid model” of 50% automated, 50% human-powered transcription services promise a minimum accuracy rate of 99%, with 10 errors per 1000 words. And that’s for very low-quality recordings.

Human transcriptionists can adjust their hearing, replay complicated passages, and fill gaps in their understanding of the recording through context. Human transcriptionists are also better than machines at recognizing homophones, words that are written differently and have different meanings, but that sounds exactly the same.

Having a flawless transcription of the material at hand is especially important if the transcription isn’t the final product, but a necessary middle step between the source material and a translation. This revision and editing process prevents carrying mistakes from the AI’s occasionally rough or erroneous transcription into the final product.

While AI-powered transcription tools will develop further during the next couple of years, and advancements are surely promising (the 95% accuracy rate in high-quality recordings is a far cry from the average 15% accuracy that we had a couple of years ago), for the time being, these tools alone “don’t cut it”.

In conclusion, if we need transcription services for our company, -perhaps we shouldn’t be purchasing software ourselves and delegating transcriptions to it completely. Instead, we should be hiring a company that uses AI to guarantee fast delivery, and humans, to guarantee accuracy.

Read More: On Alexa’s 5th Birthday, What Does Voice Need to Do If It Wants to Grow Up

Picture of Sean Patrick Hopwood

Sean Patrick Hopwood

Sean Patrick Hopwood is the President of Day Translations, a global translation company.

You Might Also Like