By leaps and bounds, voice-recognition software has improved. But is it good enough for the legal field?
Odds are, you’ve probably noticed your iPhone isn’t half bad at taking verbal orders. Despite the occasional gaffe, Siri dictates messages and notes fairly well, and your commands appear on the screen with relative accuracy.
But just like Siri’s dictation skills, the voice-recognition software for transcribing audio files into written, digital documents is far from perfect. And while the results may satisfy some fields, law is certainly not one of them.
Why Automated Transcription Falls Short for Law
Voice-recognition technology fails the legal industry because attorneys base their cases off the written record of what was said, not what was said most of the time. Facts and accuracy is the currency of law, and for this reason inaccurate legal transcription can be costly. It can even damage a law firm’s reputation.
But accuracy is where human transcriptionists shines, because when it comes to language, our brains are designed to compute in ways our machine counterparts cannot. Here are some examples:
Software struggles with accents and regional dialects.
Americans have a remarkable ability to understand one another despite the many differences in how we pronounce certain words: Idaho children draw with “cray-awns” while their Rhode Island peers use “crans.” And Californians like “may-uh-naze” while Georgians call it “man-aze.”
These differences are often amusing, but rarely do we become confused by them. The same, however, cannot be said for software, which is easily thrown off by unfamiliar inflections and emphases.
Software often misses punctuation.
Speech-recognition technology often struggles to differentiate between commas, colons and other punctuation. At best, this is sloppy transcription. At worst, it can fundamentally alter the meaning of what was said.
The man, who seemed nervous, was most likely the suspect.
The man who seemed nervous was most likely the suspect.
The difference is slight but notice the effect. In the first sentence, it is clear the subject is a single person. In the second, perhaps there were multiple men and the most nervous was the suspect? Missed punctuation is a clarity killer.
Mumbling and indistinct speech is hard for AI to detect.
Humans are experts at deciphering indistinct speech. Particularly during police interviews and depositions, when individuals are likely to be quite nervous, fast-talking and stumbling over words is common. Software struggles to detect and accurately represent this indistinct speech, especially when the subject is whispering or speaking softly.
Machines cannot distinguish between homophones.
Homophones are words that sound the same with different meanings: write vs. right, accept vs. except, by vs. buy. Using the wrong homophone grammatically often has a humorous effect; in a legal situation, the effect can be disastrous. A diamond-heist thief swipes a carat — not a carrot.
Poor recording quality is often a reality with audio and video. But ambient noises and background chatter are not something software is great at sifting through. If a human listens carefully, however, these challenges are overcome with relative ease.
Software struggles to differentiate multiple speakers.
Perhaps speech-recognition software’s biggest downside for legal transcription is detecting multiple speakers. Humans can readily distinguish between speakers in a group even if they sound similar. But a machine, given an audio file, will struggle to pick between two people (and the more speakers you add, the harder it becomes).
Is voice-to-speech software right for you?
If your field is law, medicine or another technical trade, the answer is likely no. Automated transcription risks compromising essential documents that attorneys use to build cases, outweighing any convenience they might provide.