By leaps and bounds, voice-recognition software has improved. But is it good enough for the legal field?
Odds are, you’ve probably noticed your iPhone isn’t half bad at taking verbal orders. Despite the occasional gaffe, Siri dictates messages and notes fairly well, and your commands appear on the screen with relative accuracy.
But just like Siri’s dictation skills, the voice-recognition software for transcribing audio files into written, digital documents is far from perfect. And while the results may satisfy some fields, law is certainly not one of them.
Why Automated Transcription Falls Short for Law
Inaccurate legal transcription can be costly. At worse, it can be damaging to a law firm’s reputation. Attorneys cannot afford to base their cases on misunderstandings and software typos, big or small. And this is where the human transcriptionist takes the lead with a not-so-secret weapon. Thousands of years in the making, the human brain is designed to transcribe accurately in ways our machine counterparts, well, can’t. Here are some scenarios.
Software struggles with accents and regional dialects.
Although Americans across the country pronounce words differently, we all have a remarkable ability to understand their meaning: Idaho children draw with “cray-awns” while their Rhode Island peers use “crans.” Californians like sandwiches with “may-uh-naze” while Georgians call it “man-aze.”
People are often amused at differences in speech but rarely do they become confused by meaning. The same, however, cannot be said for software, which is easily thrown off by unfamiliar inflections and emphases.
Software often misses punctuation.
Speech-recognition technology often struggles to differentiate between commas, colons and other punctuation. At best, this is sloppy transcription. At worst, it can fundamentally alter the meaning of what was said.
The man, who seemed nervous, was most likely the suspect.
The man who seemed nervous was most likely the suspect.
The difference is slight but notice the effect. In the first sentence, it is clear the subject is a single person. In the second, perhaps there were multiple men and the most nervous was the suspect? Missed punctuation is a clarity killer.
Mumbling and indistinct speech is hard for AI to detect.
Humans are experts at deciphering indistinct speech. Particularly during police interviews and depositions, when individuals are likely to be quite nervous, fast-talking and stumbling over words is common. Software struggles to detect and accurately represent this indistinct speech, especially when the subject is whispering or speaking softly.
Machines cannot distinguish between homophones.
Homophones are words that sound the same with different meanings: write vs. right, accept vs. except, by vs. buy. Using the wrong homophone grammatically often has a humorous effect; in a legal situation, the effect can be disastrous. A diamond-heist thief swipes a carat — not a carrot.
Poor recording quality is often a reality with transcription. But ambient noises and background chatter are not something software is great at sifting through. If a human listens carefully, however, these challenges are overcome with relative ease.
Software struggles to differentiate multiple speakers.
Perhaps speech-recognition software’s biggest downside for legal transcription is detecting multiple speakers. Humans can readily distinguish between speakers in a group even if they sound similar. But a machine, given an audio file, will struggle to pick between two people (and the more speakers you add, the harder it becomes).
Is voice-to-speech software right for you?
If your field is law, medicine or another technical trade, the answer is a resounding no. Automated transcription risks compromising essential documents that attorneys use to build cases, outweighing any convenience they might provide.