Why speech recognition isn't a technological failure

Writer Robert Fortner doled out harsh criticisms of speech recognition yesterday, and today his faulty logic and false assertions are reverberating across the blogosphere. He argues that speech recognition "flat-lined in 2001," falling far short of human levels.

I have to respectfully and completely disagree. Using Nuance NaturallySpeaking 10.1 software on a fast PC, I achieve accuracy that's nothing short of astounding. Unless I'm drunk, it nails speech accuracy at a level of at least 98%, which happens to be the recognition rate of the human ear. And, since 2001, it's been improved by many orders of magnitude.

I must admit I've been working with NaturallySpeaking for the better part of a decade, and it's intricately familiar with the technology I write about and the words I use. It allows me to dictate thousands of words of text in the time it would've taken me to type hundreds. For Fortner to say it doesn't work and it's a failure is completely misinformed.

I did a quick test of some of the words he said sound so much alike that speech recognition can't handle them. Here's his flagship sentence:

Saying "recognize speech" makes a sound that can be indistinguishable from "wreck a nice beach."

I dictated that sentence into NaturallySpeaking and didn't correct anything. In fact, I dictated this entire post into NaturallySpeaking, and didn't correct a single thing. It even spelled Fortner's name right, and capitalized it. So there. 100% accuracy.

So as far as I'm concerned, his idea that speech recognition is dead can itself rest in peace. And yes, I did dictate that, and it didn't type "rest in peas." But it did substitute the word "bad" for "dead" and oops, just then it misunderstood the word "for," typing "of" — so that's about 99% accurate. Close enough.

Read his post, and keep in mind what I've just dictated to you:

Via Posterous (Robert Fortner)