At what point will we be able to casually chat with our gadgets like the crew of the USS Enterprise does with its computer on Star Trek, or like Dave Bowman and Frank Poole do in 2001 before HAL went violently bonkers?
We're taking baby steps toward normalized machine-human relations with Apple's Siri, Ford's Sync, the ivee clock radio, Samsung's voice-controlled HDTVs and IBM's "Jeopardy"-champion Watson. Perhaps a further step will be taken by the long-rumored Siri-controlled Apple HDTV later this year.
But we're still a long way from considering colloquies with our appliances as normal as bar codes, Wi-Fi and touchscreens. The question is, just how long of a way? And just how conversational do we want our gadgets to become before paranoiac imaginings of malevolent self-awareness develop?
While Star Trek has always been an inspiration for engineers, a particular acknowledgement should be made to the late Majel Barrett.
Also known as Mrs. Gene Roddenberry, Ms. Barrett played Nurse Chapel on Star Trek: The Original Series and Diana Troi's meddlesome mother Lwaxana in The Next Generation. More importantly, she supplied the voice of most of Star Trek's computers. Her comforting female tones are likely the inspiration behind all female-voiced computer interfaces.
If only current voice control systems spoke our language.
Ivee, The Clock That Listens
You may be more familiar with Ford Sync's voice control and Samsung's TV control than with ivee, but all three represent "dumb" voice control — they can understand your voice without any training, but both require you to speak their specific syntax.
Sync is the more expansive system with thousands of commands, Samsung's TV a little less so, but the ivee FLEX IV2 ($70), a voice-controlled alarm clock, is a microcosm of current "dumb" voice-controlled devices.
Just say "Hello, ivee!" (Sync requires you to push a voice activation key on the steering wheel before issuing commands) and ivee responds to 43 voice commands — set the current time, wake-up times, type of alarm sound, turn on the radio, etc.
Reminiscent of many voice-control systems, ivee doesn't always understand you, and often responds to ambient conversation or even the radio.
Plus, I often had to scurry back to the manual to reference ivee's specific phraseology. I can't imagine the frustration of Ford Sync users who misspeak the exact phrase their car requires.
Can you imagine Scottie or Mr. Spock having to phrase their engineering or scientific questions in precise computer-defined terms?
Siri, Your "Intelligent Assistant"
Siri doesn't suffer from Sync, Samsung's or ivee's semantic stupidity. Its colloquial comprehension skills are unique on a consumer gadget and a logical machine-human interface evolutionary step.
Siri, however, isn't self-sufficient or self-contained. It requires a connection to the Internet to establish a long-distance wireless connection to its brain. As a result, Siri is slooow to perform local iPhone duties such as translating your request to "Call Bob" to actually dialing Bob's phone number, or to play a specific music track.
And like most voice control systems, Siri has problems comprehending in noisy environments.
Limitations aside, Siri's accomplishments and popularity have already inspired a flurry of interest in reviving voice-control.
I say reviving because the whole idea of voice control is nearly as old as computers themselves. Most early attempts involved the system learning your personal vocal inflections — which took an annoying amount of time — before it could understand and act.
One of the earliest attempts was Butler-in-a-Box, a $1,500 whole-home voice-controlled system created in the mid-1980s by magician Gus Searcy.
The last iteration of Butler-in-a-Box got a little closer to natural language interface. While not requiring specific commands, BiB still relied on a limited vocabulary, required extensive voice comprehension training and could respond to only two learned voices.
ivee, Sync, Samsung HDTVs, Siri and Butler-in-a-Box, however, all lack the one key aspect to enable true natural machine-human conversation:
Come Here, Watson, I Need You
iPhone may be a "smart" phone, and Siri may have access to a lot of WolframAlphainfo, but the combination is a kindergartener compared to IBM's Watson. (Full disclosure: my wife does public relations for IBM, but has nothing to do with the Watson work.)
Unlike Siri, Watson is capable of making logical and intuitive connections from several disparate pieces of information, a significant artificial intelligence/natural language breakthrough. In other words, Watson understands.
Just how much Watson understands is illustrated by its new job — providing diagnostic advice to the good human doctors at Memorial Sloan-Kettering Cancer Center. The folks at IBM must be beaming: our son, the doctor!
While clearly brainy and intuitive, Watson likely has awful bedside manner. I'm not aware of any conversational skills Watson may enjoy. Just the facts, ma'am.
Perhaps Watson's programmers will provide Watson with a more conversational Siri-like front end. And, maybe at some point, we consumers will be able to dial in and talk to Watson, sort of like Alexander Graham Bell summoned an earlier Watson 135 years ago.
We need to combine all these capabilities — colloquial comprehension, local control and operational functionality, and intuition — to move closer to normalized device-people interaction.
The Last Question
While we strive for greater carbon-silicon communication, we don't desire silicon sentience unless restricted by some version of Isaac Asimov's ingenious Three Rules of Robotics. Otherwise we could end up with a "Daisy"-singing HAL from 2001, the voyeur Colossus: The Forbin Project or the murderously misanthropic Skynet.
And perhaps voice is only an intermediate interface.
Perhaps the most fascinating computer communication evolution in science fiction is described in Asimov's 1956 short story, "The Last Question."
Over the course of several millennia, a series of humans query an increasingly sophisticated computer about how to end entropy. "The Last Question" is first asked via keyboard of a computer which could be an immediate descendent of Watson, moves to voice interogatory on global- to galaxy-wide systems, and ends...well, read for yourself — with a familiar, natural vocal response.
When I first read "The Last Question" as a 13-year-old it blew my adolescent mind. Seeing the start ivee, Sync, Samsung, Siri and Watson are giving us toward more normalized machine-human interface, it may be positively prescient.