rudimentary level. My I-Cybie robot dog has a microphone in each plastic ear to triangulate the source of a sound.When I clap my hands, the dog turns its head toward me. If I clap in a certain sequence or say one of a small vocabulary of command words, it does a trick, like any well-trained natural dog.Also like a real dog, it learns to respond to a name and it speaks dog language—barking to show that it under- stands a given command and whimpering when it is not getting enough attention.
Another level of speech interaction is found in computer dicta- tion programs, where what you say into a microphone is turned into written words on the screen.To get a true sense of machine conversa- tion, though, pick up the telephone and dial airline reservations or your bank. There is a good chance you’ll hear a synthesized voice welcome you, and ask what you need. You respond verbally, and a dialogue ensues. The conversation might well have its moments of frustration when you and the machine misunderstand each other.Still, according to Julia Hirschberg, a computational linguist at Columbia University, such conversations represent significant progress since the late 1980s. Computers are now fast enough to hear and respond in real time, and although the process is not perfect, Hirschberg notes that “Speech recognition and understanding is ‘good enough’ for lim- ited, goal-directed interactions.” (Italics in the original.)
To be judged good enough or better, a machine must pass three tests: It must recognize the words you say, regardless of accent and personal speaking style, it must generate words that you recognize without machinelike overtones, and it must give sensible responses to your conversation. This last requirement is basically the Turing test, only with speech instead of written messages. If the machine con- verses so well on any conceivable subject that it cannot be distin- guished from a person, it passes Turing’s criterion for artificial intelligence. Even in limited conversations, however, the computer must be able to recognize words spoken by people, and to form its own words.
Speech recognition systems work by matching what a person says against a corpus; that is, a dataset of natural speech stored in the com-