THE FIVE SENSES, AND BEYOND
wash at 95 degrees, then spin at 1,400 revolutions and start in half an hour.” It currently responds to a few hundred German words, but is expected to be able to eventually handle several thousand, and in other languages as well.
Another example that shows how conversational machines func- tion in practice is a telephone-based system for booking train travel, used since 2001 by Amtrak, the U.S. passenger rail system. Dial the Amtrak number, and a pleasantly crisp female voice says “Hi.This is Amtrak.I’m Julie.”Speaking in the first person and using casual speech such as “Here goes” and “No problem,” Julie offers schedules, ticket reservations, and train status. At each juncture where the caller must make a choice, the questions are crafted so that a yes or no will do, or Julie announces the words the customer can use and be understood, such as “Book that one” or “Change itinerary.”
Within the constraint of a limited vocabulary, Julie does well in recognizing words and responding suitably, as I found when I decided to test Julie by making a reservation. In several conversations, it never missed “New Orleans,” which has a variety of pronunciations. It mis- understood only when I departed from the list of approved words, and once when it interpreted my “19” as “90”—an understandable error that humans make too—and the system let me correct the error with little fuss. Surveys show that customers are substantially happier with Julie than with the touch-tone method Amtrak used previ- ously—but the same surveys also show that many customers still hang up before completing the reservation process. Certainly no one yet has full confidence in Julie, competent as it sounds; the caller can always say “agent” to get connected to a human.
Other voice-based systems include a mock air-travel planning service based at Carnegie Mellon University that was designed as a test bed for the DARPA Communicator project.This ambitious effort had the goal of developing speech-based interfaces for battlefield use that would “support complex conversational interaction, where both user and the system can initiate interaction, provide information, ask for clarification, signal nonunderstanding, or interrupt the other par- ticipant.” When you dial the phone number, you are greeted by a