X hits on this document

41 views

0 shares

0 downloads

0 comments

2 / 15

606

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 8, NOVEMBER 2002

Fig. 1.

One of MiPad’s industrial design templates.

One key feature of MiPad is a general purpose “Command” field to which a user can issue naturally spoken commands such as “Schedule a meeting with Bill tomorrow at two o’clock.” From the user’s perspective, MiPad not only recognizes but understands the command by MiPad executing the necessary actions conveyed in the spoken commands. In response to the above command, MiPad will display a “meeting arrangement” screen with related fields (such as date, time, attendees, etc.) filled appropriately based on the user’s utterance. MiPad fully implements Personal Information Management (PIM) functions including email, calendar, notes, task, and contact list with a hardware prototype based on Compaq’s iPaq PDA (3800 se- ries). All MiPad applications are configured in a client–server architecture as shown in Fig. 2. The client on the left side of Fig. 2 is MiPad powered by Microsoft Windows CE operating system that supports 1) sound capture, 2) front-end acoustic pro- cessing including noise reduction, channel normalization, fea- ture compression, and error protection, 3) GUI processing, and 4) a fault-tolerant communication layer that allows the system to recover gracefully from network connection failures. Specif- ically, to reduce bandwidth requirements, the client compresses the wideband speech parameters down to a maximal 4.8 Kbps bandwidth. Between 1.6 and 4.8 Kbps, we observed virtually no increase in the recognition error on some tasks tested. A wireless local area network (WLAN), which is currently used to simulate a third generation (3G) wireless network, connects MiPad to a host machine (server) where the continuous speech recognition (CSR) and spoken language understanding (SLU) take place. The client takes approximately 450 KB of program space and an additional 200 KB of runtime heap, and merely consumes approximately 35% of CPU load with iPAQ’s 206 MHz Stron- gARM processor. At the server side, as shown on the right side of Fig. 2, MiPad applications communicate with the ASR and

SLU engines for coordinated context-sensitive Tap & Talk in- teraction. Noise robustness processing also takes place at the server since it allows for easy updating.

We now describe the rationale behind MiPad’s architecture. Although customized system software and hardware have been reported [3], [16] to bring extra benefits and flexibility in tailoring applications to mobile environments, the MiPad project utilizes only off-the-shelf hardware and software. Given the rapid improvements in the hardware and system software capabilities, we believe such an approach is a reasonable one. Second, although speaker independent speech recognition has made significant strides during the past two decades, we have deliberately positioned MiPad as a personal device where the user profile can be utilized to enrich applications and comple- ment technological shortcomings. For speech, this means we may use speaker dependent recognition, thereby avoiding the challenges faced by other approaches [21], [24]. In addition to enabling higher recognition accuracy, user specific information can also be stored locally and speaker specific processing can be carried out on the client device itself. This architecture allows us to create user customized applications using generic servers, thereby improving overall scalability.

The rest of the paper will describe details of MiPad with em- phasis on the speech processing and the UI design consider- ations. Various portions of this paper have been presented at several conferences (ICSLP-2000 [1], [18], [28] ICASSP-2001 [5], [19], [25], Eurospeech-2001 [12] and ASRU-2001 Work- shop [6]). The purpose of this paper is to combine these ear- lier presentations on the largely isolated MiPad components into a single coherent paper so as to highlight the important roles of distributed speech processing in the MiPad design, and re- port some more recent research results. The organization of this paper is as follows. In Sections II and III, we describe our re-

Document info
Document views41
Page views41
Page last viewedSun Dec 11 10:08:07 UTC 2016
Pages15
Paragraphs359
Words11730

Comments