ebook img

Auditory User Interfaces: Toward the Speaking Computer PDF

155 Pages·1997·8.844 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Auditory User Interfaces: Toward the Speaking Computer

AUDITORY USER INTERFACES Toward the Speaking Computer AUDITORY USER INTERFACES Toward the Speaking Computer by T. V. Raman Adobe Systems Incorporated SPRINGER-SCIENCE+BUSINESS MEDIA, LLC ISBN 978-1-4613-7855-6 ISBN 978-1-4615-6225-2 (eBook) DOI 10.1007/978-1-4615-6225-2 Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress. Copyright © 1997 Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 1997 Softcover reprint ofthe hardcover Ist edition 1997 AII rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photo copying, recording, or otherwise, without the prior written permis sion of the publisher, Springer-Science+Business Media, LLC. Printed on acid-free pap er. I To my guiding eyes~ ASTER i Contents List of Figures ix List of Tables xiii Foreword xv Preface xix Acknowledgements xxi 1. SPEECH-ENABLED APPLICATIONS 1.1 Introduction 1 1.2 What Is UI? 2 1.3 Alternative Modes of Interaction 4 1.4 Retrofitting Spoken Interaction 5 1.5 The Speech-enabling Approach 7 1.6 Separating Computation From User Interface 9 2. NUTS AND BOLTS OF AUDITORY INTERFACES 11 2.1 Introduction 11 2.2 Speech Synthesis 12 2.3 Speech Recognition 18 2.4 Digital Signal Processing Techniques 23 2.5 Auditory Displays And Audio Formatting 28 2.6 Interactive User Interface Development 32 3. THE AUDIO DESKTOP 39 3.1 Introduction 39 3.2 The Visual Desktop 40 3.3 Conversational Gestures 44 3.4 Choosing Abstractions For The Audio Desktop 49 4. CONCRETE IMPLEMENTATION OF AN AUDIO DESKTOP 55 4.1 Introduction 55 4.2 Basic Services For Speech-enabling The Desktop 56 4.3 The Emacspeak Desktop 64 vii viii AUDITORY USER INTERFACES (AUI) 4.4 Speech-enabled Editing Tools 72 4.5 Structured Editing And Templates 77 4.6 Browsing Structured Information 86 4.7 Information Management On The Audio Desktop 93 4.8 Speech-enabled Messaging Tools 95 4.9 Editing Program Source 102 4.10 Software Development Environment 107 4.11 Technique Used To Speech-enable Emacs 113 4.12 Thanking The Emacs Community 118 5. SPEECH-ENABLING THE WWW 121 5.1 Introduction 121 5.2 Aural Information Access 122 5.3 Web Surfing Without A Monitor 125 Bibliography 133 Index 137 List of Figures 1.1 Computing applications typically consist of obtaining user input, com- puting on this information and finally displaying the results. The first and third phase in this process constitute the user interface. As can be seen, it is possible to separate the user interface from the computational phase. 3 1.2 Calendars are displayed visually using a two dimensional layout that makes it easy to see the underlying structure. The calendar display consists of a set of characters on the screen; but the meaning of this display is as much in its visual layout as in the characters themselves. Merely speaking the text fails to convey meaning. We can see that January 1, 2000 is a Saturday; this information is missing when the visual display is spoken. 7 2.1 Sub-components of recorded prompts used by an IVR system at a bank. Different prompts can be generated by concatenating appropriate components. 14 2.2 Phonemes in American English. The various vowels and consonants making up standard American English are shown using a two-letter notation. Each phoneme is shown along with a word containing that phoneme. 15 2.3 Textual description of a nested exponent. Notice that when reading the prose making up this description, it is very difficult to perceive the underlying structure of the mathematical expression. 18 2.4 A call management system using word spotting. Users can express the same command in several ways. The recognition system looks for key phrases that determine the user command, thereby allowing for a flexible system. 20 ix x AUDITORY USER INTERFACES (AU!) 2.5 Coarticulatory effects in continuous speech. Co articulatory effects (or the lack there of) are often a problem when trying to synthesize natural sounding speech. Not surprisingly, the presence of these same effects in human speech make the computer's task of recognizing continuous speech even harder. 21 2.6 Using spatial audio to encode information about incoming email. Au ditory cues indicate the arrival of new mail. These auditory cues encode additional information such as urgency of the message using spatial audio. 26 3.1 Visual realization of conversational gestures -the building blocks for dialogues. User interface design tries to bridge the impedance mis match in man-machine communication by inventing a basic set of con versational gestures that can be effectively generated and interpreted by both man and machine. 45 4.1 The Emacspeak desktop consists of a set of active buffer objects. This display shows a subset of currently active buffers on my desktop. 66 4.2 A sample directory listing. The visual interface exploits vertical align- ment to implicitly encode the meaning of each field in the listing. 69 4.3 A listing of running processes. The task manager helps in tracking system resources. Processes can be killed or suspended from the task manager. 72 4.4 Commands available while searching. A set of highly context-specific conversational gestures. 75 4.5 Outline view of this section. It can be used to move quickly to different logical components of the document. 78 4.6 Result of folding the lexical analyzer in J\l)T~. This is a document consisting of over 2, 000 lines. Folding helps in organizing the code, obtaining quick overviews, as well as in efficient navigation. 80 4.7 Sample collection of dynamic macros available when editing C source. Standard C constructs can be generated with a few gestures. 82 4.8 A sample C program. It can be created with a few gestures when using dynamic macros. 83 4.9 A sample HTML page. Template-based authoring makes creating such documents easy. 84 4.10 Visual display of a structured data record. The data record is visually formatted to display each field name along with its value. 85 4.11 An expense report. Semantics of the various fields in each record is implicitly encoded in the visual layout. 86 4.12 Tracking an investment portfolio. Modifying entries can cause com- plex changes to the rest of the document. 87 LIST OF FIGURES Xl 4.13 A train schedule. We typically look for the information we want, rather than reading the entire timetable. 91 4.14 Commands in table browsing mode. The interface enables the user to locate the desired item of information without having to read the entire table. 92 4.15 A well-formatted display of the message headers presents a succinct overview of an email message in the visual interface. Speaking this visual display does not produce a pleasant spoken interface -the spoken summary needs to be composed directly from the underlying information making up the visual display. 98 4.16 Newsgroups with unread articles are displayed in a *Group* buffer. This buffer provides special commands for operating on newsgroups. The visual interface shows the name of the group preceded by the number of unread articles. 100 4.17 Unread articles are displayed in buffer *Group Summary*. This buffer is augmented with special commands for reading and responding to news postings. The visually formatted output succinctly conveys arti- cle attributes such as author and subject. 101 4.18 More than one opening delimiter can appear on a line. When typing the closing delimiter, Emacspeak speaks the line containing the matching delimiter. The spoken feedback is designed to accurately indicate which of the several open delimiters is being matched. 106 4.19 An example of comparing different versions of a file. Visual layout exploits changes in fonts to set apart the two versions. The reader's attention is drawn to specific differences by visual highlighting -here, specific differences are shown in a bold font. Visual interaction relies on the eye's ability to quickly navigate a two dimensional display. Directly speaking such displays is both tedious and unproductive. 110 4.20 Browsing the Java Development Kit (JDK 1.1) using a rich visual interface. Understanding large object oriented systems requires rich browsing tools. Emacspeak speech-enables a powerful object oriented browser to provide a pleasant software development environment. 114 4.21 Emacspeak is implemented as a series of modular layers. Low-level layers provide device-specific interfaces. Core services are imple mented on a device-independent layer. Application-specific extensions rely on these core services. 115 4.22 Advice is a powerful technique for extending functionality of pre existing functions without modifying their source code. Here, we show the calling sequence for a function f that has before, around, and after advice defined. 117 xii AUDITORY USER INTERFACES (AUI) 4.23 Example of advising a built-in Emacs command to speak. Here, com- mand next-line is speech-enabled via an after advice that causes the current line to be spoken after every user invocation of this command. 117 5.1 HTML pages on the WWW of the 1990's abound in presentational markup. What does red text on a monochrome display mean? What does it mean to (er) blink aurally? 123 5.2 A sample aural style sheet fragment for producing audio formatted Webformation. Audio formatting conveys document structure implic- itly in the aural rendering, allowing the listener to focus on the infor- mation content. 125 5.3 The HTML 3.2 specification fails to separate the underlying conversa- tional gesture from its visual realization even more dramatically than GU! toolkits. In this example, it is impossible to decipher from the markup that the current dialogue expects the user to enter a name and age -in HTML 3.2, there is no association between an edit field and its label. 129 5.4 The AltaV ista main page. This page presents a search dialogue using a visual interface. Emacspeak presents a speech-enabled version of this dialogue that is derived from the underlying HTML. 130

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.