Research on Effective Designs and Evaluation for Speech Interface Systems

This post is excerpt from the draft version of the abstract of doctoral dissertation.
My public hearing for the dissertation will be held this month at Waseda University.
Although the thesis itself is written in Japanese, I am willing to write the related topics here in English.

This paper describes a systematic way of enabling of developers and designers to build information-communication systems successfully with speech technologies, such as speech synthesis and speech recognition. As the results of this work, application systems of speech technologies can be used easily for everyone.

This work also describes four research projects including the development of speech applications and the evaluations of speech interfaces, which are performed based on the proposed methodology.

The first chapter proposes “the principles of interfaces,” consists of (1) basic principles, (2) organization principles, and (3) adoption principles, which are based on the fundamental theories of human-machine interactions.
The most important hypothesis of this work is that these principles are indispensable for accomplishing the projects to investigate on various human interface systems with speech technologies.

The following four chapters show the effectiveness of proposed principles, which comes along with the effectiveness of speech technologies and importance of designs and evaluations of speech interfaces.

One of the application systems proposed here is the S-tgif multimodal drawing system, which uses isolated speech command recognition, mouse and keyboard.
Using speech recognition with the application, average operation time or the number of command inputs can be reduced.

Another proposed system is the AVM asynchronous voice messaging system.
The system displays the voice message as the threaded written words. The user can manipulate voice messages just as if they are text messages. If the participant wants to quote or annotate to a message, the user has only to play the sound and barge into the message while it is playing. Such user interface increases the usefulness of the voice-mail system.

Other two projects are related to the evaluations of speech interface systems.
One of the projects proposed an improvement of the dual-task method to measure the workload of spoken dialog tasks.
In this method, subjects play a game using visual display and keyboard input. We evaluated the effectiveness of our method with a word shadowing task and a spoken dialog application.
Our method can measure relatively small workload differences, such as in word shadowing tasks, which are difficult to measure with the previous works.
Also, our method can identify positions in the dialogs which cause some users significant difficulty.

Another project was related to the ultra fast speech for the computer application of persons with visual disability.
For the evaluation of such speech, consideration of learning effect of the listener is important.
In this research, the learning effects of listening to ultra fast speech with the control of word familiarity were investigated with the considerations of (1) the changes of the familiarity condition during the experiments, and (2) existence or nonexistence of instructions of the familiarity. The experiments to observe the intelligibility and mental workload were performed, using the speech with the speed of approximately 21 mora per second. The results supported the hypothesis that the intelligibility increases and mental workload decreases if the listener is aware of high word familiarity because the access to mental lexicon is promoted.

This thesis also shows the various research projects related to the interface principles, which give the perspectives of human-machine interactions and applications of speech technology.

The most important contribution of this work is discussed in last chapter, which is the separation of general human interface principles and utilization of speech technologies.
The well-organized principles naturally give the guidelines for designing and evaluating interface systems with various applications, modalities and devices. Speech technologies is one of the applications of the principles.

Published by nishimotz