Home
About NRCC
Research Projects
Publications
Image Gallery
Blog
Overview: SimoneSimone - Spoken Interaction for Mobile Networked
Ecosystems - The cellular phone is rapidly transforming itself from a
mobile telecommunication device into a multi-faceted information
manager that supports both spoken communication among people as well
as the manipulation of an increasingly diverse set of data types that
are stored both locally and remotely. Although the essence of the
device has been to support spoken human-human communication, speech
and language technology will also prove to be vital enablers for
graceful human-device interaction, and for effective audio-visual
content creation and access.
The need for better
human-device interaction is clear. As personal devices continue to
shrink in size yet expand their capabilities, the conventional GUI
model will become increasingly cumbersome to use. A voice-based
interface will work seamlessly with small devices, and will allow
users to more easily invoke local applications or access remote
information.
Information devices of the future will
operate as an audio-visual recorder for a variety of personal or
business uses. To be more effective, the data could be annotated with
additional information to allow them to be indexed, searched,
summarized, or even translated. Some of the annotations could be
meta-level descriptions such as the sequence of events that occurred
during the recording (e.g., who was speaking? when? where? what was
the structure of the event?). Other annotations could involve more
detailed transcriptions of what was said by different speakers.
This project proposes to develop spoken language and
multimodal technology in support of a new generation of mobile user
interfaces, and content processing for mobile information devices. We
propose to perform research and collaborate with other Nokia-MIT
researchers to enable natural, spoken interaction to control device
applications, and to develop methods for annotating content that has
been recorded by an individual with their mobile information
device. Since we believe the technology should ultimately operate in
the user's language of preference, we propose to focus on English and Mandarin-based processing to demonstrate the viability of multilingual interaction.
In this proposal we suggest two areas where spoken language technology can potentially benefit Nokia: 1) for simplifying the current user interface, especially in providing assistance to the end user, and 2) for annotating and retrieving audio-visual content. We propose to initially explore these topics in English, and then develop Mandarin capabilities in both areas.
|