research-article

Presentation sensei: a presentation training system using speech and image processing

Authors:
Kazutaka Kurihara

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
View Profile

,
Masataka Goto

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan
View Profile

,
Jun Ogata

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan
View Profile

,
Yosuke Matsusaka

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan

National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan
View Profile

,
Takeo Igarashi

The University of Tokyo, Tokyo, Japan

The University of Tokyo, Tokyo, Japan
View Profile

ICMI '07: Proceedings of the 9th international conference on Multimodal interfacesNovember 2007Pages 358–365https://doi.org/10.1145/1322192.1322256

Published:12 November 2007Publication History

ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

Pages 358–365

ABSTRACT

In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.

References

AR-toolkit. http://www.hitl.washington.edu/artoolkit/Google Scholar
julian. http://julius.sourceforge.jpGoogle Scholar
PowerPoint. http://www.microsoft.com/office/powerpoint/prodinfo/Google Scholar
TalkMan. http://www.jp.playstation.com/scej/title/talkman/Google Scholar
Shibaimichi. http://www.jp.playstation.com/scej/title/shibaimichi/index.htmlGoogle Scholar
Heer et al. Presiding Over Accidents: System Mediation of Human Action. In CHI'04, pp.463--470, 2004. Google ScholarDigital Library
Hindus et al. Ubiquitous Audio: Capturing Spontaneous Collaboration. In CSCW'02, pp.210--217, 1992. Google ScholarDigital Library
Kurihara et al. Speech Pen: Predictive Handwriting based on Ambient Multimodal Recognition. In CHI'06, pp. 851--860, 2006. Google ScholarDigital Library
Lyons et al. Augmenting Conversations Using Dual-Purpose Speech. In UIST'02, pp. 237--246, 2004. Google ScholarDigital Library
A. Mehrabian. Silent messages, Implicit Communication of Emotions and Attitudes. 2nd Ed., Wadsworth Pub. Co., 1981.Google Scholar
S. Kori. The Acoustic Characteristics of Styles Seen in Announcements and Narrations. In 16th Conference of Acoustic Society Japan, pp.151--156, 2002, in Japanese.Google Scholar
Y. Matsusaka. 2D Omni Directional Head and Head-Parts Tracking Technique Using Subspace Method and SVM. IEICE Technical Report PRMU, Vol.106, no.72, pp.19--24, 2006, in Japanese.Google Scholar
Itou et al. A Japanese Spontaneous Speech Corpus Collected using Automatically Inferencing Wizard of OZ System. J. Acoust. Soc. Jpn. (E), Vol. 20, No. 3, 1999.Google Scholar
{anonimized}Google Scholar
Ikari et al. English CALL System with Functions of Speech Segmentation and Pronunciation Evaluation Using Speech Recognition Technology. In ICSLP'2002, pp.1229--1232, 2002.Google Scholar
Goto et al. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In Eurospeech '99, pp.227--230, 1999.Google Scholar
Goto et al. Speech Completion: New Speech Interface with On-demand Completion Assistance, In HCI International 2001, Vol. 1, pp.198--202, 2001.Google Scholar
Kitayama et al. "SWITCH" on Speech. IPSJ SIG Technical Report, SLP-46-12, Vol.2003, No.58, pp.67--72, 2003, in Japanese.Google Scholar
I. Takeuchi. More Than 90% is Judged by Your Look. Shincho-sha Pub. Co., ISBN: 4106101378, 2005 in Japanese.Google Scholar
H. Yahata. Perfect Presentation. Seisansei Shuppan Pub. Co., ISBN: 4820115634, 1998 in Japanese.Google Scholar
Oviatt et al., Individual Differences in Multimodal Integration Patterns: What Are They And Why Do They Exist?. In CHI'05, pp.241--249, 2005. Google ScholarDigital Library
Rosenberg et al. Acoustic/Prosodic Correlates of Charismatic Speech. In Eurospeech'05, pp.513--516, 2005.Google Scholar

Index Terms

Presentation sensei: a presentation training system using speech and image processing
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Prominence detection for presentation training system
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

We propose a method for detecting prominences in a Japanese presentation. A prominence is unclearly defined in Japanese, because it is covered only in phonetics and Japanese language education. Thus we describe the literature of phonetics and Japanese ...
Read More
MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Read More
Improving the intelligibility of dysarthric speech

Dysarthria is a speech motor disorder usually resulting in a substantive decrease in speech intelligibility by the general population. In this study, we have significantly improved the intelligibility of dysarthric vowels of one speaker from 48% to 54%, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces
November 2007
402 pages
ISBN:9781595938176
DOI:10.1145/1322192
General Chairs:
Kenji Mase
Nagoya University, Japan
,
Dominic Massaro
UC Santa Cruz, USA
,
Program Chairs:
Kazuya Takeda
Nagoya University, Japan
,
Deb Roy
MIT, USA
,
Alexandros Potamianos
Technical University of Crete, Greece
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image processing
presentation
sensei
speech processing
training
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 731
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Presentation sensei: a presentation training system using speech and image processing

ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prominence detection for presentation training system

MFCC-GMM based accent recognition system for Telugu speech signals

Improving the intelligibility of dysarthric speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Presentation sensei: a presentation training system using speech and image processing

ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prominence detection for presentation training system

MFCC-GMM based accent recognition system for Telugu speech signals

Improving the intelligibility of dysarthric speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media