ABSTRACT
In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.
- AR-toolkit. http://www.hitl.washington.edu/artoolkit/Google Scholar
- julian. http://julius.sourceforge.jpGoogle Scholar
- PowerPoint. http://www.microsoft.com/office/powerpoint/prodinfo/Google Scholar
- TalkMan. http://www.jp.playstation.com/scej/title/talkman/Google Scholar
- Shibaimichi. http://www.jp.playstation.com/scej/title/shibaimichi/index.htmlGoogle Scholar
- Heer et al. Presiding Over Accidents: System Mediation of Human Action. In CHI'04, pp.463--470, 2004. Google ScholarDigital Library
- Hindus et al. Ubiquitous Audio: Capturing Spontaneous Collaboration. In CSCW'02, pp.210--217, 1992. Google ScholarDigital Library
- Kurihara et al. Speech Pen: Predictive Handwriting based on Ambient Multimodal Recognition. In CHI'06, pp. 851--860, 2006. Google ScholarDigital Library
- Lyons et al. Augmenting Conversations Using Dual-Purpose Speech. In UIST'02, pp. 237--246, 2004. Google ScholarDigital Library
- A. Mehrabian. Silent messages, Implicit Communication of Emotions and Attitudes. 2nd Ed., Wadsworth Pub. Co., 1981.Google Scholar
- S. Kori. The Acoustic Characteristics of Styles Seen in Announcements and Narrations. In 16th Conference of Acoustic Society Japan, pp.151--156, 2002, in Japanese.Google Scholar
- Y. Matsusaka. 2D Omni Directional Head and Head-Parts Tracking Technique Using Subspace Method and SVM. IEICE Technical Report PRMU, Vol.106, no.72, pp.19--24, 2006, in Japanese.Google Scholar
- Itou et al. A Japanese Spontaneous Speech Corpus Collected using Automatically Inferencing Wizard of OZ System. J. Acoust. Soc. Jpn. (E), Vol. 20, No. 3, 1999.Google Scholar
- {anonimized}Google Scholar
- Ikari et al. English CALL System with Functions of Speech Segmentation and Pronunciation Evaluation Using Speech Recognition Technology. In ICSLP'2002, pp.1229--1232, 2002.Google Scholar
- Goto et al. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In Eurospeech '99, pp.227--230, 1999.Google Scholar
- Goto et al. Speech Completion: New Speech Interface with On-demand Completion Assistance, In HCI International 2001, Vol. 1, pp.198--202, 2001.Google Scholar
- Kitayama et al. "SWITCH" on Speech. IPSJ SIG Technical Report, SLP-46-12, Vol.2003, No.58, pp.67--72, 2003, in Japanese.Google Scholar
- I. Takeuchi. More Than 90% is Judged by Your Look. Shincho-sha Pub. Co., ISBN: 4106101378, 2005 in Japanese.Google Scholar
- H. Yahata. Perfect Presentation. Seisansei Shuppan Pub. Co., ISBN: 4820115634, 1998 in Japanese.Google Scholar
- Oviatt et al., Individual Differences in Multimodal Integration Patterns: What Are They And Why Do They Exist?. In CHI'05, pp.241--249, 2005. Google ScholarDigital Library
- Rosenberg et al. Acoustic/Prosodic Correlates of Charismatic Speech. In Eurospeech'05, pp.513--516, 2005.Google Scholar
Index Terms
- Presentation sensei: a presentation training system using speech and image processing
Recommendations
Prominence detection for presentation training system
SoICT '16: Proceedings of the 7th Symposium on Information and Communication TechnologyWe propose a method for detecting prominences in a Japanese presentation. A prominence is unclearly defined in Japanese, because it is covered only in phonetics and Japanese language education. Thus we describe the literature of phonetics and Japanese ...
MFCC-GMM based accent recognition system for Telugu speech signals
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Improving the intelligibility of dysarthric speech
Dysarthria is a speech motor disorder usually resulting in a substantive decrease in speech intelligibility by the general population. In this study, we have significantly improved the intelligibility of dysarthric vowels of one speaker from 48% to 54%, ...
Comments