skip to main content
10.1145/1322192.1322256acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Presentation sensei: a presentation training system using speech and image processing

Authors Info & Claims
Published:12 November 2007Publication History

ABSTRACT

In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.

References

  1. AR-toolkit. http://www.hitl.washington.edu/artoolkit/Google ScholarGoogle Scholar
  2. julian. http://julius.sourceforge.jpGoogle ScholarGoogle Scholar
  3. PowerPoint. http://www.microsoft.com/office/powerpoint/prodinfo/Google ScholarGoogle Scholar
  4. TalkMan. http://www.jp.playstation.com/scej/title/talkman/Google ScholarGoogle Scholar
  5. Shibaimichi. http://www.jp.playstation.com/scej/title/shibaimichi/index.htmlGoogle ScholarGoogle Scholar
  6. Heer et al. Presiding Over Accidents: System Mediation of Human Action. In CHI'04, pp.463--470, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hindus et al. Ubiquitous Audio: Capturing Spontaneous Collaboration. In CSCW'02, pp.210--217, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kurihara et al. Speech Pen: Predictive Handwriting based on Ambient Multimodal Recognition. In CHI'06, pp. 851--860, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lyons et al. Augmenting Conversations Using Dual-Purpose Speech. In UIST'02, pp. 237--246, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Mehrabian. Silent messages, Implicit Communication of Emotions and Attitudes. 2nd Ed., Wadsworth Pub. Co., 1981.Google ScholarGoogle Scholar
  11. S. Kori. The Acoustic Characteristics of Styles Seen in Announcements and Narrations. In 16th Conference of Acoustic Society Japan, pp.151--156, 2002, in Japanese.Google ScholarGoogle Scholar
  12. Y. Matsusaka. 2D Omni Directional Head and Head-Parts Tracking Technique Using Subspace Method and SVM. IEICE Technical Report PRMU, Vol.106, no.72, pp.19--24, 2006, in Japanese.Google ScholarGoogle Scholar
  13. Itou et al. A Japanese Spontaneous Speech Corpus Collected using Automatically Inferencing Wizard of OZ System. J. Acoust. Soc. Jpn. (E), Vol. 20, No. 3, 1999.Google ScholarGoogle Scholar
  14. {anonimized}Google ScholarGoogle Scholar
  15. Ikari et al. English CALL System with Functions of Speech Segmentation and Pronunciation Evaluation Using Speech Recognition Technology. In ICSLP'2002, pp.1229--1232, 2002.Google ScholarGoogle Scholar
  16. Goto et al. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In Eurospeech '99, pp.227--230, 1999.Google ScholarGoogle Scholar
  17. Goto et al. Speech Completion: New Speech Interface with On-demand Completion Assistance, In HCI International 2001, Vol. 1, pp.198--202, 2001.Google ScholarGoogle Scholar
  18. Kitayama et al. "SWITCH" on Speech. IPSJ SIG Technical Report, SLP-46-12, Vol.2003, No.58, pp.67--72, 2003, in Japanese.Google ScholarGoogle Scholar
  19. I. Takeuchi. More Than 90% is Judged by Your Look. Shincho-sha Pub. Co., ISBN: 4106101378, 2005 in Japanese.Google ScholarGoogle Scholar
  20. H. Yahata. Perfect Presentation. Seisansei Shuppan Pub. Co., ISBN: 4820115634, 1998 in Japanese.Google ScholarGoogle Scholar
  21. Oviatt et al., Individual Differences in Multimodal Integration Patterns: What Are They And Why Do They Exist?. In CHI'05, pp.241--249, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rosenberg et al. Acoustic/Prosodic Correlates of Charismatic Speech. In Eurospeech'05, pp.513--516, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Presentation sensei: a presentation training system using speech and image processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces
      November 2007
      402 pages
      ISBN:9781595938176
      DOI:10.1145/1322192

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader