Skip to main content

Abstract

It is a common experience in our modern world, for us humans to be overwhelmed by the complexities of technological artifacts around us, and by the attention they demand. While technology provides wonderful support and helpful assistance, it also causes an increased preoccupation with technology itself and a related fragmentation of attention. But as humans, we would rather attend to a meaningful dialog and interaction with other humans, than to control the operations of machines that serve us. The cause for such complexity and distraction, however, is a natural consequence of the flexibility and choice of functions and features that technology has to offer. Thus flexibility of choice and the availability of desirable functions are in conflict with ease of use and our very ability to enjoy their benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abad, A., Canton-Ferrer, C., Segura, C., Landabaso, J.L., Macho, D., Casas, J.R., Hernando, J., Pardas, M., Nadeu, C.: UPC Audio, Video and Multimodal Person Tracking Systems in the CLEAR Evaluation Campaign. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)

    Google Scholar 

  2. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker adaptation training. In: Proc. Int. Conf. Spoken Language Process. (ICSLP), pp. 1137–1140. Philadelphia, PA (1996)

    Google Scholar 

  3. Andreou, A., Kamm, T., Cohen, J.: Experiments in vocal tract normalisation. In: Proc. CAIP Works.: Frontiers in Speech Recognition II (1994)

    Google Scholar 

  4. Anguera, X., Wooters, C., Hernando, J.: Acoustic beamforming for speaker diarization of meetings. IEEE Trans. Audio Speech Language Process. 15(7), 2011–2022 (2007)

    Article  Google Scholar 

  5. Bales, R.F.: Interaction process analysis: a method for the study of small groups. University of Chicago press (1976)

    Google Scholar 

  6. Benne, K.D., Sheats, P.: Functional roles of group members. Journal of Social Issues 4 pp. 41–49 (1948)

    Google Scholar 

  7. Bernardin, K., Gehrig, T., Stiefelhagen, R.: Multi-Level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 70–81. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  8. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, Special Issue on Video Tracking in Complex Scenes for Surveillance Applications (2008)

    Google Scholar 

  9. Beskow, J., Karlsson, I., Kewley, J., Salvi, G.: SYNFACE - A talking head telephone for the hearing-impaired, pp. 1178–1186. Springer-Verlag (2004)

    Google Scholar 

  10. Beskow, J., Nordenberg, M.: Data-driven synthesis of expressive visual speech using an mpeg-4 talking head. In: Proceedings of Interspeech 2005. Lisbon (2005)

    Google Scholar 

  11. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)

    Article  Google Scholar 

  12. Boud, D., Keogh, R., (Eds.), D.W.: Reflection: Turning experience into learning. Kogan Page, London (1988)

    Google Scholar 

  13. Brunelli, R., Brutti, A., Chippendale, P., Lanz, O., Omologo, M., Svaizer, P., Tobia, F.: A generative approach to audio-visual person tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, pp. 55–68. Springer LNCS 4122, Southampton, UK (2006)

    Google Scholar 

  14. Brutti, A.: A person tracking system for CHIL meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  15. Callaway, C., Not, E., Stock, O.: Report generation for post-visit summaries in museum environments. In: O. Stock, M. Zancanaro (eds.). PEACH: Intelligent Interfaces for Museum Visits. Springer (2007)

    Google Scholar 

  16. Canton-Ferrer, C., Casas, J.R., Pardàs, M.: Human model and motion based 3D action recognition in multiple view scenarios (invited paper). In: 14th European Signal Processing Conference, EUSIPCO. EURASIP, University of Pisa, Florence, Italy (2006). ISBN: 0-387-34223-0

    Google Scholar 

  17. Canton-Ferrer, C., Salvador, J., Casas, J., M.Pardas: Multi-person tracking strategies based on voxel analysis. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 91–103. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  18. Canton-Ferrer, C., Segura, C., Casas, J.R., Pardàs, M., Hernando, J.: Audiovisual head orientation estimation with particle filters in multisensor scenarios. EURASIP Journal on Advances in Signal Processing (2007)

    Google Scholar 

  19. The CHIL technology catalogue. http://chil.server.de/servlet/is/5777/

  20. Chippendale, P., Lanz, O.: Optimised meeting recording and annotation using real-time video analysis. In: Proc. 5th Joint Workshop on Machine Learning and Multimodal Interaction, MLMI08. Utrecht, The Netherlands (2008)

    Google Scholar 

  21. CLEAR – Classification of Events, Activities, and Relationships Evaluation and Workshop: http://www.clear-evaluation.org

  22. The CLEF Website: http://www.clef-campaign.org/

  23. Danninger, M., Stiefelhagen, R.: A context-aware virtual secretary in a smart office environment. In: Proceedings of the ACM Multimedia 2008. Vancouver, Canada (2008)

    Google Scholar 

  24. Davis, S., Mermelstein, P.: Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  25. D2.2 functional requirements and chil cooperative information system software design, part 2, cooperative information system software design. Available on http://chil.server.de

  26. Dempster, A.P., Laird, M.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society Series B (methodological) 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  27. Dimakis, N., Soldatos, J., Polymenakos, L., Curin, J., Fleury, P., Kleindienst, J.: Integrated development of context-aware applications in smart spaces. IEEE Pervasive Computing 7(4), 71–79 (2008)

    Article  Google Scholar 

  28. Dong, W., Lepri, B., Cappelletti, A., Pentland, A., Pianesi, F., Zancanaro, M.: Using the influence model to recognize functional roles in meetings. In: Proceedings of the International Conference on Multimodal Interaction ICMI2007. Nagoya, Japan (2007)

    Google Scholar 

  29. Dourish, P.: The appropriation of interactive technologies: Some lessons from placeless documents. Computer Supported Cooperative Work (2003)

    Google Scholar 

  30. Doyle, M., Straus, D.: How To Make Meetings Work. The Berkley Publishing Group, New York, NY (1993)

    Google Scholar 

  31. Edlund, J., Beskow, J.: Pushy versus meek - using avatars to influence turn-taking behaviour. In: Proceedings of Interspeech 2007 ICSLP, pp. 682–685. Antwerp, Belgium (2007)

    Google Scholar 

  32. Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A.: Towards human-like spoken dialogue systems. Speech Communication 50(8-9), 630–645 (2008). URL http://www.speech.kth.se/prod/publications/files/3145.pdf

    Article  Google Scholar 

  33. Edlund, J., Heldner, M.: Exploring prosody in interaction control. Phonetica 62(2-4), 215–226 (2005)

    Article  Google Scholar 

  34. Edlund, J., Heldner, M.: Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch. In: C. Müller (ed.) Speaker Classification. Springer/LNAI (2007)

    Google Scholar 

  35. Ekenel, H.K., Stiefelhagen, R.: Analysis of local appearance-based face recognition: Effects of feature selection and feature normalization. In: CVPR Biometrics Workshop. New York, USA (2006)

    Google Scholar 

  36. ELRA Catalogue of Language Resources: http://catalog.elra.info

  37. FIPA: The foundation for intelligent physical agents. http://www.fipa.org

  38. Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (ROVER). In: Proc. Automatic Speech Recognition and Understanding Works. (ASRU), pp. 347–352. Santa Barbara, CA (1997)

    Google Scholar 

  39. Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The Rich Transcription 2006 Spring meeting recognition evaluation. In: S. Renals, S. Bengio, J.G. Fiscus (eds.) Machine Learning for Multimodal Interaction, vol. 4299, pp. 309–322. LNCS (2006)

    Google Scholar 

  40. Fleury, P., Cuřín, J., Kleindienst, J.: SitCom - development platform for multimodal perceptual services. In: Proceedings of the 3nd International Conference on Industrial Applications of Holonic and Multi-Agent Systems, pp. 106–113. Regensburg, Germany (2007). V. Marik, V. Vyatkin, A.W. Colombo (Eds.): HoloMAS 2007, LNAI 4659

    Google Scholar 

  41. Gauvain, J.L., Lee, C.: Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing 2(2), 291–298 (1994). URL ftp://tlp.limsi.fr/public/map93.ps.Z

    Article  Google Scholar 

  42. Gehrig, T., McDonough, J.: Tracking multiple speakers with probabilistic data association filters. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)

    Google Scholar 

  43. Gopinath, R.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 661–664. Seattle, WA (1998)

    Google Scholar 

  44. Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), vol. 1, pp. 13–16 (1992)

    Google Scholar 

  45. Heldner, M., Edlund, J., Carlson, R.: Interruption impossible. In: M. Horne, G. Bruce (eds.) Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, pp. 97–105. Peter Lang, Frankfurt am Main (2006)

    Google Scholar 

  46. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustic Society America 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  47. Huang, J., Marcheret, E., Visweswariah, K.: Improving speaker diarization for CHIL lecture meetings. In: Proc. Interspeech, pp. 1865–1868. Antwerp, Belgium (2007)

    Google Scholar 

  48. Huang, J., Marcheret, E., Visweswariah, K., Potamianos, G.: The IBM RT07 evaluation systems for speaker diarization on lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 497–508. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  49. Hugot, V.: Eye gaze analysis in human-human communication. Master thesis, KTH Speech, Music and Hearing (2007)

    Google Scholar 

  50. Ivanov, Y.A., Bobick., A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 852–872 (2000)

    Article  Google Scholar 

  51. JADE: Java Agent DEvelopent Framework. http://jade.tilab.com

  52. Katsarakis, N., Talantzis, F., Pnevmatikakis, A., Polymenakos, L.: The AIT 3D audio / visual person tracker for CLEAR 2007. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 35–46. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  53. Katznbach, J., Smith, D.: The Wisdom of Teams. Creating the High Performance Organisations. Harvard Business School Press, Cambridge, MA (1993)

    Google Scholar 

  54. Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (2006)

    Google Scholar 

  55. Kray, C., Wasinger, R., Kortuem, G.: Concepts and issues in interfaces for multiple users and multiple devices. In: Proceedings of the Workshop on Multi-User and Ubiquitous User Interfaces (MU3I), IUI/CADUI (2004)

    Google Scholar 

  56. Kruger, R., Carpendale, M., Scott, S., Tang, A.: Fluid integration of rotation and translation. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2005). Portland, Oregon (2005)

    Google Scholar 

  57. Kulyk, O., Wang, C., Terken, J.: Real-time feedback based on nonverbal behaviour to enhance social dynamics in small group meetings. In: MLMI’05: Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, LNCS, vol. 3869, pp. 150–161 (2006)

    Google Scholar 

  58. Landabaso, J.L., M. Pardas, M.: Foreground regions extraction and characterization towards real-time object tracking. In: Machine Learning for Multimodal Interaction (MLMI), vol. 3869, pp. 241–249. Springer LNCS (2006)

    Google Scholar 

  59. Lanz, O.: Approximate Bayesian Multibody Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1436–1449 (2006)

    Article  Google Scholar 

  60. Lanz, O., Brunelli, R.: Dynamic head location and pose from video. In: IEEE Conf. Multisensor Fusion and Integration (2006)

    Google Scholar 

  61. Lanz, O., Chippendale, P., Brunelli, R.: An appearance-based particle filter for visual tracking in smart rooms. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 57–69. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  62. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  63. Luque, J., Anguera, X., Temko, A., Hernando, J.: Speaker diarization for conference room: The UPC RT07 s evaluation system. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 543–554. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  64. Morris, M., Piper, A., Cassanego, A., Huang, A., Paepcke, A., Winograd, T.: Mediating group dynamics through tabletop interface design. IEEE Computer Graphics and Applications pp. 65–73 (2006)

    Google Scholar 

  65. M.Voit, R.Stiefelhagen: Tracking head pose and focus of attention with multiple far-field cameras. In: International Conference On Multimodal Interfaces - ICMI 2006. Banff, Canada (2006)

    Google Scholar 

  66. Nickel, K., Gehrig, T., Ekenel, H.K., McDonough, J., Stiefelhagen, R.: An audio-visual particle filter for speaker tracking on the CLEAR’06 evaluation dataset. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)

    Google Scholar 

  67. Nickel, K., Gehrig, T., Stiefelhagen, R., McDonough, J.: A Joint Particle Filter for Audio-visual Speaker Tracking. In: Proceedings of the Seventh International Conference On Multimodal Interfaces - ICMI 2005, pp. 61–68. ACM Press (2005)

    Google Scholar 

  68. The NIST MarkIII Microphone Array: http://www.nist.gov/smartspace/mk3_presentation.html

  69. Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of consensus decision making meetings. The Journal of Language Resources and Evaluation 41(3–4) (2007)

    Google Scholar 

  70. Pianesi, F., Zancanaro, M., Not, E., Leonardi, C., Falcon, V., Lepri, B.: Multimodal support to group dynamics. Personal and Ubiquitous Computing 12(2) (2008)

    Google Scholar 

  71. Povey, D., Woodland, P.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP). Salt Lake City, UT (2001)

    Google Scholar 

  72. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 105–108. Orlando, FL (2002)

    Google Scholar 

  73. Rentzeperis, E., Stergiou, A., Boukis, C., Pnevmatikakis, A., Polymenakos, L.C.: The 2006 Athens Information Technology speech activity detection and speaker diarization systems. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 385–395. LNCS (2006)

    Google Scholar 

  74. The Rich Transcription 2006 Spring Meeting Recognition Evaluation Website: http://www.nist.gov/speech/tests/rt/2006-spring

  75. Rich Transcription 2007 Meeting Recognition Evaluation. http://www.nist.gov/speech/tests/rt/2007

  76. Schwenk, H.: Efficient training of large neural networks for language modeling. In: IJCNN, pp. 3059–3062 (2004)

    Google Scholar 

  77. Segura, C., Abad, A., Nadeu, C., Hernando, J.: Multispeaker localization and tracking in intelligent environments. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 82–90. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  78. Sellen, A., Harper, R.: The Myth of the Paperless Office. MIT Press (2001)

    Google Scholar 

  79. Shen, C., Vernier, F., Forlines, C., Ringel, M.: Diamondspin: An extensible toolkit for around-the-table interaction. In: ACM Conference on Human Factors in Computing Systems (CHI) (2004)

    Google Scholar 

  80. Siciliano, C., Williams, G., Beskow, J., Faulkner, A.: Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired. In: Proc of ICPhS, XV Intl Conference of Phonetic Sciences, pp. 131–134. Barcelona, Spain (2003)

    Google Scholar 

  81. Skantze, G., House, D., Edlund, J.: User responses to prosodic variation on fragmentary grounding utterances in dialogue. In: Proceedings Interspeech 2006, pp. 2002–2005. Pittsburgh, PA (2006)

    Google Scholar 

  82. SmarTrack - a SmarT people Tracker. Patent pending. Online at http://tev.fbk.eu/smartrack/

  83. Soldatos, J., Dimakis, N., Stamatis, K., Polymenakos, L.: A Breadboard Architecture for Pervasive Context-Aware Services in Smart Spaces: Middleware Components and Prototype Applications. Personal and Ubiquitous Computing Journal 11(2), 193–212 (2007). URL http://www.springerlink.com/content/j14821834364128w/

    Article  Google Scholar 

  84. Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, CLEAR 2006, no. 4122 in Springer LNCS, pp. 1–45. Southampton, UK (2006)

    Google Scholar 

  85. Stiefelhagen, R., Bernardin, K., Bowers, R., Rose, R.T., Michel, M., Garofolo, J.: The CLEAR 2007 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 3–34. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  86. Stiefelhagen, R., Bernardin, K., Ekenel, H., McDonough, J., Nickel, K., Voit, M., Woelfel, M.: Audio-visual perception of a lecturer in a smart seminar room. Signal Processing - Special Issue on Multimodal Interfaces 86(12) (2006)

    Google Scholar 

  87. Stiefelhagen, R., Bowers, R., Fiscus, J. (eds.): Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

  88. Stiefelhagen, R., Garofolo, J. (eds.): Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR’06. No. 4122 in LNCS. Springer, Southampton, UK (2006)

    Google Scholar 

  89. Sturm, J., van Herwijnen, O.H., Eyck, A., Terken, J.: Influencing social dynamics in meetings through a peripheral display. In: ICMI ’07: Proceedings of the 9th international conference on Multimodal interfaces, pp. 263–270. ACM, New York, NY, USA (2007)

    Google Scholar 

  90. Svanfeldt, G., Olszewski, D.: Perception experiment combining a parametric loudspeaker and a synthetic talking head. In: Proceedings of Interspeech, pp. 1721–1724 (2005)

    Google Scholar 

  91. Tang, J.C.: Finding from observational studies of collaborative work. International Journal of Man-Machine Studies 34(2), 143–160 (1991)

    Article  Google Scholar 

  92. Tyagi, A., Potamianos, G., Davis, J.W., Chu, S.M.: Fusion of multiple camera views for kernel-based 3D tracking. In: Proc. IEEE Works. Motion and Video Computing (WMVC). Austin, Texas (2007)

    Google Scholar 

  93. VACE - Video Analysis and Content Extraction, http://iris.usc.edu/Outlines/vace/vace.html

  94. Waibel, A., Stiefelhagen, R. (eds.): Computers in the Human Interaction Loop. Human-Computer Interaction. Springer (2009)

    Google Scholar 

  95. Wallers, Å., Edlund, J., Skantze, G.: The effects of prosodic features on the interpretation of synthesised backchannels. In: E. André, L. Dybkjaer, W. Minker, H. Neumann, M. Weber (eds.) Proceedings of Perception and Interactive Technologies, pp. 183–187. Springer, Kloster Irsee, Germany (2006)

    Chapter  Google Scholar 

  96. Wojek, C., Nickel, K., Stiefelhagen, R.: Activity recognition and room level tracking in an office environment. In: IEEE Int. Conference on Multisensor Fusion and Integration for Intelligent Systems. Heidelberg, Germany (2006)

    Google Scholar 

  97. Wölfel, M.: Warped-twice minimum variance distortionless response spectral estimation. In: Proc. EUSIPCO (2006)

    Google Scholar 

  98. Wölfel, M., McDonough, J.: Combining multi-source far distance speech recognition strategies: Beamforming, blind channel and confusion network combination. In: Proc. Interspeech (2005)

    Google Scholar 

  99. Zancanaro, M., Lepri, B., Pianesi, F.: Automatic detection of group functional roles in face to face interactions. In: Proceedings of the International Conference of Multimodal Interfaces ICMI-06 (2006)

    Google Scholar 

  100. Zhang, Z., Potamianos, G., Senior, A.W., Huang, T.S.: Joint face and head tracking inside multi-camera smart rooms. Signal, Image and Video Processing pp. 163–178 (2007)

    Google Scholar 

  101. Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Speaker diarization: from Broadcast News to lectures. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 396–406. LNCS (2006)

    Google Scholar 

  102. Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Multi-stage speaker diarization for conference and lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 533–542. Springer, Baltimore, MD, USA (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. Waibel or R. Stiefelhagen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Waibel, A. et al. (2010). Computers in the Human Interaction Loop. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds) Handbook of Ambient Intelligence and Smart Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_40

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-93808-0_40

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-93807-3

  • Online ISBN: 978-0-387-93808-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics