Computers in the Human Interaction Loop

Waibel, A.; Stiefelhagen, R.; Carlson, R.; Casas, J.; Kleindienst, J.; Lamel, L.; Lanz, O.; Mostefa, D.; Omologo, M.; Pianesi, F.; Polymenakos, L.; Potamianos, G.; Soldatos, J.; Sutschet, G.; Terken, J.

doi:10.1007/978-0-387-93808-0_40

A. Waibel⁴,
R. Stiefelhagen⁴,
R. Carlson⁵,
J. Casas⁶,
J. Kleindienst⁷,
L. Lamel⁸,
O. Lanz⁹,
D. Mostefa¹⁰,
M. Omologo⁹,
F. Pianesi⁹,
L. Polymenakos¹¹,
G. Potamianos¹²,
J. Soldatos¹¹,
G. Sutschet¹³ &
…
J. Terken¹⁴

3118 Accesses
12 Citations

Abstract

It is a common experience in our modern world, for us humans to be overwhelmed by the complexities of technological artifacts around us, and by the attention they demand. While technology provides wonderful support and helpful assistance, it also causes an increased preoccupation with technology itself and a related fragmentation of attention. But as humans, we would rather attend to a meaningful dialog and interaction with other humans, than to control the operations of machines that serve us. The cause for such complexity and distraction, however, is a natural consequence of the flexibility and choice of functions and features that technology has to offer. Thus flexibility of choice and the availability of desirable functions are in conflict with ease of use and our very ability to enjoy their benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abad, A., Canton-Ferrer, C., Segura, C., Landabaso, J.L., Macho, D., Casas, J.R., Hernando, J., Pardas, M., Nadeu, C.: UPC Audio, Video and Multimodal Person Tracking Systems in the CLEAR Evaluation Campaign. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)
Google Scholar
Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker adaptation training. In: Proc. Int. Conf. Spoken Language Process. (ICSLP), pp. 1137–1140. Philadelphia, PA (1996)
Google Scholar
Andreou, A., Kamm, T., Cohen, J.: Experiments in vocal tract normalisation. In: Proc. CAIP Works.: Frontiers in Speech Recognition II (1994)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Acoustic beamforming for speaker diarization of meetings. IEEE Trans. Audio Speech Language Process. 15(7), 2011–2022 (2007)
Article Google Scholar
Bales, R.F.: Interaction process analysis: a method for the study of small groups. University of Chicago press (1976)
Google Scholar
Benne, K.D., Sheats, P.: Functional roles of group members. Journal of Social Issues 4 pp. 41–49 (1948)
Google Scholar
Bernardin, K., Gehrig, T., Stiefelhagen, R.: Multi-Level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 70–81. Springer, Baltimore, MD, USA (2007)
Google Scholar
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, Special Issue on Video Tracking in Complex Scenes for Surveillance Applications (2008)
Google Scholar
Beskow, J., Karlsson, I., Kewley, J., Salvi, G.: SYNFACE - A talking head telephone for the hearing-impaired, pp. 1178–1186. Springer-Verlag (2004)
Google Scholar
Beskow, J., Nordenberg, M.: Data-driven synthesis of expressive visual speech using an mpeg-4 talking head. In: Proceedings of Interspeech 2005. Lisbon (2005)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)
Article Google Scholar
Boud, D., Keogh, R., (Eds.), D.W.: Reflection: Turning experience into learning. Kogan Page, London (1988)
Google Scholar
Brunelli, R., Brutti, A., Chippendale, P., Lanz, O., Omologo, M., Svaizer, P., Tobia, F.: A generative approach to audio-visual person tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, pp. 55–68. Springer LNCS 4122, Southampton, UK (2006)
Google Scholar
Brutti, A.: A person tracking system for CHIL meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)
Google Scholar
Callaway, C., Not, E., Stock, O.: Report generation for post-visit summaries in museum environments. In: O. Stock, M. Zancanaro (eds.). PEACH: Intelligent Interfaces for Museum Visits. Springer (2007)
Google Scholar
Canton-Ferrer, C., Casas, J.R., Pardàs, M.: Human model and motion based 3D action recognition in multiple view scenarios (invited paper). In: 14th European Signal Processing Conference, EUSIPCO. EURASIP, University of Pisa, Florence, Italy (2006). ISBN: 0-387-34223-0
Google Scholar
Canton-Ferrer, C., Salvador, J., Casas, J., M.Pardas: Multi-person tracking strategies based on voxel analysis. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 91–103. Springer, Baltimore, MD, USA (2007)
Google Scholar
Canton-Ferrer, C., Segura, C., Casas, J.R., Pardàs, M., Hernando, J.: Audiovisual head orientation estimation with particle filters in multisensor scenarios. EURASIP Journal on Advances in Signal Processing (2007)
Google Scholar
The CHIL technology catalogue. http://chil.server.de/servlet/is/5777/
Chippendale, P., Lanz, O.: Optimised meeting recording and annotation using real-time video analysis. In: Proc. 5th Joint Workshop on Machine Learning and Multimodal Interaction, MLMI08. Utrecht, The Netherlands (2008)
Google Scholar
CLEAR – Classification of Events, Activities, and Relationships Evaluation and Workshop: http://www.clear-evaluation.org
The CLEF Website: http://www.clef-campaign.org/
Danninger, M., Stiefelhagen, R.: A context-aware virtual secretary in a smart office environment. In: Proceedings of the ACM Multimedia 2008. Vancouver, Canada (2008)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
D2.2 functional requirements and chil cooperative information system software design, part 2, cooperative information system software design. Available on http://chil.server.de
Dempster, A.P., Laird, M.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society Series B (methodological) 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Dimakis, N., Soldatos, J., Polymenakos, L., Curin, J., Fleury, P., Kleindienst, J.: Integrated development of context-aware applications in smart spaces. IEEE Pervasive Computing 7(4), 71–79 (2008)
Article Google Scholar
Dong, W., Lepri, B., Cappelletti, A., Pentland, A., Pianesi, F., Zancanaro, M.: Using the influence model to recognize functional roles in meetings. In: Proceedings of the International Conference on Multimodal Interaction ICMI2007. Nagoya, Japan (2007)
Google Scholar
Dourish, P.: The appropriation of interactive technologies: Some lessons from placeless documents. Computer Supported Cooperative Work (2003)
Google Scholar
Doyle, M., Straus, D.: How To Make Meetings Work. The Berkley Publishing Group, New York, NY (1993)
Google Scholar
Edlund, J., Beskow, J.: Pushy versus meek - using avatars to influence turn-taking behaviour. In: Proceedings of Interspeech 2007 ICSLP, pp. 682–685. Antwerp, Belgium (2007)
Google Scholar
Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A.: Towards human-like spoken dialogue systems. Speech Communication 50(8-9), 630–645 (2008). URL http://www.speech.kth.se/prod/publications/files/3145.pdf
Article Google Scholar
Edlund, J., Heldner, M.: Exploring prosody in interaction control. Phonetica 62(2-4), 215–226 (2005)
Article Google Scholar
Edlund, J., Heldner, M.: Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch. In: C. Müller (ed.) Speaker Classification. Springer/LNAI (2007)
Google Scholar
Ekenel, H.K., Stiefelhagen, R.: Analysis of local appearance-based face recognition: Effects of feature selection and feature normalization. In: CVPR Biometrics Workshop. New York, USA (2006)
Google Scholar
ELRA Catalogue of Language Resources: http://catalog.elra.info
FIPA: The foundation for intelligent physical agents. http://www.fipa.org
Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (ROVER). In: Proc. Automatic Speech Recognition and Understanding Works. (ASRU), pp. 347–352. Santa Barbara, CA (1997)
Google Scholar
Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The Rich Transcription 2006 Spring meeting recognition evaluation. In: S. Renals, S. Bengio, J.G. Fiscus (eds.) Machine Learning for Multimodal Interaction, vol. 4299, pp. 309–322. LNCS (2006)
Google Scholar
Fleury, P., Cuřín, J., Kleindienst, J.: SitCom - development platform for multimodal perceptual services. In: Proceedings of the 3nd International Conference on Industrial Applications of Holonic and Multi-Agent Systems, pp. 106–113. Regensburg, Germany (2007). V. Marik, V. Vyatkin, A.W. Colombo (Eds.): HoloMAS 2007, LNAI 4659
Google Scholar
Gauvain, J.L., Lee, C.: Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing 2(2), 291–298 (1994). URL ftp://tlp.limsi.fr/public/map93.ps.Z
Article Google Scholar
Gehrig, T., McDonough, J.: Tracking multiple speakers with probabilistic data association filters. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)
Google Scholar
Gopinath, R.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 661–664. Seattle, WA (1998)
Google Scholar
Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), vol. 1, pp. 13–16 (1992)
Google Scholar
Heldner, M., Edlund, J., Carlson, R.: Interruption impossible. In: M. Horne, G. Bruce (eds.) Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, pp. 97–105. Peter Lang, Frankfurt am Main (2006)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustic Society America 87(4), 1738–1752 (1990)
Article Google Scholar
Huang, J., Marcheret, E., Visweswariah, K.: Improving speaker diarization for CHIL lecture meetings. In: Proc. Interspeech, pp. 1865–1868. Antwerp, Belgium (2007)
Google Scholar
Huang, J., Marcheret, E., Visweswariah, K., Potamianos, G.: The IBM RT07 evaluation systems for speaker diarization on lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 497–508. Springer, Baltimore, MD, USA (2007)
Google Scholar
Hugot, V.: Eye gaze analysis in human-human communication. Master thesis, KTH Speech, Music and Hearing (2007)
Google Scholar
Ivanov, Y.A., Bobick., A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 852–872 (2000)
Article Google Scholar
JADE: Java Agent DEvelopent Framework. http://jade.tilab.com
Katsarakis, N., Talantzis, F., Pnevmatikakis, A., Polymenakos, L.: The AIT 3D audio / visual person tracker for CLEAR 2007. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 35–46. Springer, Baltimore, MD, USA (2007)
Google Scholar
Katznbach, J., Smith, D.: The Wisdom of Teams. Creating the High Performance Organisations. Harvard Business School Press, Cambridge, MA (1993)
Google Scholar
Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (2006)
Google Scholar
Kray, C., Wasinger, R., Kortuem, G.: Concepts and issues in interfaces for multiple users and multiple devices. In: Proceedings of the Workshop on Multi-User and Ubiquitous User Interfaces (MU3I), IUI/CADUI (2004)
Google Scholar
Kruger, R., Carpendale, M., Scott, S., Tang, A.: Fluid integration of rotation and translation. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2005). Portland, Oregon (2005)
Google Scholar
Kulyk, O., Wang, C., Terken, J.: Real-time feedback based on nonverbal behaviour to enhance social dynamics in small group meetings. In: MLMI’05: Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, LNCS, vol. 3869, pp. 150–161 (2006)
Google Scholar
Landabaso, J.L., M. Pardas, M.: Foreground regions extraction and characterization towards real-time object tracking. In: Machine Learning for Multimodal Interaction (MLMI), vol. 3869, pp. 241–249. Springer LNCS (2006)
Google Scholar
Lanz, O.: Approximate Bayesian Multibody Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1436–1449 (2006)
Article Google Scholar
Lanz, O., Brunelli, R.: Dynamic head location and pose from video. In: IEEE Conf. Multisensor Fusion and Integration (2006)
Google Scholar
Lanz, O., Chippendale, P., Brunelli, R.: An appearance-based particle filter for visual tracking in smart rooms. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 57–69. Springer, Baltimore, MD, USA (2007)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Luque, J., Anguera, X., Temko, A., Hernando, J.: Speaker diarization for conference room: The UPC RT07 s evaluation system. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 543–554. Springer, Baltimore, MD, USA (2007)
Google Scholar
Morris, M., Piper, A., Cassanego, A., Huang, A., Paepcke, A., Winograd, T.: Mediating group dynamics through tabletop interface design. IEEE Computer Graphics and Applications pp. 65–73 (2006)
Google Scholar
M.Voit, R.Stiefelhagen: Tracking head pose and focus of attention with multiple far-field cameras. In: International Conference On Multimodal Interfaces - ICMI 2006. Banff, Canada (2006)
Google Scholar
Nickel, K., Gehrig, T., Ekenel, H.K., McDonough, J., Stiefelhagen, R.: An audio-visual particle filter for speaker tracking on the CLEAR’06 evaluation dataset. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)
Google Scholar
Nickel, K., Gehrig, T., Stiefelhagen, R., McDonough, J.: A Joint Particle Filter for Audio-visual Speaker Tracking. In: Proceedings of the Seventh International Conference On Multimodal Interfaces - ICMI 2005, pp. 61–68. ACM Press (2005)
Google Scholar
The NIST MarkIII Microphone Array: http://www.nist.gov/smartspace/mk3_presentation.html
Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of consensus decision making meetings. The Journal of Language Resources and Evaluation 41(3–4) (2007)
Google Scholar
Pianesi, F., Zancanaro, M., Not, E., Leonardi, C., Falcon, V., Lepri, B.: Multimodal support to group dynamics. Personal and Ubiquitous Computing 12(2) (2008)
Google Scholar
Povey, D., Woodland, P.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP). Salt Lake City, UT (2001)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 105–108. Orlando, FL (2002)
Google Scholar
Rentzeperis, E., Stergiou, A., Boukis, C., Pnevmatikakis, A., Polymenakos, L.C.: The 2006 Athens Information Technology speech activity detection and speaker diarization systems. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 385–395. LNCS (2006)
Google Scholar
The Rich Transcription 2006 Spring Meeting Recognition Evaluation Website: http://www.nist.gov/speech/tests/rt/2006-spring
Rich Transcription 2007 Meeting Recognition Evaluation. http://www.nist.gov/speech/tests/rt/2007
Schwenk, H.: Efficient training of large neural networks for language modeling. In: IJCNN, pp. 3059–3062 (2004)
Google Scholar
Segura, C., Abad, A., Nadeu, C., Hernando, J.: Multispeaker localization and tracking in intelligent environments. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 82–90. Springer, Baltimore, MD, USA (2007)
Google Scholar
Sellen, A., Harper, R.: The Myth of the Paperless Office. MIT Press (2001)
Google Scholar
Shen, C., Vernier, F., Forlines, C., Ringel, M.: Diamondspin: An extensible toolkit for around-the-table interaction. In: ACM Conference on Human Factors in Computing Systems (CHI) (2004)
Google Scholar
Siciliano, C., Williams, G., Beskow, J., Faulkner, A.: Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired. In: Proc of ICPhS, XV Intl Conference of Phonetic Sciences, pp. 131–134. Barcelona, Spain (2003)
Google Scholar
Skantze, G., House, D., Edlund, J.: User responses to prosodic variation on fragmentary grounding utterances in dialogue. In: Proceedings Interspeech 2006, pp. 2002–2005. Pittsburgh, PA (2006)
Google Scholar
SmarTrack - a SmarT people Tracker. Patent pending. Online at http://tev.fbk.eu/smartrack/
Soldatos, J., Dimakis, N., Stamatis, K., Polymenakos, L.: A Breadboard Architecture for Pervasive Context-Aware Services in Smart Spaces: Middleware Components and Prototype Applications. Personal and Ubiquitous Computing Journal 11(2), 193–212 (2007). URL http://www.springerlink.com/content/j14821834364128w/
Article Google Scholar
Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, CLEAR 2006, no. 4122 in Springer LNCS, pp. 1–45. Southampton, UK (2006)
Google Scholar
Stiefelhagen, R., Bernardin, K., Bowers, R., Rose, R.T., Michel, M., Garofolo, J.: The CLEAR 2007 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 3–34. Springer, Baltimore, MD, USA (2007)
Google Scholar
Stiefelhagen, R., Bernardin, K., Ekenel, H., McDonough, J., Nickel, K., Voit, M., Woelfel, M.: Audio-visual perception of a lecturer in a smart seminar room. Signal Processing - Special Issue on Multimodal Interfaces 86(12) (2006)
Google Scholar
Stiefelhagen, R., Bowers, R., Fiscus, J. (eds.): Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)
Google Scholar
Stiefelhagen, R., Garofolo, J. (eds.): Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR’06. No. 4122 in LNCS. Springer, Southampton, UK (2006)
Google Scholar
Sturm, J., van Herwijnen, O.H., Eyck, A., Terken, J.: Influencing social dynamics in meetings through a peripheral display. In: ICMI ’07: Proceedings of the 9th international conference on Multimodal interfaces, pp. 263–270. ACM, New York, NY, USA (2007)
Google Scholar
Svanfeldt, G., Olszewski, D.: Perception experiment combining a parametric loudspeaker and a synthetic talking head. In: Proceedings of Interspeech, pp. 1721–1724 (2005)
Google Scholar
Tang, J.C.: Finding from observational studies of collaborative work. International Journal of Man-Machine Studies 34(2), 143–160 (1991)
Article Google Scholar
Tyagi, A., Potamianos, G., Davis, J.W., Chu, S.M.: Fusion of multiple camera views for kernel-based 3D tracking. In: Proc. IEEE Works. Motion and Video Computing (WMVC). Austin, Texas (2007)
Google Scholar
VACE - Video Analysis and Content Extraction, http://iris.usc.edu/Outlines/vace/vace.html
Waibel, A., Stiefelhagen, R. (eds.): Computers in the Human Interaction Loop. Human-Computer Interaction. Springer (2009)
Google Scholar
Wallers, Å., Edlund, J., Skantze, G.: The effects of prosodic features on the interpretation of synthesised backchannels. In: E. André, L. Dybkjaer, W. Minker, H. Neumann, M. Weber (eds.) Proceedings of Perception and Interactive Technologies, pp. 183–187. Springer, Kloster Irsee, Germany (2006)
Chapter Google Scholar
Wojek, C., Nickel, K., Stiefelhagen, R.: Activity recognition and room level tracking in an office environment. In: IEEE Int. Conference on Multisensor Fusion and Integration for Intelligent Systems. Heidelberg, Germany (2006)
Google Scholar
Wölfel, M.: Warped-twice minimum variance distortionless response spectral estimation. In: Proc. EUSIPCO (2006)
Google Scholar
Wölfel, M., McDonough, J.: Combining multi-source far distance speech recognition strategies: Beamforming, blind channel and confusion network combination. In: Proc. Interspeech (2005)
Google Scholar
Zancanaro, M., Lepri, B., Pianesi, F.: Automatic detection of group functional roles in face to face interactions. In: Proceedings of the International Conference of Multimodal Interfaces ICMI-06 (2006)
Google Scholar
Zhang, Z., Potamianos, G., Senior, A.W., Huang, T.S.: Joint face and head tracking inside multi-camera smart rooms. Signal, Image and Video Processing pp. 163–178 (2007)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Speaker diarization: from Broadcast News to lectures. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 396–406. LNCS (2006)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Multi-stage speaker diarization for conference and lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 533–542. Springer, Baltimore, MD, USA (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Universität Karlsruhe (TH), Interactive Systems Labs, Karlsruhe, Germany
A. Waibel & R. Stiefelhagen
Kungl Tekniska Högskolan, Centre for Speech Technology, Stockholm, Sweden
R. Carlson
Universitat Politecnica de Catalunya, Barcelona, Spain
J. Casas
IBM Research, Prague, Czech Republic
J. Kleindienst
LIMSI-CNRS, France
L. Lamel
Foundation Bruno Kessler, irst, Trento, Italy
O. Lanz, M. Omologo & F. Pianesi
ELDA, Paris, France
D. Mostefa
Athens Information Technology, Athens, Greece
L. Polymenakos & J. Soldatos
Institute of Computer Science, FORTH, Crete, Greece
G. Potamianos
Fraunhofer Institute IITB, Karlsruhe, Germany
G. Sutschet
Technische Universiteit Eindhoven, Netherlands
J. Terken

Authors

A. Waibel
View author publications
You can also search for this author in PubMed Google Scholar
R. Stiefelhagen
View author publications
You can also search for this author in PubMed Google Scholar
R. Carlson
View author publications
You can also search for this author in PubMed Google Scholar
J. Casas
View author publications
You can also search for this author in PubMed Google Scholar
J. Kleindienst
View author publications
You can also search for this author in PubMed Google Scholar
L. Lamel
View author publications
You can also search for this author in PubMed Google Scholar
O. Lanz
View author publications
You can also search for this author in PubMed Google Scholar
D. Mostefa
View author publications
You can also search for this author in PubMed Google Scholar
M. Omologo
View author publications
You can also search for this author in PubMed Google Scholar
F. Pianesi
View author publications
You can also search for this author in PubMed Google Scholar
L. Polymenakos
View author publications
You can also search for this author in PubMed Google Scholar
G. Potamianos
View author publications
You can also search for this author in PubMed Google Scholar
J. Soldatos
View author publications
You can also search for this author in PubMed Google Scholar
G. Sutschet
View author publications
You can also search for this author in PubMed Google Scholar
J. Terken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. Waibel or R. Stiefelhagen .

Editor information

Editors and Affiliations

Future University Hakodate, Kameda-Nakano 116-2, Hakodate, Hokkaido, 041-8655, Japan
Hideyuki Nakashima
Department of Electrical Engineering, Stanford University, 350 Serra Mall, Stanford, CA, 94305-9515, USA
Hamid Aghajan
School of Computing & Mathematics, University of Ulster at Jordanstown, Shore Road, Newtownabbey, Co. Antrim, UK, BT37 0QB
Juan Carlos Augusto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Waibel, A. et al. (2010). Computers in the Human Interaction Loop. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds) Handbook of Ambient Intelligence and Smart Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_40

Download citation

DOI: https://doi.org/10.1007/978-0-387-93808-0_40
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-93807-3
Online ISBN: 978-0-387-93808-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics