Audio Lifelog Search System Using a Topic Model for Reducing Recognition Errors

Tezuka, Taro; Maeda, Akira

doi:10.1007/978-3-642-20152-3_6

Taro Tezuka¹⁹ &
Akira Maeda¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6588))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1057 Accesses

Abstract

A system that records daily conversations is one of the most useful types of lifelogs. It is, however, not widely used due to the low precision of speech recognizers when applied to conversations. To solve this problem, we propose a method that uses a topic model to reduce incorrectly recognized words. Specifically, we measure relevancy between a term and the other words in the conversation and remove those that come below the threshold. An audio lifelog search system was implemented using the method. Experiments showed that our method is effective in compensating recognition errors of speech recognizers. We observed increase in both precision and recall. The results indicate that our method has an ability to reduce errors in the index of a lifelog search system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sellen, A., Whittaker, S.: Beyond total capture: a constructive critique of lifelogging. Communications of the ACM 53(5), 70–77 (2010)
Article Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)
MATH Google Scholar
Ney, H., Ortmanns, S.: Dynamic Programming Search for Continuous Speech Recognition Contents. IEEE Signal Processing Magazine 16, 64–83 (1999)
Article Google Scholar
Holmes, J., Holmes, W.: Speech synthesis and recognition. Taylor & Francis, Abington (2001)
MATH Google Scholar
Bellegarda, J.R.: Exploiting latent semantic information in statistical language modeling. Proc. of the IEEE 88(8), 1279–1296 (2000)
Article Google Scholar
Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Communication 42, 93–108 (2004)
Article Google Scholar
Wick, M.L., Ross, M.G., Learned-Miller, E.G.: Context-Sensitive Error Correction: Using Topic Models to Improve OCR. In: Proc. of the 9th International Conference on Document Extraction and Analysis, pp. 1168–1172 (September 2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding Scientific Topics. Proc. of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Article Google Scholar
Heinrich, G.: Parameter estimation for text analysis, Technical Note, ver 2.4 (2008), http://www.arbylon.net/publications/text-est.pdf
Wikipedia, http://wikipedia.org
Julius - Open-Source Large Vocabulary CSR Engine, http://julius.sourceforge.jp/en_index.php
The Corpus of Spontaneous Japanese (CSJ Corpus), http://www.kokken.go.jp/katsudo/seika/corpus/public/

Download references

Author information

Authors and Affiliations

College of Information Science and Engineering, Ritsumeikan University, Japan
Taro Tezuka & Akira Maeda

Authors

Taro Tezuka
View author publications
You can also search for this author in PubMed Google Scholar
Akira Maeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Jeffrey Xu Yu
Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro (373-1 Guseong-don), 305-701, Yuseong-gu, Daejeon, Korea
Myoung Ho Kim
Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tezuka, T., Maeda, A. (2011). Audio Lifelog Search System Using a Topic Model for Reducing Recognition Errors. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-20152-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics