ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computer Speech & Language
Volume 20, Issue 4, October 2006, Pages 468-494
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (383 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2005.06.002    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2005 Elsevier Ltd All rights reserved.

A study in machine learning from imbalanced data for sentence boundary detection in speech

Yang Liua, c, Corresponding Author Contact Information, E-mail The Corresponding Author, Nitesh V. Chawlab, E-mail The Corresponding Author, Mary P. Harperc, E-mail The Corresponding Author, Elizabeth Shriberga, d, E-mail The Corresponding Author and Andreas Stolckea, d, E-mail The Corresponding Author

aSpeech Group, International Computer Science Institute, 1947 Center St., Ste 600, Berkeley, CA 94704, USA bDepartment of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46530, USA cDepartment of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA dSRI International, Menlo Park, CA 94025, USA

Received 4 August 2004; 
revised 14 March 2005; 
accepted 16 June 2005. 
Available online 13 July 2005.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. Since there are more nonsentence boundaries than sentence boundaries in the data, the prosody model, which is implemented as a decision tree classifier, must be constructed to effectively learn from the imbalanced data distribution. To address this problem, we investigate a variety of sampling approaches and a bagging scheme. A pilot study was carried out to select methods to apply to the full NIST sentence boundary evaluation task across two corpora (conversational telephone speech and broadcast news speech), using both human transcriptions and recognition output. In the pilot study, when classification error rate is the performance measure, using the original training set achieves the best performance among the sampling methods, and an ensemble of multiple classifiers from different downsampled training sets achieves slightly poorer performance, but has the potential to reduce computational effort. However, when performance is measured using receiver operating characteristics (ROC) or area under the curve (AUC), then the sampling approaches outperform the original training set. This observation is important if the sentence boundary detection output is used by downstream language processing modules. Bagging was found to significantly improve system performance for each of the sampling methods. The gain from these methods may be diminished when the prosody model is combined with the language model, which is a strong knowledge source for the sentence detection task. The patterns found in the pilot study were replicated in the full NIST evaluation task. The conclusions may be dependent on the task, the classifiers, and the knowledge combination approach.

Article Outline

1. Introduction
2. Sentence boundary detection task
2.1. Task representation
2.2. Data
2.3. Evaluation metrics
3. SU boundary detection approach
3.1. Description of the HMM approach
3.1.1. The prosody model
3.1.2. The language model (LM)
3.1.3. Model combination using HMM
3.2. Related work on sentence boundary detection
4. Addressing the imbalanced data set problem
4.1. Imbalanced data set problem
4.2. Sampling approaches
4.3. Bagging
5. Pilot study
5.1. Experimental setup
5.2. Sampling results
5.3. Bagging results
6. Evaluation on the NIST SU task
6.1. Experimental setup
6.2. Results
7. Conclusions
Acknowledgements
References






Computer Speech & Language
Volume 20, Issue 4, October 2006, Pages 468-494
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.