ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Processing & Management
Volume 43, Issue 2, March 2007, Pages 379-392
Special issue on AIRS2005: Information Retrieval Research in Asia
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (272 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
Special issue
View Record in Scopus
 
doi:10.1016/j.ipm.2006.07.013    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Published by Elsevier Ltd.

A hybrid generative/discriminative approach to text classification with additional information

Akinori FujinoCorresponding Author Contact Information, a, E-mail The Corresponding Author, Naonori Uedaa, E-mail The Corresponding Author and Kazumi Saitoa, E-mail The Corresponding Author

aNTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan

Received 27 May 2006; 
accepted 25 July 2006. 
Available online 11 October 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a hybrid composed of probabilistic generative and discriminative approaches. Our formulation considers individual component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for the main text and additional components such as titles, links, and authors, so that we can apply our formulation to document and Web page classification problems. Our experimental results for four test collections confirmed that our hybrid approach effectively combined main text and additional components and thus improved classification performance.

Keywords: Multiclass and single-labeled text classification; Multiple components; Maximum entropy principle; Naive Bayes model

Article Outline

1. Introduction
2. Conventional approaches
2.1. Generative approach
2.2. Discriminative approach
2.3. Hybrid approach for binary classification
3. Proposed method
3.1. Hybrid approach
3.1.1. Component generative models
3.1.2. Discriminative class posterior design
3.1.3. Another class posterior by ME
3.2. Application to text classification
4. Experiments
4.1. Test collections
4.2. Experimental settings
4.2.1. Evaluation methods
4.2.2. Evaluation measure
4.3. Experiment 1
4.3.1. Compared classifiers
4.3.2. Results
4.4. Experiment 2
4.4.1. Compared classifiers
4.4.2. Results
4.4.3. Analysis of combination weights
4.5. Experiment 3
4.5.1. Compared classifiers
4.5.2. Results
5. Related work
6. Conclusion
Appendix A. Hyperparameter tuning procedure
References



Information Processing & Management
Volume 43, Issue 2, March 2007, Pages 379-392
Special issue on AIRS2005: Information Retrieval Research in Asia
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.