High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

doi:10.1016/j.eswa.2011.07.116

Expert Systems with Applications

Volume 39, Issue 1, January 2012, Pages 1147-1155

https://doi.org/10.1016/j.eswa.2011.07.116 Get rights and content

Abstract

High Relevance Keyword Extraction (HRKE) facility is introduced to Bayesian text classification to perform feature/keyword extraction during the classifying stage, without needing extensive pre-classification processes. In order to perform the task of keyword extraction, HRKE facility uses the posterior probability value of keywords within a specific category associated with text document. The experimental results show that HRKE facility is able to ensure promising classification performance for Bayesian classifier while dealing with different text classification domains of varying characteristics. This method guarantees an effective and efficient Bayesian text classifier which is able to handle different domains of varying characteristics, with high accuracy while maintaining the simplicity and low cost processes of the conventional Bayesian classification approach.

Introduction

Text document classification denotes the task of assigning raw text documents to one or more pre-defined categories. This is a direct concept from machine learning, which implies the declaration of a set of labelled categories as a way to represent the documents, and a statistical classifier trained with a labelled training set. Classification is the process in which objects are initially recognized, differentiated and understood, and implies that objects are grouped into categories, usually for some specific purposes. Ideally, a category represents a relationship between the subject and the object of knowledge. Classification is fundamental in prediction, inference, and decision making. However, there are a variety of ways to approach classification task. An increasing number of supervised classification approaches have been developed for various types of classification tasks, such as rule induction (Apte et al., 1994, Provost, 1999), k-nearest neighbor classification (Han, Karypis, & Kumar, 1999), maximum entropy (Nigam, Lafferty, & McCallum, 1999), artificial neural network (Diligenti et al., 2003a, Diligenti et al., 2003b), support vector machines (Isa et al., 2008a, Isa et al., 2008b, Joachims, 1998, Lin, 1999), and Bayesian classification (Domingos and Pazzani, 1997, Eyheramendy et al., 2003, Kim et al., 2002, McCallum and Nigam, 2003, O’Brien and Vogel, 2003, Provost, 1999, Rish, 2001). Besides the supervised classification approaches, the unsupervised clustering approaches, such as self-organizing map (Adami et al., 2005, Hartley et al., 2006, Isa et al., 2009, Wang, 2001) have also been widely implemented in segmenting data into groups for further analysis and processing. Among these approaches, Bayesian classification has been widely implemented in many real world applications due to its relatively simple training and classifying algorithms.

One of the outstanding features of Bayesian classification as compared to other classification approaches is its ability and simplicity in handling raw text data directly, without requiring any pre-process to transform text data into a representation suitable format, typically in numerical form, as required by most of the successful and highly accurate text classification approaches, such as by the use of k-nearest neighbour (k-NN) and support vector machines (SVM) classifiers. As a trade-off to its simplicity, Bayesian classification has been reported as one of the poorest-performing classification approaches by many research groups through extensive experiments and evaluations (Brücher et al., 2002, Yang and Liu, 1999). In order to enhance the performance of Bayesian classifier, researchers have proposed several pre-processes such as stop word elimination, word stemming, and feature selection methods (Al-Mubaid and Umair, 2006, Apte et al., 1994, Chen et al., 2009, Dhillon et al., 2002, Eyheramendy et al., 2003, Han et al., 1999; Isa et al., 2008a, Isa et al., 2008b; Joachims, 1998, Joachims, 1999, Kim et al., 2002, McCallum and Nigam, 2003, Ozgur et al., 2005, Yang and Pedersen, 1997, Yang and Liu, 1999). However, by implementing the pre-processes, the simplicity and low cost training and classifying algorithms of Bayesian classification would have to be sacrificed due to the pre-processing stages which consume high time, physical memory and CPU usages, and also require extensive human expert interaction. It is the goal of this paper to enhance the classification effectiveness of Bayesian text classification, while maintaining the simplicity and low cost training and classification processes. In this paper, we introduce the High Relevance Keyword Extraction (HRKE) facility which uses a simple algorithm to perform stop word elimination and feature selection during the classifying stage, without the need for any pre-process and additional human experts’ involvement.

Conventional Bayesian classification takes the entire body of text document into account for training and classifying purposes. Due to the fact that text documents contain irrelevant words, the accuracy of Bayesian classification is severely degraded by the presence of noisy or irrelevant features. Extensive research works have been carried out in order to counter the problem above by introducing pre-processes such as stop word elimination and feature selection to be applied to the training set and the testing set in order to eliminate irrelevant and low informative features, and in the context of text classification, the low relevance keywords.

Stop word elimination is the procedure where common words which can be seen in many documents such as “a”, “an”, “the”, “to”, “for”, “be”, etc. are eliminated from the documents contained in the dataset (Baeza-Yates & Ribeiro-Neto, 1999). To perform stop word elimination, a list of stop words is used to match each individual word from the text documents. The words that contained in text documents which match any word from the list of stop words will not be taken into account for both the training and classifying processes. There is a potential drawback of stop word elimination, where certain words which are considered as stop words for a particular dataset (domain), but can be highly informative features for another dataset (domain) (Takamura, 2003).

Besides the simple stop word elimination technique, there are several statistical methods for feature selection which have been introduced as pre-processes for Bayesian text classification. These methods provide a measure for usefulness of each individual word in the classification task. Some of the common statistical feature selection methods are document frequency thresholding, information gain, mutual information, x² statistic, and term strength. The feature selection methods mentioned above are discussed and compared in Yang and Pedersen (1997). Besides these, there are also some feature selection methods which have been invented specifically for Bayesian classification such as Multi-class Odds Ratio (MOR), and Class Discrimination Measure (CDM) which have been experimentally proven to be better than other feature selection approaches (Chen et al., 2009).

It was found here that most of the feature selection methods are carried out as pre-processes prior to the classification process. Furthermore, the pre-processes consume additional time, memory and CPU usages, hence the efficiency of the classification task is degraded. It is the goal of the work presented in this paper to introduce a keyword extraction facility to perform the feature selection during the classifying stage of the classifier, without needing any extensive and costly pre-process. This keyword extraction facility implements Bayesian probabilistic algorithm to determine the “Importance” of keywords based on their posterior probability values being annotated to each of the available categories, during the classifying stage performed by Bayesian classifier. Only the “Important” keywords are taken into consideration as the features for the classification and this contributes to an effective and efficient classification, by performing the feature selection during the process of classification.

Section snippets

Bayesian classification approach

The conventional Bayesian classification approach performs its classification tasks starting with the initial step of analyzing text document by extracting words which are contained in the document to generate a list of words (Isa, Lee, & Kallimani, 2008). The list of words is constructed with the assumption that input document consists of words w1, w2, w3, … , wn − 1, wn, where the length of the document (in terms of number of words) is n.

Based on the list of words, the trained Bayesian classifier

High Relevance Keywords Extraction facility

Conventional pattern recognition systems consist of a data acquisition module, a feature selection and transformation mechanism, and a machine learning classification scheme, either supervised or unsupervised. The ordinary Bayesian classification approach takes the entire text body of a document into account to identify the right category of the document (Sahami, Dumais, Heckerman, & Horvitz, 1998). Improving on this classification approach, we have proposed, for our Bayesian text

Experiments and evaluations

The proposed High Relevance Keywords Extraction (HRKE) facility generally improves the classification accuracy of ordinary Bayesian classifier which takes the whole body of the text documents into account in the classification process. However, the optimal performance of a Bayesian text classifier equipped with HRKE facility is achieved depends on the degree of relevance in the extracted keywords to the classification task. This is done by setting a threshold in HRKE facility.

In order to

Conclusion

A Bayesian text classification enhancing technique, the HRKE facility is presented and described here. The enhancement is seen in terms of improvement in classification accuracy and is achieved through applying unique feature selection method based on the occurrence of keywords in documents from a specified category, and compares the occurrence of those keywords in each of the competing categories, respectively. Besides the improvement in terms of classification accuracy, the implementation of

References (33)

G. Adami et al.
Clustering documents into web directory for bootstrapping a supervised classification
Data and Knowledge Engineering
(2005)
D. Isa et al.
Using the self-organizing map for clustering of text documents
Expert Systems With Applications (ESWA)
(2009)
H. Al-Mubaid et al.
A new text categorization technique using distributional clustering and learning logic
IEEE Transactions on Knowledge and Data Engineering
(2006)
C. Apte et al.
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
(1994)
R. Baeza-Yates et al.
Modern information retrieval
(1999)
Brücher, H., Knolmayer, G., & Mittermayer, M. A. (2002). Document classification methods for organizing explicit...
J.N. Chen et al.
Feature selection for text classification with Naïve Bayes
Expert Systems With Applications (ESWA)
(2009)
Dhillon, I. S., Mallela, S., & Kumar, R. (2002). Enhanced word clustering for hierarchical text classification. In...
Diligenti, M., Maggini, M., & Rigutini, L. (2003a). Automatic text categorization using neural network. In Proceedings...
Diligenti, M., Maggini, M., & Rigutini, L. (2003b). Learning similarities for text documents using neural networks. In...

P. Domingos et al.

On the optimality of the simple Bayesian classifier under zero-one loss

Machine Learning

(1997)

Eyheramendy, S., Genkin, A., Ju, W. H., Lewis, D., & Madigan, D. (2003). Sparce Bayesian classifiers for text...

Han, E. H., Karypis, G., & Kumar, V. (1999). Text categorization using weighted adjusted k-nearest neighbor...

Hartley, M., Isa, D., Kallimani, V. P., & Lee, L. H. (2006). A domain knowledge preserving in process engineering using...

D. Isa et al.

Polychotomiser for case-based reasoning beyond the traditional Bayesian classification approach

Journal of Computer and Information Science, Canadian Center of Science and Education (CCSE)

(2008)

D. Isa et al.

Text document pre-processing with the Bayes formula for classification using the support vector machine

IEEE Transactions on Knowledge and Data Engineering (TKDE)

(2008)

Cited by (38)

Finding significant keywords for document databases by two-phase Maximum Entropy Partitioning
2019, Pattern Recognition Letters
Citation Excerpt :
There are different perspectives in literature of the interpretation of ‘significant’ keywords. The most forthcoming interpretation is: ‘relevant keywords that give information about the class of the document’ [18]. The relevance factor can be measured by mutual information, information gain and other information-theoretic measures [14,21] that measure the joint frequency of occurrence of the keyword with the class label.
This paper investigates the selection of class-specific significant keywords for document databases. We define two types of significant keywords with respect to a document class: Elite and Unique Elite, derived in two phases. Elite Keywords are defined as those that have high term frequencies within the class. To obtain the top partition of distinctively high occurring terms in each class, we employ Maximum Entropy Partitioning (MEP) in the first phase. Our presumption is that the term probabilities within the subset of significant (and non-significant) keywords at the point of maximum entropy are relatively more uniform with respect to each other. Unique Elite keywords are those that are Elite for a particular class, and at the same time have a higher frequency of occurrence only in that class as compared to the other classes. To measure this aspect, in the second phase, we compute the entropy of each Elite keyword across all classes, sort the entropies in the ascending order and again employ MEP to shortlist those Elite keywords that occur uniquely in this class, characterized by distinctively low entropy. Experimental comparisons with the state-of-the-art on benchmark datasets using an ensemble of bagged tree classifiers, establishes the discriminatory powers of the derived keywords.
Bidirectional LSTM with attention mechanism and convolutional layer for text classification
2019, Neurocomputing
Citation Excerpt :
At present, there are three main types of the methods for text classification. These methods are: (1) the statistics-based classification methods, such as Bayesian classifier [4]; (2) the connected network learning classification methods, such as neural networks [5]; (3) the rule-making methods, such as decision tree classification [6]. Text classification mainly includes topic classification, question classification and sentiment analysis.
Neural network models have been widely used in the field of natural language processing (NLP). Recurrent neural networks (RNNs), which have the ability to process sequences of arbitrary length, are common methods for sequence modeling tasks. Long short-term memory (LSTM) is one kind of RNNs and has achieved remarkable performance in text classification. However, due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language, text classification presents difficult challenges. In order to solve the above problems, a novel and unified architecture which contains a bidirectional LSTM (BiLSTM), attention mechanism and the convolutional layer is proposed in this paper. The proposed architecture is called attention-based bidirectional long short-term memory with convolution layer (AC-BiLSTM). In AC-BiLSTM, the convolutional layer extracts the higher-level phrase representations from the word embedding vectors and BiLSTM is used to access both the preceding and succeeding context representations. Attention mechanism is employed to give different focus to the information outputted from the hidden layers of BiLSTM. Finally, the softmax classifier is used to classify the processed context information. AC-BiLSTM is able to capture both the local feature of phrases as well as global sentence semantics. Experimental verifications are conducted on six sentiment classification datasets and a question classification dataset, including detailed analysis for AC-BiLSTM. The results clearly show that AC-BiLSTM outperforms other state-of-the-art text classification methods in terms of the classification accuracy.
A recent overview of the state-of-the-art elements of text classification
2018, Expert Systems with Applications
Citation Excerpt :
The most distinctive feature types are as follows: simple features (keywords or phrases), including uni-grams, bi-grams, and n-grams (Chang & Poon, 2009; Figueiredo et al., 2011; Lee, Isa, Choo, & Chue, 2012; Onan, Korukoglu, & Bulut, 2016a; Xie, Wu, & Zhu, 2017), taxonomies or ontologies of features (Cagliero & Garza, 2013; Kang, Haghighi, & Burstein, 2016; de Knijff, Frasincar, & Hogenboom, 2013; Li, Yang, & Park, 2012; Liu, He, Lim, & Wang, 2014; Pi, Martí, & Garcia, 2016; Saleh, Al Rahmawy, & Abulwafa, 2017b; Wu et al., 2017),
The aim of this study is to provide an overview the state-of-the-art elements of text classification. For this purpose, we first select and investigate the primary and recent studies and objectives in this field. Next, we examine the state-of-the-art elements of text classification. In the following steps, we qualitatively and quantitatively analyse the related works. Herein, we describe six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation. This study will help readers acquire the necessary information about these elements and their associated techniques. Thus, we believe that this study will assist other researchers and professionals to propose new studies in the field of text classification.
Feature Extraction of Dialogue Text Based on Big Data and Machine Learning
2024, International Journal of Web-Based Learning and Teaching Technologies
Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
2023, Tsinghua Science and Technology
Resume Classification using Elite Bag-of-Words Approach
2023, Proceedings - 5th International Conference on Smart Systems and Inventive Technology, ICSSIT 2023

View all citing articles on Scopus

¹: Tel: +603 89248116.

²: Tel: +605 4688888.

View full text

High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Abstract

Introduction

Section snippets

Bayesian classification approach

High Relevance Keywords Extraction facility

Experiments and evaluations

Conclusion

Data and Knowledge Engineering

Expert Systems With Applications (ESWA)

A new text categorization technique using distributional clustering and learning logic

IEEE Transactions on Knowledge and Data Engineering

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)

Modern information retrieval

Feature selection for text classification with Naïve Bayes