A mixed-integer programming approach to multi-class data classification problem
Introduction
Classification is a supervised learning strategy which emphasizes on building models able to assign new instances to one of a set of well-defined classes. Classification problems have been intensively studied by a diverse group of researchers including statisticians, engineers, biologists, computer scientists. There are variety of methods for solving classification problem in different disciplines. Some of these methods include neural networks (NN), fuzzy logic, support vector machines (SVM), tolerant rough sets, principal component analysis (PCA), and linear programming [1].
A general neural-network model for fuzzy logic control and decision systems including the data classification problem is discussed in [19]. Among the classification methods fuzzy adaptive resonance theory (ART) is a fast and reliable analog pattern clustering system [8]. Rough set theory introduced by [21] is a mathematical tool to deal with vagueness and uncertainty in the areas of machine learning and pattern recognition. Two applications of logic for classification by using rough set approach are presented in [5]. Another important factor in data classification using rough sets is the tolerance relation among the objects for pattern classification [20]. A data classification method based on the tolerant rough set that combines the use of logic and the tolerance relation among the objects is presented in [16]. Furthermore, a fuzzy min–max classification neural network in which pattern classes are utilized as fuzzy sets is given in [24]. Filter-based greedy modular subspace techniques use principal component analysis that is instrumental in reducing the number of attributes in clustering [10]. Anthony analyzed the generalization properties of multi-class data classification techniques based on iterative linear partitioning [2].
In recent years, SVM has been considered as one of the most efficient methods for two-class classification problems [27]. SVM is a new classification technique developed by Vapnik and his group [11]. SVM is able to generate a separating hyper surface in order to maximize the margin and produce good generalization ability. However, the SVM has two important drawbacks. First, a combination of SVMs has to be used in order to solve the multi-class classification problems. Since it is originally a model for binary-class classification, proposed models for combination of SVMs does not have improved performance. Second, some approximation algorithms are used in order to reduce the computational time for SVMs while learning the large scale of data. On the other hand, this computational improvement could cause less efficient performance values. To overcome above problems, many variants of SVM are suggested including the use of SVM ensemble with bagging or boosting rather than using a single SVM [17].
There have been some attempts to solve classification problems using mathematical programming. Most of these methods modeled data classification as linear programming (LP) problems which optimize a distance function [12]. Contrary to LP problems, MILP problems with minimizing the misclassifications on the design data set as an objective function are studied. There have been several attempts to formulate these problems as a mixed-integer programming problem [3], [14], [18], [25]. A heuristics extension of linear programming approach in order to improve the performance of multi-class supervised classification was proposed in [1]. The logical analysis of data (LAD) is a combinatorics, optimization, and Boolean algebra-based methodology for extracting information from data including classification [6]. This approach is very effective in binary classification; however, the method suffers from accuracy and efficiency when there are more than two classes.
This paper presents an accurate and efficient mathematical programming method for multi-class data classification problems. We discuss our approach to multi-class data classification problem in Section 2. The mixed-integer programming model is presented in Section 3. The application of the approach on a sample problem is illustrated in Section 4 and the results for IRIS data set is given in Section 5. Finally, the paper is concluded by presenting the conclusions and discussion of the results.
Section snippets
Data classification approach
The objective in data classification is to assign data points that are described by several attributes into a predefined number of classes. Fig. 1a shows the schematic representation of classification of multi-dimensional data using hyper-planes. Although the methods that are based on using hyper-planes to define the boundaries of classes can be efficient in classifying data into two sets, they are inaccurate and inefficient when data needs to be classified into more than two sets as shown in
Problem formulation for multi-class data classification
The data classification problem is considered in two parts: training and testing. The objective of the training part is to determine the characteristics of the data points that belong to a certain class and differentiate them from the data points that belong to other classes. After the distinguishing characteristics of the classes are determined, then the effectiveness of the classification must be tested.
Illustrative example
We applied the mixed-integer programming method on a set of 16 data points in four different classes given in Fig. 2. The data points can be represented by two attributes, 1 and 2.
There are a total of 20 data points; 16 of these points were used in training and 4 of them used in testing. The training problem classified the data into 4 four classes using 5 hyper-boxes as shown in Fig. 3. It is interesting to note that Class1 requires two hyper-boxes while the other classes are represented with a
Evaluation of the method on IRIS data set
In this part of the study, the efficiency of the proposed model is tested on the well-known IRIS data set. IRIS data published by [13] are selected due to the reason that it has been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in centimeters on 50 IRIS specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. This data set is extensively studied in the pattern
Conclusions and future work
The proposed data classification method based on mixed-integer programming allows the use of hyper-boxes for defining boundaries of the classes that enclose all or some of the points in that set. Traditional methods based using hyper-planes can be inaccurate and inefficient in classifying data more than two classes. Consequently, using hyper-boxes for multi-class data classification problems can be very accurate because of the well-construction of boundaries of each class.
In the training part
References (27)
- et al.
Mathematical programming based heuristics for improving LP-generated classifiers for the multi-class supervised classification problem
European Journal of Operational Research
(2006) On data classification by iterative linear partitioning
Discrete Applied Mathematics
(2004)- et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Computer Vision and Graphics Image Understanding
(1987) - et al.
Modeling and integer programming techniques applied to propositional calculus
Computers and Operational Research
(1990) - et al.
A modular eigen subspace scheme for high-dimensional data classification
Future Generation Computer Systems
(2004) General mathematical programming formulations for the statistical classification problem
Operations Research Letters
(1986)Data classification based on tolerant rough set
Pattern Recognition
(2001)- et al.
Constructing support vector machine ensemble
Pattern Recognition
(2003) - et al.
Relation between MILP modeling and logical inference for chemical process synthesis
Computers and Chemical Engineering
(1991) - et al.
A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem
European Journal of Operations Research
(1990)
Linear programming approaches for multi category support vector machines
European Journal of Operational Research
An experimental comparison of statistical and linear programming approaches to the discriminant problem
Decision Sciences
Cited by (53)
A hybrid approach based on mathematical modelling and improved online learning algorithm for data classification
2023, Expert Systems with ApplicationsAn Integer Programming Approach for the 2-class Single-group Classification Problem
2019, Electronic Notes in Theoretical Computer ScienceThe Geodesic Classification Problem on Graphs
2019, Electronic Notes in Theoretical Computer ScienceIncremental conic functions algorithm for large scale classification problems
2018, Digital Signal Processing: A Review JournalAuction optimization using regression trees and linear models as integer programs
2017, Artificial IntelligenceCitation Excerpt :CCMs solve prediction problems. In [34], the authors build a mixed integer program for multi-class data classification. A comprehensive overview of optimization techniques used in learning is given in [35].