A mixed-integer programming approach to multi-class data classification problem

https://doi.org/10.1016/j.ejor.2005.04.049Get rights and content

Abstract

This paper presents a new data classification method based on mixed-integer programming. Traditional approaches that are based on partitioning the data sets into two groups perform poorly for multi-class data classification problems. The proposed approach is based on the use of hyper-boxes for defining boundaries of the classes that include all or some of the points in that set. A mixed-integer programming model is developed for representing existence of hyper-boxes and their boundaries. In addition, the relationships among the discrete decisions in the model are represented using propositional logic and then converted to their equivalent integer constraints using Boolean algebra. The proposed approach for multi-class data classification is illustrated on an example problem. The efficiency of the proposed method is tested on the well-known IRIS data set. The computational results on the illustrative example and the IRIS data set show that the proposed method is accurate and efficient on multi-class data classification problems.

Introduction

Classification is a supervised learning strategy which emphasizes on building models able to assign new instances to one of a set of well-defined classes. Classification problems have been intensively studied by a diverse group of researchers including statisticians, engineers, biologists, computer scientists. There are variety of methods for solving classification problem in different disciplines. Some of these methods include neural networks (NN), fuzzy logic, support vector machines (SVM), tolerant rough sets, principal component analysis (PCA), and linear programming [1].

A general neural-network model for fuzzy logic control and decision systems including the data classification problem is discussed in [19]. Among the classification methods fuzzy adaptive resonance theory (ART) is a fast and reliable analog pattern clustering system [8]. Rough set theory introduced by [21] is a mathematical tool to deal with vagueness and uncertainty in the areas of machine learning and pattern recognition. Two applications of logic for classification by using rough set approach are presented in [5]. Another important factor in data classification using rough sets is the tolerance relation among the objects for pattern classification [20]. A data classification method based on the tolerant rough set that combines the use of logic and the tolerance relation among the objects is presented in [16]. Furthermore, a fuzzy min–max classification neural network in which pattern classes are utilized as fuzzy sets is given in [24]. Filter-based greedy modular subspace techniques use principal component analysis that is instrumental in reducing the number of attributes in clustering [10]. Anthony analyzed the generalization properties of multi-class data classification techniques based on iterative linear partitioning [2].

In recent years, SVM has been considered as one of the most efficient methods for two-class classification problems [27]. SVM is a new classification technique developed by Vapnik and his group [11]. SVM is able to generate a separating hyper surface in order to maximize the margin and produce good generalization ability. However, the SVM has two important drawbacks. First, a combination of SVMs has to be used in order to solve the multi-class classification problems. Since it is originally a model for binary-class classification, proposed models for combination of SVMs does not have improved performance. Second, some approximation algorithms are used in order to reduce the computational time for SVMs while learning the large scale of data. On the other hand, this computational improvement could cause less efficient performance values. To overcome above problems, many variants of SVM are suggested including the use of SVM ensemble with bagging or boosting rather than using a single SVM [17].

There have been some attempts to solve classification problems using mathematical programming. Most of these methods modeled data classification as linear programming (LP) problems which optimize a distance function [12]. Contrary to LP problems, MILP problems with minimizing the misclassifications on the design data set as an objective function are studied. There have been several attempts to formulate these problems as a mixed-integer programming problem [3], [14], [18], [25]. A heuristics extension of linear programming approach in order to improve the performance of multi-class supervised classification was proposed in [1]. The logical analysis of data (LAD) is a combinatorics, optimization, and Boolean algebra-based methodology for extracting information from data including classification [6]. This approach is very effective in binary classification; however, the method suffers from accuracy and efficiency when there are more than two classes.

This paper presents an accurate and efficient mathematical programming method for multi-class data classification problems. We discuss our approach to multi-class data classification problem in Section 2. The mixed-integer programming model is presented in Section 3. The application of the approach on a sample problem is illustrated in Section 4 and the results for IRIS data set is given in Section 5. Finally, the paper is concluded by presenting the conclusions and discussion of the results.

Section snippets

Data classification approach

The objective in data classification is to assign data points that are described by several attributes into a predefined number of classes. Fig. 1a shows the schematic representation of classification of multi-dimensional data using hyper-planes. Although the methods that are based on using hyper-planes to define the boundaries of classes can be efficient in classifying data into two sets, they are inaccurate and inefficient when data needs to be classified into more than two sets as shown in

Problem formulation for multi-class data classification

The data classification problem is considered in two parts: training and testing. The objective of the training part is to determine the characteristics of the data points that belong to a certain class and differentiate them from the data points that belong to other classes. After the distinguishing characteristics of the classes are determined, then the effectiveness of the classification must be tested.

Illustrative example

We applied the mixed-integer programming method on a set of 16 data points in four different classes given in Fig. 2. The data points can be represented by two attributes, 1 and 2.

There are a total of 20 data points; 16 of these points were used in training and 4 of them used in testing. The training problem classified the data into 4 four classes using 5 hyper-boxes as shown in Fig. 3. It is interesting to note that Class1 requires two hyper-boxes while the other classes are represented with a

Evaluation of the method on IRIS data set

In this part of the study, the efficiency of the proposed model is tested on the well-known IRIS data set. IRIS data published by [13] are selected due to the reason that it has been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in centimeters on 50 IRIS specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. This data set is extensively studied in the pattern

Conclusions and future work

The proposed data classification method based on mixed-integer programming allows the use of hyper-boxes for defining boundaries of the classes that enclose all or some of the points in that set. Traditional methods based using hyper-planes can be inaccurate and inefficient in classifying data more than two classes. Consequently, using hyper-boxes for multi-class data classification problems can be very accurate because of the well-construction of boundaries of each class.

In the training part

References (27)

  • Y. Yajima

    Linear programming approaches for multi category support vector machines

    European Journal of Operational Research

    (2005)
  • S.M. Bajgier et al.

    An experimental comparison of statistical and linear programming approaches to the discriminant problem

    Decision Sciences

    (1982)
  • B. Bay, The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA,...
  • Cited by (53)

    • The Geodesic Classification Problem on Graphs

      2019, Electronic Notes in Theoretical Computer Science
    • Auction optimization using regression trees and linear models as integer programs

      2017, Artificial Intelligence
      Citation Excerpt :

      CCMs solve prediction problems. In [34], the authors build a mixed integer program for multi-class data classification. A comprehensive overview of optimization techniques used in learning is given in [35].

    View all citing articles on Scopus
    View full text