A mixed-integer programming approach to multi-class data classification problem

doi:10.1016/j.ejor.2005.04.049

European Journal of Operational Research

Volume 173, Issue 3, 16 September 2006, Pages 910-920

https://doi.org/10.1016/j.ejor.2005.04.049 Get rights and content

Abstract

This paper presents a new data classification method based on mixed-integer programming. Traditional approaches that are based on partitioning the data sets into two groups perform poorly for multi-class data classification problems. The proposed approach is based on the use of hyper-boxes for defining boundaries of the classes that include all or some of the points in that set. A mixed-integer programming model is developed for representing existence of hyper-boxes and their boundaries. In addition, the relationships among the discrete decisions in the model are represented using propositional logic and then converted to their equivalent integer constraints using Boolean algebra. The proposed approach for multi-class data classification is illustrated on an example problem. The efficiency of the proposed method is tested on the well-known IRIS data set. The computational results on the illustrative example and the IRIS data set show that the proposed method is accurate and efficient on multi-class data classification problems.

Introduction

Classification is a supervised learning strategy which emphasizes on building models able to assign new instances to one of a set of well-defined classes. Classification problems have been intensively studied by a diverse group of researchers including statisticians, engineers, biologists, computer scientists. There are variety of methods for solving classification problem in different disciplines. Some of these methods include neural networks (NN), fuzzy logic, support vector machines (SVM), tolerant rough sets, principal component analysis (PCA), and linear programming [1].

A general neural-network model for fuzzy logic control and decision systems including the data classification problem is discussed in [19]. Among the classification methods fuzzy adaptive resonance theory (ART) is a fast and reliable analog pattern clustering system [8]. Rough set theory introduced by [21] is a mathematical tool to deal with vagueness and uncertainty in the areas of machine learning and pattern recognition. Two applications of logic for classification by using rough set approach are presented in [5]. Another important factor in data classification using rough sets is the tolerance relation among the objects for pattern classification [20]. A data classification method based on the tolerant rough set that combines the use of logic and the tolerance relation among the objects is presented in [16]. Furthermore, a fuzzy min–max classification neural network in which pattern classes are utilized as fuzzy sets is given in [24]. Filter-based greedy modular subspace techniques use principal component analysis that is instrumental in reducing the number of attributes in clustering [10]. Anthony analyzed the generalization properties of multi-class data classification techniques based on iterative linear partitioning [2].

In recent years, SVM has been considered as one of the most efficient methods for two-class classification problems [27]. SVM is a new classification technique developed by Vapnik and his group [11]. SVM is able to generate a separating hyper surface in order to maximize the margin and produce good generalization ability. However, the SVM has two important drawbacks. First, a combination of SVMs has to be used in order to solve the multi-class classification problems. Since it is originally a model for binary-class classification, proposed models for combination of SVMs does not have improved performance. Second, some approximation algorithms are used in order to reduce the computational time for SVMs while learning the large scale of data. On the other hand, this computational improvement could cause less efficient performance values. To overcome above problems, many variants of SVM are suggested including the use of SVM ensemble with bagging or boosting rather than using a single SVM [17].

There have been some attempts to solve classification problems using mathematical programming. Most of these methods modeled data classification as linear programming (LP) problems which optimize a distance function [12]. Contrary to LP problems, MILP problems with minimizing the misclassifications on the design data set as an objective function are studied. There have been several attempts to formulate these problems as a mixed-integer programming problem [3], [14], [18], [25]. A heuristics extension of linear programming approach in order to improve the performance of multi-class supervised classification was proposed in [1]. The logical analysis of data (LAD) is a combinatorics, optimization, and Boolean algebra-based methodology for extracting information from data including classification [6]. This approach is very effective in binary classification; however, the method suffers from accuracy and efficiency when there are more than two classes.

This paper presents an accurate and efficient mathematical programming method for multi-class data classification problems. We discuss our approach to multi-class data classification problem in Section 2. The mixed-integer programming model is presented in Section 3. The application of the approach on a sample problem is illustrated in Section 4 and the results for IRIS data set is given in Section 5. Finally, the paper is concluded by presenting the conclusions and discussion of the results.

Section snippets

Data classification approach

The objective in data classification is to assign data points that are described by several attributes into a predefined number of classes. Fig. 1a shows the schematic representation of classification of multi-dimensional data using hyper-planes. Although the methods that are based on using hyper-planes to define the boundaries of classes can be efficient in classifying data into two sets, they are inaccurate and inefficient when data needs to be classified into more than two sets as shown in

Problem formulation for multi-class data classification

The data classification problem is considered in two parts: training and testing. The objective of the training part is to determine the characteristics of the data points that belong to a certain class and differentiate them from the data points that belong to other classes. After the distinguishing characteristics of the classes are determined, then the effectiveness of the classification must be tested.

Illustrative example

We applied the mixed-integer programming method on a set of 16 data points in four different classes given in Fig. 2. The data points can be represented by two attributes, 1 and 2.

There are a total of 20 data points; 16 of these points were used in training and 4 of them used in testing. The training problem classified the data into 4 four classes using 5 hyper-boxes as shown in Fig. 3. It is interesting to note that Class1 requires two hyper-boxes while the other classes are represented with a

Evaluation of the method on IRIS data set

In this part of the study, the efficiency of the proposed model is tested on the well-known IRIS data set. IRIS data published by [13] are selected due to the reason that it has been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in centimeters on 50 IRIS specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. This data set is extensively studied in the pattern

Conclusions and future work

The proposed data classification method based on mixed-integer programming allows the use of hyper-boxes for defining boundaries of the classes that enclose all or some of the points in that set. Traditional methods based using hyper-planes can be inaccurate and inefficient in classifying data more than two classes. Consequently, using hyper-boxes for multi-class data classification problems can be very accurate because of the well-construction of boundaries of each class.

In the training part

References (27)

J. Adem et al.
Mathematical programming based heuristics for improving LP-generated classifiers for the multi-class supervised classification problem
European Journal of Operational Research
(2006)
M. Anthony
On data classification by iterative linear partitioning
Discrete Applied Mathematics
(2004)
G. Carpenter et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Computer Vision and Graphics Image Understanding
(1987)
T.M. Cavalier et al.
Modeling and integer programming techniques applied to propositional calculus
Computers and Operational Research
(1990)
Y. Chang et al.
A modular eigen subspace scheme for high-dimensional data classification
Future Generation Computer Systems
(2004)
W.V. Gehrlein
General mathematical programming formulations for the statistical classification problem
Operations Research Letters
(1986)
D. Kim
Data classification based on tolerant rough set
Pattern Recognition
(2001)
H. Kim et al.
Constructing support vector machine ensemble
Pattern Recognition
(2003)
R. Raman et al.
Relation between MILP modeling and logical inference for chemical process synthesis
Computers and Chemical Engineering
(1991)
A. Stam et al.
A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem
European Journal of Operations Research
(1990)

Y. Yajima

Linear programming approaches for multi category support vector machines

European Journal of Operational Research

(2005)

S.M. Bajgier et al.

An experimental comparison of statistical and linear programming approaches to the discriminant problem

Decision Sciences

(1982)

B. Bay, The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA,...

Cited by (53)

Sparse solution of least-squares twin multi-class support vector machine using ℓ<inf>0</inf> and ℓ<inf>p</inf>-norm for classification and feature selection
2023, Neural Networks
In the realm of multi-class classification, the twin K-class support vector classification (Twin-KSVC) generates ternary outputs ${- 1, 0, + 1}$ by evaluating all training data in a “1-versus-1-versus-rest” structure. Recently, inspired by the least-squares version of Twin-KSVC and Twin-KSVC, a new multi-class classifier called improvements on least-squares twin multi-class classification support vector machine (ILSTKSVC) has been proposed. In this method, the concept of structural risk minimization is achieved by incorporating a regularization term in addition to the minimization of empirical risk. Twin-KSVC and its improvements have an influence on classification accuracy. Another aspect influencing classification accuracy is feature selection, which is a critical stage in machine learning, especially when working with high-dimensional datasets. However, most prior studies have not addressed this crucial aspect. In this study, motivated by ILSTKSVC and the cardinality-constrained optimization problem, we propose $ℓ_{p}$ -norm least-squares twin multi-class support vector machine (PLSTKSVC) with $0 < p < 1$ to perform classification and feature selection at the same time. The technique employed to solve the optimization problems associated with PLSTKSVC is user-friendly, as it involves solving systems of linear equations to obtain an approximate solution for the proposed model. Under certain assumptions, we investigate the properties of the optimum solutions to the related optimization problems. Several real-world datasets were tested using the suggested method. According to the results of our experiments, the proposed method outperforms all current strategies in most datasets in terms of classification accuracy while also reducing the number of features.
A hybrid approach based on mathematical modelling and improved online learning algorithm for data classification
2023, Expert Systems with Applications
This paper proposes a novel hybrid classifier method using a mixed-integer linear programming (MILP) model and an improved online learning algorithm for general fuzzy min–max neural networks (IOL_GFMM). IOL_GFMM is used to generate initial hyper-boxes for the MILP model in order to improve its efficiency. This paper is the first attempt that integrates the fuzzy min–max neural networks and the MILP model. The applicability of the proposed hybrid model is proven on both real and artificial datasets. Performance evaluation parameters of the novel hybrid classifier are classification accuracy, training time, number of iterations, and number of misclassified data. Results show that the proposed hybrid method provides satisfying improvements in performance evaluation parameters compared with the pure MILP model. Our proposed approach achieves to speed the training time up to 76 times and decreases the number of misclassified data by up to 17 times. Results of artificial datasets show another contribution that the proposed hybrid approach is very efficient in order to classify data samples not only when they belong to separate regions but also when there exists significant overlap.
An Integer Programming Approach for the 2-class Single-group Classification Problem
2019, Electronic Notes in Theoretical Computer Science
Two sets $X_{B}, X_{R} \subseteq R^{d}$ are linearly separable if their convex hulls are disjoint, implying that a hyperplane separating $X_{B}$ from $X_{R}$ exists. Such a hyperplane provides a method for classifying new points, according to the side of the hyperplane in which the new points lie. In this work we consider a particular case of the 2-class classification problem, which asks to select the maximum number of points from $X_{B}$ and $X_{R}$ in such a way that the selected points are linearly separable. We present an integer programming formulation for this problem, explore valid inequalities for the associated polytope, and develop a cutting plane approach coupled with a lazy-constraints scheme.
The Geodesic Classification Problem on Graphs
2019, Electronic Notes in Theoretical Computer Science
Motivated by the significant advances in integer optimization in the past decade, Bertsimas and Shioda developed an integer optimization method to the classical statistical problem of classification in a multidimensional space, delivering a software package called CRIO (Classification and Regression via Integer Optimization). Following those ideas, we define a new classification problem, exploring its combinatorial aspects. That problem is defined on graphs using the geodesic convexity as an analogy of the Euclidean convexity in the multidimensional space. We denote such a problem by Geodesic Classification (GC) problem. We propose an integer programming formulation for the GC problem along with a branch-and-cut algorithm to solve it. Finally, we show computational experiments in order to evaluate the combinatorial optimization efficiency and classification accuracy of the proposed approach.
Incremental conic functions algorithm for large scale classification problems
2018, Digital Signal Processing: A Review Journal
In order to cope with classification problems involving large datasets, we propose a new mathematical programming algorithm by extending the clustering based polyhedral conic functions approach. Despite the high classification efficiency of polyhedral conic functions, the realization previously required a nested implementation of k-means and conic function generation, which has a computational load related to the number of data points. In the proposed algorithm, an efficient data reduction method is employed to the k-means phase prior to the conic function generation step. The new method not only improves the computational efficiency of the successful conic function classifier, but also helps avoiding model over-fitting by giving fewer (but more representative) conic functions.
Auction optimization using regression trees and linear models as integer programs
2017, Artificial Intelligence
Citation Excerpt :
CCMs solve prediction problems. In [34], the authors build a mixed integer program for multi-class data classification. A comprehensive overview of optimization techniques used in learning is given in [35].
In a sequential auction with multiple bidding agents, the problem of determining the ordering of the items to sell in order to maximize the expected revenue is highly challenging. The challenge is largely due to the fact that the autonomy and private information of the agents heavily influence the outcome of the auction.
The main contribution of this paper is two-fold. First, we demonstrate how to apply machine learning techniques to solve the optimal ordering problem in sequential auctions. We learn regression models from historical auctions, which are subsequently used to predict the expected value of orderings for new auctions. Given the learned models, we propose two types of optimization methods: a black-box best-first search approach, and a novel white-box approach that maps learned regression models to integer linear programs (ILP), which can then be solved by any ILP-solver. Although the studied auction design problem is hard, our proposed optimization methods obtain good orderings with high revenues.
Our second main contribution is the insight that the internal structure of regression models can be efficiently evaluated inside an ILP solver for optimization purposes. To this end, we provide efficient encodings of regression trees and linear regression models as ILP constraints. This new way of using learned models for optimization is promising. As the experimental results show, it significantly outperforms the black-box best-first search in nearly all settings.

View all citing articles on Scopus

View full text

A mixed-integer programming approach to multi-class data classification problem

Abstract

Introduction

Section snippets

Data classification approach

Problem formulation for multi-class data classification

Illustrative example

Evaluation of the method on IRIS data set

Conclusions and future work

European Journal of Operational Research

Discrete Applied Mathematics

Computer Vision and Graphics Image Understanding

Computers and Operational Research

Future Generation Computer Systems

Operations Research Letters

Pattern Recognition

Pattern Recognition

Computers and Chemical Engineering

European Journal of Operations Research

European Journal of Operational Research

An experimental comparison of statistical and linear programming approaches to the discriminant problem

Decision Sciences