Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets
Introduction
The k-nearest neighbors classifier (kNN) [16] is one of the most popular supervised learning methods. It is a nonparametric method which does not rely on building a model during the training phase, and whose classification rule is based on a given similarity function between the training instances and the test instance to be classified. Since its definition, kNN has become one of most relevant algorithms in data mining [42], and it is an integral part of many applications of machine learning in various domains [35], [39].
In nearest neighbor classification, fuzzy sets can be used to model the degree of membership of each instance to the classes of the problem. This approach, known as the fuzzy k-nearest neighbor (fuzzy-kNN) classifier [31], has been shown to be an effective improvement of kNN.
This fuzzy approach overcomes a drawback of the kNN classifier, in which equal importance is given to every instance in the decision rule, regardless of its typicalness as a class prototype and its distance to the pattern to be classified. Fuzzy memberships enable fuzzy-kNN to achieve higher accuracy rates in most classification problems. This is also the reason why it has been the preferred choice in several applications in medicine [9], [12], economy [11], bioinformatics [30], industry [33] and many other fields.
The definition of fuzzy memberships is a fundamental issue in fuzzy-kNN. Although they can be set through expert knowledge, or by analyzing local data around each instance (as in [31] or [44]), there may be still a lack of knowledge associated with the assignation of a single value to the membership. This is caused by the necessity of fixing in advance two parameters: kInit in the definition of the initial membership values and m in the computation of the votes of the neighbors.
To overcome this difficulty, interval valued fuzzy sets (IVFSs) [4], [26], a particular case of type-2 fuzzy sets [5], [34], may be used. IVFSs allow membership values to be defined by using a lower and an upper bound. The interval based definition includes not only a greater degree of flexibility than just using a single value, but also enables us to measure the degree of ignorance with the length of the interval [8], [20]. Following this approach, IVFSs have been successfully applied in the development of fuzzy systems for classification [36], [37], [38]. In the case of nearest neighbor classification, this enables the representation of the uncertainty associated with the true class (or classes) to which every instance belongs, in the context of most standard, supervised classification problems.
The optimization capabilities of evolutionary algorithms can also help to overcome this issue. In recent years, they have become a very useful tool in the design of fuzzy learning systems. For example, genetic fuzzy systems [14], [15] show how the incorporation of evolutionary algorithms allows to enhance the performance of the learning model through parameter adjustment. Nearest neighbor classifiers’ performance is also prone to be improved by the use of evolutionary algorithms [10], [18].
Considering the aforementioned issue, in this paper we propose an evolutionary fuzzy k-nearest neighbors classifier using interval-valued fuzzy sets (EF-kNN-IVFS). On the one hand, it tackles the problem of setting up the parameters via the implementation of interval values to represent both the membership of each training instance to the classes and the votes cast by each neighbor in the decision rule. The introduction of intervals in this approach allows us to consider different values for the kInit and m parameters, obtaining as a result different degrees of membership per each training instance. On the other hand, the implementation of evolutionary in the model would enable us to optimize the selection of both parameters, thus improving the accuracy of the whole classifier algorithm. Specifically, it is proposed to use evolutionary algorithms to develop an automatic method, driven by the CHC evolutionary algorithm [21], for optimizing the procedure to build the intervals in the interval-valued model and, following a wrapper based approach, to adapt the intervals to the specific chosen data set.
The methodology developed in [19] for the field of fuzzy nearest neighbor classification is followed to carry out an experimental study to compare the EF-kNN-IVFS and various advanced fuzzy nearest neighbor classifiers. In this study, the classification accuracy is tested over several well-known classification problems. The results are contrasted using nonparametric statistical procedures, validating the conclusions drawn from them.
The rest of the paper is organized as follows. Section 2 describes the kNN and fuzzy-kNN classifiers, highlighting the enhancements to the former introduced by the latter. Section 3 presents the EF-kNN-IVFS model, as a natural extension of fuzzy-kNN. Section 4 is devoted to the experimental study and the analysis of its results. Finally, conclusions are drawn in Section 5.
Section snippets
kNN and fuzzy-kNN classifiers
The kNN and fuzzy-kNN classifiers require to measure the similarity of a new query instance (the new instance to be classified) to the instances stored in the training set. In the next step, a set of k nearest neighbors is found. Every neighbor casts a vote on the class to which the query instance should be assigned. Finally, a class is assigned to the query instance by combining these votes.
The above procedure can be formally described as follows: let X be a training set, composed of N
EF-kNN-IVFS: evolutionary fuzzy k-nearest neighbors classifier using interval-valued fuzzy sets
EF-kNN-IVFS is proposed to tackle the problem of membership assignation through the introduction of IVFS and evolutionary algorithms. As a consequence, the membership values of every instance in the training set are represented as an array of intervals, depicting a more flexible representation of the typicalness of the instances in every class of the problem. Intervals are also considered in the computation of the votes cast by each of the k nearest neighbors in the decision rule. Using this
Experimental study
An experimental study has been carried out to test the performance of EF-kNN-IVFS. The experiments will involve several well-known classification problems and various state of the art algorithms in fuzzy nearest neighbor classification, chosen according to the revision presented in [19]. Section 4.1 describes the experimental framework in which all the experiments have been carried out. Section 4.2 provides a description on the study performed for choosing the right order operator for comparing
Conclusion
In this paper we have proposed a new evolutionary interval-valued nearest neighbor classifier, EF-kNN-IVFS. IVFS are chosen as an appropriate tool for representing the instances’ memberships to the different classes of the problem. They also enable our classifier to represent several votes as a single interval, thus giving more flexibility to the decision rule computation, and ultimately, improving the generalization capabilities of the nearest neighbor rule. The evolutionary optimization
Acknowledgments
This work was supported by the Spanish Ministry of Science and Technology (Project TIN2014-57251-P) and by the Andalusian Government (Junta de Andalucía - Regional Projects P10-TIC-06858 and P11-TIC-7765).
References (44)
Indicator of inclusion grade for interval-valued fuzzy sets. Application to approximate reasoning based on interval-valued fuzzy sets
Int. J. Approx. Reason.
(2000)- et al.
A new approach to interval-valued Choquet integrals and the problem of ordering in interval-valued fuzzy sets applications
IEEE Trans. Fuzzy Syst.
(2013) - et al.
Fuzzy k-nearest neighbor classifiers for ventricular arrhythmia detection
Int. J. Biomed. Comput.
(1991) - et al.
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Trans. Evol. Comput.
(2003) - et al.
Diagnosis of diabetes diseases using an artificial immune recognition system2 (AIRS2) with fuzzy k-nearest neighbor
J. Med. Syst.
(2012) - et al.
Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases
Advances in Fuzzy Systems—Applications and Theory
(2001) - et al.
Fuzzy nearest neighbor algorithms: taxonomy, experimental analysis and prospects
Inf. Sci.
(2014) - et al.
An introduction to bipolar representations of information and preference
Int. J. Intell. Syst.
(2008) - et al.
Real-coded genetic algorithms and interval-schemata
Proceedings of the Second Workshop on Foundations of Genetic Algorithms. Vail, Colorado, USA, July 26–29
(1992) - et al.
An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons
J. Mach. Learn. Res.
(2008)
Applied Fuzzy Arithmetic. An Introduction with Engineering Applications
A genetic tuning to improve the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets: degree of ignorance and lateral position
Int. J. Approx. Reason.
IVTURS: a linguistic fuzzy rule-based classification system based on a new interval-valued fuzzy reasoning method with tuning and rule selection
IEEE Trans. Fuzzy Syst.
Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework
J. Multiple Valued Logic Soft Comput.
KEEL: a software tool to assess evolutionary algorithms for data mining problems
Soft Comput.
Pruned fuzzy k-nearest neighbor classifier for beat classification
J. Biomed. Sci. Eng.
Interval type-2 fuzzy sets are generalization of interval-valued fuzzy sets: towards a wider view on their relationship
IEEE Trans. Fuzzy Syst.
Generation of linear orders for intervals by means of aggregation functions
Fuzzy Sets Syst.
Ignorance functions. An application to the calculation of the threshold in prostate ultrasound images
Fuzzy Sets Syst.
A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method
Knowl. Based Syst.
An interval type-2 fuzzy k-nearest neighbor
Proceedings of the 12th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’03), St. Louis, Missouri, USA, May 25–28 (2003)
Ten years of genetic fuzzy systems: current framework and new trends
Fuzzy Sets Syst.
Cited by (63)
Generated admissible orders for intervals by matrices and continuous functions
2024, Information SciencesA fast belief rule base generation and reduction method for classification problems
2023, International Journal of Approximate ReasoningInterval–valued fuzzy and intuitionistic fuzzy–KNN for imbalanced data classification
2021, Expert Systems with ApplicationsMetaheuristic Search Based Feature Selection Methods for Classification of Cancer
2021, Pattern RecognitionDigital inspection approach of overlapped peaks due to high counting rates in neutron spectroscopy
2021, Progress in Nuclear EnergyCitation Excerpt :As a result, fuzzy-KNN classifier is applied with many approaches for precision improvement. It is localized within best choices (Derrac et al., 2016). The fuzzy sets for KNN classifier may be applied in order to model degree of membership for every instance of the classes of neutron peaks.
Deep hybrid neural-like P systems for multiorgan segmentation in head and neck CT/MR images
2021, Expert Systems with Applications