Elsevier

Information Sciences

Volume 329, 1 February 2016, Pages 144-163
Information Sciences

Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets

https://doi.org/10.1016/j.ins.2015.09.007Get rights and content

Highlights

  • EF-kNN-IVFS, a new fuzzy nearest neighbor classification algorithm based on interval-valued fuzzy sets and evolutionary algorithms is presented.

  • Interval-valued fuzzy sets provide a way of representing several configurations for the parameters of fuzzyKNN.

  • Those configurations are set up in an adaptive way: an evolutionary method (CHC) searches for the best possible configuration according to the training data available.

  • An extensive experimental study demonstrates the good behavior of EF-kNN-IVFS, when compared with other algorithms of the state of the art.

Abstract

One of the most known and effective methods in supervised classification is the k-nearest neighbors classifier. Several approaches have been proposed to enhance its precision, with the fuzzy k-nearest neighbors (fuzzy-kNN) classifier being among the most successful ones. However, despite its good behavior, fuzzy-kNN lacks of a method for properly defining several mechanisms regarding the representation of the relationship between the instances and the classes of the classification problems. Such a method would be very desirable, since it would potentially lead to an improvement in the precision of the classifier.

In this work we present a new approach, evolutionary fuzzy k-nearest neighbors classifier using interval-valued fuzzy sets (EF-kNN-IVFS), incorporating interval-valued fuzzy sets for computing the memberships of training instances in fuzzy-kNN. It is based on the representation of multiple choices of two key parameters of fuzzy-kNN: one is applied in the definition of the membership function, and the other is used in the computation of the voting rule. Besides, evolutionary search techniques are incorporated to the model as a self-optimization procedure for setting up these parameters. An experimental study has been carried out to assess the capabilities of our approach. The study has been validated by using nonparametric statistical tests, and remarks the strong performance of EF-kNN-IVFS compared with several state of the art techniques in fuzzy nearest neighbor classification.

Introduction

The k-nearest neighbors classifier (kNN) [16] is one of the most popular supervised learning methods. It is a nonparametric method which does not rely on building a model during the training phase, and whose classification rule is based on a given similarity function between the training instances and the test instance to be classified. Since its definition, kNN has become one of most relevant algorithms in data mining [42], and it is an integral part of many applications of machine learning in various domains [35], [39].

In nearest neighbor classification, fuzzy sets can be used to model the degree of membership of each instance to the classes of the problem. This approach, known as the fuzzy k-nearest neighbor (fuzzy-kNN) classifier [31], has been shown to be an effective improvement of kNN.

This fuzzy approach overcomes a drawback of the kNN classifier, in which equal importance is given to every instance in the decision rule, regardless of its typicalness as a class prototype and its distance to the pattern to be classified. Fuzzy memberships enable fuzzy-kNN to achieve higher accuracy rates in most classification problems. This is also the reason why it has been the preferred choice in several applications in medicine [9], [12], economy [11], bioinformatics [30], industry [33] and many other fields.

The definition of fuzzy memberships is a fundamental issue in fuzzy-kNN. Although they can be set through expert knowledge, or by analyzing local data around each instance (as in [31] or [44]), there may be still a lack of knowledge associated with the assignation of a single value to the membership. This is caused by the necessity of fixing in advance two parameters: kInit in the definition of the initial membership values and m in the computation of the votes of the neighbors.

To overcome this difficulty, interval valued fuzzy sets (IVFSs) [4], [26], a particular case of type-2 fuzzy sets [5], [34], may be used. IVFSs allow membership values to be defined by using a lower and an upper bound. The interval based definition includes not only a greater degree of flexibility than just using a single value, but also enables us to measure the degree of ignorance with the length of the interval [8], [20]. Following this approach, IVFSs have been successfully applied in the development of fuzzy systems for classification [36], [37], [38]. In the case of nearest neighbor classification, this enables the representation of the uncertainty associated with the true class (or classes) to which every instance belongs, in the context of most standard, supervised classification problems.

The optimization capabilities of evolutionary algorithms can also help to overcome this issue. In recent years, they have become a very useful tool in the design of fuzzy learning systems. For example, genetic fuzzy systems [14], [15] show how the incorporation of evolutionary algorithms allows to enhance the performance of the learning model through parameter adjustment. Nearest neighbor classifiers’ performance is also prone to be improved by the use of evolutionary algorithms [10], [18].

Considering the aforementioned issue, in this paper we propose an evolutionary fuzzy k-nearest neighbors classifier using interval-valued fuzzy sets (EF-kNN-IVFS). On the one hand, it tackles the problem of setting up the parameters via the implementation of interval values to represent both the membership of each training instance to the classes and the votes cast by each neighbor in the decision rule. The introduction of intervals in this approach allows us to consider different values for the kInit and m parameters, obtaining as a result different degrees of membership per each training instance. On the other hand, the implementation of evolutionary in the model would enable us to optimize the selection of both parameters, thus improving the accuracy of the whole classifier algorithm. Specifically, it is proposed to use evolutionary algorithms to develop an automatic method, driven by the CHC evolutionary algorithm [21], for optimizing the procedure to build the intervals in the interval-valued model and, following a wrapper based approach, to adapt the intervals to the specific chosen data set.

The methodology developed in [19] for the field of fuzzy nearest neighbor classification is followed to carry out an experimental study to compare the EF-kNN-IVFS and various advanced fuzzy nearest neighbor classifiers. In this study, the classification accuracy is tested over several well-known classification problems. The results are contrasted using nonparametric statistical procedures, validating the conclusions drawn from them.

The rest of the paper is organized as follows. Section 2 describes the kNN and fuzzy-kNN classifiers, highlighting the enhancements to the former introduced by the latter. Section 3 presents the EF-kNN-IVFS model, as a natural extension of fuzzy-kNN. Section 4 is devoted to the experimental study and the analysis of its results. Finally, conclusions are drawn in Section 5.

Section snippets

kNN and fuzzy-kNN classifiers

The kNN and fuzzy-kNN classifiers require to measure the similarity of a new query instance (the new instance to be classified) to the instances stored in the training set. In the next step, a set of k nearest neighbors is found. Every neighbor casts a vote on the class to which the query instance should be assigned. Finally, a class is assigned to the query instance by combining these votes.

The above procedure can be formally described as follows: let X be a training set, composed of N

EF-kNN-IVFS: evolutionary fuzzy k-nearest neighbors classifier using interval-valued fuzzy sets

EF-kNN-IVFS is proposed to tackle the problem of membership assignation through the introduction of IVFS and evolutionary algorithms. As a consequence, the membership values of every instance in the training set are represented as an array of intervals, depicting a more flexible representation of the typicalness of the instances in every class of the problem. Intervals are also considered in the computation of the votes cast by each of the k nearest neighbors in the decision rule. Using this

Experimental study

An experimental study has been carried out to test the performance of EF-kNN-IVFS. The experiments will involve several well-known classification problems and various state of the art algorithms in fuzzy nearest neighbor classification, chosen according to the revision presented in [19]. Section 4.1 describes the experimental framework in which all the experiments have been carried out. Section 4.2 provides a description on the study performed for choosing the right order operator for comparing

Conclusion

In this paper we have proposed a new evolutionary interval-valued nearest neighbor classifier, EF-kNN-IVFS. IVFS are chosen as an appropriate tool for representing the instances’ memberships to the different classes of the problem. They also enable our classifier to represent several votes as a single interval, thus giving more flexibility to the decision rule computation, and ultimately, improving the generalization capabilities of the nearest neighbor rule. The evolutionary optimization

Acknowledgments

This work was supported by the Spanish Ministry of Science and Technology (Project TIN2014-57251-P) and by the Andalusian Government (Junta de Andalucía - Regional Projects P10-TIC-06858 and P11-TIC-7765).

References (44)

  • M. Hanss

    Applied Fuzzy Arithmetic. An Introduction with Engineering Applications

    (2005)
  • J.A. Sanz et al.

    A genetic tuning to improve the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets: degree of ignorance and lateral position

    Int. J. Approx. Reason.

    (2011)
  • J.A. Sanz et al.

    IVTURS: a linguistic fuzzy rule-based classification system based on a new interval-valued fuzzy reasoning method with tuning and rule selection

    IEEE Trans. Fuzzy Syst.

    (2013)
  • J. Alcalá-Fdez et al.

    Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework

    J. Multiple Valued Logic Soft Comput.

    (2011)
  • J. Alcalá-Fdez et al.

    KEEL: a software tool to assess evolutionary algorithms for data mining problems

    Soft Comput.

    (2008)
  • M. Arif et al.

    Pruned fuzzy k-nearest neighbor classifier for beat classification

    J. Biomed. Sci. Eng.

    (2010)
  • H. Bustince et al.

    Interval type-2 fuzzy sets are generalization of interval-valued fuzzy sets: towards a wider view on their relationship

    IEEE Trans. Fuzzy Syst.

    (2015)
  • H. Bustince et al.

    Generation of linear orders for intervals by means of aggregation functions

    Fuzzy Sets Syst.

    (2013)
  • H. Bustince et al.

    Ignorance functions. An application to the calculation of the threshold in prostate ultrasound images

    Fuzzy Sets Syst.

    (2010)
  • H.-L. Chen et al.

    A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method

    Knowl. Based Syst.

    (2011)
  • F. Chung-Hoon et al.

    An interval type-2 fuzzy k-nearest neighbor

    Proceedings of the 12th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’03), St. Louis, Missouri, USA, May 25–28 (2003)

    (2003)
  • O. Cordón et al.

    Ten years of genetic fuzzy systems: current framework and new trends

    Fuzzy Sets Syst.

    (2004)
  • Cited by (63)

    • Digital inspection approach of overlapped peaks due to high counting rates in neutron spectroscopy

      2021, Progress in Nuclear Energy
      Citation Excerpt :

      As a result, fuzzy-KNN classifier is applied with many approaches for precision improvement. It is localized within best choices (Derrac et al., 2016). The fuzzy sets for KNN classifier may be applied in order to model degree of membership for every instance of the classes of neutron peaks.

    View all citing articles on Scopus
    View full text