Elsevier

Applied Soft Computing

Volume 20, July 2014, Pages 103-111
Applied Soft Computing

Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system

https://doi.org/10.1016/j.asoc.2013.11.009Get rights and content

Highlights

  • The diagnosis of cardiovascular diseases (CVDs) is faced using a linguistic fuzzy rule-based classification system.

  • Interval-valued fuzzy sets (IVFSs) are used to model the ignorance degree associated with the definition of membership functions.

  • The Kα operator is used to exploit the extra information provided by the IVFSs.

  • An evolutionary approach is applied to adapt the IVFSs to the CVD problem.

  • Both patients and health institutions are benefited from the application of the new methodology.

Abstract

Objective

To develop a classifier that tackles the problem of determining the risk of a patient of suffering from a cardiovascular disease within the next 10 years. The system has to provide both a diagnosis and an interpretable model explaining the decision. In this way, doctors are able to analyse the usefulness of the information given by the system.

Methods

Linguistic fuzzy rule-based classification systems are used, since they provide a good classification rate and a highly interpretable model. More specifically, a new methodology to combine fuzzy rule-based classification systems with interval-valued fuzzy sets is proposed, which is composed of three steps: (1) the modelling of the linguistic labels of the classifier using interval-valued fuzzy sets; (2) the use of the Kα operator in the inference process and (3) the application of a genetic tuning to find the best ignorance degree that each interval-valued fuzzy set represents as well as the best value for the parameter α of the Kα operator in each rule.

Results

The suitability of the new proposal to deal with this medical diagnosis classification problem is shown by comparing its performance with respect to the one provided by two classical fuzzy classifiers and a previous interval-valued fuzzy rule-based classification system. The performance of the new method is statistically better than the ones obtained with the methods considered in the comparison. The new proposal enhances both the total number of correctly diagnosed patients, around 3% with respect the classical fuzzy classifiers and around 1% vs. the previous interval-valued fuzzy classifier, and the classifier ability to correctly differentiate patients of the different risk categories.

Conclusion

The proposed methodology is a suitable tool to face the medical diagnosis of cardiovascular diseases, since it obtains a good classification rate and it also provides an interpretable model that can be easily understood by the doctors.

Introduction

Cardio vascular diseases (CVDs) affect the heart and they are usually caused by some disorder that hinders the blood flow. These diseases imply a high risk of suffering from severe illness like heart attacks or thrombosis among others. They are the main health problem in adult population provoking a high death rate in many developed countries [1]. Therefore, it is important to obtain an early diagnosis of the risk of suffering from such diseases so as to start a proper medical treatment to reduce the chances of developing them.

In order to estimate such risk, Spanish doctors look up specific tables called REGICOR [2]. These tables consider different variables like gender, age, presence or absence of diabetes, systolic and diastolic blood pressure, total cholesterol and HDL cholesterol values, among others. The value provided by this procedure quantifies the risk of the patient of suffering from a CVD during the next ten years. In this manner, different categories of patients according to this value can be established. Hence, the problem of estimating the patients’ risk category can be considered as a classification problem.

Fuzzy rule-based classification systems (FRBCSs) [3] are a useful tool to face classification problems. These systems are widely used because of their good performance and their capability to build an interpretable model which uses common linguistic terms for the user in the problem domain. Moreover, they offer the possibility of mixing information coming from different sources, i.e. expert knowledge, mathematical models or empirical measures. For this reason, FRBCSs are suitable to deal with medical diagnosis problems since, besides from providing the patients diagnosis, doctors can know the reasoning behind the decision by looking the rule or set of rules involved in the final classification. FRBCSs use fuzzy logic [4] in order to model the linguistic terms used by the system. A key step for the subsequent success of fuzzy systems is the definition of the membership functions representing the problem information as well as possible. Sometimes, it is really difficult to determine the membership functions because the same concept can be defined in different ways by different persons [5]. This problem led Zadeh to suggest the notion of type-2 fuzzy sets [6] as an extension of fuzzy sets [4]. A particular case of type-2 fuzzy sets are the Interval-Valued Fuzzy Sets (IVFSs) [7] that assign as membership degree of the elements to the set an interval instead of a number. IVFSs allow the system uncertainties to be modelled whereas their computational effort is less than the one demanded by the use of type-2 fuzzy sets.

In Ref. [8], the authors proposed an interval-valued FRBCS (IV-FRBCS), that is, a FRBCS whose linguistic labels are modelled with IVFSs enhancing the performance of classical FRBCSs. In this manner, the inherent ignorance related to the definition of the membership functions was modelled by means of the IVFSs. Then, the shape of every IVFS considered in the system was optimized by using an evolutionary tuning approach. Furthermore, an Interval-valued fuzzy reasoning method (IV-FRM) was proposed, where the first two steps, namely, the computation of the matching and the association degrees for each rule of the FRBCS, used IVFSs. In order to apply the remainder of the method as in the classical FRM [9], a number was given as a result of the association degree. To compute it, the two values associated with the lower and upper bounds of the intervals were averaged, which may cause that the system does not make the most of the interval information.

In this paper, in order to handle the interval information in the IV-FRM, we introduce the Kα operator defined by Atanassov [10] to compute the association degree. In this manner, the information given by the IVFSs is exploited, since other values rather than the average one can be obtained. As a result of introducing the Kα operator, the values for the α parameters need to be found. In order to do so, we propose an evolutionary tuning to compute the best α value for each rule involved in the inference process and therefore, to provide the system with a new mechanism to take advantage of the extra information given by the IVFSs. In the experimental study, we will show that our new IV-FRBCS allows one to improve the behaviour of the previous approaches when predicting the risk of suffering a CVD and hence, it allows helping the primary care doctors. The new FRBCS will only use as inputs the physical values that can be measured directly by the doctor, i.e. gender, age, smoking condition, blood pressure and body mass index. The objective of the system is to provide the doctors with a quick and reliable estimation of the patients’ risk category, in such a way that they can make better decisions like deriving the patient to the secondary health centres (hospitals) or starting an appropriate treatment according to the patient's risk category if necessary.

In order to show the validity of our proposal, in the experimental study we will use two well-known FRBCSs, namely the Chi et al.'s method [11] and the Fuzzy Hybrid Genetics-Based Machine Learning (FH-GBML) algorithm [12]. We will study the behaviour of our new methodology with respect to both the initial FRBCSs and the previous IV-FRBCS [8]. To this end, we will consider the standard classification accuracy as well as the classification rate for each one of the three different CVD risk categories in which the patients can be classified. The paper is organized as follows: the problem of the CVDs is presented in Section 2. Next, the basic concepts of IVFSs and FRBCSs along with the description of the previous proposal to combine FRBCSs with IVFSs are given in Section 3. Then, in Section 4, we describe in detail both our new proposal to introduce the Kα operator in the IV-FRM and the genetic tuning of the parameter α. Section 5 shows the experimental framework along with the analysis of the obtained results. Finally, the main conclusions of this paper are drawn in Section 6.

Section snippets

Problem description

CVDs affect different parts of the body, mainly the heart and the arteries of the brain, heart and legs. Most of these diseases are induced by the decrease of either the calibre or the diameter of the arteries. The lack of blood supply does not only damage the heart but also the legs and the brain, which can lead to health disorders implying an increase of the risk of suffering from heart attacks, thrombosis or rupture of blood vessels, among others.

Among adult population, CVDs are the main

Interval-valued fuzzy rule-based classification systems

In this section, we first provide some preliminary concepts on both IVFSs and FRBCSs. Then, we describe the previous model that employs IVFSs to represent the linguistic labels of FRBCSs [8], which is the base for our new proposal.

On the use of the Kα operator in the interval-valued fuzzy reasoning method

This section defines our new proposal. It involves both the introduction of the Kα operator in the extended FRM with IVFSs recalled in Section 3.3 as well as the description of the genetic optimization process of the value of the parameter α.

As we have described in Section 3.3, in Ref. [8] the first two steps of the FRM are extended in order to be able to work with IVFSs. These two steps are the computation of the matching and the association degrees (see Eqs. (8), (9), respectively). In Eq. (9)

Experimental study

The experimental study aims to show the global improvement and the advantages of the application of our approach for both the patient and the health institution. To do so, we analyse the improvements achieved by the application of our new methodology with respect to the both initial IV-FRBCSs and the original FRBCS considered in this work.

We first describe the experimental framework and then, we analyse the achieved results on predicting the category of risk of the patients by studying the

Conclusion

In this paper, we have introduced the Kα operator in the IV-FRM of a previous IV-FRBCS [8] to provide the system with a mechanism to handle the extra information given by the IVFSs. In this manner, the performance of the previous IV-FRBCS is improved, since there is interval information that was not properly exploited when applying the mean between the values associated with the lower and the upper bounds. Furthermore, we have proposed a genetic tuning method that simultaneously modify both the

Acknowledgments

This work was partially supported by the Spanish Ministry of Science and Technology under project TIN2010-15055 and the Research Services of the Universidad Publica de Navarra.

References (43)

  • G. Gowrison et al.

    Minimal complexity attack classification intrusion detection system

    Applied Soft Computing

    (2013)
  • O. Cordón et al.

    Ten years of genetic fuzzy systems: current framework and new trends

    Fuzzy Sets and Systems

    (2004)
  • S.H. Ling et al.

    Natural occurrence of nocturnal hypoglycemia detection using hybrid particle swarm optimized fuzzy reasoning model

    Artificial Intelligence in Medicine

    (2012)
  • J. Yang et al.

    Channel selection and classification of electroencephalogram signals: An artificial neural network and genetic algorithm-based approach

    Artificial Intelligence in Medicine

    (2012)
  • A. Fernández et al.

    On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets

    Expert Systems with Applications

    (2009)
  • J. Sanz et al.

    A genetic tuning to improve the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets: Degree of ignorance and lateral position

    International Journal of Approximate Reasoning

    (2011)
  • L. Eshelman

    Foundations of Genetic Algorithms

    (1991)
  • M.C. Lee et al.

    Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction

    Artificial Intelligence in Medicine

    (2010)
  • J.A. Sáez et al.

    Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness

    Information Sciences

    (2013)
  • M. Galar et al.

    An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes

    Pattern Recognition

    (2011)
  • J. Sala et al.

    Improvement in survival after myocardial infarction between 1978–85 and 1986–88 in the regicor study

    European Heart Journal

    (1995)
  • Cited by (0)

    View full text