Elsevier

Information Sciences

Volume 181, Issue 6, 15 March 2011, Pages 1080-1096
Information Sciences

The superiority of three-way decisions in probabilistic rough set models

https://doi.org/10.1016/j.ins.2010.11.019Get rights and content

Abstract

Three-way decisions provide a means for trading off different types of classification error in order to obtain a minimum cost ternary classifier. This paper compares probabilistic three-way decisions, probabilistic two-way decisions, and qualitative three-way decisions of the standard rough set model. It is shown that, under certain conditions when considering the costs of different types of miss-classifications, probabilistic three-way decisions are superior to the other two.

Introduction

In the classical view of concepts, a concept is interpreted by a pair of intension and extension [26], [35]. The extension consists of the instances to which the concept applies; the intension is a set of singly necessary and jointly sufficient conditions that describe the instances of the concept. The intension of a concept may be formally defined by a logical formula in a logic language. The theory of rough sets provides a method for approximating a concept whose extension is a subset of a finite and nonempty universal set and whose intension may not necessarily be expressed by a logic formula with respect to a particular logic language [12], [13], [15]. The building blocks of the rough set approximations or classifications are equivalence classes, representing elementary concepts whose intensions can be expressed by logic formulas in the logic language [13], [32]. For a given concept, its rough set approximation or classification is three pair-wise disjoint regions that are definable by logic formulas, namely, the positive, the boundary and the negative regions of the concept.

Researchers raise a question regarding the rigidness of Pawlak rough set approximations, namely, “the classification must be fully correct or certain” [42]. For example, an equivalence class is in the positive region if and only if it is fully contained in the set. To resolve this problem, probabilistic rough set models have been proposed and studied as generalizations of Pawlak rough sets [16], [19], [21], [28], [29], [31], [34], [37], [38], [43]. Mathematically, one may introduce a probability function on an σ-algebra of subsets of a universal set to construct a probabilistic approximation space, within which relationships between concepts can be defined in probabilistic terms [16], [28], [29], [37], [38]. Operationally, one may estimate the conditional probability of a set given an equivalence class based on the cardinality of sets [14], [15], [24], [25], [34], [42], [43]. With probabilistic information, an equivalence class is in the probabilistic positive region if and only if an element in the equivalence class has a high probability (i.e., greater than or equal to a threshold) to be in the set. Thus, probabilistic rough set classification is not fully correct nor certain, but with a certain tolerance level of error [36].

Several claims have been made and accepted by rough set researchers, without a careful articulation based on conclusive evidence. The first claim concerns the superiority of probabilistic models. The acceptance of probabilistic rough sets is merely due to the fact that they are defined by using probabilistic information and, hence, are more general and flexible. There is a lack of justifications for, nor systematically investigations on, this superiority. The second claim is that the introduction of probability enables the models to treat the universe of objects as samples from a much larger universe [42]. However, there is hardly any study on this topic. The third claim is that probabilistic models are insensitive to noises. Again, there is no study on what exactly is noise and how the models handle noise. In summary, although these claims are intuitively appealing and plausible, they are not fully substantialized, nor supported by sufficient evidence. For the future developments and applications of probabilistic rough set models, one must establish the validity of these claims.

The main objective of this paper is to investigate the first claim. We try to answer an important question: when is a probabilistic model superior? We provide an answer to this question by making use of results from the decision-theoretic rough set models [31], [34], [37], [38]. A probabilistic rough set model represents a compromise between two extreme classification models, one is the Pawlak ternary classification model and the other is the widely used probabilistic binary classification model. A binary classifier is based on a two-way decision of acceptance and rejection and may produce two types of error, namely, incorrect acceptance and incorrect rejection. A ternary classifier is based on a three-way decision of acceptance, rejection and deferment. Two additional types of error, namely, deferment of positive and deferment of negative, are introduced. The main differences between the three models stem from the trade-offs among different types of error. When the costs of different types of miss-classifications are considered, under certain conditions probabilistic three-way decisions are superior to the other two. Although the degrees of the classification accuracy and the rates of different types of error of a probabilistic rough set model lie between these of the other two models at both the micro (i.e., individual rules) and the macro (i.e., entire system) levels [40], the associated costs of a three-way probabilistic rough set model are always less than that of the other two models.

The rest of the paper is organized as follows. In Section 2, we review the main ideas of rough set approximations and the associated three-way decisions. In Section 3, we introduce the notions of probabilistic two-way and three-way decisions for binary and ternary classifications, respectively. In Section 4, we provide a detailed analysis of the three models, the Pawlak model, a probabilistic two-way model and a probabilistic three-way model. The results show that, under certain conditions, a probabilistic rough set model, namely, the three-way model, is superior to other two models. Some concluding remarks are given in Section 5.

Section snippets

Rough set approximations and three-way decisions

Suppose U is a finite and nonempty universe of objects. Consider an equivalence relation E  U × U, representing relationships between objects in U. Practically, an equivalence relation can be defined based on a set of attributes in an information table so that two objects are equivalent if and only if they have the same value on every attribute [13]. The equivalence relation induces a partition of the universe, denoted by U/E. The equivalence class containing an object x is given by [x]E = {yy  U, x

Probabilistic two-way and three-way decisions

The Bayesian decision theory deals with making decisions with minimum risk based on observed evidence. For approximating a concept, represented by a subset of a universe of objects, one can apply the Bayesian decision procedure to derive a two-way decision model (i.e., a binary classifier) and a three-way decision model (i.e., a ternary classifier [10]). The former is widely used in many classification methods [2], and the latter is known as the decision-theoretic rough set model [33], [37],

Comparisons of the three models

According to the discussion of the last section, conditions (c1) and (c2) imply that a three-way decision model is different from both the Pawlak model and a two-way model. In this section, we provide a detailed comparison of the three models, in terms of error rates and costs, to further substantialize the claim that a probabilistic three-way model is indeed superior at both micro and macro levels [40].

For notational simplicity, we sometimes use the same symbols for all three models by

Conclusion

The superiority of probabilistic rough set models was assumed and has been accepted by many researchers. This paper provides arguments and justification for the claim of the superiority by comparing three models, the Pawlak rough set model, a probabilistic rough set model (called a three-way model), and a two-way probabilistic model. At both micro and macro levels, it is shown that a three-way model provides a good compromise between the two extreme models. The Pawlak model offers insufficient

Acknowledgements

This work is partially supported by a Discovery Grant from NSERC Canada. The author is grateful to constructive comments from three reviewers.

References (43)

  • S.K.M. Wong et al.

    Comparison of the probabilistic approximate classification and the fuzzy set model

    Fuzzy Sets and Systems

    (1987)
  • Y.Y. Yao

    Probabilistic rough set approximations

    International Journal of Approximation Reasoning

    (2008)
  • Y.Y. Yao

    Three-way decisions with probabilistic rough sets

    Information Sciences

    (2010)
  • Y.Y. Yao et al.

    A decision theoretic framework for approximating concepts

    International Journal of Man-machine Studies

    (1992)
  • Y.Y. Yao et al.

    Attribute reduction in decision-teoretic rough set models

    Information Sciences

    (2008)
  • W. Ziarko

    Variable precision rough set model

    Journal of Computer and System Sciences

    (1993)
  • W. Ziarko

    Probabilistic approach to rough sets

    International Journal of Approximate Reasoning

    (2008)
  • M.M.E. Abd El-Monsef et al.

    Decision analysis via granulation basedon general binary relation

    International Journal of Mathematics and Mathematical Sciences

    (2007)
  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • T. Fawcett, ROC Graphs: Notes and Practical Considerations for Researchers,...
  • S. Greco et al.

    Parameterized rough set model using rough membership and Bayesian confirmation measures

    International Journal of Approximate Reasoning

    (2007)
  • Cited by (0)

    View full text