1 Introduction

Rough set theory is as an extension of set theory for the study of intelligent systems characterized by insufficient and incomplete information (Pawlak, 1991). It has attracted the attention of many researchers who have studied its theories and applications in recent years (Dai and Xu, 2012; 2013; Dai et al., 2012; 2013a; 2013c; Hu, 2015; Lin et al., 2015; Zhang X et al., 2015; Zhang XH et al., 2016).

The classical rough set model is not appropriate for handling interval-valued data. However, in real applications, many data are interval-valued (Billard et al., 2008; Hedjazi et al., 2011; Dai et al., 2012; 2013b). Thus, dealing with interval-valued information systems becomes an interesting problem in rough set theory (Leung et al., 2008; Qian et al., 2008; Yang et al., 2009). Dai (2008) investigated the algebraic structures for interval-set-valued rough sets generated from an approximation space. Recently, Dai et al. (2012; 2013b) studied the uncertainty measurement issue in interval-valued information systems. Leung et al. (2008) presented a rough set approach based on the misclassification rate for interval-valued information systems. Qian et al. (2008) proposed a dominance relation to ordered interval information systems. Yang et al. (2009) investigated a dominance relation in incomplete interval-valued information systems. However, there have been few studies dealing with the attribute reduction issue in interval-valued information systems. Dai et al. (2013b) considered this issue from the viewpoint of uncertainty measurement. In this study, we aim to introduce a heuristic approach for attribute reduction in interval-valued information systems based on information theory.

2 Basic concepts

2.1 Similarity degree between interval values

Unlike classical real values, it is difficult to compare two interval values using traditional methods. Motivated by this fact, some research efforts have been directed toward finding efficient methods to measure or rank two interval values, mainly in the fuzzy set community (Bustince et al., 2006; Zhang and Fu, 2006; Galar et al., 2011). Galar et al. (2011) defined the similarity between two intervals as an interval. Zhang and Fu (2006) defined the similarity between two intervals as a real number. In this study, motivated by a similarity measure for interval-valued fuzzy sets proposed by Zhang and Fu (2006), we give a similarity measure for general interval-valued data: Definition 1 Let \(U = \left\{ {{u_1},\,{u_2},\, \ldots ,\,{u_n}} \right\}\) be the universe of the interval values, and \({u_i} = \left[ {u_i^- ,u_i^+} \right],\;i = 1,2, \ldots ,n\). Let \({m^-} = {\min\nolimits_{{u_i} \in U}}\{ u_i^- \} ,\;\;\;{m^+} = {\max\nolimits_{{u_i} \in U}}\{ u_i^+ \}\). The relative bound difference similarity degree between u i and u j is defined as

$${v_{ij}} = 1 - {1 \over 2}{{\left| {u_i^ - - u_j^ - } \right| + \left| {u_i^ + - u_j^ + } \right|} \over {{m^ + } - {m^ - }}}.$$
((1))

Proposition 1 Note that the relative bound difference similarity degree u ij has the following properties:

  1. 1.

    0 ≤ υ ij ≤ 1;

  2. 2.

    υ ij = 1 if and only if u i equals u j ;

  3. 3.

    υ ij = υ ji

Proof It can be proved easily according to Definition 1.

Example 1 Assume U = {u1, u2}, u1 = [3, 4], u2 = [2, 5]. We have

$$\begin{array}{*{20}c} {{m^ - } = \min \{ 3,2\} = 2,\,{m^ + } = \max \{ 4,\,5\} = 5,} \\ {\left| {u_1^ - - u_2^ - } \right| = \left| {3 - 2} \right| = 1,\,\left| {u_1^ + - u_2^ + } \right| = \left| {4 - 5} \right| = 1,} \\ {{v_{ij}} = 1 - {1 \over 2} \times {{1 + 1} \over {5 - 2}} = {2 \over 3}.} \\ \end{array} $$

2.2 Similarity classes and generalized decisions in interval-valued information systems

Let IVIS = (U, A) denote an interval-valued information system, where \(U = \left\{ {{u_1},\,{u_2},\, \ldots ,\,{u_n}} \right\}\) is a non-empty finite set called the universe of discourse, and \(A = \left\{ {{a_1},\,{a_2},\, \ldots ,\,{a_m}} \right\}\) is a non-empty finite set of m attributes called conditional attributes.

Definition 2 Assume that IVIS = (U, A) is an interval-valued information system. For a given similarity rate α ∈ [0, 1] and an attribute subset BA, an α-similarity class of an object u i U is denoted as

$$S_B^\alpha ({u_i}) = \left\{ {\left. {{u_j}} \right|\,v_{ij}^\kappa > \,\alpha ,\,\forall {a_\kappa } \in B,\,{u_j} \in U\,} \right\},$$
((2))

where \(v_{ij}^\kappa \) represents the similarity degree of u i and u j at the κth attribute.

Remark 1 \(S_B^\alpha \) denotes the family set \(\left\{ {S_B^\alpha \left( {{u_i}} \right)\,\left| {\,{u_i} \in U} \right.} \right\}\).

\(S_B^\alpha \left( {{u_i}} \right)\) is the maximum set of objects which are possibly indiscernible with object u i by attribute set B under similarity rate α. In other words, \(S_B^\alpha \left( {{u_i}} \right)\) is an α-similarity class of u i .

Proposition 2 Given an interval-valued information system IVIS, and assuming that the attribute subset BA, then \(S_B^\alpha \) has the following properties for any u i U:

  1. 1.

    \(S_B^\alpha \left( {{u_i}} \right) = \bigcap\limits_{b \in B} {S_{\{ b\} }^\alpha \left( {{u_i}} \right);} \)

  2. 2.

    if CB, then \(S_B^\alpha \left( {{u_i}} \right) \subseteq \,S_C^\alpha \left( {{u_i}} \right)\).

Proof

  1. 1.

    By definition, we have \(S_B^\alpha ({u_i}) = \left\{ {{u_j}\vert v_{ij}^b > \alpha ,\;\forall b \in B,\;{u_j} \in U} \right\} = \bigcap\limits_{b \in B} {S_{\{ b\}}^\alpha ({u_i})}\)

  2. 2.

    By definition, we have \(S_B^\alpha ({u_i}) = \left\{ {\left. {{u_j}} \right\vert v_{ij}^b > \alpha ,\;\forall b \in B,{u_{ij}} \in U} \right\}\) and \(S_C^\alpha ({u_i}) = \left\{ {\left. {{u_j}} \right\vert v_{ij}^b > \alpha ,\;\forall b \in C,{u_j} \in U} \right\}\). For any \({u_j} \in S_B^\alpha ({u_i})\), we know \(v_{ij}^b > \alpha \) holds for any bB. Since CB, we know \(v_{ij}^b > \alpha \) holds for any bC. Hence, \({u_j} \in S_C^\alpha \left( {{u_i}} \right)\).

Let IVDS = (U, A ∪ {d}) denote an interval-valued decision system where d is the decision attribute, also called the class label.

Definition 3 Assume that IVDS = (U, A U{d}) is an interval-valued decision system. The decision class of an object u i U is denoted as

$$D\left( {{u_i}} \right) = \left\{ {\left. {{u_j}\,} \right|\,d\left( {{u_i}} \right) = d\left( {{u_j}} \right),\,\forall {u_j} \in U} \right\}\,.$$
((3))

Remark 2 We use \(U/d = \left\{ {\left. {{D_i}} \right\vert \;d({u_x}) = d({u_y})} \right.\), \(\left. {\forall \;{u_x},\;{u_y} \in {D_i}} \right\}\) to denote the partition of U based on the decision attribute d; i.e., D(u i ) represents the set of objects which contain the same decision attribute as u i .

Similar to the definition of generalized decision proposed by Kryszkiewicz (1998) for incomplete decision systems, a-generalized decision can be defined as follows:

Definition 4 The α-generalized decision of an object u i U is denoted as

$$\partial_B^\alpha ({u_i}) = \left\{ {\left. {d({u_j})} \right\vert u_{ij}^\kappa > \alpha ,\;\forall \;{a_\kappa} \in B,{u_{ij}} \in U} \right\}{.}$$
((4))

Remark 3 \(\partial_B^\alpha\) denotes the family set \(\{ \partial_B^\alpha ({u_i})\vert {u_i} \in U\}\).

Let \({\rm{IVDS}} = \left( {U,\;A \cup \{ d\}} \right)\) be an interval-valued decision system. If \(S_A^\alpha ({u_i}) \subseteq D({u_i})\) for any object u i U, then the interval-valued decision system {U, A U{d}} is called a consistent (deterministic, definite) interval-valued decision system. Otherwise, it is called an inconsistent (non-deterministic, non-definite) interval-valued decision system.

Proposition 3 Given a consistent decision system \({\rm{IVDS}} = \left( {U,\;A \cup \{ d\}} \right)\), for an attribute set BA, the following conditions are equivalent for all objects u i U:

  1. 1.

    \(S_B^\alpha ({u_i}) \subseteq \;D({u_i});\);

  2. 2.

    \(\left\vert {\partial_B^\alpha \left( {{u_i}} \right)} \right\vert = 1\).

Proof Suppose \(S_B^\alpha ({u_i}) \subseteq \;D({u_i})\). For any \({u_j} \in S_B^\alpha \left( {{u_i}} \right)\), we have u j D(u i ). It follows that d(u j ) = d(u i ). In other words, d(u j ) = d(u i ) holds for any u j satisfying \(v_{ij}^b > \alpha\), ∀ bB. Hence, we know \(\left\vert {\partial_B^\alpha \left( {{u_i}} \right)} \right\vert = 1\) by Definition 4.

Suppose \(\left\vert {\partial_B^\alpha \left( {{u_i}} \right)} \right\vert = 1\). For any \({u_j} \in S_B^\alpha \left( {{u_i}} \right)\), we have d(u i ) = d(u j ). It means u j D(u i ). Thus, we have \(S_B^\alpha \left( {{u_i}} \right) \subseteq D\left( {{u_i}} \right)\).

3 Information entropy and conditional entropy for interval-valued information systems

Let us introduce an information measure for the discernibility power of an α-similarity class denoted as \(S_B^\alpha \left( {{u_i}} \right)\), which is equivalent to Shannon’s entropy if \(S_B^\alpha \left( {{u_i}} \right)\) is a partition of U.

Definition 5 Given \({\rm{IVDS}} = (U,A \cup \{ d\} )\) and BA, the information entropy of B is defined as follows:

$$\begin{array}{*{20}c}{{H_{{\rm{SIM}}}}(B)} & { = - \sum\limits_{i = 1}^{\vert U\vert} {p\left( {S_B^\alpha ({u_i})} \right)} {\rm{log}}\;p\left( {S_B^\alpha ({u_i})} \right)} \\ {} & { = - \sum\limits_{i = 1}^{\vert U\vert} {{{\vert S_B^\alpha ({u_i})\vert} \over {\vert U\vert}}{\rm{log}}{{\vert S_B^\alpha ({u_i})\vert} \over {\vert U\vert}},\quad}} \end{array}$$
((5))

where |·| denotes the number of elements in the set.

At the same time, the conditional entropy of B to d is defined as follows:

$$\begin{array}{*{20}c}{{H_{{\rm{SIM}}}}(d{\vert}B) = - \sum\limits_{i = 1}^{\vert U\vert} {p\left( {S_B^\alpha ({u_i})} \right)} \quad \quad \quad \quad \quad \quad \quad \quad} \\ { \cdot \sum\limits_{j = 1}^{\vert U/d\vert} {p\left( {{D_j}{\vert}S_B^\alpha ({u_i})} \right)} {\rm{log}}\;p\left( {{D_j}{\vert}S_B^\alpha ({u_i})} \right)} \\ { = - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha ({u_i}) \cap {D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_B^\alpha ({u_i}) \cap {D_j}\vert} \over {\vert U\vert}}{.}\quad \quad \quad} \end{array}$$
((6))

The joint entropy of B and d is defined as follows:

$$\begin{array}{*{20}c}{{H_{{\rm{SIM}}}}(d \cup B)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ { = - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {p\left( {{D_j} \cap S_B^\alpha ({u_i})} \right)}} {\rm{log}}\;p\left( {{D_j} \cap S_B^\alpha ({u_i})} \right)} \\ { = - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha ({u_i}) \cap {D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_B^\alpha ({u_i}) \cap {D_j}\vert} \over {\vert U\vert}}{.}\quad \;\quad \;} \end{array}$$
((7))

Note that we define \({H_{{\rm{SIM}}}}(d\vert B) = 0\) when \(\vert S_B^\alpha ({u_i}) \cap {D_j}\vert = 0\)

Proposition 4 Let \({\rm{IVDS}} = (U,A \cap \{ d\} )\) be an interval-valued decision system. For the attribute subset BA, we have

  1. 1.

    \({H_{{\rm{SIM}}}}(B) \geq 0\);

  2. 2.

    \({H_{{\rm{SIM}}}}(d \cup B) = \max \left\{ {{H_{{\rm{SIM}}}}(d),{H_{{\rm{SIM}}}}(B)} \right\}\);

  3. 3.

    \({H_{{\rm{SIM}}}}(d\vert B) = 0\) and \({H_{{\rm{SIM}}}}(d \cup B) = {H_{{\rm{SIM}}}}(B)\) if and only if \(S_B^\alpha ({u_i}) \subseteq D({u_i})\), ∀ u i U;

  4. 4.

    \({H_{{\rm{SIM}}}}(d \cup B) = {H_{{\rm{SIM}}}}(d\vert B) + {H_{{\rm{SIM}}}}(B)\).

Proof

$$\begin{array}{*{20}c}{} & {{H_{{\rm{SIM}}}}(d\vert B) + {H_{{\rm{SIM}}}}(B)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert S_B^\alpha \left( {{u_i}} \right)\vert}}\quad \quad \quad \quad} \\ {} & { - \sum\limits_{i = 1}^{\vert U\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}}} \left( {{\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}} \right.\quad \quad \quad \quad} \\ {} & {\left. { - {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} \right)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {} & { - \sum\limits_{i = 1}^{\vert U\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}\quad \quad \quad \quad \;} \\ {} & { + \sum\limits_{i = 1}^{\vert U\vert} {\left( {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}} - {{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} \right)} \quad \quad \quad \quad \quad} \\ {} & { \cdot {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right) \cap {D_j}\vert} \over {\vert U\vert}}\quad \quad \quad \quad} \\ {} & { + \sum\limits_{i = 1}^{\vert U\vert} {\left( {{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}} - {{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} \right){\rm{log}}{{\vert S_B^\alpha \left( {{u_i}} \right)\vert} \over {\vert U\vert}}} \quad \quad \quad} \\ = & {{H_{{\rm{SIM}}}}(d \cup B){.}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \end{array}$$

Note that

$${H_{{\rm{SIM}}}}(d \cup B) \neq {H_{{\rm{SIM}}}}(B\vert d) + {H_{{\rm{SIM}}}}(d){.}$$

4 Attribute reduction framework for interval-valued information systems and interval-valued decision systems based on information entropies

One significant problem in rough set theory is searching for particular subsets of attributes which provide the same information for classification. It is also called attribute reduction.

4.1 Information theory view for attribute reduction of interval-valued information systems

Definition 6 Assume that IVIS = (U, A) is an interval-valued information system. If an attribute subset BA and α ∈ [0, 1], attribute subset B is a reduct of IVIS if and only if

  1. 1.

    \(S_B^\alpha = S_A^\alpha\);

  2. 2.

    \(\forall b \in B,\;S_{B - \{ b\}}^\alpha \neq S_A^\alpha\).

Definition 7 Let \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) be an interval-valued decision system. If attribute subset B ⊆ A and α ∈ [0, 1], attribute set B is a relative reduct of IVDS if and only if

  1. 1.

    \(\partial_B^\alpha = \partial_A^\alpha\);

  2. 2.

    \(\forall b \in B,\;\partial_{B - \{ b\}}^\alpha \neq \partial_A^\alpha\).

To construct our information theory based attribute reduction methods, we provide an information theory view for attribute reduction in interval-valued information systems.

Theorem 1 Assume that IVIS = (U, A) is an interval-valued information system. Then \(S_B^\alpha = S_A^\alpha\) and HSIM(B) = HSIM(A) are equivalent.

Proof

$$\begin{array}{*{20}c}{{H_{{\rm{SIM}}}}(B)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ { = - \sum\limits_{i = 1}^{\vert U\vert} {{{\vert S_B^\alpha ({u_i})\vert} \over {\vert U\vert}}} {\rm{log}}{{\vert S_B^\alpha ({u_i})\vert} \over {\vert U\vert}}\quad \quad \quad \quad \quad \;} \\ { = - \sum\limits_{i = 1}^{\vert U\vert} {{{\vert S_A^\alpha ({u_i})\vert} \over {\vert U\vert}}} {\rm{log}}{{\vert S_A^\alpha ({u_i})\vert} \over {\vert U\vert}} = {H_{{\rm{SIM}}}}(A){.}} \end{array}$$

Theorem 2 Let IVIS = (U, A) be an interval-valued information system. If C ⊆ A is redundant, then \({H_{{\rm{SIM}}}}(A) - {H_{{\rm{SIM}}}}(A - C) = 0\).

Proof If C is redundant in IVIS, then we know \(S_{\{ A - C\}}^\alpha = S_A^\alpha\) by Definition 6. It is easy to obtain \({H_{{\rm{SIM}}}}(A - C) = {H_{{\rm{SIM}}}}(A)\) by Theorem 1.

Theorem 3 Let IVIS = (U, A) be an interval-valued information system. Then CA is indispensable if and only if \({H_{{\rm{SIM}}}}(A) - {H_{{\rm{SIM}}}}(A - C) > 0\).

Proof Suppose the attribute subset C is indispensable. Then we have \(S_{\{ A - C\}}^\alpha \neq S_A^\alpha\). By Theorem 1, we have \({H_{{\rm{SIM}}}}(A - C) \neq {H_{{\rm{SIM}}}}(A)\). Since \({H_{{\rm{SIM}}}}(A - C) \leq {H_{{\rm{SIM}}}}(A)\), it follows that \({H_{{\rm{SIM}}}}(A) - {H_{{\rm{SIM}}}}(A - C) > 0\)

Definition 8 Assume that IVIS = (U, A) is an interval-valued information system. The significance of an attribute α i B relative to B is as follows:

$${\rm{Sig}}({a_i},B) = {H_{{\rm{SIM}}}}(B) - {H_{{\rm{SIM}}}}(B - \left\{ {{a_i}} \right\}){.}$$
((8))

Definition 9 If \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) is an interval-valued decision system, the significance of attribute a i relative to B is defined as

$${\rm{Sig}}({a_i},B,d) = {H_{{\rm{SIM}}}}(d\vert B - \left\{ {{a_i}} \right\}) - {H_{{\rm{SIM}}}}(d\vert B){.}$$
((9))

From the above theorems, we obtain the information theory view for attribute reduction in interval-valued information systems.

Theorem 4 Assume that IVIS = (U, A) is an interval-valued information system. The attribute subset BA is a reduct of A if and only if

  1. 1.

    HSIM(B) = HSIM(A);

  2. 2.

    bB, Sig(b, A) > 0.

Theorem 5 Given \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) and BA, if IVDS is consistent, then \(\partial_B^\alpha = \partial_A^\alpha\) and \({H_{{\rm{SIM}}}}(d\vert B) = {H_{{\rm{SIM}}}}(d + A)\) are equivalent.

Proof Assume \(\partial_B^\alpha = \partial_A^\alpha\). Since \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) is consistent, we have \(\left\vert {\partial_A^\alpha ({u_i})} \right\vert = 1\) and \(S_A^\alpha ({u_i}) \subseteq D({u_i})\) for all objects u i U. Hence, \(\left\vert {\partial_B^\alpha ({u_i})} \right\vert = 1\) and \(S_B^\alpha ({u_i}) \subseteq D({u_i})\) for all objects u i U. By Proposition 4, we have \(H(d\vert B) = 0\). Since \(H(d\vert A) = 0\), we have \(H(d\vert B) = H(d\vert A)\).

Suppose \(H(d\vert B) = H(d\vert A)\). Since IVDS is consistent, we have \(H(d\vert B) = 0\). Hence, \(S_B^\alpha ({u_i}) \subseteq D({u_i})\) for all objects u i U. Since \(S_A^\alpha ({u_i}) \subseteq D({u_i})\), we have \(\partial_B^\alpha = \partial_A^\alpha\). This completes the proof.

Theorem 6 Let \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) be a consistent interval-valued decision system. Then CA is dispensable if and only if \({H_{{\rm{SIM}}}}(d\vert A - C) = 0\).

Proof Suppose that \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) is a consistent interval-valued decision system and that C is a dispensable attribute subset. Then we have \(\partial_{A - C}^\alpha = \partial_A^\alpha\) by Definition 7. According to Theorem 5, we obtain \({H_{{\rm{SIM}}}}(d\vert A - C) = {H_{{\rm{SIM}}}}(d\vert A)\). Since IVDS is consistent, we have \({H_{{\rm{SIM}}}}(d\vert A) = 0\). Therefore, it follows that \({H_{{\rm{SIM}}}}(d\vert A - C) = 0\).

Theorem 7 Let \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) be a consistent interval-valued decision system. Then CA is indispensable if and only if \({H_{{\rm{SIM}}}}(d\vert A - C) > 0\).

Proof Assume that \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) is a consistent interval-valued decision system and that the attribute subset C is indispensable. Then we have \(\partial_{A - C}^\alpha \neq \partial_A^\alpha\). It is not difficult to obtain \({H_{{\rm{SIM}}}}(d\vert A - C) \neq {H_{{\rm{SIM}}}}(d\vert A)\). Then we have \({H_{{\rm{SIM}}}}(d\vert A - C) \geq {H_{{\rm{SIM}}}}(d\vert A)\) and HSIM(d | A) = 0, according to Proposition 4. Consequently, we have \({H_{{\rm{SIM}}}}(d\vert A - C) > 0\)

From the above theorems, we obtain the information theory view for attribute reduction in consistent interval-valued decision systems.

Theorem 8 Assume that \({\rm{IVDS}} = (U,A \cup \;\{ d\} )\) is a consistent interval-valued decision system. The attribute subset BA is a reduct of IVDS if and only if

  1. 1.

    \({H_{{\rm{SIM}}}}(d\vert B) = {H_{{\rm{SIM}}}}(d\vert A)\);

  2. 2.

    bB, Sig(b, B, d) > 0.

4.2 Attribute reduction algorithms for IVIS and IVDS

The attribute reduction algorithms for IVIS and IVDS are given in Algorithms 1 and 2, respectively.

Now we provide an example to illustrate the proposed method. Assume that an IVDS is as listed in Table 1.

Table 1 An example with interval values for attribute reduction

Example 2 Suppose the similarity rate α = 0.8 and A = {a, b, c}. We confirm that the IVDS shown in Table 1 is consistent when α = 0.8. Let us compute \(S_A^{0{.}8}\left( {{u_1}} \right)\) in detail. By Definition 1, we have

$$\begin{array}{*{20}c}{{v_a}\left( {{u_1},{u_1}} \right) = 1{.}0000,} & {{v_a}\left( {{u_1},{u_2}} \right) = 0{.}6448,} \\ {{v_a}\left( {{u_1},{u_3}} \right) = 0{.}9425,} & {{v_a}\left( {{u_1},{u_4}} \right) = 0{.}8207,} \\ {{v_a}\left( {{u_1},{u_5}} \right) = 0{.}5655,} & {{v_a}\left( {{u_1},{u_6}} \right) = 1{.}9701,} \\ {{v_a}\left( {{u_1},{u_7}} \right) = 0{.}8253,} & {{v_a}\left( {{u_1},{u_8}} \right) = 1{.}7345{.}} \end{array}$$

Then, we have

$$S_a^{0{.}8}\left( {{u_1}} \right) = \left\{ {{u_1},{u_3},{u_4},{u_6},{u_7}} \right\}{.}$$

Similarly, we have

$$\begin{array}{*{20}c}{S_b^{0{.}8}\left( {{u_1}} \right) = \left\{ {{u_1},{u_3},{u_6},{u_7}} \right\},} \\ {S_c^{0{.}8}\left( {{u_1}} \right) = \left\{ {{u_1},{u_6},{u_8}} \right\}{.}\quad \;} \end{array}$$

Thus, we obtain

$$\begin{array}{*{20}c}{S_A^{0{.}8}\left( {{u_1}} \right)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ { = S_a^{0{.}8}\left( {{u_1}} \right) \cap S_b^{0{.}8}\left( {{u_1}} \right) \cap S_c^{0{.}8}\left( {{u_1}} \right)\quad \quad \quad \quad \quad \quad \quad \;\quad \quad \quad} \\ { = \left\{ {{u_1},{u_3},{u_4},{u_6},{u_7}} \right\} \cap \left\{ {{u_1},{u_3},{u_6},{u_7}} \right\} \cap \left\{ {{u_1},{u_6},{u_8}} \right\}} \\ { = \left\{ {{u_1},{u_6}} \right\}{.}\;\;\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \end{array}$$

In the same way, we can compute \(S_A^{0{.}8}\left( {{u_i}} \right)\) for all objects u i U:

$$\begin{array}{*{20}c}{S_A^{0{.}8}\left( {{u_1}} \right) = \left\{ {{u_1},{u_6}} \right\},} & {S_A^{0{.}8}\left( {{u_5}} \right) = \left\{ {{u_2},{u_5}} \right\},} \\ {S_A^{0{.}8}\left( {{u_2}} \right) = \left\{ {{u_2},{u_5}} \right\},} & {S_A^{0{.}8}\left( {{u_6}} \right) = \left\{ {{u_1},{u_6}} \right\},} \\ {S_A^{0{.}8}\left( {{u_3}} \right) = \left\{ {{u_3},{u_6}} \right\},} & {S_A^{0{.}8}\left( {{u_7}} \right) = \left\{ {{u_7}} \right\},\quad \quad} \\ {S_A^{0{.}8}\left( {{u_4}} \right) = \left\{ {{u_4},{u_8}} \right\},} & {S_A^{0{.}8}\left( {{u_8}} \right) = \left\{ {{u_4},{u_8}} \right\}{.}} \end{array}$$

By decision attribute d, we have the partition

$$U/d = \left\{ {\left\{ {{u_1},{u_6}} \right\},\left\{ {{u_3}} \right\},\left\{ {{u_2},{u_5},{u_7}} \right\},\left\{ {{u_4},{u_8}} \right\}} \right\}{.}$$

The decision classes based on the decision attribute are also listed as follows:

$$\begin{array}{*{20}c}{{D_1} = \left\{ {{u_1},{u_6}} \right\},\quad \quad} & {{D_3} = \left\{ {{u_3}} \right\},\quad \quad} \\ {{D_2} = \left\{ {{u_2},{u_5},{u_7}} \right\},} & {{D_4} = \left\{ {{u_4},{u_8}} \right\}{.}} \end{array}$$

First, an empty set is assigned to the original attribute reduction set B, i.e., B = ∅, and the conditional entropy for each single attribute a, b, or c can be computed.

By definition, we have

$$\begin{array}{*{20}c}{} & {{H_{{\rm{SIM}}}}(d\vert a)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {p\left( {S_a^\alpha ({u_i})} \right)} \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {} & { \cdot \sum\limits_{j = 1}^{\vert U/d\vert} {p\left( {{D_j}\vert S_a^\alpha ({u_i})} \right)} \;{\rm{log}}\;p\left( {{D_j}\vert S_a^\alpha ({u_i})} \right)\quad \quad \quad} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\sum\limits_{j = 1}^{\vert U/d\vert} {{{\vert S_a^\alpha ({u_i})\; \cap \;{D_j}\vert} \over {\vert U\vert}}}} {\rm{log}}{{\vert S_a^\alpha ({u_i})\; \cap \;{D_j}\vert} \over {\vert S_a^\alpha ({u_i})\vert}}\quad \quad} \\ = & { - \sum\limits_{i = 1}^{\vert U\vert} {\left[ {{{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_1}\vert} \over {\vert U\vert}}{\rm{log}}{{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_1}\vert} \over {\vert S_a^{0{.}8}({u_i})\vert}}} \right.}} \\ {} & { + {{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_2}\vert} \over {\vert U\vert}}{\rm{log}}{{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_2}\vert} \over {\vert S_a^{0{.}8}({u_i})\vert}}\quad \;\quad \quad \;\quad} \\ {} & { + {{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_3}\vert} \over {\vert U\vert}}{\rm{log}}{{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_3}\vert} \over {\vert S_a^{0{.}8}({u_i})\vert}}\quad \;\quad \;\quad \quad} \\ {} & {\left. { + {{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_4}\vert} \over {\vert U\vert}}{\rm{log}}{{\vert S_a^{0{.}8}({u_i})\; \cap \;{D_4}\vert} \over {\vert S_a^{0{.}8}({u_i})\vert}}} \right]\quad \quad \quad \;\quad} \\ = & {3{.}9512{.}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \end{array}$$

Hence, we obtain

$$\begin{array}{*{20}c}{{H_{{\rm{SIM}}}}(d\vert a) = 3{.}9512,} \\ {{H_{{\rm{SIM}}}}(d\vert b) = 2{.}9387,} \\ {{H_{{\rm{SIM}}}}(d\vert c) = 2{.}8113{.}} \end{array}$$

Then, we find HSIM(d | c) is the minimum, and attribute c is added to attribute reduction set B, i.e., B = {c}. The next step is to compute the conditional entropies of ca and cb:

$${H_{{\rm{SIM}}}}(d\vert a \cup c) = 0,\quad {H_{{\rm{SIM}}}}(d\vert b \cup c) = 0{.}$$

Since \({H_{{\rm{SIM}}}}(d\vert a \cup c) = {H_{{\rm{SIM}}}}(d\vert b \cup c)\), we choose the one explored first, i.e., attribute a. The algorithm terminates since the conditional entropy is equal to zero. Hence, we obtain the reduct B = {a, c}.

5 Experiments

To test the effectiveness of the proposed algorithm, experiments on three real-world datasets are performed. All values of conditional attributes in the datasets are interval values.

5.1 Fish dataset (or ecotoxicology dataset)

The Fish dataset has been introduced to test the effectiveness of attribute reduction for symbolic interval data (Hedjazi et al., 2011). This dataset is composed of observations for abnormal levels of mercury contamination in some Amerindian areas, taken from several studies in French Guyana by researchers from the LEESA Laboratory.

There are 13 interval conditional attributes, which are length, weight, muscle, intestine, stomach, gills, liver, kidneys, liver/muscle, kidneys/muscle, gills/muscle, intestine/muscle, and stomach/muscle. Similarly, they are labeled as a1a13. What is more, a reference classification with respect to the fish diet is taken as the decision attribute, including carnivorous, detritivorous, omnivorous, and herbivorous.

The conditional entropy for each conditional attribute is illustrated in Fig. 1. We can find that the conditional entropy of attribute a9, which represents the liver/muscle, is the minimum. This means liver/muscle is more significant than the other attributes.

Fig. 1
figure 1

Conditional entropy for the single attribute of the Fish dataset

5.2 Face Recognition dataset

The Face Recognition dataset focuses on face recognition. Each interval value represents the measurement of each local feature in a face image. For each face image, the localization of the salient features such as nose, mouth, and eyes is obtained by using morphological operators. A distance is measured between specific points delimiting each boundary and several distances are described as interval values.

The dataset contains 9 men with 3 sequences for each, giving a total of 27 observations (Dai et al., 2013b). The decision attribute identifies which person it is. There are 6 conditional attributes, including the length spanned by the eyes, the length between the eyes, the length from the outer right eye to the upper middle lip at the point between the nose and mouth, and so on. We can represent the six conditional attributes as a1a6.

By using our reduction algorithm, first the set of attribute reductions, B, is initialized to the empty set, and we can compute the conditional entropy for each single conditional attribute shown in Fig. 2. Then, we can find that the conditional entropy of attribute a3 (which represents the length from the outer right eye to the upper middle lip at the point between the nose and the mouth) is the minimum. This means the length from the outer right eye to the upper middle lip at the point between the nose and the mouth is more significant than the other attributes.

Fig. 2
figure 2

Conditional entropy for the single attribute of the Face Recognition dataset

5.3 Car dataset

The Car dataset contains 33 car models described by 7 interval variables, 2 categorical multivalued variables, and 1 nominal variable. This dataset has been used in research on clustering for interval values (Hedjazi et al., 2011; Dai et al., 2013b).

In this study, we take only the 7 interval conditional attributes into consideration, which are price, engine capacity, top speed, step, length, width, and height, denoted as a1a7. The nominal variable ‘car category’ represents the decision attribute, which has been used as a priori classification.

In Fig. 3, we list the conditional entropy of each attribute a1a7 to decision attribute d. It is easy to find that HSIM(d | a1) is the minimum, and we can conclude that the attribute price provides more information for classification.

Fig. 3
figure 3

Conditional entropy for the single attribute of the Car dataset

5.4 Comparison of performance

To verify the effectiveness of the proposed approach, attribute selection experiments based on the proposed uncertainty measurements are conducted on the above datasets.

Besides the similarity measure constructed in Definition 1, we consider two other similarity measures for comparison. One is constructed by Dai et al. (2012; 2013b) based on possible degree. The similarity is called possible degree similarity. Let \(A = \left[ {{a^-},{a^+}} \right]\) and \(B = \left[ {{b^-},{b^+}} \right]\) be two interval values. The possible degree similarity between the two interval values is defined as

$${S_{AB}} = 1 - \left\vert {{P_{(A \geq B)}} - {P_{(B \geq A)}}} \right\vert ,$$
((10))

where

$$\begin{array}{*{20}c}{{P_{(A \geq B)}}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ { = \min \left\{ {1,\max \left\{ {{{{a^+} - {b^-}} \over {({a^+} - {a^-}) + ({b^+} - {b^-})}},0} \right\}} \right\}{.}} \end{array}$$
((11))

\({P_{(A \geq B)}}\) and \({P_{(B \geq A)}}\) are the possible degree of A relative to B and the possible degree of B relative to A, respectively.

The other similarity measure can be adapted from that used by Dai and Tian (2013) to handle set-valued data. For interval values \(A = \left[ {{a^-},{a^+}} \right]\) and \(B = \left[ {{b^-},{b^+}} \right]\), the intersection-union similarity is defined as

$$S(A,B) = {{\vert A \cap B\vert} \over {\vert A \cup B\vert}} = {{\left\vert {\left[ {{a^-},{a^+}} \right] \cap \left[ {{b^-},{b^+}} \right]} \right\vert} \over {\left\vert {\left[ {{a^-},{a^+}} \right] \cup \left[ {{b^-},{b^+}} \right]} \right\vert}}{.}$$
((12))

It is easy to prove that the similarity has the three properties in Proposition 1.

Very few classifiers can be used to address interval data; therefore, to compare the performance of classification based on the selected attributes, Dai et al. (2013b) extended the classical k-nearest neighbor (KNN) classifier and probabilistic neural network (PNN) classifier to handle interval-valued data by redefining the distance between two objects:

Definition 10 (Dai et al., 2013b) Suppose that X and Y are two objects in interval-valued information systems and that \(u_i^\kappa\) and \(u_j^\kappa\) are two interval values at the κth attribute. The distance between X and Y is defined as follows:

$${\rm{Dis}}(X,Y) = \sqrt {\sum\limits_{\kappa = 1}^m {{{\left( {{P_{(u_i^\kappa \geq u_j^\kappa )}} - {P_{(u_j^\kappa \geq u_i^\kappa )}}} \right)}^2}}} ,$$
((13))

where m is the number of conditional attributes and \({P_{(u_i^\kappa \geq u_j^\kappa )}}\) is the possible degree between two interval values.

Due to the limited number of objects in the dataset, we use a leave-one-out cross-validation approach to evaluate the classification performances.

In the experiments, the relative bound difference similarity, possible degree similarity, and intersection-union similarity are used in the proposed attribute reduction framework, denoted as RBD, PD, and IU, respectively. We also compare these three methods with the attribute selection method based on uncertainty measurement for interval-valued information systems constructed by Dai et al. (2013b), called uncertainty measurement attribute reduction (UMAR).

The results are shown in Tables 24. The parameter α represents the similarity threshold (Section 2.2), used to construct similarity relations and similarity classes. Tables 24 show the accuracy rates on the Fish, Face Recognition, and Car datasets, respectively, obtained by the extended KNN classifier and the extended PNN classifier. From the results, we find that RBD obviously outperforms the existing method UMAR. Moreover, PD is a little better than UMAR. IU performs as well as UMAR. The results indicate that the proposed attribute reduction method is feasible and effective. Applying different similarity measures to the proposed algorithm leads to different results. Among RBD, IU, and PD, RBD outperforms IU and PD. In most cases, RBD obtains the best results. PD performs a little better than IU. The reason may lie in the differences of the abilities of describing the similarity between two interval values among these three measures.

Table 2 Performance on the Fish dataset by KNN and PNN
Table 3 Performance on the Face Recognition dataset by KNN and PNN
Table 4 Performance on the Car dataset by KNN and PNN

6 Conclusions

The classical rough set model is not appropriate for handling interval-valued data. In this paper, we present a new framework for attribute reduction in interval-valued information systems from the viewpoint of information theory. Some information theory concepts, including entropy, conditional entropy, and joint entropy, are defined in interval-valued information systems. Based on these concepts, we provide an information theory view for attribute reduction in interval-valued information systems. Consequently, attribute reduction algorithms are proposed. To test the proposed algorithms, experiments on three datasets are conducted. Experiments show that the proposed framework is effective for attribute reduction in interval-valued information systems.

In the future, we plan to investigate other similarity measures and introduce them into our information theory framework of attribute reduction for interval-valued information systems.