1 Introduction

The field of neuroinformatics mainly concludes: data collection, organization and analysis of neuroscience data, data calculation model and development of analytical tools, etc. Treated as a comprehensive subject of information science and neuroscience, neuroinformatics plays a vital role in information science and neuroscience research [1]. Functional magnetic resonance imaging (fMRI) technology is one of the most significant approaches to obtain the data of neuroinformatics. It has been widely used in human behavior experiment and pathology because of its noninvasive, repeatability and other advantages [2,3,4,5,6,7,8]. fMRI can be used to obtain high-resolution three-dimensional images of the brain through the BOLD (blood oxygen level dependent) effect, which can dynamically reflect changes in brain activity signals. However, there are some problems such as large amount of data and excessive interference noise in the research of fMRI. Therefore, how to effectively reduce and extract the feature of fMRI data has become the core contents of the research.

Rough set theory provides a new method that can extract attribute reduction set from fMRI data and obtain feature rules by subtracting the set. Rough set is a mathematical idea proposed by Polish mathematician Pawlak for dealing with uncertainty data in 1980 [9]. The main idea is to keep the classification ability under the premise of the same, get the problem of classification rules and decision rules through the knowledge reduction [10, 11]. The ultimate goal of rough set theory is to generate the final rule from the information (decision) system. There are two principles for the derivation of a feature rule: first, the rules should be used for the classification of database objects. That is, to predict the categories of unlabeled objects. Second, the rule should be used to develop a mathematical model in the field of research, and this knowledge should be presented in a way that can be understood by people. The main steps to process the data using rough set theory are as follows: (1) mapping information from the original database to the decision table (2) data preprocessing (3) calculated attribute reduction (4) from the data reduction derived rules (5) rules filtering. One of the most critical tasks is the reduction of the attributes of the data. In general, as a decision table elements, the real object often produces a large amount of data, and these data is not all valuable from the calculation point of view. Therefore, it would be meaningful to be able to extract the most valuable information from the large decision table effectively.

In this paper, a kind of K-means algorithm based on rough set optimization is proposed to apply the classification of fMRI data. First, use the rough set of ideas in the training set of fMRI data on the property reduction. Then, gain the best attribute reduction by calculating the importance of the property, and regard the best attribute reduction as an important attribute. Finally, treat the important attributes as data features, and classify the test fMRI data by k-means algorithm. Furthermore, various fMRI data experiments are used to demonstrate the effectiveness of the proposed method.

2 Knowledge of Rough Sets

Definition 1:

Suppose \( P\,\subseteq\,R, P \ne \varnothing , \cap \, P \) represents the intersection of all equivalence relations in P, as ind(P), and \( [{\text{x}}]_{{ind({\text{P}})}} = \bigcap\limits_{P \subseteq R} {[{\text{X}}]_{R} } \). ([x] is the non-distinguishable set of x).

Definition 2:

Four-tuples = (U, A, V, f) is a knowledge representation system, U represents the domain of non-empty set of object, as \( U = \{ X_{1} ,X_{2} , \cdots ,X_{n} \} \); A represents non-empty finite set of the indicators, as \( A = \{ \text{a}_{1} ,{\text{a}}_{2} , \cdots ,{\text{a}}_{n} \} \), (where \( A = C \cup D \), C represents the condition attribute and D represents the decision attribute); V is the range of indicator a, as \( V = \cup \,V_{a} (\forall V_{a} \in V) \); \( f = U \times A \to V \) is an information function which given the value of a message to each attribute, where \( V = \cup \,V_{a} (\forall V_{a} \in V) \).

Definition 3

(Pawlak approximation space): Let U be a finite nonempty universe of discourse. Let R be an equivalence relation over U. Then the ordered pair <U, R> is a Pawlak approximation space.

Approximation operators: Let <U, R> be a Pawlak approximation space \( {\text{U}}/{\text{R}} = \left\{ {{\text{W}}1,{\text{W}}2 \ldots {\text{Wm}}} \right\} \) is a partition where W1, W2 … Wm are all equivalence classes of R. With each subset \( X \subseteq U \), we associate two subsets:

$$ \underline{R} ({\text{X}}) = \cup \{ {\text{W}}_{i} \in {\text{U}}/{\text{R}}:{\text{W}}_{i} \subset {\text{X}}\} $$
(1)
$$ \overline{\text{R}} ({\text{X}}) = \cup \{ {\text{W}}_{i} \in {\text{U}}/{\text{R}}:{\text{W}}_{i} \cap {\text{X}} \ne {\varnothing }\} $$
(2)

Called the R- upper and R- lower approximations of X, respectively.

$$ {\text{X' s boundary of R}}\quad \quad \quad bn_{R} ({\text{X}}) = \overline{R} X - \underline{R} X $$
(3)
$$ {\text{X' s positive region of R}}\quad \quad pos_{R} ({\text{X}}) = \underline{R} X $$
(4)
$$ {\text{X s negative region of R}}\quad \quad neg_{R} ({\text{X}}) = U - \overline{R} X $$
(5)

Their relationship can be explained by Fig. 1.

Fig. 1.
figure 1

Sketch map of concept of rough sets

Attribute positive field: Suppose P and Q are the two knowledge of the Domain of U. Definition positive field P of Q in fellow:

$$ Pos_{P} ({\text{Q}}) = \bigcup\limits_{X \in U/Q} {Pos_{P} ({\text{X}}) = \{ {\text{x}}|{\text{x}} \in {\text{U}} \wedge [{\text{x}}]_{p} \subseteq [{\text{x}}]_{Q} \} } $$
(6)

3 Experimental Methods

3.1 Data Acquisition and Data Preprocessing

  1. (1)

    Data of different eye states

This study includes the following two behaviors: (1) Open eyes behavior. Subjects were asked to open their eyes, pegged to the top of the machine cross, and kept the frequency of blink as low as possible. (2) Close eyes behavior. Subjects were asked to close their eyes, keep less visual input. Data acquisition process requires subjects to keep their brain clear and keep away from the sleep state.

The BOLD fMRI data were acquired on a Siemens Trio 3.0 T of East China Normal University with a gradient echo EPI with 36 slices providing whole-brain coverage and 230 volumes, a TR of 2s and a scan resolution of 64 * 64. The in-plane resolution was 3.5 mm × 3.5 mm, and the slice thickness was 3.5 mm.

  1. (2)

    Data of Alzheimer’s disease and health

The data of 30 Alzheimer’s disease data which was used in the current study was provided by the Alzheimer’s disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI is a public-private partnership guided by Michael W. Weiner, MD, which was founded in 2003. ADNI’s initial target was to investigate and find a solution to predict the progression of early Alzheimer’s disease and mild cognitive impairment by means of a combination of positron emission tomography, magnetic resonance imaging and other biological, neuropsychological and clinical evaluations.

The specific parameters for data acquisition are: Magnetic field strength 3.0 T Philips with a gradient echo EPI with 48 slices providing whole-brain coverage and 140 volumes, a TR of 3s and a scan resolution of 64 * 64. The in-plane resolution was 3.31 mm × 3.31 mm, and the slice thickness was 3.31 mm.

The 30 healthy subjects’ data which was used in the current study, was provided by Common database of neuroimaging (http://www.nitrc.org/projects/fcon1000/). The data was published by Professor Yufeng Zang in NIFTI format.

The specific parameters for data acquisition are: Magnetic field strength 3.0T Philips with a gradient echo EPI with 33 slices providing whole-brain coverage and 215 volumes, a TR of 2s and a scan resolution of 64 * 64. The in-plane resolution was 3.13 mm × 3.13 mm, and the slice thickness was 3.6 mm.

3.2 Attribute Importance Calculation

Suppose that S = <U, A, V, f > is a knowledge expression system. \( A = C \cup D \), \( C \cap D = \varnothing \) C is the condition attribute set, D is the decision attribute. If \( U/C = \{ X_{1} ,X_{2} , \cdots ,X_{n} \} ,U/D = \{ Y_{1} ,Y_{2} , \cdots ,Y_{n} \} \) the degree of dependence between P and D can be computed as [12]:

$$ k_{c} ({\text{D}}) = \left[ {1 - \frac{{cord({\text{x}})}}{d}} \right]\frac{1}{\left| U \right|}\sum\limits_{i = 1}^{m} {\left| {C\_({\text{Y}}_{i} )} \right|} = \frac{1}{\left| U \right|}\sum\limits_{i = 1}^{m} {\left| {pos_{C} ({\text{Y}}_{i} )} \right|} $$
(7)

Thus, the significance of an attribute a ∈ C can be calculated from the set of conditional attributes C as follows:

$$ sig_{{C - \{ {\text{c}}\} }}^{D} ({\text{c}}) = {\text{K}}_{C} ({\text{D}}) - {\text{K}}_{{C - \{ {\text{C}}\} }} ({\text{D}}) $$
(8)

3.3 Best Attribute Reduction

The attribute significance of each condition attribute from each subject was calculated, and the attribute significance of the same condition attribute was added, the order table of attribute significance was obtained according to the order from large to small as well. We counted the attribute significances which were obtained for the combination of attribute reduction. In the premise of keeping the attribute table decision-making ability unchanged, we selected the least amount and the highest attribute significance from the combination of attribute reduction as the best attribute reduction.

3.4 K-means Algorithm

  1. Step 1:

    From N data objects, K objects were selected as the initial clustering centers;

  2. Step 2:

    According to the mean (central object) of each clustering object, calculate the distance between each object and the central object, and divide the corresponding object according to the minimum distance;

  3. Step 3:

    Recalculate the mean of each cluster (central object);

  4. Step 4:

    Calculate the standard measure function. When a certain condition is satisfied, such as meet the convergence of the function, the algorithm terminates; if the condition is not satisfied, then go back to step 2.

From the training set of data calculating the attribute significance, relying on the attribute significance to select the best attribute reduction of the rough set, the attribute of best attribute reduction was defined as an important attribute, and the rest of the attributes other than the best attribute were defined as non-significant attributes. The test sets for the two sets of data were tested separately; the K-means algorithm was used to classify the important and non-important attributes. Take K = 2, randomly generated the initial center of mass, the classification result was compared with the original data label to obtain the classification accuracy n. Considering that the correspondence between the label after K-means classification and the original label was uncertain, therefore, only values with a classification accuracy greater than 0.5 were selected. If n is less than 0.5, then 1 − n is the final classification accuracy. That is, the classification accuracy is at least 0.5, and the highest is 1.

4 Result Analyses

4.1 Reductions in fMRI Data for Different Eye States

We randomly selected 30 groups of subjects, a total of 13800 data as a training set, and 4600 data of the remaining 10 subjects as a test set. After acquiring the attribute reductions, the best attribute reduction is obtained by calculating the attribute significance. The best reduction in fMRI data for different eye states consists of 20 brain regions listed in Table 1. Rank in descending order according to attribute importance: \( \left\{ {49,4,3,10,15,40,9,47,21,46,16,42,41,50,13,22,11,44,39,35} \right\} \).

Table 1. The best reduction in fMRI data for different eye states

4.2 Reductions in fMRI Data for Alzheimer’s Disease and Healthy Controls

We randomly selected 20 groups of subjects, a total of 7100 data as a training set, and 3550 data of the remaining 10 subjects as test set. After acquiring the attribute reductions, the best attribute reduction is obtained by calculating the attribute significance. The best reduction in fMRI data for Alzheimer’s and healthy controls contains 17 brain regions listed in Table 2. Rank in descending order according to attribute importance: \( \left\{ {19,3,16,21,50,20,23,2,48,49,43,8,40,14,35,42,10} \right\} \).

Table 2. The best reduction in fMRI data for different eye states Alzheimer’s disease

4.3 Data Atlas

In order to more intuitively reflect the differences between important attributes and non-important attributes, the mean values of the different brain regions of each subject were obtained. The data maps are produced as follows:

  1. (1)

    Different eye state data_Average data map (Line 1 to 230 is closed eyes, Line 231 to 460 is opened eyes) (Fig. 2).

    Fig. 2.
    figure 2

    Data Atlas of important and non-vital brain regions with different eye status data

  2. (2)

    Alzheimer’s disease and normal control data _ Average data map (Line 1 to 140 is Alzheimer’s disease data, Line 141 to 355 is normal data) (Fig. 3).

    Fig. 3.
    figure 3

    Data Atlas of important and non-vital brain regions with Alzheimer’s disease

We can clearly see from the two sets of data, in the data of the same control group, the distinction in the data of important attributes is greater, and different categories of data are more clearly distinguished. Therefore, the use of important attributes for classification can have a better effect. And the non-important attribute data is usually more disorganized and less distinct, so using it to classify is often ineffective and may even affect judgement.

4.4 Clustering Algorithm Based on Rough Set Optimization

The result of the attribute reduction is taken as an important attribute, and the data that has been subtracted is taken as a non-important attribute. Then the K-means algorithm is used to classify the raw data, the data containing only the important attributes, and the data containing only the non-significant attributes, respectively, where K = 2. The results are as follows (Table 3):

Table 3. Classification accuracy of K-means algorithms with different data

It can be found that, using the important attributes that obtained by the attribute reduction of the rough set to classify, can have a better effect. However, using the non-important attributes that has been subtracted to classify, its accuracy will be less than the raw data, and much less than the important attributes.

5 Discuss

In this paper, a K-means algorithm based on rough set optimization is proposed and applied to the hierarchical fMRI data classification. First, the concepts of important and non-important attributes were put forward. Applicate the rough set theory in the training data set for reduction of attributes, and select the best attribute reduction through the importance of the property. Regard the best attribute reduction of the attribute as an important attribute, the rest attributes as non-important attributes. The important attributes are the most significant cluster of data in the original attributes, and the rest of the attribute often plays a very weak role in the classification, and even affects the classification result. It can be seen from the experiment that the accuracy of classification using the important attributes will be higher than the original data, and the accuracy of the use of non-important attributes of the classification will be lower than the original data, and far below the important attributes. Therefore, the use of important attributes for classification calculation can greatly reduce the workload and can achieve better classification results. In the next step, the experiment can be improved by optimizing the attribute reduction algorithm, selecting the more reasonable strategy of superiority, dividing the more detailed brain region and so on, and obtain the scientific result.