CCAR: An efficient method for mining class association rules with itemset constraints
Introduction
The problem of mining class association rules (CARs) is finding of the complete set of CARs that satisfies the user-specified minimum support and minimum confidence thresholds from a dataset. Numerous approaches have been proposed to solve this problem. Examples include the Apriori-based algorithm CBA (Liu et al., 1998), the FP tree-based algorithm CMAR (Li et al., 2001), mining CARs based on the vertical dataset layout (Zhao et al., 2009), the use of an equivalence class rule tree (Vo and Le, 2009), the lattice-based approach for mining CARs (Nguyen et al., 2012), the use of a modified ECR tree with Obidset (Nguyen et al., 2013), and parallel mining CARs on the multi-core processor architecture (Nguyen et al., 2014).
Mining CARs to discover associations between itemsets and class labels is very popular and useful in practice, especially in mining medical data. However, end users often consider only a subset of CARs, for instance, those that contain at least one itemset from a user-defined set of itemsets. Itemset constraints reduce the number of obtained CARs and decrease the search space, improving the performance of the mining process. Additionally, constrained CARs also help to discover interesting or useful rules particular to the end user. For example, in cancer treatment applications, biologists often focus on rules involving new drugs to understand the effectiveness of new treatment strategies. Thus, the present study considers constraints in the form of Boolean expressions over the presence of itemsets in the antecedents of classification rules. The main contributions of this paper are as follows. Firstly, a tree structure named the Constraint Class Rule tree (CCR-tree) is proposed for efficiently mining CARs with itemset constraints. At the first level, the tree contains both constrained nodes which include constrained itemsets and frequent nodes which include frequent 1-itemsets. At the following levels, the tree contains constrained nodes only. Using this tree structure, only nodes that contain constrained itemsets are generated. Secondly, two theorems for quickly pruning infrequent itemsets are derived. Finally, an efficient and fast algorithm for mining CARs with itemset constraints is developed. Compared to two existing pre- and post-processing approaches, the proposed method does not generate all rules which significantly accelerates the mining time and also reduces the memory consumption. The experimental results also show that the proposed algorithm can achieve up to 3 × and 12 × speedups in comparison with pre- and post-processing methods, respectively.
The rest of this paper is organized as follows. In Section 2, some preliminary concepts of CAR mining are briefly given. Work related to mining association rules with itemset constraints and mining class association rules with itemset constraints is introduced in Section 3. The primary contributions are presented in Section 4, in which the CCR-tree structure is presented and two theorems for eliminating infrequent itemsets are provided. The proposed algorithm, Constraint Class Association Rule (CCAR), for efficiently mining CARs with itemset constraints is also described in this section. The experimental results are presented in Section 5. Section 6 describes a real-life application of the proposed method in the HIV/AIDS domain. Finally, conclusions and future work are discussed in Section 7.
Section snippets
Preliminary concepts
Let D be a dataset with n attributes {A1, A2,...,An} and |D| records (objects) where each record has an object identifier (OID). Let C={c1,c2,...,ck} be a list of class labels. A specific value of an attribute Ai and class C is denoted by lower-case letters aim and cj, respectively. Definition 1 An item is described as an attribute and a specific value for that attribute, denoted by 〈(Ai,aim)〉, e.g. 〈(A1,a11)〉, 〈(A1,a12)〉, 〈(A2,a21)〉, etc. Definition 2 An itemset is a set of items, e.g., 〈(A1,a11),(A2,a21)〉, 〈(A1,a11),(A3,
Mining association rules with itemset constraints
The problem of mining association rules with itemset constraints has been widely researched in the literature. Since the introduction of mining association rules with itemset constraints (Srikant et al., 1997), three main strategies have been proposed. The first group, post-processing methods, first mines frequent itemsets by using an algorithm such as Apriori (Agrawal and Srikant, 1994) or FP-Growth (Han et al., 2000) and then filters out the ones that do not satisfy the itemset constraints in
Tree structure
This study proposes the CCR-tree structure. In the CCR-tree, each node contains one itemset along with the following information:
- (1)
(Obidset1,Obidset2,...,Obidsetk): each Obidseti is a set of object identifiers that contains both itemset and class ci. Note that k is the number of classes in the dataset.
- (2)
pos: stores the position of the class with the maximum cardinality of Obidseti, i.e., pos=argmaxi∈[1,k]{|Obidseti|}.
- (3)
total: stores the sum of cardinality of all Obidseti, i.e., .
Experiments
All experiments were conducted on a computer with an Intel Core i5-540 M CPU at 2.53 GHz and 4 GB of RAM running Windows 7 Enterprise (32-bit) SP1. The experimental datasets were obtained from the UCI Machine Learning Repository (http://mlearn.ics.uci.edu). The algorithms were coded in C# using MS Visual Studio.NET 2010 Express.
An application of the proposed method in the HIV/AIDS domain
Data mining has practical applications in many areas such as business, retail, banking, education, healthcare, science, engineering, etc. In the engineering domain, applications of data mining are becoming popular. Kamsu-Foguem et al. (2013) applied sequential rule mining to the production process for quality improvement. Their study reported some interesting results for the drill production process. Sequential rule mining was also used in intelligent tutoring agents (Faghihi et al., 2012,
Conclusions and future work
This study has proposed an efficient method for mining CARs with itemset constraints. Unlike post-processing and pre-processing approaches, our approach generates only rules that satisfy the itemset constraints. The framework of the proposed algorithm is based on a novel tree structure which includes only nodes containing constrained itemsets and two theorems for quickly pruning infrequent itemsets. To validate the efficiency of the proposed method, a series of experiments was conducted on four
Acknowledgments
This work was funded by the Vietnam׳s National Foundation for Science and Technology Development (NAFOSTED) under Grant no. 102.01-2012.17. The authors would like to thank Ho Chi Minh City Provincial AIDS Committee (PAC) which provided the real VCT dataset used in this study.
References (32)
- et al.
An efficient method for mining frequent itemsets with double constraints
Eng. Appl. Artif. Intell.
(2014) - et al.
A computational model for causal learning in cognitive agents
Knowl.-based Syst.
(2012) - et al.
CMRules: mining sequential rules common to several sequences
Knowl.-based Syst.
(2012) - et al.
Mining association rules for the quality improvement of the production process
Expert Syst. Appl.
(2013) - et al.
A review of data mining applications for quality improvement in manufacturing industry
Expert Syst. Appl.
(2011) - et al.
Application of association rules in Iranian Railways (RAI) accident data analysis
Saf. Sci.
(2010) - et al.
Efficient strategies for parallel mining class association rules
Expert Syst. Appl.
(2014) - et al.
Classification based on association rules: a lattice-based approach
Expert Syst. Appl.
(2012) - et al.
CAR-Miner: an efficient algorithm for mining class-association rules
Expert Syst. Appl.
(2013) - et al.
Learning task models in ill-defined domain using an hybrid knowledge discovery framework
Knowl.-based Syst.
(2011)
Generating knowledge in maintenance from experience feedback
Knowl.-based Syst.
The role of HIV counseling and testing in the developing world
AIDS Educ. Prev.
Efficacy of risk-reduction counseling to prevent human immunodeficiency virus and sexually transmitted diseases: a randomized controlled trial
J. Am. Med. Assoc.
Cited by (24)
A guided FP-Growth algorithm for mining multitude-targeted item-sets and class association rules in imbalanced data
2021, Information SciencesCitation Excerpt :Another variation of the item-set tree structure [14] has been designed to reduce memory consumption, by having each single-prefix path portion of the tree be represented by a single node. One may consider the task of targeted item-set mining as a special case of frequent item-set mining, which involves an additional constraint specifying interesting subsets of item-sets [1,39,11,36,38]. Various constraints have been studied in frequent item-set mining.
An efficient algorithm for unique class association rule mining
2021, Expert Systems with ApplicationsCitation Excerpt :One dataset is selected form each group showed in Fig. 3. Based on the related work two efficient CARs’ mining algorithms have been selected for the comparison which are CCAR (Nguyen et al., 2015) and LD-CARM-IC (Nguyen et al., 2016). Both algorithm requires specifying a minimum support and selectivity as an important input constraint or preference for the search process.
ACPRISM: Associative classification based on PRISM algorithm
2017, Information SciencesA lattice-based approach for mining high utility association rules
2017, Information SciencesCitation Excerpt :The HGB-HAR algorithm took a long time to complete the task of mining HARs from the Accidents dataset, while LARM only needed an average of 14.5 ms to complete this (Fig. 13). Actually, with this dataset we needed 6 ms to construct HUIL from HUIs, which was extracted from the FHIM algorithm with min-util = 14% [13]. Then, using this HUIL, we could mine all HARs easily within an average of 9.5 ms. This result again indicates the good performance of LARM as well as the reusability of HUIL.
Efficient mining of class association rules with the itemset constraint
2016, Knowledge-Based SystemsCitation Excerpt :Finally, after generating all CARs which satisfy the constraint, the algorithm clears all marks of nodes to prepare for the next lattice traverse with the new itemset constraint. : Please refer [8]. ∎
An improved algorithm for mining class association rules using the difference of Obidsets
2015, Expert Systems with ApplicationsCitation Excerpt :Association rule mining has been extensively studied due to its application in numerous fields such as market basket analysis, medicine, protein sequencing, census data processing, and fraud detection. Many subjects have attracted researchers, including mining association rules (Duong, Tin, & Vo, 2014; Grahne & Zhu, 2005; Lucchese, Orlando, & Perego, 2006; Vo, Hong, & Le, 2012; Vo, Hong, & Le, 2013; Zaki & Hsiao, 2005) and classification based on association rules (Abdelhamid, Ayesh, Thabtah, Ahmadi, & Hadi, 2012; Chien & Chen, 2010; Coenen, Leng, & Zhang, 2007; Li, Han, & Pei, 2001; Lim & Lee, 2010; Liu, Hsu, & Ma, 1998; Liu, Jiang, Liu, & Yang, 2008; Liu, Ma, & Wong, 2000; Nguyen & Vo, 2014; Nguyen, Vo, Hong, & Thanh, 2012; Nguyen, Vo, Hong, & Thanh, 2013; Nguyen, Vo, & Le, 2014; Nguyen, Vo, & Le, 2015; Thabtah, Cowling, & Peng, 2004; Thabtah, Cowling, & Hammoud, 2006; Veloso, Meira, Goncalves, Almeida, & Zaki, 2007; Veloso, Meira, Goncalves, Almeida, & Zaki, 2011; Veloso, Meira, & Zaki, 2006; Vo & Le, 2008; Yang, Mabu, Shimada, & Hirasawa, 2011; Yin & Han, 2003; Zhang, Chen, & Wei, 2011; Zhao, Tsang, Chen, & Wang, 2010). A common issue in these problems is frequent itemset mining.