Abstract
Discriminative subgraph mining from a large collection of graph objects is a crucial problem for graph classification. Several main memory-based approaches have been proposed to mine discriminative subgraphs, but they always lack scalability and are not suitable for large-scale graph databases. Based on the MapReduce model, we propose an efficient method, MRGAGC, to process discriminative subgraph mining. MRGAGC employs the iterative MapReduce framework to mine discriminative subgraphs. Each map step applies the evolutionary computation and three evolutionary strategies to generate a set of locally optimal discriminative subgraphs, and the reduce step aggregates all the discriminative subgraphs and outputs the result. The iteration loop terminates until the stopping condition threshold is met. In the end, we employ subgraph coverage rules to build graph classifiers using the discriminative subgraphs mined by MRGAGC. Extensive experimental results on both real and synthetic datasets show that MRGAGC obviously outperforms the other approaches in terms of both classification accuracy and runtime efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. ACM (1986)
Bilgin, C., Demir, C., Nagi, C., Yener, B.: Cell-graph mining for breast tissue modeling and classification. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2007, pp. 5311–5314. IEEE (2007)
Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2003, pp. 51–58. IEEE (2002)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment 3(1-2), 285–296 (2010)
Cui, B., Mei, H., Ooi, B.C.: Big data: the driver for innovation in databases. National Science Review 1(1), 27–30 (2014)
De Jong, K.: Evolutionary computation: a unified approach. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 737–750. ACM (2012)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17(8), 1036–1050 (2005)
Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 661–666. ACM (2012)
Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., Tropsha, A.: Mining protein family specific residue packing patterns from protein structure graphs. In: Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology, pp. 308–315. ACM (2004)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 549–552. IEEE (2003)
Jin, N., Wang, W.: Lts: Discriminative subgraph mining by learning from search history. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 207–218. IEEE (2011)
Jin, N., Young, C., Wang, W.: Graph classification based on pattern co-occurrence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 573–582. ACM (2009)
Jin, N., Young, C., Wang, W.: Gaia: graph classification using evolutionary computation. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 879–890. ACM (2010)
Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 844–855. IEEE (2014)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report (1999)
Ranu, S., Singh, A.K.: Graphsig: A scalable approach to mining significant subgraphs in large graph databases. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 844–855. IEEE (2009)
Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proceedings of the National Academy of Sciences of the United States of America 102(6), 1974–1979 (2005)
Storn, R., Price, K.: Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces. ICSI Berkeley (1995)
Tang, L., Liu, H.: Graph mining applications to social network analysis. In: Managing and Mining Graph Data, pp. 487–513. Springer (2010)
Tao, Y., Lin, W., Xiao, X.: Minimal mapreduce algorithms. In: Proceedings of the 2013 International Conference on Management of Data, pp. 529–540. ACM (2013)
Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 433–444. ACM (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Z., Zhao, Y., Wang, G., Cheng, Y. (2015). Large-Scale Graph Classification Based on Evolutionary Computation with MapReduce. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-25255-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)