Abstract
Graphs are a ubiquitous model to represent objects and their relations. However, the complex combinations of structure and content, coupled with massive volume, high streaming rate, and uncertainty inherent in the data, raise several challenges that require new efforts for smarter and faster graph analysis. With the advent of complex networks such as the World Wide Web, social networks, knowledge graphs, genome and scientific databases, Internet of things, medical and government records, novel graph computations are also emerging, including graph pattern matching and mining, similarity search, keyword search, and graph query-by-example. These workloads require both topology and content information of the network; and hence, they are different from classical graph computations such as shortest path, reachability, and minimum cut, which depend only on the structure of the network. In this chapter, we shall describe the emerging graph queries and mining problems, their applications and resolution techniques. We emphasize the current challenges and highlight some future research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)
S. Abiteboul, D. Quass, J. McHugh, J. Widom, J.L. Wiener, The lorel query language for semistructured data. Int. J. Digit. Libr. 1(1), 68–88 (1997)
B. Aditya, G. Bhalotia, S. Chakrabarti, A. Hulgeri, C. Nakhe, P. Parag, S. Sudarshan, BANKS: browsing and keyword searching in relational databases, in VLDB (2002)
C. Aggarwal, H. Wang, Managing and Mining Graph Data (Springer, Berlin, 2010)
S. Agrawal, S. Chaudhuri, G. Das, DBXplorer: a system for keyword-based search over relational databases, in ICDE (2002)
D. Ajwani, M. Karnstedt, A. Sala, Processing large graphs: representations, storage, systems, and algorithms, in WWW (2015)
R. Angles, C. Gutierrez, Survey of graph database models. ACM Comput. Surv. 40(1), 1:1–1:39 (2008)
A. Arora, M. Sachan, A. Bhattacharya, Mining statistically significant connected subgraphs in vertex labeled graphs, in SIGMOD (2014)
P. Barceló, L. Libkin, J.L. Reutter, Querying graph patterns, in PODS (2011)
M. Bayati, M. Gerritsen, D.F. Gleich, A. Saberi, Y. Wang, Algorithms for large sparse network alignment problems, in ICDM (2009)
J. Berry, B. Hendrickson, S. Kahan, P. Konecny, Software and algorithms for graph queries on multithreaded architectures, in IPDPS (2007)
S.S. Bhowmick, B. Choi, S. Zhou, VOGUE: towards a visual interaction-aware graph query processing framework, in CIDR (2013)
C. Borgelt, M.R. Berthold, Mining molecular fragments: finding relevant substructures of molecules, in ICDM (2002)
S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
B. Bringmann, S. Nijssen, What is frequent in a single graph? in PAKDD (2008)
J. Broekstra, A. Kampman, F.v. Harmelen, Sesame: a generic architecture for storing and querying RDF and RDF schema, in ISWC (2002)
A. Buluç, J.R. Gilbert, The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
P. Buneman, M.F. Fernandez, D. Suciu, UnQL: a query language and algebra for semistructured data based on structural recursion. VLDB J. 9(1), 76–110 (2000)
M. Bureli, The Current State of Graph Databases (2012). http://bigbe.su/lectures/2014/16.3.pdf
C. Chen, X. Yan, F. Zhu, J. Han, P.S. Yu, Graph OLAP: towards online analytical processing on graphs, in ICDM (2008)
H. Cheng, D. Lo, Y. Zhou, X. Wang, X. Yan, Identifying bug signatures using discriminative graph mining, in ISSTA (2009)
E.I. Chong, S. Das, G. Eadon, J. Srinivasan, An efficient SQL-based RDF querying scheme, in VLDB (2005)
S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv, XSEarch: a semantic search engine for XML, in VLDB (2003)
M.P. Consens, A.O. Mendelzon, Expressing structural hypertext queries in graphlogm, in HYPERTEXT (1989)
S. Cook, The complexity of theorem-proving procedures, in STOC (1971), pp. 151–158
L.P. Cordella, P. Foggia, C. Sansone, M. Vento, A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
T.H. Cormen, C. Stein, R.L. Rivest, C.E. Leiserson, Introduction to Algorithms (McGraw-Hill Higher Education, New York, 2001)
X.H. Dang, A. Singh, P. Bogdanov, H. You, B. Hsu, Discriminative subnetworks with regularized spectral learning for global-state network data, in ECML PKDD (2014)
X.H. Dang, H. You, P. Bogdanov, A. Singh, Learning predictive substructures with regularization for network data, in ICDM (2015)
M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17, 1036–1050 (2005)
DEX/Sparksee, http://sparsity-technologies.com/
A. Dovier, C. Piazza, The subgraph bisimulation problem. TKDE 15(4), 1055–1056 (2003)
J. Dutkowski, T. Ideker, Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 7, 09 (2011)
M. Elseidy, E. Abdelhamid, S. Skiadopoulos, P. Kalnis, GraMi: frequent subgraph and pattern mining in a single large graph, in VLDB (2014)
O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, P. Boncz, The LDBC social network benchmark: interactive workload, in SIGMOD (2015)
R. Fagin, A. Lotem, M. Naor, Optimal aggregation algorithms for middleware, in PODS (2001)
C. Faloutsos, G. Miller, C. Tsourakakis, Large graph mining: power tools and a practioner’s guide, in KDD (2009)
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, Y. Wu, Graph pattern matching: from intractable to polynomial time, in VLDB (2010)
W. Fan, J. Li, S. Ma, H. Wang, Y. Wu, Graph homomorphism revisited for graph matching, in VLDB (2010)
W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, Y. Wu, Incremental graph pattern matching, in SIGMOD (2011)
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, Adding regular expressions to graph reachability and pattern queries, in ICDE (2011)
M.F. Fernandez, D. Florescu, A.Y. Levy, D. Suciu, Declarative specification of web sites with STRUDEL. VLDB J. 9(1), 38–55 (2000)
M. Fiedler, C. Borgelt, Subgraph support in a single large graph, in ICDM Workshops, 2007 (2007)
B. Gallagher, Matching structure and semantics: a survey on graph-based pattern matching, in AAAI FS (2006)
J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in OSDI (2014)
D. Gregor, A. Lumsdaine, The parallel BGL: a generic library for distributed graph computations, in POOSC (2005)
Z. Guan, J. Wu, Q. Zhang, A. Singh, X. Yan, Assessing and ranking structural correlations in graphs, in SIGMOD (2011)
L. Guo, F. Shao, C. Botev, J. Shanmugasundaram, XRANK: ranked keyword search over XML documents, in SIGMOD (2003)
R. Gupta, S. Sarawagi, Answering table augmentation queries from unstructured lists on the web, in VLDB (2009)
S. Gurukar, S. Ranu, B. Ravindran, COMMIT: a scalable approach to mining communication motifs from dynamic networks, in SIGMOD (2015)
A. Guttman, R-trees: a dynamic index structure for spatial searching, in SIGMOD (1984)
J. Han, Y. Sun, X. Yan, P.S. Yu, Mining knowledge from databases: an information network analysis approach, in SIGMOD (2010)
L. Han, T. Finin, A. Joshi, GoRelations: an intuitive query system for dbpedia, in JIST (2011)
M. Han, K. Daudjee, K. Ammar, M.T. Özsu, X. Wang, T. Jin, An experimental comparison of pregel-like graph processing systems, in VLDB (2014)
W.-S. Han, J. Lee, M.-D. Pham, J. Yu, iGraph: a framework for comparisons of disk-based graph indexing techniques, in VLDB (2010)
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, H. Yu, TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC, in KDD (2013)
S. Harris, N. Gibbins, 3store: efficient bulk RDF, in PSSS (2003)
M.A. Hasan, V. Chaoji, S. Salem, J. Besson, M.J. Zaki, ORIGAMI: mining representative orthogonal graph patterns, in ICDM (2007)
M.A. Hasan, M.J. Zaki, Output space sampling for graph patterns, in VLDB (2009)
H. He, A. Singh, Graphs-at-a-time: query language and access methods for graph databases, in SIGMOD (2008)
H. He, H. Wang, J. Yang, P.S. Yu, BLINKS: ranked keyword searches on graphs, in SIGMOD (2007)
B. Hendrickson, R. Leland, A multilevel algorithm for partitioning graphs, in Supercomputing (1995)
M.R. Henzinger, T.A. Henzinger, P.W. Kopke, Computing simulations on finite and infinite graphs, in FOCS (1995)
S. Hong, H. Chafi, E. Sedlar, K. Olukotun, Green-Marl: a dsl for easy and efficient graph analysis, in ASPLOS (2012)
V. Hristidis, Y. Papakonstantinou, Discover: keyword search in relational databases, in VLDB (2002)
V. Hristidis, L. Gravano, Y. Papakonstantinou, Efficient IR-style keyword search over relational databases, in VLDB (2003)
V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava, Keyword proximity search in XML trees. TKDE 18(4), 525–539 (2006)
J. Huan, W. Wang, J. Prins, Efficient mining of frequent subgraphs in the presence of isomorphism, in ICDM (2003)
J. Huan, W. Wang, J. Prins, J. Yang, Spin: mining maximal frequent subgraphs from graph databases, in KDD (2004)
J. Huan, W. Wang, D.Bandyopadhyay, J. Snoeyink, J. Prins, A. Tropsha, Mining spatial motifs from protein structure graphs, in Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB04) (2004), pp. 308–315
InfiniteGraph, http://www.objectivity.com/products/infinitegraph/
A. Inokuchi, T. Washio, H. Motoda, An apriori-based algorithm for mining frequent substructures from graph data. Princ. Data Min. Knowl. Discov. 1910, 13–23 (2000)
N. Jayaram, A. Khan, C. Li, X. Yan, R. Elmasri, Querying knowledge graphs by example entity tuples. TKDE 27(10), 2797–2811 (2015)
N. Jin, C. Young, W.Wang, 0010. GAIA: graph classification using evolutionary computation, in SIGMOD (2010)
C. Jin, S.S. Bhowmick, X. Xiao, B. Choi, S. Zhou, GBLENDER: visual subgraph query formulation meets query processing, in SIGMOD (2011)
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar, Bidirectional expansion for keyword search on graph databases, in VLDB (2005)
M. Kargar, A. An, Keyword search in graphs: finding R-cliques, in VLDB (2011)
G. Karypis, METIS and ParMETIS, in Encyclopedia of parallel computing (Springer, Berlin, 2011)
Z. Kefato, M. Lissandrini, D. Mottin, T. Palpanas, Keyword Query to Graph Query. Technical report DISI-14-003, University of Trento (2013)
B.P. Kelley, B. Yuan, F. Lewitter, R. Sharan, B.R. Stockwell, T. Ideker, PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, 83–88 (2004)
D. Kempe, J.M. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in KDD (2003)
A. Khan, L. Chen, On uncertain graphs modeling and queries, in VLDB (2015)
A. Khan, S. Elnikety, Systems for big-graphs, in VLDB (2014)
A. Khan, N. Li, Z. Guan, S. Chakraborty, S. Tao, Neighborhood based fast graph search in large networks, in SIGMOD (2011)
A. Khan, X. Yan, K.-L. Wu, Towards proximity pattern mining in large graphs, in SIGMOD (2010)
A. Khan, Y. Wu, X. Yan, Emerging graph queries in linked data, in ICDE (2012)
A. Khan, Y. Wu, C. Aggarwal, X. Yan, NeMa: fast graph search with label similarity, in VLDB (2013)
J. Kleinberg, Navigation in a small world. Nature 406, 845 (2000)
K. Kochut, M. Janik, SPARQLeR: extended sparql for semantic association discovery, in ESWC (2007)
R. Krishnamurthy, S.P. Morgan, M. Zloof, Query-by-example: operations on piecewise continuous data, in VLDB (1983)
M. Kuramochi, G. Karypis, Frequent subgraph discovery, in ICDM (2001)
M. Kuramochi, G. Karypis, GREW-a scalable frequent subgraph discovery algorithm, in ICDM (2004)
T. Lappas, K. Liu, E. Terzi, Finding a team of experts in social networks, in KDD (2009)
J. Lee, W.-S. Han, R. Kasperovics, J.-H. Lee, An in-depth comparison of subgraph isomorphism algorithms in graph databases, in VLDB (2013)
J. Leskovec, C. Faloutsos, Tools for large graph mining: structure and difference, in WWW (2008)
G. Li, B.C. Ooi, J. Feng, J. Wang, L. Zhou, EASE: an effective 3-in-1 keyword search method for unstructured semi-structured and structured data, in SIGMOD (2008)
Z. Liang, M. Xu, M. Teng, L. Niu, NetAlign: a web-based tool for comparison of protein interaction networks. Bioinformatics 22(17), 2175–2177 (2006)
F. Liu, C. Yu, W. Meng, A. Chowdhury, Effective keyword search in relational databases, in SIGMOD (2006)
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, J.M. Hellerstein, Distributed graphlab: a framework for machine learning and data mining in the cloud, in VLDB (2012)
S. Ma, Y. Cao, W. Fan, J. Huai, T. Wo, Capturing topology in graph pattern matching, in VLDB (2012)
G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD (2010)
F. Manola, E. Miller, RDF Primer, W3C Recommendation (2004). http://www.w3.org/TR/REC-rdf-syntax/
R.R. McCune, T. Weninger, G. Madey, Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 25:1–25:39 (2015)
A. McGregor, Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014)
F. McSherry, M. Isard, D.G. Murray, Scalability! but at what COST? in HotOS (2015)
K. Mehlhorn, S. Naher, LEDA, a platform for combinatorial and geometric computing. Commun. ACM 38(1), 96–102 (1995)
S. Melnik, H.G.-Molina, E. Rahm, Similarity flooding: a versatile graph matching algorithm and its application to schema matching, in ICDE (2002)
A.O. Mendelzon, P.T. Wood, Finding regular simple paths in graph databases. SIAM J. Comput. 24(6), 1235–1258 (1995)
M. Mongiovì, R.D. Natale, R. Giugno, A. Pulvirenti, A. Ferro, R. Sharan, Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinform. Comput. Biol. 8(2), 199–218 (2010)
D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas, Exemplar queries: give me an example of what you need, in VLDB (2014)
D.G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi, Naiad: a timely dataflow system, in SOSP (2013)
Neo4j, https://neo4j.com/
T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
S. Nijssen, J.N. Kok, The gaston tool for frequent subgraph mining, in Proceedings of the International Workshop on Graph-Based Tools (2004)
M.T. Özsu, A survey of rdf data management systems (2015). http://arxiv.org/abs/1601.00707
F. Pellegrini, J. Roman, SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs, in HPCN (1996)
E. Prud’hommeaux, A. Seaborne, SPARQL query language for RDF. W3C Recommendation (2008)
S. Ranu, B.T. Calhoun, A.K. Singh, S.J. Swamidass, Probabilistic substructure mining from small-molecule screens. Mol. Inform. 30(9), 809–815 (2011)
S. Ranu, M. Hoang, A. Singh, Mining discriminative subgraphs from global-state networks, in KDD (2013)
S. Ranu, A.K. Singh, GraphSig: a scalable approach to mining significant subgraphs in large graph databases, in ICDE (2009)
S. Ranu, A.K. Singh, Mining statistically significant molecular substructures for efficient molecular classification. J. Chem. Inf. Model. 49, 2537–2550 (2009)
S. Sakr, G. Al-Naymat, Relational processing of RDF queries: a survey. SIGMOD Rec. 38(4), 23–28 (2010)
S. Sakr, S. Elnikety, Y. He, G-SPARQL: a hybrid engine for querying large attributed graphs, in CIKM (2012)
H. Samet, J. Sankaranarayanan, H. Alborzi, Scalable network distance browsing in spatial databases, in SIGMOD (2008)
M. Sarwat, S. Elnikety, Y. He, M.F. Mokbel, Horton+: a distributed system for processing declarative reachability queries over partitioned graphs, in VLDB (2013)
H. Shang, Y. Zhang, X. Lin, J. Yu, Taming verification hardness: an efficient algorithm for testing subgraph isomorphism, in VLDB (2008)
J. Shun, G.E. Blelloch, Ligra: a lightweight graph processing framework for shared memory, in PPoPP (2013)
R. Singh, J. Xu, B. Berger, Global alignment of multiple protein interaction networks with application to functional orthology detection. PNAS 105(35), 12763–12768 (2008)
C. Sommer, Shortest-path queries in static networks. ACM Comput. Surv. 46(4), 45:1–45:31 (2014)
H. Sun, M. Srivatsa, S. Tan, Y. Li, L.M. Kaplan, S. Tao, X. Yan, Analyzing expert behaviors in collaborative networks, in KDD (2014)
Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, PathSim: meta path-based top-K similarity search in heterogeneous information networks, in VLDB (2011)
Z. Sun, H. Wang, H. Wang, B. Shao, J. Li, Efficient subgraph matching on billion node graphs, in VLDB (2012)
M. Thoma, H. Cheng, A. Gretton, J. Han, H.-P. Kriegel, A. Smola, L. Song, P.S. Yu, X. Yan, K. Borgwardt, Near-optimal supervised feature selection among frequent subgraphs, in SDM (2009)
L.T. Thomas, S.R. Valluri, K. Karlapalem, MARGIN: maximal frequent subgraph mining. ACM Trans. Knowl. Discov. Data 4(3), 10:1–10:42 (2010)
Y. Tian, R. McEachin, C. Santos, D. States, J. Patel, SAGA: a subgraph matching tool for biological graphs. Bioinformatics 23(2), 232–239 (2006)
Y. Tian, J.M. Patel, TALE: a tool for approximate large graph matching, in ICDE (2008)
H. Tong, C.-Y. Lin, Non-negative residual matrix factorization with application to graph anomaly detection, in SDM (2011)
H. Tong, C. Faloutsos, B. Gallagher, T. Eliassi-Rad, Fast best-effort pattern matching in large attributed graphs, in KDD (2007)
S. Trißl, U. Leser, Fast and practical indexing and querying of very large graphs, in SIGMOD (2007)
J.R. Ullmann, An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)
N. Vanetik, E. Gudes, Mining frequent labeled and partially labeled graph patterns, in ICDE (2004)
C. Vicknair, M. Macias, Z. Zhao, X. Nan, Y. Chen, D. Wilkins, A comparison of a graph database and a relational database: a data provenance perspective, in ACMSE (2010)
S.V.N. Vishwanathan, N.N. Schraudolph, R. Kondor, K.M. Borgwardt, Graph Kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
R.C. Wang, W. Cohen, Language-independent set expansion of named entities using the web, in ICDM (2007)
A. Wlc, R. Raman, Z. Wu, S. Hong, H. Chafi, J. Banerjee, Graph analysis: do we have to reinvent the wheel? in GRADES (2013)
K. Wilkinson, C. Sayers, H. Kuno, D. Reynolds, Efficient RDF storage and retrieval in Jena2, in SWDB (2003)
P.T. Wood, Query languages for graph databases. SIGMOD Rec. 41(1), 50–60 (2012)
Y. Xu, Y. Papakonstantinou, Efficient keyword search for smallest LCAs in XML databases, in SIGMOD (2005)
X. Yan, J. Han, gSpan: graph-based substructure pattern mining, in ICDM (2002)
X. Yan, J. Han, Closegraph: mining closed frequent graph patterns, in KDD (2003)
X. Yan, P.S. Yu, J. Han, Graph indexing: a frequent structure-based approach, in SIGMOD (2004)
X. Yan, F. Zhu, P.S. Yu, J. Han, Feature-based similarity search in graph structures. ACM Trans. Database Syst. 31(4), 1418–1453 (2006)
X. Yan, H. Cheng, J. Han, P.S. Yu, Mining significant graph patterns by scalable leap search, in SIGMOD (2008)
X. Yan, B. He, F. Zhu, J. Han, Top-K aggregation queries over large networks, in ICDE (2010)
J. Yao, B. Cui, L. Hua, Y. Huang, Keyword query reformulation on structured data, in ICDE (2012)
S. Zhang, S. Li, J. Yang, GADDI: distance index based subgraph matching in biological networks, in EDBT (2009)
S. Zhang, J. Yang, S. Li, RING: an integrated method for frequent representative subgraph mining, in ICDM (2009)
S. Zhang, J. Yang, W. Jin, SAPPER: subgraph indexing and approximate matching in large graphs, in VLDB (2010)
P. Zhao, J. Han, On graph query optimization in large networks, in VLDB (2010)
Q. Zhong, H. Li, J. Li, G. Xie, J. Tang, L. Zhou, Y. Pan, A Gauss function based approach for unbalanced ontology matching, in SIGMOD (2009)
Y. Zhu, L. Qin, J. Yu, H. Cheng, Finding top-k similar graphs in graph databases, in EDBT (2012)
L. Zou, L. Chen, M.T. Özsu, D. Zhao, Dynamic skyline queries in large graphs, in DASFAA (2010)
L. Zou, J. Mo, L. Chen, M.T. Özsu, D. Zhao, gStore: answering SPARQL queries via subgraph matching, in VLDB (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Khan, A., Ranu, S. (2017). Big-Graphs: Querying, Mining, and Beyond. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-49340-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)