ABSTRACT
Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring and mining HINs relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present Atrapos, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. Atrapos selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that Atrapos accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.
- Marc Abrams, Charles R Standridge, Ghaleb Abdulla, Stephen Williams, and Edward A Fox. 1995. Caching proxies: Limitations and potentials. Technical Report. Department of Computer Science, Virginia Polytechnic Institute & State.Google Scholar
- Charu Aggarwal, Joel L Wolf, and Philip S. Yu. 1999. Caching on the world wide web. TKDE 11, 1 (1999), 94–107.Google ScholarDigital Library
- Alfred V Aho, Peter J Denning, and Jeffrey D Ullman. 1971. Principles of optimal page replacement. JACM 18, 1 (1971), 80–93.Google ScholarDigital Library
- Hyokyung Bahn, Kern Koh, Sam H Noh, and SM Lyul. 2002. Efficient replacement of nonuniform objects in web caches. Computer 35, 6 (2002), 65–73.Google ScholarDigital Library
- Abdullah Balamash and Marwan Krunz. 2004. An overview of web caching replacement algorithms. IEEE Communications Surveys & Tutorials 6, 2 (2004), 44–56.Google ScholarDigital Library
- Paul Bieganski, John Riedl, John V Carlis, and Ernest F Retzel. 1994. Generalized suffix trees for biological sequence data: applications and implementation. In HICSS. 35–44.Google Scholar
- Bokai Cao, Xiangnan Kong, and Philip S. Yu. 2014. Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks. In ICDM. 50–59.Google Scholar
- Pei Cao and Sandy Irani. 1997. Cost-aware www proxy caching algorithms.. In Usenix, Vol. 12. 193–206.Google Scholar
- Soumen Chakrabarti. 2007. Dynamic personalized pagerank in entity-relation graphs. In WWW. 571–580.Google Scholar
- Serafeim Chatzopoulos, Kostas Patroumpas, Alexandros Zeakis, Thanasis Vergoulis, and Dimitrios Skoutas. 2020. SPHINX: a system for metapath-based entity exploration in heterogeneous information networks. VLDB 13, 12 (2020), 2913–2916.Google ScholarDigital Library
- Serafeim Chatzopoulos, Thanasis Vergoulis, Panagiotis Deligiannis, Dimitrios Skoutas, Theodore Dalamagas, and Christos Tryfonopoulos. 2021. SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining.. In EDBT. 654–657.Google Scholar
- Hongxu Chen, Hongzhi Yin, Weiqing Wang, Hao Wang, Quoc Viet Hung Nguyen, and Xue Li. 2018. PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction. In SIGKDD. 1177–1186.Google Scholar
- Hsi-Wen Chen, Hong-Han Shuai, De-Nian Yang, Wang-Chien Lee, Chuan Shi, S Yu Philip, and Ming-Syan Chen. 2021. Structure-Aware Parameter-Free Group Query via Heterogeneous Information Network Transformer. In ICDE. 2075–2080.Google Scholar
- Edith Cohen. 1998. Structure Prediction and Computation of Sparse Matrix Products. Journal of Combinatorial Optimization 2, 4 (1998), 307–332.Google ScholarCross Ref
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, 3rd Edition. MIT Press. http://mitpress.mit.edu/books/introduction-algorithmsGoogle ScholarDigital Library
- Yixiang Fang, Kai Wang, Xuemin Lin, and Wenjie Zhang. 2021. Cohesive Subgraph Search over Big Heterogeneous Information Networks: Applications, Challenges, and Solutions. In SIGMOD. 2829–2838.Google Scholar
- Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. VLDB 13, 6 (2020), 854–867.Google ScholarDigital Library
- Sheldon Finkelstein. 1982. Common expression analysis in database applications. In SIGMOD. 235–245.Google Scholar
- Annie P Foong, Yu-Hen Hu, and Dennis M Heisey. 1999. Adaptive web caching using logistic regression. In SSP. 515–524.Google Scholar
- Sidan Gao and Kemafor Anyanwu. 2013. PrefixSolve: efficiently solving multi-source multi-destination path queries on RDF graphs by sharing suffix computations. In WWW. 423–434.Google Scholar
- Gaël Guennebaud, Benoît Jacob, 2010. Eigen v3. http://eigen.tuxfamily.org.Google Scholar
- Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–37.Google ScholarDigital Library
- Xun Jian, Yue Wang, and Lei Chen. 2020. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. VLDB 13, 10 (2020), 1723–1736.Google ScholarDigital Library
- Shudong Jin and Azer Bestavros. 2001. GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams. Computer Communications 24, 2 (2001), 174–183.Google ScholarDigital Library
- Panagiotis Karras and Nikos Mamoulis. 2008. Hierarchical synopses with optimal error guarantees. ACM Trans. Database Syst. 33, 3 (2008), 18:1–18:53.Google ScholarDigital Library
- Tarun Kathuria and S Sudarshan. 2017. Efficient and provable multi-query optimization. In Symposium on Principles of Database Systems (ACM SIGMOD-SIGACT-SIGAI). 53–67.Google ScholarDigital Library
- David Kernert, Frank Köhler, and Wolfgang Lehner. 2015. SpMacho - Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation. In EDBT.Google Scholar
- K. Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-Driven Graph Summarization. Proc. VLDB Endow. 12, 4 (2018), 335–347.Google ScholarDigital Library
- Ni Lao and William W Cohen. 2010. Fast query execution for retrieval models based on path-constrained random walks. In SIGKDD. 881–888.Google Scholar
- Sangkeun Lee, Sungchan Park, Minsuk Kahng, and Sang Goo Lee. 2012. PathRank: a novel node ranking measure on a heterogeneous graph for recommender systems. CIKM.Google Scholar
- Kalev Leetaru and Philip A Schrodt. 2013. Gdelt: Global data on events, location, and tone, 1979–2012. In ISA annual convention, Vol. 2. Citeseer, 1–49.Google Scholar
- Xiang Li, Danhao Ding, Ben Kao, Yizhou Sun, and Nikos Mamoulis. 2021. Leveraging Meta-path Contexts for Classification in Heterogeneous Information Networks. In ICDE. 912–923.Google Scholar
- Yitong Li, Chuan Shi, S Yu Philip, and Qing Chen. 2014. Hrank: a path based ranking method in heterogeneous information network. In WAIM. 553–565.Google Scholar
- Jun S Liu. 2008. Monte Carlo strategies in scientific computing. Springer Science & Business Media.Google ScholarDigital Library
- Amgad Madkour, Ahmed M Aly, and Walid G Aref. 2018. Worq: Workload-driven rdf query processing. In International Semantic Web Conference (ISWC). 583–599.Google ScholarDigital Library
- Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering Meta-Paths in Large Heterogeneous Information Networks. In WWW. 754–764.Google Scholar
- Michaek Kwok-Po Ng, Xutao Li, and Yunming Ye. 2011. Multirank: co-ranking for objects and relations in multi-relational data. In SIGKDD. 1217–1225.Google Scholar
- Sadegh Nobari, Panagiotis Karras, HweeHwa Pang, and Stéphane Bressan. 2014. L-opacity: Linkage-Aware Graph Anonymization. In EDBT. 583–594.Google Scholar
- Nikolaos Papailiou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2015. Graph-aware, workload-adaptive SPARQL query caching. In SIGMOD. 1777–1792.Google Scholar
- Stefan Podlipnig and Laszlo Böszörmenyi. 2003. A survey of web cache replacement strategies. CSUR 35, 4 (2003), 374–398.Google ScholarDigital Library
- Luigi Rizzo and Lorenzo Vicisano. 2000. Replacement policies for a proxy cache. IEEE/ACM Transactions on Networking 8, 2 (2000), 158–170.Google ScholarDigital Library
- Prasan Roy, Srinivasan Seshadri, S Sudarshan, and Siddhesh Bhobe. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD. 249–260.Google Scholar
- Angelo Salatino, Francesco Osborne, and Enrico Motta. 2022. CSO Classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics. International Journal on Digital Libraries 23, 1 (2022), 91–110.Google ScholarDigital Library
- Timos Sellis and Subrata Ghosh. 1990. On the multiple-query optimization problem. Transactions on Knowledge and Data Engineering (TKDE) 2, 02 (1990), 262–266.Google ScholarDigital Library
- Timos K Sellis. 1988. Multiple-query optimization. TODS 13, 1 (1988), 23–52.Google ScholarDigital Library
- Wei Shen, Jiawei Han, and Jianyong Wang. 2014. A probabilistic model for linking named entities in web text with heterogeneous information networks. In SIGMOD. 1199–1210.Google Scholar
- Baoxu Shi and Tim Weninger. 2014. Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks. In ICDM. 488–495.Google Scholar
- Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. TKDE 31, 2 (2019), 357–370.Google ScholarDigital Library
- Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. TKDE 26, 10 (2014), 2479–2492.Google ScholarCross Ref
- Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In EDBT. 180–191.Google Scholar
- Chuan Shi, Yitong Li, Philip S. Yu, and Bin Wu. 2016. Constrained-meta-path-based ranking in heterogeneous information network. Knowledge and Information Systems 49, 2 (2016), 719–747.Google ScholarDigital Library
- Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2016. A survey of heterogeneous information network analysis. TKDE 29, 1 (2016), 17–37.Google ScholarDigital Library
- Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In CIKM. 699–708.Google Scholar
- Junho Shim, Peter Scheuermann, and Radek Vingralek. 1999. Proxy cache algorithms: Design, implementation, and performance. TKDE 11, 4 (1999), 549–562.Google ScholarDigital Library
- Shudong Jin and A. Bestavros. 2000. Popularity-aware greedy dual-size Web proxy caching algorithms. In ICDCS. 254–261.Google Scholar
- Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, and Peter J. Haas. 2019. MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. In SIGMOD. 1607–1623.Google ScholarDigital Library
- David Starobinski and David Tse. 2001. Probabilistic methods for web caching. Performance Evaluation 46, 2-3 (2001), 125–137.Google ScholarDigital Library
- Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. arXiv preprint arXiv:1201.6563 (2012).Google Scholar
- Yizhou Sun, Jiawei Han, Xifeng Yan, and Philip S Yu. 2012. Mining knowledge from interconnected data: a heterogeneous information network analysis approach. VLDB 5, 12 (2012), 2022–2023.Google ScholarDigital Library
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. PathSim: Meta Path-Based Top-k Similarity Search in Heterogeneous Information Networks. PVLDB 4, 11 (2011), 992–1003.Google ScholarDigital Library
- Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In EDBT. 565–576.Google Scholar
- Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. TKDD 7, 3 (2013), 1–23.Google ScholarDigital Library
- Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In SIGKDD. 797–806.Google Scholar
- Wojciech Szpankowski. 1993. A generalized suffix tree and its (un) expected asymptotic behaviors. SICOMP 22, 6 (1993), 1176–1198.Google ScholarDigital Library
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In SIGKDD. 990–998.Google Scholar
- Esko Ukkonen. 1995. On-line construction of suffix trees. Algorithmica 14, 3 (1995), 249–260.Google ScholarDigital Library
- Yue Wang, Zhe Wang, Ziyuan Zhao, Zijian Li, Xun Jian, Lei Chen, and Jianchun Song. 2020. HowSim: A General and Effective Similarity Measure on Heterogeneous Information Networks. In ICDE. 1954–1957.Google Scholar
- Jim Webber. 2012. A programmatic introduction to neo4j. In SPLASH. 217–218.Google Scholar
- Tao Xie, Yangjun Xu, Liang Chen, Yang Liu, and Zibin Zheng. 2021. Sequential Recommendation on Dynamic Heterogeneous Information Network. In ICDE. 2105–2110.Google Scholar
- P. Yu Y. Xiong, Y. Zhu. 2014. Top-k Similarity Join in Heterogeneous Information Networks. TKDE 27, 6 (2014), 1710–1723.Google Scholar
- Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In ICDE. 901–912.Google Scholar
- Neal Young. 1994. Thek-server dual and loose competitiveness for paging. Algorithmica 11, 6 (1994), 525–541.Google ScholarDigital Library
- Yongyang Yu, Mingjie Tang, Walid G. Aref, Qutaibah M. Malluhi, Mostafa Abbas, and Mourad Ouzzani. 2017. In-memory distributed matrix computation processing & optimization. In ICDE. 1047–1058.Google Scholar
- Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In SIGKDD. 338–346.Google Scholar
Index Terms
- Atrapos: Real-time Evaluation of Metapath Query Workloads
Recommendations
Dynamic Materialization of Query Views for Data Warehouse Workloads
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data EngineeringA materialized view, or Materialized Query Table (MQT), is an auxiliary table with precomputed data that can be used to significantly improve the performance of a database query. Previous research efforts have focused onfinding the best candidate MQT ...
Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataWe study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Comments