skip to main content
10.1145/3543507.3583322acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Atrapos: Real-time Evaluation of Metapath Query Workloads

Published:30 April 2023Publication History

ABSTRACT

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring and mining HINs relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present Atrapos, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. Atrapos selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that Atrapos  accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.

References

  1. Marc Abrams, Charles R Standridge, Ghaleb Abdulla, Stephen Williams, and Edward A Fox. 1995. Caching proxies: Limitations and potentials. Technical Report. Department of Computer Science, Virginia Polytechnic Institute & State.Google ScholarGoogle Scholar
  2. Charu Aggarwal, Joel L Wolf, and Philip S. Yu. 1999. Caching on the world wide web. TKDE 11, 1 (1999), 94–107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alfred V Aho, Peter J Denning, and Jeffrey D Ullman. 1971. Principles of optimal page replacement. JACM 18, 1 (1971), 80–93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hyokyung Bahn, Kern Koh, Sam H Noh, and SM Lyul. 2002. Efficient replacement of nonuniform objects in web caches. Computer 35, 6 (2002), 65–73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Abdullah Balamash and Marwan Krunz. 2004. An overview of web caching replacement algorithms. IEEE Communications Surveys & Tutorials 6, 2 (2004), 44–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paul Bieganski, John Riedl, John V Carlis, and Ernest F Retzel. 1994. Generalized suffix trees for biological sequence data: applications and implementation. In HICSS. 35–44.Google ScholarGoogle Scholar
  7. Bokai Cao, Xiangnan Kong, and Philip S. Yu. 2014. Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks. In ICDM. 50–59.Google ScholarGoogle Scholar
  8. Pei Cao and Sandy Irani. 1997. Cost-aware www proxy caching algorithms.. In Usenix, Vol. 12. 193–206.Google ScholarGoogle Scholar
  9. Soumen Chakrabarti. 2007. Dynamic personalized pagerank in entity-relation graphs. In WWW. 571–580.Google ScholarGoogle Scholar
  10. Serafeim Chatzopoulos, Kostas Patroumpas, Alexandros Zeakis, Thanasis Vergoulis, and Dimitrios Skoutas. 2020. SPHINX: a system for metapath-based entity exploration in heterogeneous information networks. VLDB 13, 12 (2020), 2913–2916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Serafeim Chatzopoulos, Thanasis Vergoulis, Panagiotis Deligiannis, Dimitrios Skoutas, Theodore Dalamagas, and Christos Tryfonopoulos. 2021. SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining.. In EDBT. 654–657.Google ScholarGoogle Scholar
  12. Hongxu Chen, Hongzhi Yin, Weiqing Wang, Hao Wang, Quoc Viet Hung Nguyen, and Xue Li. 2018. PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction. In SIGKDD. 1177–1186.Google ScholarGoogle Scholar
  13. Hsi-Wen Chen, Hong-Han Shuai, De-Nian Yang, Wang-Chien Lee, Chuan Shi, S Yu Philip, and Ming-Syan Chen. 2021. Structure-Aware Parameter-Free Group Query via Heterogeneous Information Network Transformer. In ICDE. 2075–2080.Google ScholarGoogle Scholar
  14. Edith Cohen. 1998. Structure Prediction and Computation of Sparse Matrix Products. Journal of Combinatorial Optimization 2, 4 (1998), 307–332.Google ScholarGoogle ScholarCross RefCross Ref
  15. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, 3rd Edition. MIT Press. http://mitpress.mit.edu/books/introduction-algorithmsGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yixiang Fang, Kai Wang, Xuemin Lin, and Wenjie Zhang. 2021. Cohesive Subgraph Search over Big Heterogeneous Information Networks: Applications, Challenges, and Solutions. In SIGMOD. 2829–2838.Google ScholarGoogle Scholar
  17. Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. VLDB 13, 6 (2020), 854–867.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sheldon Finkelstein. 1982. Common expression analysis in database applications. In SIGMOD. 235–245.Google ScholarGoogle Scholar
  19. Annie P Foong, Yu-Hen Hu, and Dennis M Heisey. 1999. Adaptive web caching using logistic regression. In SSP. 515–524.Google ScholarGoogle Scholar
  20. Sidan Gao and Kemafor Anyanwu. 2013. PrefixSolve: efficiently solving multi-source multi-destination path queries on RDF graphs by sharing suffix computations. In WWW. 423–434.Google ScholarGoogle Scholar
  21. Gaël Guennebaud, Benoît Jacob, 2010. Eigen v3. http://eigen.tuxfamily.org.Google ScholarGoogle Scholar
  22. Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xun Jian, Yue Wang, and Lei Chen. 2020. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. VLDB 13, 10 (2020), 1723–1736.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shudong Jin and Azer Bestavros. 2001. GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams. Computer Communications 24, 2 (2001), 174–183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Panagiotis Karras and Nikos Mamoulis. 2008. Hierarchical synopses with optimal error guarantees. ACM Trans. Database Syst. 33, 3 (2008), 18:1–18:53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tarun Kathuria and S Sudarshan. 2017. Efficient and provable multi-query optimization. In Symposium on Principles of Database Systems (ACM SIGMOD-SIGACT-SIGAI). 53–67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David Kernert, Frank Köhler, and Wolfgang Lehner. 2015. SpMacho - Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation. In EDBT.Google ScholarGoogle Scholar
  28. K. Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-Driven Graph Summarization. Proc. VLDB Endow. 12, 4 (2018), 335–347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ni Lao and William W Cohen. 2010. Fast query execution for retrieval models based on path-constrained random walks. In SIGKDD. 881–888.Google ScholarGoogle Scholar
  30. Sangkeun Lee, Sungchan Park, Minsuk Kahng, and Sang Goo Lee. 2012. PathRank: a novel node ranking measure on a heterogeneous graph for recommender systems. CIKM.Google ScholarGoogle Scholar
  31. Kalev Leetaru and Philip A Schrodt. 2013. Gdelt: Global data on events, location, and tone, 1979–2012. In ISA annual convention, Vol. 2. Citeseer, 1–49.Google ScholarGoogle Scholar
  32. Xiang Li, Danhao Ding, Ben Kao, Yizhou Sun, and Nikos Mamoulis. 2021. Leveraging Meta-path Contexts for Classification in Heterogeneous Information Networks. In ICDE. 912–923.Google ScholarGoogle Scholar
  33. Yitong Li, Chuan Shi, S Yu Philip, and Qing Chen. 2014. Hrank: a path based ranking method in heterogeneous information network. In WAIM. 553–565.Google ScholarGoogle Scholar
  34. Jun S Liu. 2008. Monte Carlo strategies in scientific computing. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Amgad Madkour, Ahmed M Aly, and Walid G Aref. 2018. Worq: Workload-driven rdf query processing. In International Semantic Web Conference (ISWC). 583–599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering Meta-Paths in Large Heterogeneous Information Networks. In WWW. 754–764.Google ScholarGoogle Scholar
  37. Michaek Kwok-Po Ng, Xutao Li, and Yunming Ye. 2011. Multirank: co-ranking for objects and relations in multi-relational data. In SIGKDD. 1217–1225.Google ScholarGoogle Scholar
  38. Sadegh Nobari, Panagiotis Karras, HweeHwa Pang, and Stéphane Bressan. 2014. L-opacity: Linkage-Aware Graph Anonymization. In EDBT. 583–594.Google ScholarGoogle Scholar
  39. Nikolaos Papailiou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2015. Graph-aware, workload-adaptive SPARQL query caching. In SIGMOD. 1777–1792.Google ScholarGoogle Scholar
  40. Stefan Podlipnig and Laszlo Böszörmenyi. 2003. A survey of web cache replacement strategies. CSUR 35, 4 (2003), 374–398.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Luigi Rizzo and Lorenzo Vicisano. 2000. Replacement policies for a proxy cache. IEEE/ACM Transactions on Networking 8, 2 (2000), 158–170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Prasan Roy, Srinivasan Seshadri, S Sudarshan, and Siddhesh Bhobe. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD. 249–260.Google ScholarGoogle Scholar
  43. Angelo Salatino, Francesco Osborne, and Enrico Motta. 2022. CSO Classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics. International Journal on Digital Libraries 23, 1 (2022), 91–110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Timos Sellis and Subrata Ghosh. 1990. On the multiple-query optimization problem. Transactions on Knowledge and Data Engineering (TKDE) 2, 02 (1990), 262–266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Timos K Sellis. 1988. Multiple-query optimization. TODS 13, 1 (1988), 23–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wei Shen, Jiawei Han, and Jianyong Wang. 2014. A probabilistic model for linking named entities in web text with heterogeneous information networks. In SIGMOD. 1199–1210.Google ScholarGoogle Scholar
  47. Baoxu Shi and Tim Weninger. 2014. Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks. In ICDM. 488–495.Google ScholarGoogle Scholar
  48. Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. TKDE 31, 2 (2019), 357–370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. TKDE 26, 10 (2014), 2479–2492.Google ScholarGoogle ScholarCross RefCross Ref
  50. Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In EDBT. 180–191.Google ScholarGoogle Scholar
  51. Chuan Shi, Yitong Li, Philip S. Yu, and Bin Wu. 2016. Constrained-meta-path-based ranking in heterogeneous information network. Knowledge and Information Systems 49, 2 (2016), 719–747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2016. A survey of heterogeneous information network analysis. TKDE 29, 1 (2016), 17–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In CIKM. 699–708.Google ScholarGoogle Scholar
  54. Junho Shim, Peter Scheuermann, and Radek Vingralek. 1999. Proxy cache algorithms: Design, implementation, and performance. TKDE 11, 4 (1999), 549–562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Shudong Jin and A. Bestavros. 2000. Popularity-aware greedy dual-size Web proxy caching algorithms. In ICDCS. 254–261.Google ScholarGoogle Scholar
  56. Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, and Peter J. Haas. 2019. MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. In SIGMOD. 1607–1623.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. David Starobinski and David Tse. 2001. Probabilistic methods for web caching. Performance Evaluation 46, 2-3 (2001), 125–137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. arXiv preprint arXiv:1201.6563 (2012).Google ScholarGoogle Scholar
  59. Yizhou Sun, Jiawei Han, Xifeng Yan, and Philip S Yu. 2012. Mining knowledge from interconnected data: a heterogeneous information network analysis approach. VLDB 5, 12 (2012), 2022–2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. PathSim: Meta Path-Based Top-k Similarity Search in Heterogeneous Information Networks. PVLDB 4, 11 (2011), 992–1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In EDBT. 565–576.Google ScholarGoogle Scholar
  62. Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. TKDD 7, 3 (2013), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In SIGKDD. 797–806.Google ScholarGoogle Scholar
  64. Wojciech Szpankowski. 1993. A generalized suffix tree and its (un) expected asymptotic behaviors. SICOMP 22, 6 (1993), 1176–1198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In SIGKDD. 990–998.Google ScholarGoogle Scholar
  66. Esko Ukkonen. 1995. On-line construction of suffix trees. Algorithmica 14, 3 (1995), 249–260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yue Wang, Zhe Wang, Ziyuan Zhao, Zijian Li, Xun Jian, Lei Chen, and Jianchun Song. 2020. HowSim: A General and Effective Similarity Measure on Heterogeneous Information Networks. In ICDE. 1954–1957.Google ScholarGoogle Scholar
  68. Jim Webber. 2012. A programmatic introduction to neo4j. In SPLASH. 217–218.Google ScholarGoogle Scholar
  69. Tao Xie, Yangjun Xu, Liang Chen, Yang Liu, and Zibin Zheng. 2021. Sequential Recommendation on Dynamic Heterogeneous Information Network. In ICDE. 2105–2110.Google ScholarGoogle Scholar
  70. P. Yu Y. Xiong, Y. Zhu. 2014. Top-k Similarity Join in Heterogeneous Information Networks. TKDE 27, 6 (2014), 1710–1723.Google ScholarGoogle Scholar
  71. Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In ICDE. 901–912.Google ScholarGoogle Scholar
  72. Neal Young. 1994. Thek-server dual and loose competitiveness for paging. Algorithmica 11, 6 (1994), 525–541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yongyang Yu, Mingjie Tang, Walid G. Aref, Qutaibah M. Malluhi, Mostafa Abbas, and Mourad Ouzzani. 2017. In-memory distributed matrix computation processing & optimization. In ICDE. 1047–1058.Google ScholarGoogle Scholar
  74. Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In SIGKDD. 338–346.Google ScholarGoogle Scholar

Index Terms

  1. Atrapos: Real-time Evaluation of Metapath Query Workloads

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '23: Proceedings of the ACM Web Conference 2023
        April 2023
        4293 pages
        ISBN:9781450394161
        DOI:10.1145/3543507

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 April 2023

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%
      • Article Metrics

        • Downloads (Last 12 months)193
        • Downloads (Last 6 weeks)25

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format