ABSTRACT
The increasing availability and usage of Knowledge Graphs (KGs) on the Web calls for scalable and general-purpose solutions to store this type of data structures. We propose Trident, a novel storage architecture for very large KGs on centralized systems. Trident uses several interlinked data structures to provide fast access to nodes and edges, with the physical storage changing depending on the topology of the graph to reduce the memory footprint. In contrast to single architectures designed for single tasks, our approach offers an interface with few low-level and general-purpose primitives that can be used to implement tasks like SPARQL query answering, reasoning, or graph analytics. Our experiments show that Trident can handle graphs with 1011 edges using inexpensive hardware, delivering competitive performance on multiple workloads.
- Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating Compression and Execution in Column-Oriented Database Systems. In Proceedings of SIGMOD. ACM, New York, NY, USA, 671–682.Google ScholarDigital Library
- Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal 18, 2 (2009), 385–406.Google ScholarDigital Library
- Ibrahim Abdelaziz, Razen Harbi, Zuhair Khayyat, and Panos Kalnis. 2017. A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data. PVLDB 10, 13 (2017), 2049–2060.Google ScholarDigital Library
- I. Abdelaziz, R. Harbi, S. Salihoglu, and P. Kalnis. 2017. Combining vertex-centric graph processing with SPARQL for large-scale RDF data analytics. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3374–3388.Google ScholarDigital Library
- Günes Aluç, M. Tamer Özsu, Khuzaima Daudjee, and Olaf Hartig. 2015. Executing queries over schemaless RDF databases. In Proceedings of ICDE. IEEE, Seoul, South Korea, 807–818.Google ScholarCross Ref
- Bernd Amann, Olivier Curé, and Hubert Naacke. 2018. Distributed SPARQL Query Processing: a Case Study with Apache Spark. John Wiley & Sons, Ltd, Hoboken, NJ, USA, Chapter 2, 21–55.Google Scholar
- Grigoris Antoniou, Sotiris Batsakis, Raghava Mutharaju, Jeff Z. Pan, Guilin Qi, Ilias Tachmazidis, Jacopo Urbani, and Zhangquan Zhou. 2018. A survey of large-scale reasoning on the Web of data. The Knowledge Engineering Review 33 (2018), 1–43.Google ScholarCross Ref
- Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. 2010. Matrix ”Bit” Loaded: A Scalable Lightweight Join Query Processor for RDF Data. In Proceedings of WWW. ACM, New York, NY, USA, 41–50.Google ScholarDigital Library
- A. Azzam, S. Kirrane, and A. Polleres. 2018. Towards Making Distributed RDF Processing FLINKer. In 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data). IEEE, Los Alamitos, CA, USA, 9–16.Google Scholar
- Liu Baolin and Hu Bo. 2007. HPRD: A High Performance RDF Database. In Proceedings of NPC. Springer, Cham, Switzerland, 364–374.Google Scholar
- David Beckett. 2001. The Design and Implementation of the Redland RDF Application Framework. In Proceedings of WWW. ACM, New York, NY, USA, 449–456.Google ScholarDigital Library
- Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semantic Web 2, 1 (2011), 33–42.Google ScholarDigital Library
- Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia-A crystallization point for the Web of Data. Web Semantics: science, services and agents on the world wide web 7, 3(2009), 154–165.Google Scholar
- Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec. 2013. Entity Recommendations in Web Search. In Proceedings of ISWC. Springer, Heidelberg, Germany, 33–48.Google Scholar
- Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of NIPS. NIPS Proceedings, Lake Tahoe, NV, USA, 2787–2795.Google Scholar
- Mihaela A. Bornea, Julian Dolby, Anastasios Kementsietsidis, Kavitha Srinivas, Patrick Dantressangle, Octavian Udrea, and Bishwaranjan Bhattacharjee. 2013. Building an Efficient RDF Store over a Relational Database. In Proceedings of SIGMOD. ACM, New York, NY, USA, 121–132.Google ScholarDigital Library
- Alison Callahan, José Cruz-Toledo, Peter Ansell, and Michel Dumontier. 2013. Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. In Proceedings of ESWC. Springer, Heidelberg, Germany, 200–212.Google Scholar
- Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion Edges: Graph Processing at Facebook-Scale. PVLDB 8, 12 (2015), 1804–1815.Google ScholarDigital Library
- Eugene Inseok Chong, Souripriya Das, George Eadon, and Jagannathan Srinivasan. 2005. An Efficient SQL-Based RDF Querying Scheme. In Proceedings of VLDB. VLDB Endowment, Trondheim, Norway, 1216–1227.Google Scholar
- DATASTAX, Inc.2019. Titan: Distributed Graph Database. http://titan.thinkaurelius.com/Google Scholar
- Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel. 2015. The Case Against Specialized Graph Analytics Engines. In Proceedings of CIDR. www.cidrdb.org, Asilomar, CA, USA.Google Scholar
- David C. Faye, Olivier Curé, and Guillaume Blin. 2011. A survey of RDF storage approaches. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées 15(2011), 11–35.Google Scholar
- Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. 2013. Binary RDF representation for publication and exchange (HDT). Web Semantics: Science, Services and Agents on the World Wide Web 19 (2013), 22–41.Google ScholarDigital Library
- George H.L. Fletcher and Peter W. Beck. 2009. Scalable Indexing of RDF Graphs for Efficient Join Processing. In Proceedings of CIKM. ACM, New York, NY, USA, 1513–1516.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of OSDI. USENIX, Hollywood, CA, USA, 17–30.Google Scholar
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In Proceedings of OSDI. USENIX, Broomfield, CO, USA, 599–613.Google Scholar
- Jim Gray. 1981. The Transaction Concept: Virtues and Limitations (Invited Paper). In Proceedings of VLDB. VLDB Endowment, Cannes, France, 144–154.Google Scholar
- Mark Greaves and Peter Mika. 2008. Semantic Web and Web 2.0. Web Semantics: Science, Services and Agents on the World Wide Web 6, 1(2008), 1–3.Google ScholarDigital Library
- R. Guha, Rob McCool, and Eric Miller. 2003. Semantic Search. In Proceedings of WWW. ACM, New York, NY, USA, 700–709.Google Scholar
- Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3, 2(2005), 158–182.Google ScholarDigital Library
- Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: A Distributed Shared-Nothing RDF Engine Based on Asynchronous Message Passing. In Proceedings of SIGMOD. ACM, New York, NY, USA, 289–300.Google ScholarDigital Library
- Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An Experimental Comparison of Pregel-like Graph Processing Systems. PVLDB 7, 12 (2014), 1047–1058.Google ScholarDigital Library
- Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In Proceedings of EMNLP. ACL, Brussels, Belgium, 139–144.Google ScholarCross Ref
- Razen Harbi, Ibrahim Abdelaziz, Panos Kalnis, Nikos Mamoulis, Yasser Ebrahim, and Majed Sahli. 2016. Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. The VLDB Journal 25, 3 (2016), 355–380.Google ScholarDigital Library
- Steve Harris, Nick Lamb, and Nigel Shadbolt. 2009. 4store: The Design and Implementation of a Clustered RDF Store. In 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009). CEUR Workshop Proceedings, Washington, DC, USA, 94–109.Google Scholar
- Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. 2013. SPARQL 1.1 Query Language. http://www.w3.org/TR/sparql11-queryGoogle Scholar
- Andreas Harth. 2012. Billion Triples Challenge data set. http://km.aifb.kit.edu/projects/btc-2012/Google Scholar
- Andreas Harth, Jürgen Umbrich, Aidan Hogan, and Stefan Decker. 2007. YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In Proceedings of ISWC. Springer, Heidelberg, Germany, 211–224.Google Scholar
- Olaf Hartig and Jorge Pérez. 2018. Semantics and Complexity of GraphQL. In Proceedings of WWW. International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 1155–1164.Google ScholarDigital Library
- Patrick Hayes. 2004. RDF Semantics, W3C Recommendation. http://www.w3.org/TR/rdf-mt/Google Scholar
- Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum. 2008. NAGA: Searching and Ranking Knowledge. In Proceedings of ICDE. IEEE, Cancun, Mexico, 953–962.Google ScholarDigital Library
- Jinha Kim, Hyungyu Shin, Wook-Shin Han, Sungpack Hong, and Hassan Chafi. 2015. Taming Subgraph Isomorphism for RDF Query Processing. PVLDB 8, 11 (2015), 1238–1249.Google Scholar
- Kisung Lee and Ling Liu. 2013. Scaling queries over big RDF graphs with semantic hash partitioning. PVLDB 6, 14 (2013), 1894–1905.Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google Scholar
- Jure Leskovec and Rok Sosič. 2016. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1(2016), 1.Google ScholarDigital Library
- Li Ma, Zhong Su, Yue Pan, Li Zhang, and Tao Liu. 2004. RStar: An RDF Storage and Query System for Enterprise Resource Management. In Proceedings of CIKM. ACM, New York, NY, USA, 484–491.Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of SIGMOD. ACM, New York, NY, USA, 135–146.Google ScholarDigital Library
- Brian McBride. 2001. Jena: Implementing the RDF Model and Syntax Specification. In Semantic Web Workshop 2001. CEUR Workshop Proceedings, Hong Kong, 23–28.Google Scholar
- Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys (CSUR) 48, 2 (2015), 25.Google ScholarDigital Library
- Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The Graph Structure in the Web – Analyzed on Different Aggregation Levels. The Journal of Web Science 1, 1 (2015), 33–47.Google ScholarCross Ref
- G. E. Modoni, M. Sacco, and W. Terkaj. 2014. A Survey of RDF Store Solutions. In Proceedings of ICE. IEEE, Bergamo, Italy, 1–7.Google Scholar
- Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In Proceedings of AAAI. AAAI Press, Québec, Canada, 129–137.Google Scholar
- Neo4j, Inc.2019. Neo4j Graph Platform. https://neo4j.com/Google Scholar
- Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In Proceedings of ICDE. IEEE, Hannover, Germany, 984–994.Google ScholarDigital Library
- Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19, 1 (2010), 91–113.Google ScholarDigital Library
- Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2015. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1 (2015), 11–33.Google ScholarCross Ref
- Natalya F. Noy, Nigam H. Shah, Patricia L. Whetzel, Benjamin Dai, Michael Dorf, Nicholas Griffith, Clement Jonquet, Daniel L. Rubin, Margaret-Anne Storey, and Christopher G. Chute. 2009. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic acids research 37, suppl_2 (2009), W170–W173.Google Scholar
- Objectivity Inc.2019. InfiniteGraph. https://www.objectivity.com/products/infinitegraph/Google Scholar
- OpenLink Software. 2019. Virtuoso RDF Engine. https://virtuoso.openlinksw.com/Google Scholar
- M. Tamer Özsu. 2016. A survey of RDF data management systems. Frontiers of Computer Science 10, 3 (2016), 418–432.Google ScholarDigital Library
- Soumajit Pal and Jacopo Urbani. 2017. Enhancing Knowledge Graph Completion By Embedding Correlations. In Proceedings of CIKM. ACM, New York, NY, USA, 2247–2250.Google ScholarDigital Library
- Peng Peng, Lei Zou, M. Tamer Özsu, Lei Chen, and Dongyan Zhao. 2016. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal 25, 2 (2016), 243–268.Google ScholarDigital Library
- Yonathan Perez, Rok Sosič, Arijit Banerjee, Rohan Puttagunta, Martin Raison, Pararth Shah, and Jure Leskovec. 2015. Ringo: Interactive Graph Analytics on Big-Memory Machines. In Proceedings of SIGMOD. ACM, New York, NY, USA, 1105–1110.Google ScholarDigital Library
- Minh-Duc Pham and Peter Boncz. 2016. Exploiting Emergent Schemas to Make RDF Systems More Efficient. In Proceedings of ISWC. Springer, Basel, Switzerland, 463–479.Google Scholar
- Minh-Duc Pham, Linnea Passing, Orri Erling, and Peter Boncz. 2015. Deriving an Emergent Relational Schema from RDF Data. In Proceedings of WWW. International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 864–874.Google ScholarDigital Library
- Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, and Xuemin Lin. 2014. Scalable Big Graph Processing in MapReduce. In Proceedings of SIGMOD. ACM, New York, NY, USA, 827–838.Google ScholarDigital Library
- Nicole Redaschi and UniProt Consortium. 2009. UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web. Nature Precedings (2009).Google Scholar
- Laurens Rietveld and Rinke Hoekstra. 2014. YASGUI: Feeling the Pulse of Linked Data. In Proceedings of EKAW. Springer, Basel, Switzerland, 441–452.Google Scholar
- Sherif Sakr and Ghazi Al-Naymat. 2010. Relational Processing of RDF Queries: A Survey. SIGMOD Record 38, 4 (2010), 23–28.Google ScholarDigital Library
- Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2016. S2RDF: RDF Querying with SPARQL on Spark. PVLDB 9, 10 (2016), 804–815.Google ScholarDigital Library
- Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of SIGMOD. ACM, New York, NY, USA, 505–516.Google ScholarDigital Library
- W. Shen, J. Wang, and J. Han. 2015. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2(2015), 443–460.Google ScholarCross Ref
- Lefteris Sidirourgos, Romulo Goncalves, Martin Kersten, Niels Nes, and Stefan Manegold. 2008. Column-Store Support for RDF Data Management: not all swans are white. PVLDB 1, 2 (2008), 1553–1563.Google ScholarDigital Library
- Sparsity Technologies. 2019. Sparksee. http://sparsity-technologies.com/Google Scholar
- Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2008. YAGO: A Large Ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web 6, 3(2008), 203–217.Google ScholarDigital Library
- Systap. 2019. BlazeGraph. https://blazegraph.com/Google Scholar
- Niket Tandon, Gerard de Melo, Fabian M. Suchanek, and Gerhard Weikum. 2014. WebChild: Harvesting and Organizing Commonsense Knowledge from the Web. In Proceedings of WSDM. ACM, New York, NY, USA, 523–532.Google ScholarDigital Library
- Alberto Tonon, Michele Catasta, Roman Prokofyev, Gianluca Demartini, Karl Aberer, and Philippe Cudre-Mauroux. 2016. Contextualized ranking of entity types based on knowledge graphs. Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016), 170–183.Google ScholarDigital Library
- Jacopo Urbani, Sourav Dutta, Sairam Gurajada, and Gerhard Weikum. 2016. KOGNAC: Efficient Encoding of Large Knowledge Graphs. In Proceedings of IJCAI. AAAI Press, New York, NY, USA, 3896–3902.Google Scholar
- Jacopo Urbani and Ceriel Jacobs. 2020. Adaptive Low-level Storage of Very Large Knowledge Graphs. arxiv:2001.09078Google Scholar
- Jacopo Urbani, Ceriel Jacobs, and Markus Krötzsch. 2016. Column-Oriented Datalog Materialization for Large Knowledge Graphs. In Proceedings of AAAI. AAAI Press, Phoenix, AZ, USA, 258–264.Google Scholar
- Jacopo Urbani, Jason Maassen, Niels Drost, Frank Seinstra, and Henri Bal. 2013. Scalable RDF data compression with MapReduce. Concurrency and Computation: Practice and Experience 25, 1(2013), 24–39.Google ScholarDigital Library
- Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, and Pieter Colpaert. 2016. Triple Pattern Fragments: a low-cost knowledge graph interface for the Web. Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016), 184–206.Google ScholarDigital Library
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledge base. Commun. ACM 57, 10 (2014), 78–85.Google ScholarDigital Library
- Cathrin Weiss, Panagiotis Karras, and Abraham Bernstein. 2008. Hexastore: sextuple indexing for semantic web data management. PVLDB 1, 1 (2008), 1008–1019.Google ScholarDigital Library
- Hugh E. Williams and Justin Zobel. 1999. Compressing Integers for Fast File Access. Comput. J. 42, 3 (1999), 193–201.Google ScholarCross Ref
- Marcin Wylot, Manfred Hauswirth, Philippe Cudré-Mauroux, and Sherif Sakr. 2018. RDF Data Storage and Query Processing Schemes: A Survey. ACM Computing Surveys (CSUR) 51, 4 (2018), 84:1–84:36.Google ScholarDigital Library
- Mohamed Yahya, Denilson Barbosa, Klaus Berberich, Qiuyue Wang, and Gerhard Weikum. 2016. Relationship Queries on Extended Knowledge Graphs. In Proceedings of WSDM. ACM, New York, NY, USA, 605–614.Google ScholarDigital Library
- Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling Liu. 2013. TripleBit: a fast and compact system for large scale RDF data. PVLDB 6, 7 (2013), 517–528.Google ScholarDigital Library
- Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. PVLDB 6, 4 (2013), 265–276.Google ScholarDigital Library
- Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. gStore: a graph-based SPARQL query engine. The VLDB Journal 23, 4 (2014), 565–590.Google ScholarDigital Library
Index Terms
- Adaptive Low-level Storage of Very Large Knowledge Graphs
Recommendations
Large Induced Forests in Graphs
In this article, we prove three theorems. The first is that every connected graph of order n and size m has an induced forest of order at least 8n-2m-2/9 with equality if and only if such a graph is obtained from a tree by expanding every vertex to a ...
On low degree k-ordered graphs
A simple graph G is k-ordered (respectively, k-ordered hamiltonian) if, for any sequence of k distinct vertices v"1,...,v"k of G, there exists a cycle (respectively, a hamiltonian cycle) in G containing these k vertices in the specified order. In 1997 ...
Connectivity of k-extendable graphs with large k
Discrete mathematics and theoretical computer science (DMTCS)Let G be a simple connected graph on 2n vertices with perfect matching. For a given positive integer k (0 ≤ k ≤ n - 1), G is k-extendable if any matching of size k in G is contained in a perfect matching of G. It is proved that if G is a k-extendable ...
Comments