Abstract
The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also require online updates with flexible control over multi-user isolation levels and data consistency. The challenge lies in meeting these requirements while retaining the capability for fast querying.
This paper presents a comprehensive solution that is based on an extended deferred-indexing method with integrated versioning. The version store enables time-travel queries that are efficiently processed without adversely affecting queries on the current data. For flexible consistency, transactional concurrency control is provided with options for either snapshot isolation or full serializability. All methods are integrated in an extension of the RDF-3X system, and their very good performance for both queries and updates is demonstrated by measurements of multi-user workloads with real-life data as well as stress-test synthetic loads.
- D. J. Abadi et al. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422, 2007. Google ScholarDigital Library
- A. Adya, B. Liskov, and P. E. O'Neil. Generalized isolation level definitions. In ICDE, pages 67--78, 2000. Google ScholarDigital Library
- M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. Dogma: A disk-oriented graph matching algorithm for rdf databases. In International Semantic Web Conference, pages 97--113, 2009. Google ScholarDigital Library
- E. I. Chong et al. An efficient SQL-based RDF querying scheme. In VLDB, pages 1216--1227, 2005. Google ScholarDigital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001. 2nd Edition. Google ScholarDigital Library
- G. Graefe. Hierarchical locking in b-tree indexes. In BTW, pages 18--42, 2007.Google Scholar
- A. Harth et al. YARS2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, pages 211--224, 2007. Google ScholarDigital Library
- R. Jain. The Art of Computer Systems Performance Analysis. Wiley, 1991.Google Scholar
- Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.Google Scholar
- D. B. Lomet et al. Transaction time support inside a database engine. In ICDE, page 35, 2006. Google ScholarDigital Library
- D. B. Lomet and B. Salzberg. The performance of a multiversion access method. In SIGMOD, 1990. Google ScholarDigital Library
- T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. PVLDB, 1(1):647--659, 2008. Google ScholarDigital Library
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB J., 2009. Google ScholarDigital Library
- T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In SIGMOD, 2009. Google ScholarDigital Library
- P. E. O'Neil et al. The log-structured merge-tree (lsm-tree). Acta Inf., 33(4):351--385, 1996. Google ScholarDigital Library
- A. Pugliese, O. Udrea, and V. S. Subrahmanian. Scaling rdf with time. In WWW, pages 605--614, 2008. Google ScholarDigital Library
- RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x.Google Scholar
- D. P. Reed. Implementing atomic actions on decentralized data. TOCS, 1(1):3--23, 1983. Google ScholarDigital Library
- B. Salzberg and V. J. Tsotras. Comparison of access methods for time-evolving data. ACM Comput. Surv., 31(2):158--221, 1999. Google ScholarDigital Library
- L. Sidirourgos et al. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008. Google ScholarDigital Library
- M. Stonebraker. The design of the postgres storage system. In VLDB, pages 289--300, 1987. Google ScholarDigital Library
- O. Udrea, A. Pugliese, and V. S. Subrahmanian. GRIN: A graph based RDF index. In AAAI, 2007. Google ScholarDigital Library
- W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/.Google Scholar
- G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2002. Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008. Google ScholarDigital Library
- K. Wilkinson et al. Efficient RDF storage and retrieval in Jena2. In SWDB, pages 131--150, 2003.Google Scholar
- Yars2. http://sw.deri.org/svn/sw/2004/06/yars.Google Scholar
Index Terms
- x-RDF-3X: fast querying, high update rates, and consistency for RDF databases
Recommendations
RDF-3X: a RISC-style engine for RDF
RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching ...
The RDF-3X engine for scalable management of RDF data
RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its ...
An Efficient Index for RDF Query Containment
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataQuery containment is a fundamental operation used to expedite query processing in view materialisation and query caching techniques. Since query containment has been shown to be NP-complete for arbitrary conjunctive queries on RDF graphs, we introduce a ...
Comments