Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing

Nie, Zhi; Du, Fang; Chen, Yueguo; Du, Xiaoyong; Xu, Linhao

doi:10.1007/978-3-642-29253-8_58

Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing

Zhi Nie^20,21,
Fang Du^20,21,
Yueguo Chen²⁰,
Xiaoyong Du^20,21 &
…
Linhao Xu²²

Conference paper

2262 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Abstract

Processing SPARQL queries on single node is obviously not scalable, considering the rapid growth of RDF knowledge bases. This calls for scalable solutions of SPARQL query processing over Web-scale RDF data. There have been attempts for applying SPARQL query processing techniques in MapReduce environments. However, no study has been conducted on finding optimal partitioning and indexing schemes for distributing RDF data in MapReduce. In this paper, we investigate RDF data partitioning technique that provides effective indexing schemes to support efficient SPARQL query processing in MapReduce. Our extensive experiments over a huge real-life RDF dataset show the performance of the proposed partitioning and indexing schemes for efficient SPARQL query processing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Resource Description Framework, http://www.w3.org/RDF/
SPARQL query language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Semantic web challenge, http://challenge.semanticweb.org
Jeffery, D., Sanjay, G.: MapReduce: Simplified data processing on large clusters. In: 6th Conference on Operating System Design and Implementation (2004)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: 19th ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)
Google Scholar
http://hadoop.apache.org/
JAQL, http://code.google.com/p/jaql/
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD (2008)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: easy and efficient parallel processing of massive data sets. In: PVLDB (2008)
Google Scholar
Yu, Y., Isard, M., Fetterly, D., Badiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: DryadLINQ: A system or general purpose distributed data parallel computing using a high-level language. In: OSDI (2008)
Google Scholar
Fang, D., Yueguo, C., Xiaoyong, D.: Partitioned Indexes for Entity Search Over RDF Knowledge Bases. In: 17th International Conference on Database Systems for Advanced Applications (2012)
Google Scholar
JSON, http://www.json.org
Paolo, C., Andy, S., Chris, D.: A parallel processing framework for RDF design and issues. Technical report, HP Laboratories (2009)
Google Scholar
Peter, M., Giovanni, T.: Web semantics in the clouds. Yahoo Research (2009)
Google Scholar
Tanimura, Y., Matono, A., Lynden, S., Kojima, I.: Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop. In: Data Engineering Workshops (ICDEW) (2010)
Google Scholar
Hyun-sik, C., Jihoon, S., YongHyun, C., Min, K.S., Yon, D.C.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088 (2009)
Google Scholar
Husain, M.F., Khan, L., Kantarcioglu, M., Thuraisingham, B.: Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools. In: 3rd IEEE International Conference on Cloud Computing (2010)
Google Scholar
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable Distributed Reasoning Using MapReduce. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 634–649. Springer, Heidelberg (2009)
Chapter Google Scholar
Thomas, N., Gerhard, W.: Rdf-3x: a risc-style engine for rdf. In: PVLDB, vol. 1(1) (2008)
Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Chapter Google Scholar
Daniel, J.A., Adam, M., Samuel, R.M., Kate, H.: Scalable semantic web data management using vertical partitioning. In: VLDB (2007)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: 1st International Workshop on Semantic Web and Databases (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China
Zhi Nie, Fang Du, Yueguo Chen & Xiaoyong Du
School of Information, Renmin University of China, Beijing, China
Zhi Nie, Fang Du & Xiaoyong Du
IBM Research China, Beijing, China
Linhao Xu

Authors

Zhi Nie
View author publications
You can also search for this author in PubMed Google Scholar
Fang Du
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar
Linhao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, The University of Adelaide, Australia
Quan Z. Sheng
College of Information Science and Engineering, Northeastern University, 110819, Shenyang, China
Guoren Wang
Aarhus University, Denmark
Christian S. Jensen
Center for Applied Informatics, Victoria University, PO Box 14428, 8001, VIC, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nie, Z., Du, F., Chen, Y., Du, X., Xu, L. (2012). Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-29253-8_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics