Skip to main content

Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Abstract

Processing SPARQL queries on single node is obviously not scalable, considering the rapid growth of RDF knowledge bases. This calls for scalable solutions of SPARQL query processing over Web-scale RDF data. There have been attempts for applying SPARQL query processing techniques in MapReduce environments. However, no study has been conducted on finding optimal partitioning and indexing schemes for distributing RDF data in MapReduce. In this paper, we investigate RDF data partitioning technique that provides effective indexing schemes to support efficient SPARQL query processing in MapReduce. Our extensive experiments over a huge real-life RDF dataset show the performance of the proposed partitioning and indexing schemes for efficient SPARQL query processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Resource Description Framework, http://www.w3.org/RDF/

  2. SPARQL query language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  3. Semantic web challenge, http://challenge.semanticweb.org

  4. Jeffery, D., Sanjay, G.: MapReduce: Simplified data processing on large clusters. In: 6th Conference on Operating System Design and Implementation (2004)

    Google Scholar 

  5. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: 19th ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)

    Google Scholar 

  6. http://hadoop.apache.org/

  7. JAQL, http://code.google.com/p/jaql/

  8. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD (2008)

    Google Scholar 

  9. Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: easy and efficient parallel processing of massive data sets. In: PVLDB (2008)

    Google Scholar 

  10. Yu, Y., Isard, M., Fetterly, D., Badiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: DryadLINQ: A system or general purpose distributed data parallel computing using a high-level language. In: OSDI (2008)

    Google Scholar 

  11. Fang, D., Yueguo, C., Xiaoyong, D.: Partitioned Indexes for Entity Search Over RDF Knowledge Bases. In: 17th International Conference on Database Systems for Advanced Applications (2012)

    Google Scholar 

  12. JSON, http://www.json.org

  13. Paolo, C., Andy, S., Chris, D.: A parallel processing framework for RDF design and issues. Technical report, HP Laboratories (2009)

    Google Scholar 

  14. Peter, M., Giovanni, T.: Web semantics in the clouds. Yahoo Research (2009)

    Google Scholar 

  15. Tanimura, Y., Matono, A., Lynden, S., Kojima, I.: Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop. In: Data Engineering Workshops (ICDEW) (2010)

    Google Scholar 

  16. Hyun-sik, C., Jihoon, S., YongHyun, C., Min, K.S., Yon, D.C.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088 (2009)

    Google Scholar 

  17. Husain, M.F., Khan, L., Kantarcioglu, M., Thuraisingham, B.: Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools. In: 3rd IEEE International Conference on Cloud Computing (2010)

    Google Scholar 

  18. Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable Distributed Reasoning Using MapReduce. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 634–649. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Thomas, N., Gerhard, W.: Rdf-3x: a risc-style engine for rdf. In: PVLDB, vol. 1(1) (2008)

    Google Scholar 

  20. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Daniel, J.A., Adam, M., Samuel, R.M., Kate, H.: Scalable semantic web data management using vertical partitioning. In: VLDB (2007)

    Google Scholar 

  23. Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: 1st International Workshop on Semantic Web and Databases (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nie, Z., Du, F., Chen, Y., Du, X., Xu, L. (2012). Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics