Storing and Querying Semantic Data in the Cloud

Janke, Daniel; Staab, Steffen

doi:10.1007/978-3-030-00338-8_7

Daniel Janke¹⁵ &
Steffen Staab^15,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11078))

Included in the following conference series:

Reasoning Web International Summer School

695 Accesses
3 Citations
1 Altmetric

Abstract

In the last years, huge RDF graphs with trillions of triples were created. To be able to process this huge amount of data, scalable RDF stores are used, in which graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. The main challenges to be investigated for the development of such RDF stores in the cloud are: (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this manuscript, we give an overview of how these challenges are addressed by scalable RDF stores in the cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://spark.apache.org/.
2.
https://hbase.apache.org/.
3.
http://www.sparsity-technologies.com/.
4.
http://titan.thinkaurelius.com/.
5.
In the context of relational or NoSQL databases, graph covers are called sharding and the graph chunks shards. In the literature, there exist definitions of sharding that allow for data replication whereas others do not allow it.
6.
We adapted the definition of an RDF molecule in [38] to allow for paths with a length \(\ge 1\).
7.
The term anchor vertex was taken from [79].
8.
\(\mathrm {dom}(\mu )\) refers to the set of variables of this binding.
9.
\(\mu _{|_W}\) means that the domain of \(\mu \) is restricted to the variables in W.
10.
https://aws.amazon.com/neptune/.
11.
https://hadoop.apache.org/.
12.
https://pig.apache.org/.
13.
https://spark.apache.org/graphx/.
14.
https://aws.amazon.com/de/dynamodb/.
15.
https://cassandra.apache.org/.
16.
https://accumulo.apache.org/.
17.
https://impala.apache.org/.
18.
https://www.couchbase.com/.
19.
https://www.mongodb.com/.
20.
http://lod-cloud.net/.
21.
If the hash cover is only computed on the predicate, the resulting graph cover would be similar to the vertical graph split.
22.
This idea is named differently in the literature. For instance, in Trinity.RDF [144] it is called graph exploration.
23.
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/.
24.
http://ldbcouncil.org/developer/spb.
25.
https://graphql.org/.

References

Largetriplestores. https://www.w3.org/wiki/LargeTripleStores. Accessed 10 July 2018
The bigdata\(\textregistered \) RDF Database. http://www.bigdata.com/whitepapers/bigdata_architecture_whitepaper.pdf. Accessed 29 Oct 2014
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 411–422. VLDB Endowment (2007). http://dl.acm.org/citation.cfm?id=1325851.1325900
Abbassi, S., Faiz, R.: RDF-4X: a scalable solution for RDF quads store in the cloud. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, MEDES, pp. 231–236. ACM, New York (2016). https://doi.org/10.1145/3012071.3012104
Abdelaziz, I., Harbi, R., Salihoglu, S., Kalnis, P.: Combining vertex-centric graph processing with SPARQL for large-scale RDF data analytics. IEEE Trans. Parallel Distrib. Syst. 28(12), 3374–3388 (2017). https://doi.org/10.1109/TPDS.2017.2720174
Article Google Scholar
Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: building internet-scale semantic overlay networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_9
Chapter Google Scholar
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L. (ed.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2
Chapter Google Scholar
Akar, Z., Halaç, T.G., Ekinci, E.E., Dikenelli, O.: Querying the web of interlinked datasets using VOID descriptions. In: WWW 2012 Workshop on Linked Data on the Web, Lyon, France, 16 April 2012. http://ceur-ws.org/Vol-937/ldow2012-paper-06.pdf
Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Adaptive partitioning for very large RDF data. CoRR abs/1505.0 (2015). http://arxiv.org/abs/1505.02728
Al-Harbi, R., Ebrahim, Y., Kalnis, P.: PHD-store: an adaptive SPARQL engine with dynamic partitioning for distributed RDF repositories. CoRR abs/1405.4 (2014). http://arxiv.org/abs/1405.4979
Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets with the VoID vocabulary. W3C Interest Group Note, W3C (2011). http://www.w3.org/TR/2011/NOTE-void-20110303/
Ali, L., Janson, T., Lausen, G.: 3rdf: storing and querying RDF data on top of the 3nuts overlay network. In: 2011 22nd International Workshop on Database and Expert Systems Applications, pp. 257–261 (2011). https://doi.org/10.1109/DEXA.2011.1
Ali, L., Janson, T., Schindelhauer, C.: Towards load balancing and parallelizing of RDF query processing in P2P based distributed RDF data stores. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 307–311 (2014). https://doi.org/10.1109/PDP.2014.79
Ali, L., Janson, T., Lausen, G., Schindelhauer, C.: Effects of network structure improvement on distributed RDF querying. In: Hameurlain, A., Rahayu, W., Taniar, D. (eds.) Globe 2013. LNCS, vol. 8059, pp. 63–74. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40053-7_6
Chapter Google Scholar
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13
Chapter Google Scholar
Arenas, M., Pérez, J.: Federation and navigation in SPARQL 1.1. In: Eiter, T., Krennwallner, T. (eds.) Reasoning Web 2012. LNCS, vol. 7487, pp. 78–111. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33158-9_3
Chapter Google Scholar
Basca, C., Bernstein, A.: Distributed SPARQL throughput increase: on the effectiveness of workload-driven RDF partitioning. In: ISWC 2013 (2013)
Google Scholar
Basca, C., Bernstein, A.: Querying a messy web of data with AVALANCHE. Web Semant.: Sci. Serv. Agents World Wide Web 26 (2014). http://www.websemanticsjournal.org/index.php/ps/article/view/361
Battré, D., Heine, F., Höing, A., Kao, O.: On triple dissemination, forward-chaining, and load balancing in DHT based RDF stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005–2006. LNCS, vol. 4125, pp. 343–354. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71661-7_33
Chapter Google Scholar
Beame, P., Koutris, P., Suciu, D.: Skew in parallel query processing. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 212–223. ACM, New York (2014). https://doi.org/10.1145/2594538.2594558
Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009). https://doi.org/10.4018/jswis.2009040101
Article Google Scholar
Böhm, C., Hefenbrock, D., Naumann, F.: Scalable peer-to-peer-based RDF management. In: Proceedings of the 8th International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 165–168. ACM, New York (2012). https://doi.org/10.1145/2362499.2362523
Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: COSI: cloud oriented subgraph identification in massive social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), pp. 248–255 (2010). https://doi.org/10.1109/ASONAM.2010.80
Bugiotti, F., Camacho-Rodríguez, J., Goasdoué, F., Kaoudi, Z., Manolescu, I., Zampetakis, S.: SPARQL query processing in the cloud. In: Harth, A., Hose, K., Schenkel, R. (eds.) Linked Data Management. Emerging Directions in Database Systems and Applications. Chapman and Hall/CRC (2014)
Google Scholar
Cai, M., Frank, M.: RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th International Conference on World Wide Web, pp. 650–657 (2004). http://dl.acm.org/citation.cfm?id=988760
Charalambidis, A., Troumpoukis, A., Konstantopoulos, S.: SemaGrow: optimizing federated SPARQL queries. In: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS 2015, pp. 121–128. ACM, New York (2015). https://doi.org/10.1145/2814864.2814886
Cheng, L., Kotoulas, S.: Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1(4), 138–150 (2015). https://doi.org/10.1109/TBDATA.2015.2505719
Article Google Scholar
Chu, S., Balazinska, M., Suciu, D.: From theory to practice: efficient join query evaluation in a parallel database system. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 63–78. ACM, New York (2015). https://doi.org/10.1145/2723372.2750545
Cossu, M., Färber, M., Lausen, G.: PRoST: distributed execution of SPARQL queries using mixed partitioning strategies. In: Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, 26–29 March 2018, pp. 469–472 (2018). https://doi.org/10.5441/002/edbt.2018.49
Crespo, A., Garcia-Molina, H.: Semantic overlay networks for P2P systems. In: Moro, G., Bergamaschi, S., Aberer, K. (eds.) AP2PC 2004. LNCS (LNAI), vol. 3601, pp. 1–13. Springer, Heidelberg (2005). https://doi.org/10.1007/11574781_1
Chapter Google Scholar
Cudre-Mauroux, P., Agarwal, S., Aberer, K.: GridVine: an infrastructure for peer information management. IEEE Internet Comput. 11(5), 36–44 (2007). https://doi.org/10.1109/MIC.2007.108
Article Google Scholar
Cudré-Mauroux, P., et al.: NoSQL databases for RDF: an empirical evaluation. In: Alani, H. (ed.) ISWC 2013. LNCS, vol. 8219, pp. 310–325. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_20
Chapter Google Scholar
Curé, O., Naacke, H., Baazizi, M.A., Amann, B.: On the evaluation of RDF distribution algorithms implemented over apache spark. In: Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems (ISWC 2015), pp. 16–31 (2015)
Google Scholar
DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 205–220. ACM, New York (2007). https://doi.org/10.1145/1294261.1294281
Della Valle, E., Turati, A., Ghioni, A.: PAGE: a distributed infrastructure for fostering rdf-based interoperability. In: Eliassen, F., Montresor, A. (eds.) DAIS 2006. LNCS, vol. 4025, pp. 347–353. Springer, Heidelberg (2006). https://doi.org/10.1007/11773887_27
Chapter Google Scholar
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 1–8. ACM, New York (1984). https://doi.org/10.1145/602259.602261
Dhraief, H., Kemper, A., Nejdl, W., Wiesner, C.: Processing and optimization of complex queries in schema-based P2P-networks. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 31–45. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31838-5_3
Chapter Google Scholar
Ding, L., Peng, Y., da Silva, P.P., McGuinness, D.L.: Tracking RDF graph provenance using RDF molecules. Technical report, UMBC (2005). https://ebiquity.umbc.edu/paper/html/id/240/Tracking-RDF-Graph-Provenance-using-RDF-Molecules
Du, F., Bian, H., Chen, Y., Du, X.: Efficient SPARQL query evaluation in a database cluster. In: IEEE International Congress on Big Data, pp. 165–172 (2013). https://doi.org/10.1109/BigData.Congress.2013.30
Erling, O., Mikhailov, I.: Towards web scale RDF. In: 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2008) (2008)
Google Scholar
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 501–519. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04329-1_21
Chapter Google Scholar
Farhan Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011). https://doi.org/10.1109/TKDE.2011.103
Article Google Scholar
Galarraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. CoRR abs/1212.5 (2012). http://arxiv.org/abs/1212.5636
Goasdoué, F., Kaoudi, Z., Manolescu, I., Quiané-Ruiz, J.A., Zampetakis, S.: CliqueSquare: flat plans for massively parallel RDF queries. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 771–782 (2015). https://doi.org/10.1109/ICDE.2015.7113332
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 2014, pp. 599–613. USENIX Association, Berkeley (2014). http://dl.acm.org/citation.cfm?id=2685048.2685096
Goodman, E.L., Grunwald, D.: Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, \(\text{IA}^3\) 2014, pp. 25–32. IEEE Press, Piscataway (2014). https://doi.org/10.1109/IA3.2014.10
Görlitz, O., Thimm, M., Staab, S.: SPLODGE: Systematic generation of SPARQL benchmark queries for linked open data. Semant. Web-ISWC 2012, 116–132 (2012). https://doi.org/10.1007/978-3-642-35176-1_8
Article Google Scholar
Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of the Second International Conference on Consuming Linked Data, COLD 2011, vol. 782, pp. 13–24. CEUR-WS.org, Aachen (2010). http://dl.acm.org/citation.cfm?id=2887352.2887354
Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators (2016). https://hal.inria.fr/hal-01381781
Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 80–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_9
Chapter Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for owl knowledge base systems. Web Semant.: Sci. Serv. Agents World Wide Web, 3(2–3) (2005). http://www.websemanticsjournal.org/index.php/ps/article/view/70
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD, pp. 289–300 (2014). https://doi.org/10.1145/2588555.2610511
Gutierrez, C., Hurtado, C., Mendelzon, A.O.: Foundations of semantic web databases. In: PODS, pp. 95–106. ACM (2004). https://doi.org/10.1145/1055558.1055573
Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: VLDB 1997, Athens, Greece, pp. 276–285. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S.M.R., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endow. 8(6), 654–665 (2015). https://doi.org/10.14778/2735703.2735705
Article Google Scholar
Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N.: Evaluating SPARQL queries on massive RDF datasets. PVLDB, 8(12), 1848–1851 (2015). http://www.vldb.org/pvldb/vol8/p1848-harbi.pdf
Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016). https://doi.org/10.1007/s00778-016-0420-y
Article Google Scholar
Harris, S., Lamb, N., Shadbolt, N.: 4store: the design and implementation of a clustered RDF store. In: Scalable Semantic Web Knowledge Base Systems - SSWS 2009, pp. 94–109 (2009)
Google Scholar
Harth, A., Decker, S.: Optimized index structures for querying RDF from the web. In: Proceedings of LA-WEB 2005, p. 71. IEEE (2005). https://doi.org/10.1109/LAWEB.2005.25
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_16
Chapter Google Scholar
Hong, S., Depner, S., Manhardt, T., Van Der Lugt, J., Verstraaten, M., Chafi, H.: PGX.D: a fast distributed graph processing engine. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 58:1–58:12. ACM, New York (2015). https://doi.org/10.1145/2807591.2807620
Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Data Engineering Workshops (ICDEW), pp. 1–6 (2013). https://doi.org/10.1109/ICDEW.2013.6547414
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Janke, D., Staab, S., Thimm, M.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. J. Web Semant. (2018). https://doi.org/10.1016/j.websem.2018.02.002, http://www.websemanticsjournal.org/index.php/ps/article/view/516
Jones, N.D.: An introduction to partial evaluation. ACM Comput. Surv. 28(3), 480–503 (1996). https://doi.org/10.1145/243439.243447
Article Google Scholar
Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 229–238 (2009). https://doi.org/10.1109/ICDM.2009.14
Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant.: Sci. Serv. Agents World Wide Web 8(4) (2010). http://www.websemanticsjournal.org/index.php/ps/article/view/250
Karnstedt, M., et al.: UniStore: querying a DHT-based universal storage. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1503–1504 (2007). https://doi.org/10.1109/ICDE.2007.369054
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998). https://doi.org/10.1137/S1064827595287997
Article MathSciNet MATH Google Scholar
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B.M., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. Technical report, Department of Computer Science at the University of Texas at Dallas (2012)
Google Scholar
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B.M., Castagna, P.: Jena-HBase: a distributed, scalable and efficient RDF triple store. In: Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, USA, 11–15 November 2012. http://ceur-ws.org/Vol-914/paper_14.pdf
Kim, H., Ravindra, P., Anyanwu, K.: From SPARQL to MapReduce: the journey using a nested TripleGroup Algebra. PVLDB 4(12), 1426–1429 (2011). http://www.vldb.org/pvldb/vol4/p1426-kim.pdf
Kokkinidis, G., Christophides, V.: Semantic query routing and processing in P2P database systems: the ICS-FORTH SQPeer middleware. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 486–495. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30192-9_48
Chapter Google Scholar
Kotsev, V., Kiryakov, A., Fundulaki, I., Alexiev, V.: LDBC semantic publishing benchmark (SPB) - v2.0 first public draft release. Technical report, The Linked Data Benchmark Council (2014). https://github.com/ldbc/ldbc_spb_bm_2.0/blob/master/doc/LDBC_SPB_v2.0.docx?raw=true
Ladwig, G., Harth, A.: CumulusRDF: linked data management on nested key-value stores. In: Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011) at the 10th International Semantic Web Conference (ISWC 2011) (2011)
Google Scholar
Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_10
Chapter Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). https://doi.org/10.1145/1773912.1773922
Article Google Scholar
Le-Phuoc, D., Nguyen Mau Quoc, H., Le Van, C., Hauswirth, M.: Elastic and scalable processing of linked stream data in the cloud. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 280–297. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_18
Chapter Google Scholar
Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 46:1–46:12. ACM (2013). https://doi.org/10.1145/2503210.2503302
Lee, K., Liu, L.: Scaling queries over Big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013). https://doi.org/10.14778/2556549.2556571
Article Google Scholar
Lee, K., Liu, L., Tang, Y., Zhang, Q., Zhou, Y.: Efficient and customizable data partitioning framework for distributed big RDF data processing in the cloud. In: IEEE CLOUD 2013, pp. 327–334 (2013). https://doi.org/10.1109/CLOUD.2013.63
Liarou, E., Idreos, S., Koubarakis, M.: Evaluating conjunctive triple pattern queries over large structured overlay networks. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 399–413. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_29
Chapter Google Scholar
Lynden, S., Kojima, I., Matono, A., Tanimura, Y.: ADERIS: an adaptive query processor for joining federated SPARQL endpoints. In: Meersman, R. (ed.) OTM 2011. LNCS, vol. 7045, pp. 808–817. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25106-1_28
Chapter Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010). https://doi.org/10.1145/1807167.1807184
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013). https://doi.org/10.1016/j.physrep.2013.08.002
Article MathSciNet MATH Google Scholar
Mansour, E., Abdelaziz, I., Ouzzani, M., Aboulnaga, A., Kalnis, P.: A demonstration of Lusail: querying linked data at scale. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 2017, pp. 1603–1606. ACM, New York (2017). https://doi.org/10.1145/3035918.3058731
Matono, A., Pahlevi, S.M., Kojima, I.: RDFCube: a P2P-based three-dimensional index for structural joins on distributed triple stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 323–330. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71661-7_31
Chapter Google Scholar
McMurry, J., et al.: Report on the scalability of semantic web integration in biomedbridges (2015). https://doi.org/10.5281/zenodo.14071
Mishra, P., Eich, M.H.: Join processing in relational databases. ACM Comput. Surv. 24(1), 63–113 (1992). https://doi.org/10.1145/128762.128764
Article Google Scholar
Montoya, G., Skaf-Molli, H., Hose, K.: The Odyssey approach for optimizing federated SPARQL queries. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 471–489. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_28
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL queries processing with replicated fragments. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_3
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.E.: Decomposing federated queries in presence of replicated fragments. Web Semant.: Sci. Serv. Agents World Wide Web 42(1) (2017). http://www.websemanticsjournal.org/index.php/ps/article/view/486
Montoya, G., Vidal, M.E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the Third International Conference on Consuming Linked Data, COLD 2012, vol. 905, pp. 63–74. CEUR-WS.org, Aachen (2012). http://dl.acm.org/citation.cfm?id=2887367.2887373
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29
Chapter Google Scholar
Mutharaju, R., Sakr, S., Sala, A., Hitzler, P.: D-SPARQ: distributed, scalable and efficient RDF query engine. In: ISWC (Posters & Demos) 2013, pp. 261–264 (2013)
Google Scholar
Naacke, H., Amann, B., Curé, O.: SPARQL graph pattern processing with apache spark. In: Proceedings of the Fifth International Workshop on Graph Data-Management Experiences and Systems, GRADES 2017, pp. 1:1–1:7. ACM, New York (2017). https://doi.org/10.1145/3078447.3078448
Nejdl, W., et al.: Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 536–543. ACM, New York (2003). https://doi.org/10.1145/775152.775229
Norvig, P.: The semantic web and the semantics of the web: where does meaning come from? In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, p. 1. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2016)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008). https://doi.org/10.1145/1376616.1376726
Oren, E., Kotoulas, S., Anadiotis, G., Siebes, R., ten Teije, A., van Harmelen, F.: Marvin: distributed reasoning over large-scale Semantic Web data. Web Semant.: Sci. Serv. Agents World Wide Web 7(4) (2009). http://www.websemanticsjournal.org/index.php/ps/article/view/173
Osorio, M., Aranda, C.B.: Storage balancing in P2P based distributed RDF data stores. In: Proceedings of the Workshop on Decentralizing the Semantic Web 2017, Co-located with 16th International Semantic Web Conference (ISWC 2017) (2017). http://ceur-ws.org/Vol-1934/contribution-04.pdf
Owens, A., Seaborne, A., Gibbins, N., schraefel, M.: Clustered TDB: A Clustered Triple Store for Jena (2008). http://eprints.soton.ac.uk/266974/
Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 909–912. ACM, New York (2014). https://doi.org/10.1145/2588555.2594535
Peng, P., Zou, L., Chen, L., Zhao, D.: Query workload-based RDF graph fragmentation and allocation. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, 15–16 March 2016, pp. 377–388 (2016). https://doi.org/10.5441/002/edbt.2016.35
Peng, P., Zou, L., Özsu, M.T., Chen, L., Zhao, D.: Processing SPARQL queries over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016). https://doi.org/10.1007/s00778-015-0415-0
Article Google Scholar
Penteado, R.R.M., Scroeder, R., Hara, C.S.: Exploring controlled RDF distribution. In: 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 160–167 (2016). https://doi.org/10.1109/CloudCom.2016.0038
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009). https://doi.org/10.1145/1567274.1567278
Article Google Scholar
Potter, A., Motik, B., Horrocks, I.: Querying distributed RDF graphs: the effects of partitioning. In: Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2014), pp. 29–44 (2014)
Google Scholar
Potter, A., Motik, B., Nenov, Y., Horrocks, I.: Distributed RDF query answering with dynamic data exchange. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 480–497. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_29
Chapter Google Scholar
Prud’hommeaux, E., Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation, W3C (2013). http://www.w3.org/TR/sparql11-query/
Przyjaciel-Zablocki, M., Schätzle, A., Lausen, G.: TriAL-QL: distributed processing of navigational queries. In: Proceedings of the 18th International Workshop on Web and Databases, WebDB 2015, pp. 48–54, ACM, New York (2015). https://doi.org/10.1145/2767109.2767115
Przyjaciel-Zablocki, M., Schätzle, A., Lausen, G.: Querying semantic knowledge bases with SQL-on-Hadoop. In: Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2017, pp. 4:1–4:10. ACM, New York (2017). https://doi.org/10.1145/3070607.3070610
Pujol, J.M., Erramilli, V., Rodriguez, P.: Divide and conquer: partitioning online social networks. CoRR abs/0905.4 (2009). http://arxiv.org/abs/0905.4918
Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In: 1st International Workshop on Cloud Intelligence, pp. 4:1–4:8. ACM (2012). https://doi.org/10.1145/2347673.2347677
Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_39
Chapter Google Scholar
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications, PSI EtA 2010, pp. 4:1–4:5. ACM, New York (2010). https://doi.org/10.1145/1940747.1940751
Russell, J.: Getting Started with Impala: Interactive SQL for Apache Hadoop. O’Reilly Media (2014). http://shop.oreilly.com/product/0636920033936.do
Sakr, S., Wylot, M., Mutharaju, R., Le Phuoc, D., Fundulaki, I., I.: Linked Data: Storing, Querying, and Reasoning. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73515-3
Book Google Scholar
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M. (ed.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Chapter Google Scholar
Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: duplicate-aware federated query processing over the web of data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_36
Chapter Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G.: S2X: graph-parallel querying of RDF with GraphX. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds.) Big-O(Q)/DMAH -2015. LNCS, vol. 9579, pp. 155–168. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41576-5_12
Chapter Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: mapping SPARQL to pig Latin. In: Proceedings of the International Workshop on Semantic Web Information Management, SWIM 2011, pp. 4:1–4:8. ACM, New York (2011). https://doi.org/10.1145/1999299.1999303
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11
Chapter Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. PVLDB 9(10), 804–815 (2016). http://www.vldb.org/pvldb/vol9/p804-schaetzle.pdf
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L. (ed.) ISWC 2011. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37
Chapter Google Scholar
Schmidt, M., Hornung, T., Meier, M., Pinkel, C., Lausen, G.: SP\(^2\)Bench: a SPARQL performance benchmark. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management: A Model-Based Perspective, pp. 371–393. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04329-1_16
Chapter Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Chapter Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010). https://doi.org/10.1109/MSST.2010.5496972
Stein, R., Zacharias, V.: RDF on cloud number nine. In: Ceri, S., Valle, E.D., Hendler, J., Huang, Z. (eds.) Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic. CEUR Workshop Proceedings (2010)
Google Scholar
Stutz, P., Verman, M., Fischer, L., Bernstein, A.: TripleRush: a fast and scalable triple store. In: 9th International Workshop on Scalable Semantic Web Knowledge Base Systems. CEUR Workshop Proceedings, Aachen (2013). http://ceur-ws.org
Stutz, P., Bernstein, A., Cohen, W.: Signal/collect: graph algorithms for the (semantic) web. ISWC 2010. LNCS, vol. 6496, pp. 764–780. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_48
Chapter Google Scholar
Stutz, P., Paudel, B., Verman, M., Bernstein, A.: Random walk TripleRush: asynchronous graph querying and sampling. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 1034–1044. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2015). https://doi.org/10.1145/2736277.2741687
Wang, R., Chiu, K.: Optimizing distributed RDF triplestores via a locally indexed graph partitioning. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 259–268 (2012). https://doi.org/10.1109/ICPP.2012.47
Wang, X., Tiropanis, T., Davis, H.C.: LHD: optimising linked data query processing using parallelisation. In: Proceedings of the WWW 2013 Workshop on Linked Data on the Web, Rio de Janeiro, Brazil, 14 May 2013. http://ceur-ws.org/Vol-996/papers/ldow2013-paper-06.pdf
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Beijing (2015). https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781491901687/
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. Distrib. Parallel Databases 1(1), 103–128 (1993). https://doi.org/10.1007/BF01277522
Article Google Scholar
Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 795–806 (2015). https://doi.org/10.1109/ICDE.2015.7113334
Wu, B., Zhou, Y., Yuan, P., Jin, H., Liu, L.: SemStore: a semantic-preserving distributed RDF triple store. In: CIKM 2014 (2014)
Google Scholar
Wylot, M., Cudré-Mauroux, P.: Diplocloud: efficient and scalable management of rdf data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016). https://doi.org/10.1109/TKDE.2015.2499202
Article Google Scholar
Xu, Z., Chen, W., Gai, L., Wang, T.: SparkRDF: in-memory distributed RDF management framework for large-scale social data. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 337–349. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21042-1_27
Chapter Google Scholar
Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 517–528. ACM, New York (2012). https://doi.org/10.1145/2213836.2213895
Yang, T., Chen, J., Wang, X., Chen, Y., Du, X.: Efficient SPARQL query evaluation via automatic data partitioning. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7826, pp. 244–258. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37450-0_18
Chapter Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013). https://doi.org/10.14778/2535570.2488333
Article Google Scholar
Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: ICDE 2013, pp. 565–576 (2013). https://doi.org/10.1109/ICDE.2013.6544856
Zhang, X., Chen, L., Wang, M.: Towards efficient join processing over large RDF graph using MapReduce. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 250–259. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31235-9_16
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Web Science and Technologies, Universität Koblenz-Landau, Koblenz, Germany
Daniel Janke & Steffen Staab
Web and Internet Science Group, University of Southampton, Southampton, UK
Steffen Staab

Authors

Daniel Janke
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Staab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Janke .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Claudia d’Amato
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Martin Theobald

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Janke, D., Staab, S. (2018). Storing and Querying Semantic Data in the Cloud. In: d’Amato, C., Theobald, M. (eds) Reasoning Web. Learning, Uncertainty, Streaming, and Scalability. Reasoning Web 2018. Lecture Notes in Computer Science(), vol 11078. Springer, Cham. https://doi.org/10.1007/978-3-030-00338-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-00338-8_7
Published: 30 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00337-1
Online ISBN: 978-3-030-00338-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics