Skip to main content

Large-Scale Processing Systems of Structured Data

  • Chapter
  • First Online:
Big Data 2.0 Processing Systems
  • 547 Accesses

Abstract

In practice, it has been acknowledged that Hadoop framework is not an adequate choice for supporting interactive queries which aim of achieving a response time of milliseconds or few seconds. In addition, many programmers may be unfamiliar with the Hadoop framework and they would prefer to use SQL as a high-level declarative language to implement their jobs while delegating all of the optimization details in the execution process to the underlying engine. This chapter provides an overview of various systems that have been introduced to support the SQL flavor on top of the Hadoop-like infrastructure and provide competing and scalable performance on processing large-scale structured data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Lynch, Big data: how do your data grow? Nature 455(7209), 28–29 (2008)

    Google Scholar 

  2. Large synoptic survey. http://www.lsst.org/

  3. H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  4. T. Hey, S. Tansley, K. Tolle (eds.), The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Research, Redmond, 2009)

    Google Scholar 

  5. G. Bell, J. Gray, A.S. Szalay, Petascale computational systems. IEEE Comput. 39(1), 110–112 (2006)

    Google Scholar 

  6. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A.H. Byers, Big data: the next frontier for innovation, competition, and productivity. Technical Report 1999-66, May 2011

    Google Scholar 

  7. A. McAfee, E. Brynjolfsson, T.H. Davenport, D.J. Patil, D. Barton, Big data. The management revolution. Harvard Bus. Rev. 90(10), 61–67 (2012)

    Google Scholar 

  8. R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)

    Google Scholar 

  9. L.M. Vaquero, L. Rodero-Merino, J. Caceres, M. Lindner, A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)

    Google Scholar 

  10. D.C. Plummer, T.J. Bittman, T. Austin, D.W. Cearley, D.M. Smith, Cloud computing: defining and describing an emerging phenomenon. Gartner (2008)

    Google Scholar 

  11. J. Staten, S. Yates, F.E. Gillett, W. Saleh, R.A. Dines, Is cloud computing ready for the enterprise. Forrester Research (2008)

    Google Scholar 

  12. M. Armbrust, O. Fox, R. Griffith, A.D. Joseph, Y. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., Above the clouds: a Berkeley view of cloud computing (2009)

    Google Scholar 

  13. S. Madden, From databases to big data. IEEE Internet Comput. 3, 4–6 (2012)

    Google Scholar 

  14. S. Sakr, Cloud-hosted databases: technologies, challenges and opportunities. Clust. Comput. 17(2), 487–502 (2014)

    Google Scholar 

  15. S. Sakr, A. Liu, D.M. Batista, M. Alomari, A survey of large scale data management approaches in cloud environments. IEEE Commun. Surv. Tutor. 13(3), 311–336 (2011)

    Google Scholar 

  16. S. LaValle, E. Lesser, R. Shockley, M.S. Hopkins, N. Kruschwitz, Big data, analytics and the path from insights to value. MIT Sloan Manag. Rev. 52(2), 21 (2011)

    Google Scholar 

  17. X. Wu, X. Zhu, G.-Q. Wu, W. Ding, Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Google Scholar 

  18. D.J. DeWitt, J. Gray, Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Google Scholar 

  19. A. Pavlo, E. Paulson, A. Rasin, D.J. Abadi, D.J. DeWitt, S. Madden, M. Stonebraker, A comparison of approaches to large-scale data analysis, in SIGMOD (2009), pp. 165–178

    Google Scholar 

  20. J. Dean, S. Ghemawa, MapReduce: simplified data processing on large clusters, in OSDI, 2004

    Google Scholar 

  21. D. Agrawal, S. Das, A. El Abbadi, Big data and cloud computing: current state and future opportunities, in Proceedings of the 14th International Conference on Extending Database Technology (ACM, New York, 2011), pp. 530–533

    Google Scholar 

  22. S. Sakr, A. Liu, A.G. Fayoumi, The family of MapReduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 1–44 (2013)

    Google Scholar 

  23. H. Yang, A. Dasdan, R. Hsiao, D. Parker, Map-reduce-merge: simplified relational data processing on large clusters, in SIGMOD, 2007

    Google Scholar 

  24. M. Stonebraker, The case for shared nothing. IEEE Database Eng. Bull. 9(1), 4–9 (1986)

    Google Scholar 

  25. T. White, Hadoop: The Definitive Guide (O’Reilly Media, Sebastopol, 2012)

    Google Scholar 

  26. D. Jiang, A.K.H. Tung, G. Chen, MAP-JOIN-REDUCE: toward scalable and efficient data analysis on large clusters. IEEE TKDE 23(9), 1299–1311 (2011)

    Google Scholar 

  27. Y. Bu, B. Howe, M. Balazinska, M.D. Ernst, The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)

    Google Scholar 

  28. Y. Zhang, Q. Gao, L. Gao, C. Wang, iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)

    Google Scholar 

  29. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, G. Fox, Twister: a runtime for iterative MapReduce, in HPDC, 2010

    Google Scholar 

  30. T. Nykiel, M. Potamias, C. Mishra, G. Kollios, N. Koudas, MRShare: sharing across multiple queries in MapReduce. Proc. VLDB Endowment 3(1), 494–505 (2010)

    MATH  Google Scholar 

  31. I. Elghandour, A. Aboulnaga, ReStore: reusing results of MapReduce jobs. Proc. VLDB Endowment 5(6), 586–597 (2012)

    Google Scholar 

  32. I. Elghandour, A. Aboulnaga, ReStore: reusing results of MapReduce jobs in Pig, in SIGMOD, 2012

    Google Scholar 

  33. J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, J. Schad, Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endowment 3(1), 518–529 (2010)

    Google Scholar 

  34. A. Floratou, J.M. Patel, E.J. Shekita, S. Tata, Column-oriented storage techniques for MapReduce. Proc. VLDB Endowment 4(7), 419–429 (2011)

    Google Scholar 

  35. Y. Lin et al., Llama: leveraging columnar storage for scalable join processing in the MapReduce framework, in SIGMOD, 2011

    Google Scholar 

  36. T. Kaldewey, E.J. Shekita, S. Tata, Clydesdale: structured data processing on MapReduce, in EDBT (2012), pp. 15–25

    Google Scholar 

  37. A. Balmin, T. Kaldewey, S. Tata, Clydesdale: structured data processing on Hadoop, in SIGMOD Conference (2012), pp. 705–708

    Google Scholar 

  38. M. Zukowski, P.A. Boncz, N. Nes, S. Héman, MonetDB/X100 - a DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)

    Google Scholar 

  39. Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, Z. Xu, RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems, in ICDE (2011), pp. 1199–1208

    Google Scholar 

  40. A. Jindal, J.-A. Quiane-Ruiz, J. Dittrich, Trojan data layouts: right shoes for a running elephant, in SoCC, 2011

    Google Scholar 

  41. M.Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson, CoHadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endowment 4(9), 575–585 (2011)

    Google Scholar 

  42. Y. Huai, A. Chauhan, A. Gates, G. Hagleitner, E.N. Hanson, O. O’Malley, J. Pandey, Y. Yuan, R. Lee, X. Zhang, Major technical advancements in Apache Hive, in SIGMOD, 2014

    Google Scholar 

  43. G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD, 2010

    Google Scholar 

  44. M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud, 2010

    Google Scholar 

  45. M. Odersky, L. Spoon, B. Venners, Programming in Scala: A Comprehensive Step-by-Step Guide (Artima Inc., Walnut Creek, 2011)

    Google Scholar 

  46. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R.H. Katz, S. Shenker, I. Stoica, Mesos: a platform for fine-grained resource sharing in the data center, in NSDI, 2011

    Google Scholar 

  47. M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, in EuroSys (2010), pp. 265–278

    Google Scholar 

  48. K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system, in MSST, 2010

    Google Scholar 

  49. M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in Spark, in SIGMOD, 2015

    Google Scholar 

  50. E.R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J.E. Gonzalez, M.J. Franklin, M.I. Jordan, T. Kraska, MLI: an API for distributed machine learning, in ICDM, 2013

    Google Scholar 

  51. J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in OSDI, 2014

    Google Scholar 

  52. A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M.J. Sax, S. Schelter, M. Höger, K. Tzoumas, D. Warneke, The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)

    Google Scholar 

  53. A. Alexandrov, D. Battré, S. Ewen, M. Heimel, F. Hueske, O. Kao, V. Markl, E. Nijkamp, D. Warneke, Massively parallel data analysis with PACTs on nephele. Proc. VLDB Endowment 3(2), 1625–1628 (2010)

    Google Scholar 

  54. D. Battré et al., Nephele/PACTs: a programming model and execution framework for web-scale analytical processing, in SoCC, 2010

    Google Scholar 

  55. P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, T.G. Price, Access path selection in a relational database management system, in SIGMOD, 1979

    Google Scholar 

  56. A. Heise, A. Rheinlnder, M. Leich, U. Leser, F. Naumann, Meteor/Sopremo: an extensible query language and operator model, in VLDB Workshops, 2012

    Google Scholar 

  57. V.R. Borkar, M.J. Carey, R. Grover, N. Onose, R. Vernica, Hyracks: a flexible and extensible foundation for data-intensive computing, in ICDE, 2011

    Google Scholar 

  58. A. Behm, V.R. Borkar, M.J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, V.J. Tsotras, ASTERIX: towards a scalable, semistructured data platform for evolving-world models. Distrib. Parallel Databases 29(3), 185–216 (2011)

    Google Scholar 

  59. V. Borkar, S. Alsubaiee, Y. Altowim, H. Altwaijry, A. Behm, Y. Bu, M. Carey, R. Grover, Z. Heilbron, Y.-S. Kim, C. Li, P. Pirzadeh, N. Onose, R. Vernica, J. Wen, ASTERIX: an open source system for “Big Data” management and analysis. Proc. VLDB Endowment 5(2), 1898–1901 (2012)

    Google Scholar 

  60. S. Alsubaiee, Y. Altowim, H. Altwaijry, A. Behm, V.R. Borkar, Y. Bu, M.J. Carey, I. Cetindil, M. Cheelangi, K. Faraaz, E. Gabrielova, R. Grover, Z. Heilbron, Y.-S. Kim, C. Li, G. Li, J.M. Ok, N. Onose, P. Pirzadeh, V.J. Tsotras, R. Vernica, J. Wen, T. Westmann, AsterixDB: a scalable, open source BDMS. Proc. VLDB Endowment 7(14), 1905–1916 (2014)

    Google Scholar 

  61. Y. Bu, V.R. Borkar, J. Jia, M.J. Carey, T. Condie, Pregelix: big(ger) graph analytics on a dataflow engine. Proc. VLDB Endowment 8(2), 161–172 (2014)

    Google Scholar 

  62. A. Pavlo, E. Paulson, A. Rasin, D.J. Abadi, D.J. DeWitt, S. Madden, M. Stonebraker, A comparison of approaches to large-scale data analysis, in SIGMOD, 2009

    Google Scholar 

  63. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J.S. Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in SIGMOD, 2010

    Google Scholar 

  64. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J.S. Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in SIGMOD Conference (2010), pp. 1013–1020

    Google Scholar 

  65. B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A.C. Murthy, C. Curino, Apache Tez: a unifying framework for modeling and building data processing applications, in SIGMOD, 2015

    Google Scholar 

  66. V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, E. Baldeschwieler, Apache Hadoop YARN: yet another resource negotiator, in SOCC, 2013

    Google Scholar 

  67. M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, M. Yoder, Impala: a modern, open-source SQL engine for Hadoop, in CIDR, 2015

    Google Scholar 

  68. S. Wanderman-Milne, N. Li, Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014)

    Google Scholar 

  69. A. Abouzeid, K. Bajda-Pawlikowski, D.J. Abadi, A. Rasin, A. Silberschatz, HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endowment 2(1), 922–933 (2009)

    Google Scholar 

  70. M. Stonebraker, D. Abadi, D. DeWitt, S. Madden, E. Paulson, A. Pavlo, A. Rasin, MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)

    Google Scholar 

  71. H. Choi, J. Son, H. Yang, H. Ryu, B. Lim, S. Kim, Y.D. Chung, Tajo: a distributed data warehouse system on large clusters, in ICDE, 2013

    Google Scholar 

  72. S. Melnik, A. Gubarev, J.J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis, Dremel: interactive analysis of web-scale datasets, Proc. VLDB Endowment 3(1), 330–339 (2010)

    Google Scholar 

  73. D.J. DeWitt, A. Halverson, R.V. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, J. Gramling, Split query processing in Polybase, in SIGMOD, 2013

    Google Scholar 

  74. V.R. Gankidi, N. Teletia, J.M. Patel, A. Halverson, D.J. DeWitt, Indexing HDFS data in PDW: splitting the data from the index. Proc. VLDB Endowment 7(13), 1520–1528 (2014)

    Google Scholar 

  75. S. Sakr, E. Pardede (eds.), Graph Data Management: Techniques and Applications (IGI Global, Hershey, 2011)

    Google Scholar 

  76. S. Sakr, Processing large-scale graph data: a guide to current technology, in IBM DeveloperWorks (2013), p. 15

    Google Scholar 

  77. A. Khan, S. Elnikety, Systems for big-graphs. Proc. VLDB Endowment 7(13), 1709–1710 (2014)

    Google Scholar 

  78. R. Chen, X. Weng, B. He, M. Yang, Large graph processing in the cloud, in SIGMOD, 2010

    Google Scholar 

  79. U. Kang, C.E. Tsourakakis, C. Faloutsos, PEGASUS: a peta-scale graph mining system, in ICDM, 2009

    Google Scholar 

  80. U. Kang, H. Tong, J. Sun, C.-Y. Lin, C. Faloutsos, GBASE: a scalable and general graph management system, in KDD, 2011

    Google Scholar 

  81. U. Kang, C.E. Tsourakakis, C. Faloutsos, PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)

    Google Scholar 

  82. U. Kang, B. Meeder, C. Faloutsos, Spectral analysis for billion-scale graphs: discoveries and implementation, in PAKDD, 2011

    Google Scholar 

  83. Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, P. Kalnis, Mizan: a system for dynamic load balancing in large-scale graph processing, in EuroSys, 2013

    Google Scholar 

  84. S. Salihoglu, J. Widom, GPS: a graph processing system, in SSDBM, 2013

    Google Scholar 

  85. J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin, PowerGraph: distributed graph-parallel computation on natural graphs, in OSDI, 2012

    Google Scholar 

  86. A. Kyrola, G.E. Blelloch, C. Guestrin, GraphChi: large-scale graph computation on just a PC, in OSDI, 2012

    Google Scholar 

  87. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J.M. Hellerstein, Distributed GraphLab: a framework for machine learning in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)

    Google Scholar 

  88. B. Shao, H. Wang, Y. Li, Trinity: a distributed graph engine on a memory cloud, in SIGMOD, 2013

    Google Scholar 

  89. G. Wang, W. Xie, A. Demers, J. Gehrke, Asynchronous large-scale graph processing made easy, in CIDR, 2013

    Google Scholar 

  90. P. Stutz, A. Bernstein, W.W. Cohen, Signal/collect: graph algorithms for the (semantic) web, in International Semantic Web Conference (1), 2010

    Google Scholar 

  91. L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Google Scholar 

  92. W.D. Clinger, Foundations of actor semantics. Technical report, Cambridge (1981)

    Google Scholar 

  93. Y. Tian, A. Balmin, S.A. Corsten, S. Tatikonda, J. McPherson, From “think like a vertex” to “think like a graph”. Proc. VLDB Endowment 7(3), 193–204 (2013)

    Google Scholar 

  94. A. Dave, A. Jindal, L.E. Li, R. Xin, J. Gonzalez, M. Zaharia, GraphFrames: an integrated API for mixing graph and relational queries, in Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (ACM, New York, 2016), p. 2

    Google Scholar 

  95. M. Junghanns, A. Petermann, K. Gómez, E. Rahm, Gradoop: scalable graph data management and analytics with Hadoop (2015). Preprint. arXiv:1506.00548

    Google Scholar 

  96. M. Kricke, E. Peukert, E. Rahm, Graph data transformations in Gradoop, in BTW 2019, 2019

    Google Scholar 

  97. N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, A, Taylor, Cypher: an evolving query language for property graphs, in Proceedings of the 2018 International Conference on Management of Data (ACM, New York, 2018), pp. 1433–1445

    Google Scholar 

  98. M. Junghanns, M. Kießling, A. Averbuch, A. Petermann, E. Rahm, Cypher-based graph pattern matching in Gradoop, in Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (ACM, New York, 2017), p. 3

    Google Scholar 

  99. M. Junghanns, M. Kießling, N. Teichmann, K. Gómez, A. Petermann, E. Rahm, Declarative and distributed graph analytics with Gradoop. Proc. VLDB Endowment 11(12), 2006–2009 (2018)

    Google Scholar 

  100. W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, H. Yu, TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC, in KDD, 2013

    Google Scholar 

  101. D. Yan, J. Cheng, Y. Lu, W. Ng, Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endowment 7(14), 1981–1992 (2014)

    Google Scholar 

  102. World Wide Web Consortium. RDF 1.1 Primer (2014)

    Google Scholar 

  103. F. Manola, E. Miller. RDF Primer, February 2004. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

  104. E. Prud’hommeaux, A. Seaborne, SPARQL Query Language for RDF, W3C Recommendation, January 2008. http://www.w3.org/TR/rdf-sparql-query/

  105. Z. Kaoudi, I. Manolescu, RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)

    Google Scholar 

  106. M. Wylot, M. Hauswirth, P. Cudré-Mauroux, S. Sakr, RDF data storage and query processing schemes: a survey. ACM Comput. Surv. 51(4), 84:1–84:36 (2018)

    Google Scholar 

  107. V. Khadilkar, M. Kantarcioglu, B.M. Thuraisingham, P. Castagna, Jena-HBase: a distributed, scalable and efficient RDF triple store, in Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, 11–15 November 2012

    Google Scholar 

  108. R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)

    Google Scholar 

  109. A. Aranda-Andújar, F. Bugiotti, J. Camacho-Rodríguez, D. Colazzo, F. Goasdoué, Z. Kaoudi, I. Manolescu, AMADA: web data repositories in the amazon cloud, in 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, 29 October–02 November 2012, pp. 2749–2751

    Google Scholar 

  110. G. Ladwig, A. Harth, Cumulusrdf: linked data management on nested key-value stores, in The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), vol. 30 (2011)

    Google Scholar 

  111. A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Google Scholar 

  112. R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D-SPARQ: distributed, scalable and efficient RDF query engine, in Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, 23 October 2013, pp. 261–264

    Google Scholar 

  113. J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment 4(11), 1123–1134 (2011)

    Google Scholar 

  114. N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: high-performance distributed joins over large-scale RDF graphs, in 2013 IEEE International Conference on Big Data (IEEE, Piscataway, 2013), pp. 255–263

    Google Scholar 

  115. J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment 4(11), 1123–1134 (2011)

    Google Scholar 

  116. A. Abouzied, K. Bajda-Pawlikowski, J. Huang, D.J. Abadi, A. Silberschatz, HadoopDB in action: building real world applications, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, 6–10 June 2010, pp. 1111–1114

    Google Scholar 

  117. T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endowment 1(1), 647–659 (2008)

    Google Scholar 

  118. F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, CliqueSquare: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, 13–17 April 2015, pp. 771–782

    Google Scholar 

  119. B. Djahandideh, F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, CliqueSquare in action: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, 13–17 April 2015, pp. 1432–1435

    Google Scholar 

  120. A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, G. Lausen, PigSPARQL: a SPARQL query processing baseline for big data, in Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, 23 October 2013, pp. 241–244

    Google Scholar 

  121. C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp. 1099–1110

    Google Scholar 

  122. P. Ravindra, H. Kim, K. Anyanwu, An intermediate algebra for optimizing RDF graph pattern matching on MapReduce, in The Semantic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011. Proceedings, Part II, Heraklion, Crete, 29 May–2 June 2011, pp. 46–61

    Google Scholar 

  123. H. Kim, P. Ravindra, K. Anyanwu, Optimizing RDF(S) queries on cloud platforms, in 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, 13–17 May 2013, Companion Volume (2013), pp. 261–264

    Google Scholar 

  124. A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF querying with SPARQL on Spark. CoRR (2015), abs/1512.07021

    Google Scholar 

  125. D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment (2007), pp. 411–422

    Google Scholar 

  126. P. Valduriez, Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)

    Google Scholar 

  127. P.A. Bernstein, D.-M.W. Chiu, Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)

    MATH  Google Scholar 

  128. X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, 21 October 2014, pp. 261–264

    Google Scholar 

  129. X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Volume I, Singapore, 6–9 December 2015, pp. 292–300

    Google Scholar 

  130. A. Schätzle, M. Przyjaciel-Zablocki, T. Berberich, G. Lausen, S2X: graph-parallel querying of RDF with GraphX, in 1st International Workshop on Big-Graphs Online Querying (Big-O(Q), 2015

    Google Scholar 

  131. E.L. Goodman, D. Grunwald, Using vertex-centric programming platforms to implement SPARQL queries on large graphs, in Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’14 (IEEE Press, Piscataway, 2014), pp. 25–32

    Google Scholar 

  132. H. Naacke, O. Curé, B. Amann, SPARQL query processing with Apache Spark (2016). CoRR, abs/1604.08903

    Google Scholar 

  133. K. Zeng, J. Yang, H. Wang, B. Shao, Z. Wang, A distributed graph engine for web scale RDF data, in Proceedings of the 39th International Conference on Very Large Data Bases. VLDB Endowment (2013), pp. 265–276

    Google Scholar 

  134. P. Stutz, M. Verman, L. Fischer, A. Bernstein, TripleRush: a fast and scalable triple store, in SSWS@ ISWC (2013), pp. 50–65

    Google Scholar 

  135. P. Stutz, B. Paudel, M. Verman, A. Bernstein, Random walk TripleRush: asynchronous graph querying and sampling, in Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, 18–22 May 2015, pp. 1034–1044

    Google Scholar 

  136. P. Stutz, A. Bernstein, W. Cohen, Signal/collect: graph algorithms for the (semantic) web, in International Semantic Web Conference (Springer, Berlin, 2010), pp. 764–780

    Google Scholar 

  137. R. Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB Endowment 8(12), 1848–1851 (2015)

    Google Scholar 

  138. R. Al-Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Y. Ebrahim, M. Sahli, Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)

    Google Scholar 

  139. S. Gurajada, S. Seufert, I. Miliaraki, M. Theobald, TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing, in International Conference on Management of Data, SIGMOD 2014, Snowbird, 22–27 June 2014, pp. 289–300

    Google Scholar 

  140. L. Galárraga, K. Hose, R. Schenkel, Partout: a distributed engine for efficient RDF processing, in 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April 2014, Companion Volume, pp. 267–268

    Google Scholar 

  141. T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)

    Google Scholar 

  142. M. Hammoud, D.A. Rabbou, R. Nouri, S.-M.-R. Beheshti, S. Sakr, DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endowment 8(6), 654–665 (2015)

    Google Scholar 

  143. A. Hasan, M. Hammoud, R. Nouri, S. Sakr, DREAM in action: a distributed and adaptive RDF system on the cloud, in Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, 11–15 April 2016, Companion Volume, pp. 191–194

    Google Scholar 

  144. L. Cheng, S. Kotoulas, Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1(4), 138–150 (2015)

    Google Scholar 

  145. M. Wylot, P. Cudré-Mauroux, DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)

    Google Scholar 

  146. P. Zikopoulos, C. Eaton et al., Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data (McGraw-Hill Osborne Media, New York, 2011)

    Google Scholar 

  147. K. Ashton et al., That ‘Internet of things’ thing. RFID J. 22(7), 97–114 (2009)

    Google Scholar 

  148. N. Marz, J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems (Manning Publications Co., Shelter Island, 2015)

    Google Scholar 

  149. T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, K. Elmeleegy, R. Sears, MapReduce online, in NSDI, 2010

    Google Scholar 

  150. T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, J. Gerth, J. Talbot, K. Elmeleegy, R. Sears, Online aggregation and continuous query support in MapReduce, in SIGMOD, 2010

    Google Scholar 

  151. D. Logothetis, K. Yocum, Ad-hoc data processing in the cloud. Proc. VLDB Endowment 1(2), 1472–1475 (2008)

    Google Scholar 

  152. P. Bhatotia, A. Wieder, R. Rodrigues, U.A. Acar, R. Pasquini, Incoop: MapReduce for incremental computations, in SOCC, 2011

    Google Scholar 

  153. A.M. Aly, A. Sallam, B.M. Gnanasekaran, L.-V. Nguyen-Dinh, W.G. Aref, M. Ouzzaniy, A. Ghafoor, M3: stream processing on main-memory MapReduce, in ICDE, 2012

    Google Scholar 

  154. V. Kumar, H. Andrade, B. Gedik, K.-L. Wu, DEDUCE: at the intersection of MapReduce and stream processing, in EDBT (2010), pp. 657–662

    Google Scholar 

  155. S. Sakr, An introduction to InfoSphere Streams: a platform for analyzing big data in motion. IBM DeveloperWorks, 2013. http://www.ibm.com/developerworks/library/bd-streamsintro/index.html

  156. S. Loesing, M. Hentschel, T. Kraska, D. Kossmann, Stormy: an elastic and highly available streaming service in the cloud, in EDBT/ICDT Workshops, 2012

    Google Scholar 

  157. H. Balakrishnan, M. Frans Kaashoek, D.R. Karger, R. Morris, I. Stoica, Looking up data in p2p systems. Commun. ACM 46(2), 43–48 (2003)

    Google Scholar 

  158. L. Neumeyer, B. Robbins, A. Nair, A. Kesari, S4: distributed stream computing platform, in ICDMW, 2010

    Google Scholar 

  159. B. Gedik, H. Andrade, K.-L. Wu, P.S. Yu, M. Doo, SPADE: the system S declarative stream processing engine, in SIGMOD, 2008

    Google Scholar 

  160. M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, M. Zaharia, Structured streaming: a declarative API for real-time applications in Apache Spark, in SIGMOD, 2018

    Google Scholar 

  161. J. Kreps, N. Narkhede, J. Rao et al., Kafka: a distributed messaging system for log processing, in Proceedings of the NetDB, 2011

    Google Scholar 

  162. S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J.M. Patel, K. Ramasamy, S. Taneja, Twitter Heron: stream processing at scale, in SIGMOD, 2015

    Google Scholar 

  163. G. De Francisci Morales, A. Bifet, Samoa: scalable advanced massive online analysis. J. Mach. Learn. Res. 16(1), 149–153 (2015)

    Google Scholar 

  164. A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, U. Srivastava, Building a highlevel dataflow system on top of MapReduce: the Pig experience. Proc. VLDB Endowment 2(2), 1414–1425 (2009)

    Google Scholar 

  165. A. Gates, Programming Pig (O’Reilly Media, Sebastopol, 2011)

    Google Scholar 

  166. C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: easy, efficient data-parallel pipelines, in PLDI, 2010

    Google Scholar 

  167. D. Wu, L. Zhu, X. Xu, S. Sakr, D. Sun, Q. Lu, A pipeline framework for heterogeneous execution environment of big data processing. IEEE Softw. 33, 60–67 (2016)

    Google Scholar 

  168. R. Elshawi, S. Sakr, D. Talia, P. Trunfio, Big data systems meet machine learning challenges: towards big data science as a service. Big Data Res. 14, 1–11 (2018)

    Google Scholar 

  169. D. Michie, D.J. Spiegelhalter, C.C. Taylor et al., Machine Learning. Neural and Statistical Classification, vol. 13 (Ellis Horwood, London, 1994)

    Google Scholar 

  170. S. Owen, Mahout in Action (Manning Publications Co., Shelter Island, 2012)

    Google Scholar 

  171. X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D.B. Tsai, M. Amde, S. Owen et al., MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  172. M. Stonebraker, P. Brown, A. Poliakov, S. Raman, The architecture of SciDB, in International Conference on Scientific and Statistical Database Management (Springer, Berlin, 2011), pp. 1–16

    Google Scholar 

  173. X. Li, B. Cui, Y. Chen, W. Wu, C. Zhang, MLog: towards declarative in-database machine learning. Proc. VLDB Endowment 10(12), 1933–1936 (2017)

    Google Scholar 

  174. P.G. Brown, Overview of SciDB: large scale array storage, processing and analysis, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 963–968

    Google Scholar 

  175. J.M. Hellerstein, C. Ré, F. Schoppmann, D.Z. Wang, E. Fratkin, A. Gorajek, K.S. Ng, C. Welton, X. Feng, K. Li et al., The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endowment 5(12), 1700–1711 (2012)

    Google Scholar 

  176. S. Das, Y. Sismanis, K.S. Beyer, R. Gemulla, P.J. Haas, J. McPherson, Ricardo: integrating R and Hadoop, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 987–998

    Google Scholar 

  177. S. Venkataraman, Z. Yang, D. Liu, E. Liang, H. Falaki, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica et al., SparkR: scaling R programs with Spark, in Proceedings of the 2016 International Conference on Management of Data (ACM, New York, 2016), pp. 1099–1104

    Google Scholar 

  178. S. Leo, G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (ACM, New York, 2010), pp. 819–825

    Google Scholar 

  179. AzureML Team. AzureML: anatomy of a machine learning service, in Conference on Predictive APIs and Apps (2016), pp. 1–13

    Google Scholar 

  180. B. Huang, S. Babu, J. Yang, Cumulon: optimizing statistical data analysis in the cloud, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2013), pp. 1–12

    Google Scholar 

  181. M. Boehm, M.W. Dusenberry, D. Eriksson, A.V. Evfimievski, F.M. Manshadi, N. Pansare, B. Reinwald, F.R. Reiss, P. Sen, A.C. Surve et al., SystemML: declarative machine learning on Spark. Proc. VLDB Endowment 9(13), 1425–1436 (2016)

    Google Scholar 

  182. S. Schelter, A. Palumbo, S. Quinn, S. Marthi, A. Musselman, Samsara: declarative machine learning on distributed dataflow systems, in NIPS Workshop ML Systems, 2016

    Google Scholar 

  183. T. Kraska, A. Talwalkar, J.C. Duchi, R. Griffith, M.J. Franklin, M.I. Jordan, MLbase: a distributed machine-learning system, in CIDR, 2013

    Google Scholar 

  184. M. Weimer, T. Condie, R. Ramakrishnan et al., Machine learning in ScalOps, a higher order cloud computing language, in NIPS 2011 Workshop on Parallel and Large-Scale Machine Learning (BigLearn), vol. 9 (2011), pp. 389–396

    Google Scholar 

  185. V. Borkar, M. Carey, R. Grover, N. Onose, R. Vernica, Hyracks: a flexible and extensible foundation for data-intensive computing, in 2011 IEEE 27th International Conference on Data Engineering (IEEE, Piscataway, 2011), pp. 1151–1162

    Google Scholar 

  186. E.R. Sparks, S. Venkataraman, T. Kaftan, M.J. Franklin, B. Recht, Keystoneml: optimizing pipelines for large-scale advanced analytics, in 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (IEEE, Piscataway, 2017), pp. 535–546

    Google Scholar 

  187. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)

    Google Scholar 

  188. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)

    MATH  Google Scholar 

  189. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  190. R. Collobert et al., Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  191. G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Google Scholar 

  192. Y. Bengio et al., Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    MATH  Google Scholar 

  193. M. Abadi et al., TensorFlow: a system for large-scale machine learning, in OSDI, vol. 16 (2016), pp. 265–283

    Google Scholar 

  194. D. Baylor, E. Breck, H.-T. Cheng, N. Fiedel, C.Y. Foo, Z. Haque, S. Haykal, M. Ispir, V. Jain, L. Koc et al., TFX: a TensorFlow-based production-scale machine learning platform, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2017), pp. 1387–1395

    Google Scholar 

  195. J. Bergstra et al., Theano: a CPU and GPU math compiler in Python, in Proceedings of 9th Python in Science Conference, vol. 1, 2010

    Google Scholar 

  196. T. Chen et al., MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). Preprint. arXiv:1512.01274

    Google Scholar 

  197. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch (2017)

    Google Scholar 

  198. S. Tokui, K. Oono, S. Hido, J. Clayton, Chainer: a next-generation open source framework for deep learning, in NIPS Workshops, 2015

    Google Scholar 

  199. S. Lohr, The age of big data. New York Times, 11, 2012

    Google Scholar 

  200. V. Mayer-Schönberger, K. Cukier, Big Data: A Revolution that Will Transform How We Live, Work, and Think (Houghton Mifflin Harcourt, Boston, 2013)

    Google Scholar 

  201. H.E. Schaffer, X as a service, cloud computing, and the need for good judgment. IT Prof. 11(5), 4–5 (2009)

    Google Scholar 

  202. D. Delen, H. Demirkan, Data, information and analytics as services. Decis. Support Syst. 55(1), 359–363 (2013)

    Google Scholar 

  203. M. Baker, Data science: industry allure. Nature 520, 253–255 (2015)

    Google Scholar 

  204. F. Provost, T. Fawcett, Data science and its relationship to big data and data-driven decision making. Big Data 1(1), 51–59 (2013)

    Google Scholar 

  205. A. Labrinidis, H.V. Jagadish, Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)

    Google Scholar 

  206. H.V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J.M. Patel, R. Ramakrishnan, C. Shahabi, Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)

    Google Scholar 

  207. D. Abadi, S. Babu, F. Ozcan, I. Pandis, SQL-on-Hadoop systems. Proc. VLDB Endowment 8(12), 2050–2061 (2015)

    Google Scholar 

  208. S. Sakr, S. Elnikety, Y. He, G-SPARQL: a hybrid engine for querying large attributed graphs, in CIKM (2012), pp. 335–344

    Google Scholar 

  209. Y. Guo, A.L. Varbanescu, A. Iosup, C. Martella, T.L. Willke, Benchmarking graph-processing platforms: a vision, in ICPE, 2014

    Google Scholar 

  210. A. Barnawi, O. Batarfi, S.-M.-R. Beheshti, R. El Shawi, A.G. Fayoumi, R. Nouri, S. Sakr, On characterizing the performance of distributed graph computation platforms, in TPCTC, 2014

    Google Scholar 

  211. O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, S. Sakr, Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

    Google Scholar 

  212. M. Han, K. Daudjee, K. Ammar, M. Tamer Özsu, X. Wang, T. Jin, An experimental comparison of Pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)

    Google Scholar 

  213. Y. Lu, J. Cheng, D. Yan, H. Wu, Large-scale distributed graph computing systems: an experimental evaluation. Proc. VLDB Endowment 8(3), 281–292 (2014)

    Google Scholar 

  214. Y. Guo, M. Biczak, A.L. Varbanescu, A. Iosup, C. Martella, T.L. Willke, How well do graph-processing platforms perform? An empirical performance evaluation and analysis, in IPDPS, 2014

    Google Scholar 

  215. M. Li, J. Tan, Y. Wang, L. Zhang, V. Salapura, SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark, in Proceedings of the 12th ACM International Conference on Computing Frontiers, CF’15, Ischia, 18–21 May 2015, pp. 53:1–53:8

    Google Scholar 

  216. M. Capota, T. Hegeman, A. Iosup, A. Prat-Pérez, O. Erling, P.A. Boncz, Graphalytics: a big data benchmark for graph-processing platforms, in Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, GRADES 2015, Melbourne, 31 May–4 June 2015, pp. 7:1–7:6

    Google Scholar 

  217. O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, A. Barnawi, S. Sakr et al., Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

    Google Scholar 

  218. V. Aluko, S. Sakr, Big SQL systems: an experimental evaluation. Clust. Comput. 22(4), 1347–1377 (2019)

    Google Scholar 

  219. N. Mahmoud, Y. Essam, R. El Shawi, S. Sakr, DLBench: an experimental evaluation of deep learning frameworks, in 2019 IEEE International Congress on Big Data, BigData Congress 2019, Milan, 8–13 July 2019, pp. 149–156

    Google Scholar 

  220. E. Shahverdi, A. Awad, S. Sakr, Big stream processing systems: an experimental evaluation, in 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW) (IEEE, Piscataway, 2019), pp. 53–60

    Google Scholar 

  221. I. Gog, M. Schwarzkopf, N. Crooks, M.P. Grosvenor, A. Clement, S. Hand, Musketeer: all for one, one for all in data processing systems, in EuroSys (2015), pp. 2:1–2:16

    Google Scholar 

  222. D. Agrawal, M. Lamine Ba, L. Berti-Equille, S. Chawla, A. Elmagarmid, H. Hammady, Y. Idris, Z. Kaoudi, Z. Khayyat, S. Kruse, M. Ouzzani, P. Papotti, J.-A. Quian-Ruiz, N. Tang, M.J. Zaki, Rheem: enabling multi-platform task execution, in SIGMOD Conference, 2016

    Google Scholar 

  223. N. Huijboom, T. Van den Broek, Open data: an international comparison of strategies. Eur. J. ePractice 12(1), 4–16 (2011)

    Google Scholar 

  224. M. Balazinska, B. Howe, D. Suciu, Data markets in the cloud: an opportunity for the database community. Proc. VLDB Endowment 4(12), 1482–1485 (2011)

    Google Scholar 

  225. R. El Shawi, M. Maher, S. Sakr, Automated machine learning: state-of-the-art and open challenges (2019). CoRR, abs/1906.02287

    Google Scholar 

  226. H. Miao, A. Li, L.S. Davis, A. Deshpande, ModelHub: deep learning lifecycle management, in 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (IEEE, Piscataway, 2017), pp. 1393–1394

    Google Scholar 

  227. M. Vartak, H. Subramanyam, W.-E. Lee, S. Viswanathan, S. Husnoo, S. Madden, M. Zaharia, Model DB: a system for machine learning model management, in Proceedings of the Workshop on Human-In-the-Loop Data Analytics (ACM, New York, 2016), p. 14

    Google Scholar 

  228. P. Bailis, K. Olukotun, C. Ré, M. Zaharia, Infrastructure for usable machine learning: the Stanford DAWN Project (2017). Preprint. arXiv:1705.07538

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sakr, S. (2020). Large-Scale Processing Systems of Structured Data. In: Big Data 2.0 Processing Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-44187-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44187-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44186-9

  • Online ISBN: 978-3-030-44187-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics