Large-Scale Processing Systems of Structured Data

Sakr, Sherif

doi:10.1007/978-3-030-44187-6_3

Sherif Sakr²

547 Accesses

Abstract

In practice, it has been acknowledged that Hadoop framework is not an adequate choice for supporting interactive queries which aim of achieving a response time of milliseconds or few seconds. In addition, many programmers may be unfamiliar with the Hadoop framework and they would prefer to use SQL as a high-level declarative language to implement their jobs while delegating all of the optimization details in the execution process to the underlying engine. This chapter provides an overview of various systems that have been introduced to support the SQL flavor on top of the Hadoop-like infrastructure and provide competing and scalable performance on processing large-scale structured data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Lynch, Big data: how do your data grow? Nature 455(7209), 28–29 (2008)
Google Scholar
Large synoptic survey. http://www.lsst.org/
H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Google Scholar
T. Hey, S. Tansley, K. Tolle (eds.), The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Research, Redmond, 2009)
Google Scholar
G. Bell, J. Gray, A.S. Szalay, Petascale computational systems. IEEE Comput. 39(1), 110–112 (2006)
Google Scholar
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A.H. Byers, Big data: the next frontier for innovation, competition, and productivity. Technical Report 1999-66, May 2011
Google Scholar
A. McAfee, E. Brynjolfsson, T.H. Davenport, D.J. Patil, D. Barton, Big data. The management revolution. Harvard Bus. Rev. 90(10), 61–67 (2012)
Google Scholar
R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)
Google Scholar
L.M. Vaquero, L. Rodero-Merino, J. Caceres, M. Lindner, A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)
Google Scholar
D.C. Plummer, T.J. Bittman, T. Austin, D.W. Cearley, D.M. Smith, Cloud computing: defining and describing an emerging phenomenon. Gartner (2008)
Google Scholar
J. Staten, S. Yates, F.E. Gillett, W. Saleh, R.A. Dines, Is cloud computing ready for the enterprise. Forrester Research (2008)
Google Scholar
M. Armbrust, O. Fox, R. Griffith, A.D. Joseph, Y. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., Above the clouds: a Berkeley view of cloud computing (2009)
Google Scholar
S. Madden, From databases to big data. IEEE Internet Comput. 3, 4–6 (2012)
Google Scholar
S. Sakr, Cloud-hosted databases: technologies, challenges and opportunities. Clust. Comput. 17(2), 487–502 (2014)
Google Scholar
S. Sakr, A. Liu, D.M. Batista, M. Alomari, A survey of large scale data management approaches in cloud environments. IEEE Commun. Surv. Tutor. 13(3), 311–336 (2011)
Google Scholar
S. LaValle, E. Lesser, R. Shockley, M.S. Hopkins, N. Kruschwitz, Big data, analytics and the path from insights to value. MIT Sloan Manag. Rev. 52(2), 21 (2011)
Google Scholar
X. Wu, X. Zhu, G.-Q. Wu, W. Ding, Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Google Scholar
D.J. DeWitt, J. Gray, Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Google Scholar
A. Pavlo, E. Paulson, A. Rasin, D.J. Abadi, D.J. DeWitt, S. Madden, M. Stonebraker, A comparison of approaches to large-scale data analysis, in SIGMOD (2009), pp. 165–178
Google Scholar
J. Dean, S. Ghemawa, MapReduce: simplified data processing on large clusters, in OSDI, 2004
Google Scholar
D. Agrawal, S. Das, A. El Abbadi, Big data and cloud computing: current state and future opportunities, in Proceedings of the 14th International Conference on Extending Database Technology (ACM, New York, 2011), pp. 530–533
Google Scholar
S. Sakr, A. Liu, A.G. Fayoumi, The family of MapReduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 1–44 (2013)
Google Scholar
H. Yang, A. Dasdan, R. Hsiao, D. Parker, Map-reduce-merge: simplified relational data processing on large clusters, in SIGMOD, 2007
Google Scholar
M. Stonebraker, The case for shared nothing. IEEE Database Eng. Bull. 9(1), 4–9 (1986)
Google Scholar
T. White, Hadoop: The Definitive Guide (O’Reilly Media, Sebastopol, 2012)
Google Scholar
D. Jiang, A.K.H. Tung, G. Chen, MAP-JOIN-REDUCE: toward scalable and efficient data analysis on large clusters. IEEE TKDE 23(9), 1299–1311 (2011)
Google Scholar
Y. Bu, B. Howe, M. Balazinska, M.D. Ernst, The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)
Google Scholar
Y. Zhang, Q. Gao, L. Gao, C. Wang, iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)
Google Scholar
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, G. Fox, Twister: a runtime for iterative MapReduce, in HPDC, 2010
Google Scholar
T. Nykiel, M. Potamias, C. Mishra, G. Kollios, N. Koudas, MRShare: sharing across multiple queries in MapReduce. Proc. VLDB Endowment 3(1), 494–505 (2010)
MATH Google Scholar
I. Elghandour, A. Aboulnaga, ReStore: reusing results of MapReduce jobs. Proc. VLDB Endowment 5(6), 586–597 (2012)
Google Scholar
I. Elghandour, A. Aboulnaga, ReStore: reusing results of MapReduce jobs in Pig, in SIGMOD, 2012
Google Scholar
J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, J. Schad, Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endowment 3(1), 518–529 (2010)
Google Scholar
A. Floratou, J.M. Patel, E.J. Shekita, S. Tata, Column-oriented storage techniques for MapReduce. Proc. VLDB Endowment 4(7), 419–429 (2011)
Google Scholar
Y. Lin et al., Llama: leveraging columnar storage for scalable join processing in the MapReduce framework, in SIGMOD, 2011
Google Scholar
T. Kaldewey, E.J. Shekita, S. Tata, Clydesdale: structured data processing on MapReduce, in EDBT (2012), pp. 15–25
Google Scholar
A. Balmin, T. Kaldewey, S. Tata, Clydesdale: structured data processing on Hadoop, in SIGMOD Conference (2012), pp. 705–708
Google Scholar
M. Zukowski, P.A. Boncz, N. Nes, S. Héman, MonetDB/X100 - a DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)
Google Scholar
Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, Z. Xu, RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems, in ICDE (2011), pp. 1199–1208
Google Scholar
A. Jindal, J.-A. Quiane-Ruiz, J. Dittrich, Trojan data layouts: right shoes for a running elephant, in SoCC, 2011
Google Scholar
M.Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson, CoHadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endowment 4(9), 575–585 (2011)
Google Scholar
Y. Huai, A. Chauhan, A. Gates, G. Hagleitner, E.N. Hanson, O. O’Malley, J. Pandey, Y. Yuan, R. Lee, X. Zhang, Major technical advancements in Apache Hive, in SIGMOD, 2014
Google Scholar
G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD, 2010
Google Scholar
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud, 2010
Google Scholar
M. Odersky, L. Spoon, B. Venners, Programming in Scala: A Comprehensive Step-by-Step Guide (Artima Inc., Walnut Creek, 2011)
Google Scholar
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R.H. Katz, S. Shenker, I. Stoica, Mesos: a platform for fine-grained resource sharing in the data center, in NSDI, 2011
Google Scholar
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, in EuroSys (2010), pp. 265–278
Google Scholar
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system, in MSST, 2010
Google Scholar
M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in Spark, in SIGMOD, 2015
Google Scholar
E.R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J.E. Gonzalez, M.J. Franklin, M.I. Jordan, T. Kraska, MLI: an API for distributed machine learning, in ICDM, 2013
Google Scholar
J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in OSDI, 2014
Google Scholar
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M.J. Sax, S. Schelter, M. Höger, K. Tzoumas, D. Warneke, The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)
Google Scholar
A. Alexandrov, D. Battré, S. Ewen, M. Heimel, F. Hueske, O. Kao, V. Markl, E. Nijkamp, D. Warneke, Massively parallel data analysis with PACTs on nephele. Proc. VLDB Endowment 3(2), 1625–1628 (2010)
Google Scholar
D. Battré et al., Nephele/PACTs: a programming model and execution framework for web-scale analytical processing, in SoCC, 2010
Google Scholar
P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, T.G. Price, Access path selection in a relational database management system, in SIGMOD, 1979
Google Scholar
A. Heise, A. Rheinlnder, M. Leich, U. Leser, F. Naumann, Meteor/Sopremo: an extensible query language and operator model, in VLDB Workshops, 2012
Google Scholar
V.R. Borkar, M.J. Carey, R. Grover, N. Onose, R. Vernica, Hyracks: a flexible and extensible foundation for data-intensive computing, in ICDE, 2011
Google Scholar
A. Behm, V.R. Borkar, M.J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, V.J. Tsotras, ASTERIX: towards a scalable, semistructured data platform for evolving-world models. Distrib. Parallel Databases 29(3), 185–216 (2011)
Google Scholar
V. Borkar, S. Alsubaiee, Y. Altowim, H. Altwaijry, A. Behm, Y. Bu, M. Carey, R. Grover, Z. Heilbron, Y.-S. Kim, C. Li, P. Pirzadeh, N. Onose, R. Vernica, J. Wen, ASTERIX: an open source system for “Big Data” management and analysis. Proc. VLDB Endowment 5(2), 1898–1901 (2012)
Google Scholar
S. Alsubaiee, Y. Altowim, H. Altwaijry, A. Behm, V.R. Borkar, Y. Bu, M.J. Carey, I. Cetindil, M. Cheelangi, K. Faraaz, E. Gabrielova, R. Grover, Z. Heilbron, Y.-S. Kim, C. Li, G. Li, J.M. Ok, N. Onose, P. Pirzadeh, V.J. Tsotras, R. Vernica, J. Wen, T. Westmann, AsterixDB: a scalable, open source BDMS. Proc. VLDB Endowment 7(14), 1905–1916 (2014)
Google Scholar
Y. Bu, V.R. Borkar, J. Jia, M.J. Carey, T. Condie, Pregelix: big(ger) graph analytics on a dataflow engine. Proc. VLDB Endowment 8(2), 161–172 (2014)
Google Scholar
A. Pavlo, E. Paulson, A. Rasin, D.J. Abadi, D.J. DeWitt, S. Madden, M. Stonebraker, A comparison of approaches to large-scale data analysis, in SIGMOD, 2009
Google Scholar
A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J.S. Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in SIGMOD, 2010
Google Scholar
A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J.S. Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in SIGMOD Conference (2010), pp. 1013–1020
Google Scholar
B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A.C. Murthy, C. Curino, Apache Tez: a unifying framework for modeling and building data processing applications, in SIGMOD, 2015
Google Scholar
V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, E. Baldeschwieler, Apache Hadoop YARN: yet another resource negotiator, in SOCC, 2013
Google Scholar
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, M. Yoder, Impala: a modern, open-source SQL engine for Hadoop, in CIDR, 2015
Google Scholar
S. Wanderman-Milne, N. Li, Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014)
Google Scholar
A. Abouzeid, K. Bajda-Pawlikowski, D.J. Abadi, A. Rasin, A. Silberschatz, HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endowment 2(1), 922–933 (2009)
Google Scholar
M. Stonebraker, D. Abadi, D. DeWitt, S. Madden, E. Paulson, A. Pavlo, A. Rasin, MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Google Scholar
H. Choi, J. Son, H. Yang, H. Ryu, B. Lim, S. Kim, Y.D. Chung, Tajo: a distributed data warehouse system on large clusters, in ICDE, 2013
Google Scholar
S. Melnik, A. Gubarev, J.J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis, Dremel: interactive analysis of web-scale datasets, Proc. VLDB Endowment 3(1), 330–339 (2010)
Google Scholar
D.J. DeWitt, A. Halverson, R.V. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, J. Gramling, Split query processing in Polybase, in SIGMOD, 2013
Google Scholar
V.R. Gankidi, N. Teletia, J.M. Patel, A. Halverson, D.J. DeWitt, Indexing HDFS data in PDW: splitting the data from the index. Proc. VLDB Endowment 7(13), 1520–1528 (2014)
Google Scholar
S. Sakr, E. Pardede (eds.), Graph Data Management: Techniques and Applications (IGI Global, Hershey, 2011)
Google Scholar
S. Sakr, Processing large-scale graph data: a guide to current technology, in IBM DeveloperWorks (2013), p. 15
Google Scholar
A. Khan, S. Elnikety, Systems for big-graphs. Proc. VLDB Endowment 7(13), 1709–1710 (2014)
Google Scholar
R. Chen, X. Weng, B. He, M. Yang, Large graph processing in the cloud, in SIGMOD, 2010
Google Scholar
U. Kang, C.E. Tsourakakis, C. Faloutsos, PEGASUS: a peta-scale graph mining system, in ICDM, 2009
Google Scholar
U. Kang, H. Tong, J. Sun, C.-Y. Lin, C. Faloutsos, GBASE: a scalable and general graph management system, in KDD, 2011
Google Scholar
U. Kang, C.E. Tsourakakis, C. Faloutsos, PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)
Google Scholar
U. Kang, B. Meeder, C. Faloutsos, Spectral analysis for billion-scale graphs: discoveries and implementation, in PAKDD, 2011
Google Scholar
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, P. Kalnis, Mizan: a system for dynamic load balancing in large-scale graph processing, in EuroSys, 2013
Google Scholar
S. Salihoglu, J. Widom, GPS: a graph processing system, in SSDBM, 2013
Google Scholar
J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin, PowerGraph: distributed graph-parallel computation on natural graphs, in OSDI, 2012
Google Scholar
A. Kyrola, G.E. Blelloch, C. Guestrin, GraphChi: large-scale graph computation on just a PC, in OSDI, 2012
Google Scholar
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J.M. Hellerstein, Distributed GraphLab: a framework for machine learning in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)
Google Scholar
B. Shao, H. Wang, Y. Li, Trinity: a distributed graph engine on a memory cloud, in SIGMOD, 2013
Google Scholar
G. Wang, W. Xie, A. Demers, J. Gehrke, Asynchronous large-scale graph processing made easy, in CIDR, 2013
Google Scholar
P. Stutz, A. Bernstein, W.W. Cohen, Signal/collect: graph algorithms for the (semantic) web, in International Semantic Web Conference (1), 2010
Google Scholar
L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Google Scholar
W.D. Clinger, Foundations of actor semantics. Technical report, Cambridge (1981)
Google Scholar
Y. Tian, A. Balmin, S.A. Corsten, S. Tatikonda, J. McPherson, From “think like a vertex” to “think like a graph”. Proc. VLDB Endowment 7(3), 193–204 (2013)
Google Scholar
A. Dave, A. Jindal, L.E. Li, R. Xin, J. Gonzalez, M. Zaharia, GraphFrames: an integrated API for mixing graph and relational queries, in Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (ACM, New York, 2016), p. 2
Google Scholar
M. Junghanns, A. Petermann, K. Gómez, E. Rahm, Gradoop: scalable graph data management and analytics with Hadoop (2015). Preprint. arXiv:1506.00548
Google Scholar
M. Kricke, E. Peukert, E. Rahm, Graph data transformations in Gradoop, in BTW 2019, 2019
Google Scholar
N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, A, Taylor, Cypher: an evolving query language for property graphs, in Proceedings of the 2018 International Conference on Management of Data (ACM, New York, 2018), pp. 1433–1445
Google Scholar
M. Junghanns, M. Kießling, A. Averbuch, A. Petermann, E. Rahm, Cypher-based graph pattern matching in Gradoop, in Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (ACM, New York, 2017), p. 3
Google Scholar
M. Junghanns, M. Kießling, N. Teichmann, K. Gómez, A. Petermann, E. Rahm, Declarative and distributed graph analytics with Gradoop. Proc. VLDB Endowment 11(12), 2006–2009 (2018)
Google Scholar
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, H. Yu, TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC, in KDD, 2013
Google Scholar
D. Yan, J. Cheng, Y. Lu, W. Ng, Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endowment 7(14), 1981–1992 (2014)
Google Scholar
World Wide Web Consortium. RDF 1.1 Primer (2014)
Google Scholar
F. Manola, E. Miller. RDF Primer, February 2004. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
E. Prud’hommeaux, A. Seaborne, SPARQL Query Language for RDF, W3C Recommendation, January 2008. http://www.w3.org/TR/rdf-sparql-query/
Z. Kaoudi, I. Manolescu, RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Google Scholar
M. Wylot, M. Hauswirth, P. Cudré-Mauroux, S. Sakr, RDF data storage and query processing schemes: a survey. ACM Comput. Surv. 51(4), 84:1–84:36 (2018)
Google Scholar
V. Khadilkar, M. Kantarcioglu, B.M. Thuraisingham, P. Castagna, Jena-HBase: a distributed, scalable and efficient RDF triple store, in Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, 11–15 November 2012
Google Scholar
R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Google Scholar
A. Aranda-Andújar, F. Bugiotti, J. Camacho-Rodríguez, D. Colazzo, F. Goasdoué, Z. Kaoudi, I. Manolescu, AMADA: web data repositories in the amazon cloud, in 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, 29 October–02 November 2012, pp. 2749–2751
Google Scholar
G. Ladwig, A. Harth, Cumulusrdf: linked data management on nested key-value stores, in The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), vol. 30 (2011)
Google Scholar
A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Google Scholar
R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D-SPARQ: distributed, scalable and efficient RDF query engine, in Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, 23 October 2013, pp. 261–264
Google Scholar
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment 4(11), 1123–1134 (2011)
Google Scholar
N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: high-performance distributed joins over large-scale RDF graphs, in 2013 IEEE International Conference on Big Data (IEEE, Piscataway, 2013), pp. 255–263
Google Scholar
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment 4(11), 1123–1134 (2011)
Google Scholar
A. Abouzied, K. Bajda-Pawlikowski, J. Huang, D.J. Abadi, A. Silberschatz, HadoopDB in action: building real world applications, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, 6–10 June 2010, pp. 1111–1114
Google Scholar
T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endowment 1(1), 647–659 (2008)
Google Scholar
F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, CliqueSquare: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, 13–17 April 2015, pp. 771–782
Google Scholar
B. Djahandideh, F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, CliqueSquare in action: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, 13–17 April 2015, pp. 1432–1435
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, G. Lausen, PigSPARQL: a SPARQL query processing baseline for big data, in Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, 23 October 2013, pp. 241–244
Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp. 1099–1110
Google Scholar
P. Ravindra, H. Kim, K. Anyanwu, An intermediate algebra for optimizing RDF graph pattern matching on MapReduce, in The Semantic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011. Proceedings, Part II, Heraklion, Crete, 29 May–2 June 2011, pp. 46–61
Google Scholar
H. Kim, P. Ravindra, K. Anyanwu, Optimizing RDF(S) queries on cloud platforms, in 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, 13–17 May 2013, Companion Volume (2013), pp. 261–264
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF querying with SPARQL on Spark. CoRR (2015), abs/1512.07021
Google Scholar
D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment (2007), pp. 411–422
Google Scholar
P. Valduriez, Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)
Google Scholar
P.A. Bernstein, D.-M.W. Chiu, Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)
MATH Google Scholar
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, 21 October 2014, pp. 261–264
Google Scholar
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Volume I, Singapore, 6–9 December 2015, pp. 292–300
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, T. Berberich, G. Lausen, S2X: graph-parallel querying of RDF with GraphX, in 1st International Workshop on Big-Graphs Online Querying (Big-O(Q), 2015
Google Scholar
E.L. Goodman, D. Grunwald, Using vertex-centric programming platforms to implement SPARQL queries on large graphs, in Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’14 (IEEE Press, Piscataway, 2014), pp. 25–32
Google Scholar
H. Naacke, O. Curé, B. Amann, SPARQL query processing with Apache Spark (2016). CoRR, abs/1604.08903
Google Scholar
K. Zeng, J. Yang, H. Wang, B. Shao, Z. Wang, A distributed graph engine for web scale RDF data, in Proceedings of the 39th International Conference on Very Large Data Bases. VLDB Endowment (2013), pp. 265–276
Google Scholar
P. Stutz, M. Verman, L. Fischer, A. Bernstein, TripleRush: a fast and scalable triple store, in SSWS@ ISWC (2013), pp. 50–65
Google Scholar
P. Stutz, B. Paudel, M. Verman, A. Bernstein, Random walk TripleRush: asynchronous graph querying and sampling, in Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, 18–22 May 2015, pp. 1034–1044
Google Scholar
P. Stutz, A. Bernstein, W. Cohen, Signal/collect: graph algorithms for the (semantic) web, in International Semantic Web Conference (Springer, Berlin, 2010), pp. 764–780
Google Scholar
R. Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB Endowment 8(12), 1848–1851 (2015)
Google Scholar
R. Al-Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Y. Ebrahim, M. Sahli, Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)
Google Scholar
S. Gurajada, S. Seufert, I. Miliaraki, M. Theobald, TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing, in International Conference on Management of Data, SIGMOD 2014, Snowbird, 22–27 June 2014, pp. 289–300
Google Scholar
L. Galárraga, K. Hose, R. Schenkel, Partout: a distributed engine for efficient RDF processing, in 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April 2014, Companion Volume, pp. 267–268
Google Scholar
T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
Google Scholar
M. Hammoud, D.A. Rabbou, R. Nouri, S.-M.-R. Beheshti, S. Sakr, DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endowment 8(6), 654–665 (2015)
Google Scholar
A. Hasan, M. Hammoud, R. Nouri, S. Sakr, DREAM in action: a distributed and adaptive RDF system on the cloud, in Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, 11–15 April 2016, Companion Volume, pp. 191–194
Google Scholar
L. Cheng, S. Kotoulas, Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1(4), 138–150 (2015)
Google Scholar
M. Wylot, P. Cudré-Mauroux, DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)
Google Scholar
P. Zikopoulos, C. Eaton et al., Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data (McGraw-Hill Osborne Media, New York, 2011)
Google Scholar
K. Ashton et al., That ‘Internet of things’ thing. RFID J. 22(7), 97–114 (2009)
Google Scholar
N. Marz, J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems (Manning Publications Co., Shelter Island, 2015)
Google Scholar
T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, K. Elmeleegy, R. Sears, MapReduce online, in NSDI, 2010
Google Scholar
T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, J. Gerth, J. Talbot, K. Elmeleegy, R. Sears, Online aggregation and continuous query support in MapReduce, in SIGMOD, 2010
Google Scholar
D. Logothetis, K. Yocum, Ad-hoc data processing in the cloud. Proc. VLDB Endowment 1(2), 1472–1475 (2008)
Google Scholar
P. Bhatotia, A. Wieder, R. Rodrigues, U.A. Acar, R. Pasquini, Incoop: MapReduce for incremental computations, in SOCC, 2011
Google Scholar
A.M. Aly, A. Sallam, B.M. Gnanasekaran, L.-V. Nguyen-Dinh, W.G. Aref, M. Ouzzaniy, A. Ghafoor, M3: stream processing on main-memory MapReduce, in ICDE, 2012
Google Scholar
V. Kumar, H. Andrade, B. Gedik, K.-L. Wu, DEDUCE: at the intersection of MapReduce and stream processing, in EDBT (2010), pp. 657–662
Google Scholar
S. Sakr, An introduction to InfoSphere Streams: a platform for analyzing big data in motion. IBM DeveloperWorks, 2013. http://www.ibm.com/developerworks/library/bd-streamsintro/index.html
S. Loesing, M. Hentschel, T. Kraska, D. Kossmann, Stormy: an elastic and highly available streaming service in the cloud, in EDBT/ICDT Workshops, 2012
Google Scholar
H. Balakrishnan, M. Frans Kaashoek, D.R. Karger, R. Morris, I. Stoica, Looking up data in p2p systems. Commun. ACM 46(2), 43–48 (2003)
Google Scholar
L. Neumeyer, B. Robbins, A. Nair, A. Kesari, S4: distributed stream computing platform, in ICDMW, 2010
Google Scholar
B. Gedik, H. Andrade, K.-L. Wu, P.S. Yu, M. Doo, SPADE: the system S declarative stream processing engine, in SIGMOD, 2008
Google Scholar
M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, M. Zaharia, Structured streaming: a declarative API for real-time applications in Apache Spark, in SIGMOD, 2018
Google Scholar
J. Kreps, N. Narkhede, J. Rao et al., Kafka: a distributed messaging system for log processing, in Proceedings of the NetDB, 2011
Google Scholar
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J.M. Patel, K. Ramasamy, S. Taneja, Twitter Heron: stream processing at scale, in SIGMOD, 2015
Google Scholar
G. De Francisci Morales, A. Bifet, Samoa: scalable advanced massive online analysis. J. Mach. Learn. Res. 16(1), 149–153 (2015)
Google Scholar
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, U. Srivastava, Building a highlevel dataflow system on top of MapReduce: the Pig experience. Proc. VLDB Endowment 2(2), 1414–1425 (2009)
Google Scholar
A. Gates, Programming Pig (O’Reilly Media, Sebastopol, 2011)
Google Scholar
C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: easy, efficient data-parallel pipelines, in PLDI, 2010
Google Scholar
D. Wu, L. Zhu, X. Xu, S. Sakr, D. Sun, Q. Lu, A pipeline framework for heterogeneous execution environment of big data processing. IEEE Softw. 33, 60–67 (2016)
Google Scholar
R. Elshawi, S. Sakr, D. Talia, P. Trunfio, Big data systems meet machine learning challenges: towards big data science as a service. Big Data Res. 14, 1–11 (2018)
Google Scholar
D. Michie, D.J. Spiegelhalter, C.C. Taylor et al., Machine Learning. Neural and Statistical Classification, vol. 13 (Ellis Horwood, London, 1994)
Google Scholar
S. Owen, Mahout in Action (Manning Publications Co., Shelter Island, 2012)
Google Scholar
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D.B. Tsai, M. Amde, S. Owen et al., MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar
M. Stonebraker, P. Brown, A. Poliakov, S. Raman, The architecture of SciDB, in International Conference on Scientific and Statistical Database Management (Springer, Berlin, 2011), pp. 1–16
Google Scholar
X. Li, B. Cui, Y. Chen, W. Wu, C. Zhang, MLog: towards declarative in-database machine learning. Proc. VLDB Endowment 10(12), 1933–1936 (2017)
Google Scholar
P.G. Brown, Overview of SciDB: large scale array storage, processing and analysis, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 963–968
Google Scholar
J.M. Hellerstein, C. Ré, F. Schoppmann, D.Z. Wang, E. Fratkin, A. Gorajek, K.S. Ng, C. Welton, X. Feng, K. Li et al., The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endowment 5(12), 1700–1711 (2012)
Google Scholar
S. Das, Y. Sismanis, K.S. Beyer, R. Gemulla, P.J. Haas, J. McPherson, Ricardo: integrating R and Hadoop, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 987–998
Google Scholar
S. Venkataraman, Z. Yang, D. Liu, E. Liang, H. Falaki, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica et al., SparkR: scaling R programs with Spark, in Proceedings of the 2016 International Conference on Management of Data (ACM, New York, 2016), pp. 1099–1104
Google Scholar
S. Leo, G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (ACM, New York, 2010), pp. 819–825
Google Scholar
AzureML Team. AzureML: anatomy of a machine learning service, in Conference on Predictive APIs and Apps (2016), pp. 1–13
Google Scholar
B. Huang, S. Babu, J. Yang, Cumulon: optimizing statistical data analysis in the cloud, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2013), pp. 1–12
Google Scholar
M. Boehm, M.W. Dusenberry, D. Eriksson, A.V. Evfimievski, F.M. Manshadi, N. Pansare, B. Reinwald, F.R. Reiss, P. Sen, A.C. Surve et al., SystemML: declarative machine learning on Spark. Proc. VLDB Endowment 9(13), 1425–1436 (2016)
Google Scholar
S. Schelter, A. Palumbo, S. Quinn, S. Marthi, A. Musselman, Samsara: declarative machine learning on distributed dataflow systems, in NIPS Workshop ML Systems, 2016
Google Scholar
T. Kraska, A. Talwalkar, J.C. Duchi, R. Griffith, M.J. Franklin, M.I. Jordan, MLbase: a distributed machine-learning system, in CIDR, 2013
Google Scholar
M. Weimer, T. Condie, R. Ramakrishnan et al., Machine learning in ScalOps, a higher order cloud computing language, in NIPS 2011 Workshop on Parallel and Large-Scale Machine Learning (BigLearn), vol. 9 (2011), pp. 389–396
Google Scholar
V. Borkar, M. Carey, R. Grover, N. Onose, R. Vernica, Hyracks: a flexible and extensible foundation for data-intensive computing, in 2011 IEEE 27th International Conference on Data Engineering (IEEE, Piscataway, 2011), pp. 1151–1162
Google Scholar
E.R. Sparks, S. Venkataraman, T. Kaftan, M.J. Franklin, B. Recht, Keystoneml: optimizing pipelines for large-scale advanced analytics, in 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (IEEE, Piscataway, 2017), pp. 535–546
Google Scholar
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)
MATH Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
R. Collobert et al., Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Google Scholar
Y. Bengio et al., Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
MATH Google Scholar
M. Abadi et al., TensorFlow: a system for large-scale machine learning, in OSDI, vol. 16 (2016), pp. 265–283
Google Scholar
D. Baylor, E. Breck, H.-T. Cheng, N. Fiedel, C.Y. Foo, Z. Haque, S. Haykal, M. Ispir, V. Jain, L. Koc et al., TFX: a TensorFlow-based production-scale machine learning platform, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2017), pp. 1387–1395
Google Scholar
J. Bergstra et al., Theano: a CPU and GPU math compiler in Python, in Proceedings of 9th Python in Science Conference, vol. 1, 2010
Google Scholar
T. Chen et al., MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). Preprint. arXiv:1512.01274
Google Scholar
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch (2017)
Google Scholar
S. Tokui, K. Oono, S. Hido, J. Clayton, Chainer: a next-generation open source framework for deep learning, in NIPS Workshops, 2015
Google Scholar
S. Lohr, The age of big data. New York Times, 11, 2012
Google Scholar
V. Mayer-Schönberger, K. Cukier, Big Data: A Revolution that Will Transform How We Live, Work, and Think (Houghton Mifflin Harcourt, Boston, 2013)
Google Scholar
H.E. Schaffer, X as a service, cloud computing, and the need for good judgment. IT Prof. 11(5), 4–5 (2009)
Google Scholar
D. Delen, H. Demirkan, Data, information and analytics as services. Decis. Support Syst. 55(1), 359–363 (2013)
Google Scholar
M. Baker, Data science: industry allure. Nature 520, 253–255 (2015)
Google Scholar
F. Provost, T. Fawcett, Data science and its relationship to big data and data-driven decision making. Big Data 1(1), 51–59 (2013)
Google Scholar
A. Labrinidis, H.V. Jagadish, Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)
Google Scholar
H.V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J.M. Patel, R. Ramakrishnan, C. Shahabi, Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)
Google Scholar
D. Abadi, S. Babu, F. Ozcan, I. Pandis, SQL-on-Hadoop systems. Proc. VLDB Endowment 8(12), 2050–2061 (2015)
Google Scholar
S. Sakr, S. Elnikety, Y. He, G-SPARQL: a hybrid engine for querying large attributed graphs, in CIKM (2012), pp. 335–344
Google Scholar
Y. Guo, A.L. Varbanescu, A. Iosup, C. Martella, T.L. Willke, Benchmarking graph-processing platforms: a vision, in ICPE, 2014
Google Scholar
A. Barnawi, O. Batarfi, S.-M.-R. Beheshti, R. El Shawi, A.G. Fayoumi, R. Nouri, S. Sakr, On characterizing the performance of distributed graph computation platforms, in TPCTC, 2014
Google Scholar
O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, S. Sakr, Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)
Google Scholar
M. Han, K. Daudjee, K. Ammar, M. Tamer Özsu, X. Wang, T. Jin, An experimental comparison of Pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)
Google Scholar
Y. Lu, J. Cheng, D. Yan, H. Wu, Large-scale distributed graph computing systems: an experimental evaluation. Proc. VLDB Endowment 8(3), 281–292 (2014)
Google Scholar
Y. Guo, M. Biczak, A.L. Varbanescu, A. Iosup, C. Martella, T.L. Willke, How well do graph-processing platforms perform? An empirical performance evaluation and analysis, in IPDPS, 2014
Google Scholar
M. Li, J. Tan, Y. Wang, L. Zhang, V. Salapura, SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark, in Proceedings of the 12th ACM International Conference on Computing Frontiers, CF’15, Ischia, 18–21 May 2015, pp. 53:1–53:8
Google Scholar
M. Capota, T. Hegeman, A. Iosup, A. Prat-Pérez, O. Erling, P.A. Boncz, Graphalytics: a big data benchmark for graph-processing platforms, in Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, GRADES 2015, Melbourne, 31 May–4 June 2015, pp. 7:1–7:6
Google Scholar
O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, A. Barnawi, S. Sakr et al., Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)
Google Scholar
V. Aluko, S. Sakr, Big SQL systems: an experimental evaluation. Clust. Comput. 22(4), 1347–1377 (2019)
Google Scholar
N. Mahmoud, Y. Essam, R. El Shawi, S. Sakr, DLBench: an experimental evaluation of deep learning frameworks, in 2019 IEEE International Congress on Big Data, BigData Congress 2019, Milan, 8–13 July 2019, pp. 149–156
Google Scholar
E. Shahverdi, A. Awad, S. Sakr, Big stream processing systems: an experimental evaluation, in 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW) (IEEE, Piscataway, 2019), pp. 53–60
Google Scholar
I. Gog, M. Schwarzkopf, N. Crooks, M.P. Grosvenor, A. Clement, S. Hand, Musketeer: all for one, one for all in data processing systems, in EuroSys (2015), pp. 2:1–2:16
Google Scholar
D. Agrawal, M. Lamine Ba, L. Berti-Equille, S. Chawla, A. Elmagarmid, H. Hammady, Y. Idris, Z. Kaoudi, Z. Khayyat, S. Kruse, M. Ouzzani, P. Papotti, J.-A. Quian-Ruiz, N. Tang, M.J. Zaki, Rheem: enabling multi-platform task execution, in SIGMOD Conference, 2016
Google Scholar
N. Huijboom, T. Van den Broek, Open data: an international comparison of strategies. Eur. J. ePractice 12(1), 4–16 (2011)
Google Scholar
M. Balazinska, B. Howe, D. Suciu, Data markets in the cloud: an opportunity for the database community. Proc. VLDB Endowment 4(12), 1482–1485 (2011)
Google Scholar
R. El Shawi, M. Maher, S. Sakr, Automated machine learning: state-of-the-art and open challenges (2019). CoRR, abs/1906.02287
Google Scholar
H. Miao, A. Li, L.S. Davis, A. Deshpande, ModelHub: deep learning lifecycle management, in 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (IEEE, Piscataway, 2017), pp. 1393–1394
Google Scholar
M. Vartak, H. Subramanyam, W.-E. Lee, S. Viswanathan, S. Husnoo, S. Madden, M. Zaharia, Model DB: a system for machine learning model management, in Proceedings of the Workshop on Human-In-the-Loop Data Analytics (ACM, New York, 2016), p. 14
Google Scholar
P. Bailis, K. Olukotun, C. Ré, M. Zaharia, Infrastructure for usable machine learning: the Stanford DAWN Project (2017). Preprint. arXiv:1705.07538
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr

Authors

Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sakr, S. (2020). Large-Scale Processing Systems of Structured Data. In: Big Data 2.0 Processing Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-44187-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-44187-6_3
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44186-9
Online ISBN: 978-3-030-44187-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics