Load Balancing and Fault Tolerance Mechanisms for Scalable and Reliable Big Data Analytics

Sukhija, Nitin; Morari, Alessandro; Banicescu, Ioana

doi:10.1007/978-3-319-44881-7_10

Load Balancing and Fault Tolerance Mechanisms for Scalable and Reliable Big Data Analytics

Nitin Sukhija⁵,
Alessandro Morari⁶ &
Ioana Banicescu⁷

Chapter
First Online: 28 October 2016

1555 Accesses
1 Citations

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

Data collection and analysis is rapidly changing the way scientific, national security and business communities operate. Data analytics applications, especially the ones involving graph analytics have received increased attention over the years. Moreover, with this increasing interest in graph processing, the diversity of the graph datasets and the graph processing algorithms has also increased. There has been a similar explosion in the design and development of the big data platforms to manage, store, process, and analyze large-scale graph datasets. Although these platforms have gained unquestionable success, it is currently difficult to decide on choosing a platform for deploying big data applications, due to a lack of comprehensive understanding of the performance and the design tradeoffs of these platforms in terms of handling both real-world workloads and resource failures. In this chapter, we will be surveying the load balancing and fault tolerance strategies employed by the most dominant graph database platforms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Article Google Scholar
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10, 10–10 (2010)
Google Scholar
Cattell, R.: Scalable sql and nosql data stores. ACM SIGMOD Rec. 39(4), 12–27 (2011)
Article Google Scholar
Zikopoulos, P., Eaton, C., et al.: Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media (2011)
Google Scholar
Tanenbaum, A.S., Van Steen, M.: Distributed systems: principles and paradigms, Vol. 2. Prentice hall Englewood Cliffs (2002)
Google Scholar
Wang, P., Zhang, K., Chen, R., Chen, H., Guan, H.: Replication-based fault-tolerance for large-scale graph processing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 562–573. IEEE (2014)
Google Scholar
Power, R., Li, J.: Piccolo: Building fast, distributed programs with partitioned tables. OSDI 10, 1–14 (2010)
Google Scholar
Leavitt, N.: Will nosql databases live up to their promise? Computer 43(2), 12–14 (2010)
Article Google Scholar
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33(2), 51–59 (2002)
Article Google Scholar
Titan: Titan graph database. http://thinkaurelius.github.io/titan/
OrientDB: Orientdb graph database. http://orientdb.com/orientdb
ArangoDB: Arangodb nosql database. https://www.arangodb.com/
Giraph: Apache giraph. http://giraph.apache.org/
Neo4j: Neo4j graph database. http://neo4j.com/
Cassandra, A.: Apache Cassandra (2013)
Google Scholar
Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., et al.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1071–1080. ACM (2011)
Google Scholar
Oracle Berkeley, D.: Java edition (2008)
Google Scholar
Dede, E., Sendir, B., Kuzlu, P., Hartog, J., Govindaraju, M.: An evaluation of cassandra for hadoop. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 494–501. IEEE (2013)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2), 35–40 (2010)
Article Google Scholar
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17(1), 94–162 (1992)
Article Google Scholar
Tesoriero, C.: Getting Started with OrientDB. Packt Publishing Ltd (2013)
Google Scholar
Gray, J., Reuter, A.: Transaction Processing: Soncepts and Techniques. Elsevier (1992)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, Vol. 8, p. 9 (2010)
Google Scholar
Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 169–182. ACM (2013)
Google Scholar
Sakr, S.: Processing large-scale graph data: A guide to current technology. IBM Developerworks, p. 15 (2013)
Google Scholar
Schelter, S.: Large scale graph processing with apache giraph. Invited talk at GameDuell Berlin 29th May (2012)
Google Scholar
ArangoDB: Arangodb white paper sharding. https://www.arangodb.com/documents/
Store, R.K.V.: Reliable key-value store, etcd. https://coreos.com/etcd/docs/latest/
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 305–319 (2014)
Google Scholar
Lamport, L., et al.: Paxos made simple. ACM Sigact News 32(4), 18–25 (2001)
Google Scholar
Webber, J.: A programmatic introduction to neo4j. In: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, pp. 217–218. ACM (2012)
Google Scholar
Tarreau, W.: Haproxy-the reliable, high-performance tcp/http load balancer (2012)
Google Scholar
Montag, D.: Understanding neo4j Scalability. White Paper, Neotechnology (2013)
Google Scholar
Rao, J., Shekita, E.J., Tata, S.: Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endowment 4(4), 243–254 (2011)
Article Google Scholar
Partner, J., Vukotic, A., Watt, N., Abedrabbo, T., Fox, D.: Neo4j in Action. Manning Publications Company (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Slippery Rock University of Pennsylvania, 275 Advanced Technology & Science Hall Slippery Rock University, Slippery Rock, Pennsylvania, 16057, USA
Nitin Sukhija
Pacific Northwest National Laboratory, Richland, USA
Alessandro Morari
Mississippi State University, Starkville, USA
Ioana Banicescu

Authors

Nitin Sukhija
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Morari
View author publications
You can also search for this author in PubMed Google Scholar
Ioana Banicescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitin Sukhija .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Florin Pop
Cracow University of Technology, Cracow, Poland
Joanna Kołodziej
Second University of Naples, Naples, Caserta, Italy
Beniamino Di Martino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sukhija, N., Morari, A., Banicescu, I. (2016). Load Balancing and Fault Tolerance Mechanisms for Scalable and Reliable Big Data Analytics. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-44881-7_10
Published: 28 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics