Abstract
Data collection and analysis is rapidly changing the way scientific, national security and business communities operate. Data analytics applications, especially the ones involving graph analytics have received increased attention over the years. Moreover, with this increasing interest in graph processing, the diversity of the graph datasets and the graph processing algorithms has also increased. There has been a similar explosion in the design and development of the big data platforms to manage, store, process, and analyze large-scale graph datasets. Although these platforms have gained unquestionable success, it is currently difficult to decide on choosing a platform for deploying big data applications, due to a lack of comprehensive understanding of the performance and the design tradeoffs of these platforms in terms of handling both real-world workloads and resource failures. In this chapter, we will be surveying the load balancing and fault tolerance strategies employed by the most dominant graph database platforms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10, 10–10 (2010)
Cattell, R.: Scalable sql and nosql data stores. ACM SIGMOD Rec. 39(4), 12–27 (2011)
Zikopoulos, P., Eaton, C., et al.: Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media (2011)
Tanenbaum, A.S., Van Steen, M.: Distributed systems: principles and paradigms, Vol. 2. Prentice hall Englewood Cliffs (2002)
Wang, P., Zhang, K., Chen, R., Chen, H., Guan, H.: Replication-based fault-tolerance for large-scale graph processing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 562–573. IEEE (2014)
Power, R., Li, J.: Piccolo: Building fast, distributed programs with partitioned tables. OSDI 10, 1–14 (2010)
Leavitt, N.: Will nosql databases live up to their promise? Computer 43(2), 12–14 (2010)
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33(2), 51–59 (2002)
Titan: Titan graph database. http://thinkaurelius.github.io/titan/
OrientDB: Orientdb graph database. http://orientdb.com/orientdb
ArangoDB: Arangodb nosql database. https://www.arangodb.com/
Giraph: Apache giraph. http://giraph.apache.org/
Neo4j: Neo4j graph database. http://neo4j.com/
Cassandra, A.: Apache Cassandra (2013)
Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., et al.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1071–1080. ACM (2011)
Oracle Berkeley, D.: Java edition (2008)
Dede, E., Sendir, B., Kuzlu, P., Hartog, J., Govindaraju, M.: An evaluation of cassandra for hadoop. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 494–501. IEEE (2013)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2), 35–40 (2010)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17(1), 94–162 (1992)
Tesoriero, C.: Getting Started with OrientDB. Packt Publishing Ltd (2013)
Gray, J., Reuter, A.: Transaction Processing: Soncepts and Techniques. Elsevier (1992)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, Vol. 8, p. 9 (2010)
Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 169–182. ACM (2013)
Sakr, S.: Processing large-scale graph data: A guide to current technology. IBM Developerworks, p. 15 (2013)
Schelter, S.: Large scale graph processing with apache giraph. Invited talk at GameDuell Berlin 29th May (2012)
ArangoDB: Arangodb white paper sharding. https://www.arangodb.com/documents/
Store, R.K.V.: Reliable key-value store, etcd. https://coreos.com/etcd/docs/latest/
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 305–319 (2014)
Lamport, L., et al.: Paxos made simple. ACM Sigact News 32(4), 18–25 (2001)
Webber, J.: A programmatic introduction to neo4j. In: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, pp. 217–218. ACM (2012)
Tarreau, W.: Haproxy-the reliable, high-performance tcp/http load balancer (2012)
Montag, D.: Understanding neo4j Scalability. White Paper, Neotechnology (2013)
Rao, J., Shekita, E.J., Tata, S.: Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endowment 4(4), 243–254 (2011)
Partner, J., Vukotic, A., Watt, N., Abedrabbo, T., Fox, D.: Neo4j in Action. Manning Publications Company (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Sukhija, N., Morari, A., Banicescu, I. (2016). Load Balancing and Fault Tolerance Mechanisms for Scalable and Reliable Big Data Analytics. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-44881-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)