Abstract
The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, enabling the provider to be competitive in the market. Heterogeneity of nodes, and the need for frequent expansion and reconfiguration of the subsystems fostered the development of efficient approaches that replace traditional data replication, by exploiting more advanced techniques, such the ones that leverage erasure codes. In this paper we use an ad-hoc discrete event simulation approach to study the performances of replication and erasure coding with different parametric configurations, aiming at the minimization of overheads while obtaining the desired reliability. The approach is demonstrated with a practical application to the erasure coding plugins of the increasingly popular CEPH distributed file system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilera, M., Janakiraman, R., Xu, L.: Using erasure codes efficiently for storage in a distributed system. In: Proceedings. International Conference on Dependable Systems and Networks, 2005. DSN 2005, pp. 336–345 (2005)
Barbierato, E., Gribaudo, M., Iacono, M.: Modeling apache hive based applications in big data architectures. In: Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools, pp. 30–38. ValueTools’13, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium (2013)
Barbierato, E., Gribaudo, M., Iacono, M.: A performance modeling language for big data architectures. In: Rekdalsbakken, W., Bye, R.T., Zhang, H. (eds.) ECMS, pp. 511–517. European Council for Modeling and Simulation (2013)
Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of NoSQL big-data applications using multi-formalism models. Future Gen. Comput. Syst. 37, 345–353 (2014)
Barbierato, E., Gribaudo, M., Iacono, M.: Modeling and evaluating the effects of big data storage resource allocation in global scale cloud architectures. Int. J. Data Warehousing Min. (2015)
Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Exploiting mean field analysis to model performances of big data architectures. Future Gen. Comput. Syst. 37, 203–211 (2014)
Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Software: Practice and Experience (2014)
Cerotti, D., Gribaudo, M., Iacono, M., Piazzolla, P.: Modeling and analysis of performances for concurrent multithread applications on multicore and graphics processing unit systems. Concurrency and Computation: Practice and Experience (2015)
Dandoush, A., Alouf, S., Nain, P.: Simulation analysis of download and recovery processes in p2p storage systems. In: 21st International Teletraffic Congress, 2009. ITC 21 2009, pp. 1–8 (2009)
Esposito, C., Ficco, M., Palmieri, F., Castiglione, A.: Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Transac. Comput. PP(99), 1–1 (2015)
Friedman, R., Kantor, Y., Kantor, A.: Replicated erasure codes for storage and repair-traffic efficiency. In: 14th IEEE International Conference on Peer-to-Peer Computing, P2P 2014, London, United Kingdom, September 9–11, 2014, Proceedings, pp. 1–10 (2014)
Gribaudo, M., Iacono, M., Manini, D.: Improving reliability and performances in large scale distributed applications with erasure codes and replication. Future Generation Computer Systems (2015)
Kameyama, H., Sato, Y.: Erasure codes with small overhead factor and their distributed storage applications. In: 41st Annual Conference on Information Sciences and Systems, 2007. CISS’07, pp. 80–85 (2007)
Kolodziej, J., Burczynski, T., Zomaya, A.Y.: A note on energy efficient data, services and memory management in big data information systems. Inform. Sci. 319, 69–70 (2015), energy Efficient Data, Services and Memory Management in Big Data Information Systems
Lian, Q., Chen, W., Zhang, Z.: On the impact of replica placement to the reliability of distributed brick storage systems. In: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, 2005, ICDCS 2005, pp. 187–196 (2005)
Plank, J.S.: A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Softw. Pract. Exper. 27(9), 995–1012 (1997)
Rodrigues, R., Liskov, B.: High availability in dhts: Erasure coding vs. replication. In: 4th International Workshop on Peer-to-Peer Systems IV, IPTPS 2005. Ithaca, New York (Feb 2005)
Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A.G., Vadali, R., Chen, S., Borthakur, D.: Xoring elephants: novel erasure codes for big data. In: Proceedings of the 39th International Conference on Very Large Data Bases. pp. 325–336. PVLDB’13, VLDB Endowment (2013)
Sfrent, A., Pop, F.: Asymptotic scheduling for many task computing in big data platforms. Inform. Sci. 319, 71–91 (2015), energy Efficient Data, Services and Memory Management in Big Data Information Systems
Simon, V., Monnet, S., Feuillet, M., Robert, P., Sens, P.: SPLAD: scattering and placing data replicas to enhance long-term durability. Rapport de recherche RR-8533, INRIA (2014), http://hal.inria.fr/hal-00988374
Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V., KoÅ‚odziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gen. Comput. Syst. 51, 61–71 (2015), special Section: A Note on New Trends in Data-Aware Scheduling and Resource Provisioning in Modern {HPC} Systems
Weatherspoon, H., Kubiatowicz, J.: Erasure coding versus replication: a quantitative comparison. In: Revised Papers from the First International Workshop on Peer-to-Peer Systems, pp. 328–338. IPTPS’01, Springer, London (2002)
Weil, S.A., Leung, A.W., Brandt, S.A., Maltzahn, C.: RADOS: a Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters, http://ceph.com/papers/weil-rados-pdsw07.pdf
Wu, F., Qiu, T., Chen, Y., Chen, G.: Redundancy schemes for high availability in dhts. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds.) ISPA. Lecture Notes in Computer Science, vol. 3758, pp. 990–1000. Springer (2005)
Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasurecoded data center storage. SIGMETRICS Perform. Eval. Rev. 42(2), 3–14 (2014)
Xu, L., Cipar, J., Krevat, E., Tumanov, A., Gupta, N., Kozuch, M.A., Ganger, G.R.: Agility and performance in elastic distributed storage. Trans. Storage 10(4), 16:1–16:27 (2014)
Yan, F., Riska, A., Smirni, E.: Fast eventual consistency with performance guarantees for distributed storage. In: 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW), 2012. pp. 23–28 (June 2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Manini, D., Gribaudo, M., Iacono, M. (2016). Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPH. In: Caporarello, L., Cesaroni, F., Giesecke, R., Missikoff, M. (eds) Digitally Supported Innovation. Lecture Notes in Information Systems and Organisation, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-319-40265-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-40265-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40264-2
Online ISBN: 978-3-319-40265-9
eBook Packages: Business and ManagementBusiness and Management (R0)