Skip to main content

Big Data Storage and Data Models

  • Chapter
  • First Online:
Handbook of Big Data Technologies

Abstract

Data and storage models are the basis for big data ecosystem stacks. While storage model captures the physical aspects and features for data storage, data model captures the logical representation and structures for data processing and management. Understanding storage and data model together is essential for understanding the built-on big data ecosystems. In this chapter we are going to investigate and compare the key storage and data models in the spectrum of big data frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Sakr, M. Medhat Gaber (eds.), Large Scale and Big Data - Processing and Management (Auerbach Publications, Boston, 2014)

    Google Scholar 

  2. S. Sakr, A. Liu, A.G. Fayoumi, The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)

    Article  Google Scholar 

  3. J. Satran, K. Meth, Internet small computer systems interface (iscsi) (2004)

    Google Scholar 

  4. SCSI Protocol. Information technologyscsi architecture model5 (sam-5). INCITS document, 10

    Google Scholar 

  5. S. Hopkins, B. Coile, Aoe (ata over ethernet). The Brantley Coile Company, Inc., Technical report AoEr11, 2009

    Google Scholar 

  6. ATA Serial. High-speed serialized at attachment. Serial ATA working group, available at www.sata-io.org (2001)

  7. EBS Amazon. Elastic block store has launched all things distributed (2008). https://aws.amazon.com/ebs/

  8. EC2 Amazon. Amazon elastic compute cloud (amazon ec2), Amazon Elastic Compute Cloud (Amazon EC2) (2010)

    Google Scholar 

  9. RDS Amazon. Amazon relational database service (amazon rds). https://aws.amazon.com/rds/. Accessed 27 Feb 2016

  10. S. Sivasubramanian, Amazon dynamodb: a seamlessly scalable non-relational database service. in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2012), pp. 729–730

    Google Scholar 

  11. Amazon. Amazon cloudsearch service. https://aws.amazon.com/cloudsearch/. Accessed 27 Feb 2016

  12. O. Sefraoui, M. Aissaoui, M. Eleuldj, Openstack: toward an open-source solution for cloud computing. Intern. J. Comput. Appl. 55(3), 38–42 (2012)

    Google Scholar 

  13. K. Pepple, Openstack nova architecture. Viitattu 25, 2012 (2011)

    Google Scholar 

  14. OpenStack. Openstack block storage cinder. https://wiki.openstack.org/wiki/Cinder. Accessed 27 Feb 2016

  15. K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system. in IEEE MSST (2010)

    Google Scholar 

  16. S. Sakr, Big Data 2.0 Processing Systems (Springer, Switzerland, 2016)

    Book  Google Scholar 

  17. K. Goda, Network attached secure device. in Encyclopedia of Database Systems (Springer, New York, 2009), pp. 1899–1900

    Google Scholar 

  18. S3 Amazon. Amazon simple storage service(amazon s3). https://aws.amazon.com/s3/. Accessed 27 Feb 2016

  19. Azure Microsoft. Microsoft azure: Cloud computing platform and services. https://azure.microsoft.com. Accessed 27 Feb 2016

  20. Atoms EMC. Atmos - cloud storage, big data - emc. http://australia.emc.com/storage/atmos/atmos.htm. Accessed 27 Feb 2016

  21. Swift OpenStack. Openstack swift - enterprise storage from swiftstack. https://www.swiftstack.com/openstack-swift/. Accessed 27 Feb 2016

  22. E.A. Brewer, Towards robust distributed systems. in Proceedings of the PODC, vol. 7 (2000)

    Google Scholar 

  23. J. Gray et al., The transaction concept: virtues and limitations. in Proceedings of the VLDB, vol. 81 (1981), pp. 144–154

    Google Scholar 

  24. A.B. MySQL, MySQL: The World’s Most Popular Open Source Database (MySQL AB, 1995)

    Google Scholar 

  25. K. Loney, Oracle Database 10g: The Complete Reference (McGraw-Hill/Osborne, London, 2004)

    Google Scholar 

  26. Microsoft. Sql server 2014. https://www.microsoft.com/en-au/server-cloud/products/sql-server/overview.aspx. Accessed 27 Feb 2016

  27. PostgreSQL Datatype. Postgresql: the world’s most advanced open source database. http://www.postgresql.org. Accessed 27 Feb 2016

  28. D. Pritchett, Base: an acid alternative. Queue 6(3), 48–55 (2008)

    Article  Google Scholar 

  29. J. Zawodny, Redis: lightweight key/value store that goes the extra mile. Linux Mag. 79, (2009)

    Google Scholar 

  30. B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004(124), 5 (2004)

    Google Scholar 

  31. MongoDB Inc. Mongodb for giant ideas. https://www.mongodb.org/. Accessed 27 Feb 2016

  32. Apache. Apache couchdb. http://couchdb.apache.org/. Accessed 27 Feb 2016

  33. P.A. Bernstein, N. Goodman, Concurrency control in distributed database systems. ACM Comput. Surv. (CSUR) 13(2), 185–221 (1981)

    Google Scholar 

  34. F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  35. S. Ghemawat, H. Gobioff, S.-T. Leung, The google file system. in ACM SIGOPS Operating Systems Review, vol. 37 (ACM, Bolton Landing, 2003), pp. 29–43

    Google Scholar 

  36. L. George, HBase: The Definitive Guide (O’Reilly Media, Inc., Sebastopol, 2011)

    Google Scholar 

  37. P. Hunt, M. Konar, F.P. Junqueira, B. Reed, Zookeeper: wait-free coordination for internet-scale systems. in USENIX Annual Technical Conference, vol. 8 (2010), p. 9

    Google Scholar 

  38. A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  39. M. Ronstrom, L. Thalmann, Mysql cluster architecture overview. MySQL Technical White Paper (2004)

    Google Scholar 

  40. M. Stonebraker, A. Weisberg, The voltdb main memory dbms. IEEE Data Eng. Bull. 36(2), 21–27 (2013)

    Google Scholar 

  41. A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, C. Bear, The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  42. F. Fernández de Vega, E. Cantú-Paz, Parallel and Distributed Computational Intelligence, vol. 269 (Springer, Berlin, 2010)

    MATH  Google Scholar 

  43. Microsoft. Sql database - relational database service. https://azure.microsoft.com/en-us/services/sql-database/. Accessed 27 Feb 2016

  44. Google. Cloud sql - mysql relational database. https://cloud.google.com/sql/. Accessed 27 Feb 2016

  45. Xeround. Xeround. https://en.wikipedia.org/wiki/Xeround. Accessed 27 Feb 2016

  46. EnterpriseDB. Enterprisedb - the postgres database company. https://www.enterprisedb.com. Accessed 27 Feb 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongyao Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wu, D., Sakr, S., Zhu, L. (2017). Big Data Storage and Data Models. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49340-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49339-8

  • Online ISBN: 978-3-319-49340-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics