ABSTRACT
The current exponential growth of data calls for massive-scale capabilities of storage and processing. Such large volumes of data tend to disallow their centralized storage and processing making extensive and flexible data partitioning unavoidable. This is being acknowledged by several major Internet players embracing the Cloud computing model and offering first generation remote storage services with simple processing capabilities.
In this position paper we present preliminary ideas for the architecture of a flexible, efficient and dependable fully decentralized object store able to manage very large sets of variable size objects and to coordinate in place processing. Our target are local area large computing facilities composed of tens of thousands of nodes under the same administrative domain. The system should be capable of leveraging massive replication of data to balance read scalability and fault tolerance.
- Inc Amazon.com. Amazon simpledb. http://aws.amazon.com/simpledb/, 2008.Google Scholar
- Nuno Carvalho, Jose Pereira, Rui Oliveira, and Luis Rodrigues. Emergent structure in unstructured epidemic multicast. In DSN '07: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 481--490, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 205--218, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, 2008. Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: amazon's highly available key-value store. In SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. SIGOPS Oper. Syst. Rev., 37(5):29--43, 2003. Google ScholarDigital Library
- Google. Google app engine datastore. http://code.google.com/appengine/docs/datastore/, 2008.Google Scholar
- Anjali Gupta, Barbara Liskov, and Rodrigo Rodrigues. Efficient routing for peer-to-peer overlays. In First Symposium on Networked Systems Design and Implementation (NSDI), San Francisco, CA, March 2004. Google ScholarDigital Library
- Márk Jelasity and Ozalp Babaoglu. T-man: Gossip-based overlay topology management. In In 3rd Int. Workshop on Engineering Self-Organising Applications (ESOA'05), pages 1--15. Springer-Verlag, 2005. Google ScholarDigital Library
- Máark Jelasity, Alberto Montresor, and Ozalp Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219--252, 2005. Google ScholarDigital Library
- Jayanth Kumar Kannan, Matthew Chapman Caesar, Ion Stoica, and Scott Shenker. On the consistency of dht-based routing. Technical Report UCB/EECS-2007-22, EECS Department, University of California, Berkeley, Jan 2007.Google Scholar
- Miguel Matos, José Pereira, and Rui Oliveira. Self tuning with self confidence. In In "Fast Abstract", Supplement of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 2008.Google Scholar
- Prakash Nadkarni and Cindy Brandt. Data extraction and ad hoc query of an entity-attribute-value database. Journal of the American Medical Informatics Association, 5(6):511--527, 1998.Google ScholarCross Ref
- José Pereira, Luís Rodrigues, Maria J. Monteiro, Rui Oliveira, and Anne-Marie Kermarrec. Neem: network-friendly epidemic multicast. Reliable Distributed Systems, 2003. Proceedings. 22nd International Symposium on, pages 15--24, Oct. 2003.Google ScholarCross Ref
- Venugopalan Ramasubramanian and Emin Gün Sirer. Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays. In NSDI'04: Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, pages 8--8, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
- Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Schenker. A scalable content-addressable network. In SIGCOMM '01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 161--172, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- John Risson, Aaron Harwood, and Tim Moors. Stable high-capacity one-hop distributed hash tables. In ISCC '06: Proceedings of the 11th IEEE Symposium on Computers and Communications, pages 687--694, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware '01: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, pages 329--350, London, UK, 2001. Springer-Verlag. Google ScholarDigital Library
- David Skillicorn. The case for datacentric grids. Technical Report ISSN-0836-0227-2001-451, Department of Computing and Information Science, Queen's University, November 2001.Google Scholar
- Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pages 149--160, 2001. Google ScholarDigital Library
Recommendations
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Adding data analytics capabilities to scaled-out object store
In-situ MapReduce computation on large-scale data in object store.Scale object store while computation layer remains lightweight.Implementation with Hadoop and Ceph storage system.Improved initial data ingest performance by up to 96.Improved MapReduce ...
LSM-Tree Managed Storage for Large-Scale Key-Value Store
Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend block storage, and persisting their clustered data through a block manager, usually a file system. In general, a file system is expected to not only ...
Comments