skip to main content
research-article

Using Paxos to build a scalable, consistent, and highly available datastore

Published:01 January 2011Publication History
Skip Abstract Section

Abstract

Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnaker's Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.

References

  1. Cassandra. http://cassandra.apache.org.Google ScholarGoogle Scholar
  2. Errors in Database Systems, Eventual Consistency, and the CAP Theorem. http://cacm.acm.org/blogs.Google ScholarGoogle Scholar
  3. M. K. Aguilera, A. Merchant, M. A. Shah, A. C. Veitch, and C. T. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In ACM Trans. on Computer Systems, pages 5:1--5:48, 27(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A Fast Array of Wimpy Nodes. In SOSP, pages 1--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An Analysis of Data Corruption in the Storage Stack. In FAST, pages 8:1--8:28, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. A. Brewer. Towards Robust Distributed Systems. In PODC, pages 7--7, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme Scale with full SQL Language Support in Microsoft SQL Azure. In SIGMOD, pages 1021--1024, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Cecchet, G. Candea, and A. Ailamaki. Middleware-Based Database Replication: The Gaps Between Theory and Practice. In SIGMOD, pages 739--752, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos Made Live: An Engineering Perspective. In PODC, pages 398--407, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, pages 205--218, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. PVLDB, 1:1277--1288, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In SOSP, pages 205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In SIGMOD, pages 1--8, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Elnikety, S. G. Dropsho, and F. Pedone. Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication. In EuroSys, pages 117--130, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, pages 29--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Hsiao and D. J. Dewitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, pages 227--254, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-Free Coordination for Internet-scale Systems. In USENIX, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Kemme and G. Alonso. Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication. In VLDB, pages 134--143, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. The Part-Time Parliament. In ACM Trans. on Computer Systems, pages 133--169, 16(2), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18--25, December 2001.Google ScholarGoogle Scholar
  21. L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and Primary-Backup Replication. In PODC, pages 312--313, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. K. McKusick and S. Quinlan. GFS: Evolution on Fast-Forward. In ACM Queue, 7(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Pease, R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults. In Journal of The ACM, pages 228--234, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Plattner and G. Alonso. Ganymed: Scalable Replication for Transactional Web Applications. In Middleware, pages 155--174, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Skeen. Nonblocking Commit Protocols. In SIGMOD, pages 133--142, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Yang, J. Shanmugasundaram, and R. Yerneni. A Scalable Data Platform for a Large Number of Small Applications. In CIDR, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Using Paxos to build a scalable, consistent, and highly available datastore

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image Proceedings of the VLDB Endowment
                  Proceedings of the VLDB Endowment  Volume 4, Issue 4
                  January 2011
                  59 pages

                  Publisher

                  VLDB Endowment

                  Publication History

                  • Published: 1 January 2011
                  Published in pvldb Volume 4, Issue 4

                  Qualifiers

                  • research-article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader