Abstract
Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnaker's Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.
- Cassandra. http://cassandra.apache.org.Google Scholar
- Errors in Database Systems, Eventual Consistency, and the CAP Theorem. http://cacm.acm.org/blogs.Google Scholar
- M. K. Aguilera, A. Merchant, M. A. Shah, A. C. Veitch, and C. T. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In ACM Trans. on Computer Systems, pages 5:1--5:48, 27(3), 2009. Google ScholarDigital Library
- D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A Fast Array of Wimpy Nodes. In SOSP, pages 1--14, 2009. Google ScholarDigital Library
- L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An Analysis of Data Corruption in the Storage Stack. In FAST, pages 8:1--8:28, 2008. Google ScholarDigital Library
- E. A. Brewer. Towards Robust Distributed Systems. In PODC, pages 7--7, 2000. Google ScholarDigital Library
- D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme Scale with full SQL Language Support in Microsoft SQL Azure. In SIGMOD, pages 1021--1024, 2010. Google ScholarDigital Library
- E. Cecchet, G. Candea, and A. Ailamaki. Middleware-Based Database Replication: The Gaps Between Theory and Practice. In SIGMOD, pages 739--752, 2008. Google ScholarDigital Library
- T. D. Chandra, R. Griesemer, and J. Redstone. Paxos Made Live: An Engineering Perspective. In PODC, pages 398--407, 2007. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, pages 205--218, 2006. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. PVLDB, 1:1277--1288, August 2008. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In SOSP, pages 205--220, 2007. Google ScholarDigital Library
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In SIGMOD, pages 1--8, 1984. Google ScholarDigital Library
- S. Elnikety, S. G. Dropsho, and F. Pedone. Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication. In EuroSys, pages 117--130, 2006. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, pages 29--43, 2003. Google ScholarDigital Library
- H. Hsiao and D. J. Dewitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, pages 227--254, 1990. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-Free Coordination for Internet-scale Systems. In USENIX, 2010. Google ScholarDigital Library
- B. Kemme and G. Alonso. Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication. In VLDB, pages 134--143, 2000. Google ScholarDigital Library
- L. Lamport. The Part-Time Parliament. In ACM Trans. on Computer Systems, pages 133--169, 16(2), 1998. Google ScholarDigital Library
- L. Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18--25, December 2001.Google Scholar
- L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and Primary-Backup Replication. In PODC, pages 312--313, 2009. Google ScholarDigital Library
- M. K. McKusick and S. Quinlan. GFS: Evolution on Fast-Forward. In ACM Queue, 7(7), 2009. Google ScholarDigital Library
- M. Pease, R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults. In Journal of The ACM, pages 228--234, 1980. Google ScholarDigital Library
- C. Plattner and G. Alonso. Ganymed: Scalable Replication for Transactional Web Applications. In Middleware, pages 155--174, 2004. Google ScholarDigital Library
- D. Skeen. Nonblocking Commit Protocols. In SIGMOD, pages 133--142, 1981. Google ScholarDigital Library
- F. Yang, J. Shanmugasundaram, and R. Yerneni. A Scalable Data Platform for a Large Number of Small Applications. In CIDR, 2009.Google Scholar
Index Terms
- Using Paxos to build a scalable, consistent, and highly available datastore
Recommendations
Using Paxos to Build a Lightweight, Highly Available Key-Value Data Store
WISA '13: Proceedings of the 2013 10th Web Information System and Application ConferenceKey-value data store has been widely used in e-commerce systems. The availability issue, which means no data loss and continuous service to users, is quite essential in such systems. This paper presents a lightweight, highly available architecture of key-...
Highly available transactions: virtues and limitations
To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations ...
Comments