research-article

Using Paxos to build a scalable, consistent, and highly available datastore

Authors:
Jun Rao

LinkedIn Corporation

LinkedIn Corporation
View Profile

,
Eugene J. Shekita

IBM Almaden Research Center

IBM Almaden Research Center
View Profile

,
Sandeep Tata

IBM Almaden Research Center

IBM Almaden Research Center
View Profile

Proceedings of the VLDB Endowment Volume 4 Issue 4pp 243–254https://doi.org/10.14778/1938545.1938549

Published:01 January 2011Publication History

Proceedings of the VLDB Endowment

Abstract

Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnaker's Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.

References

Cassandra. http://cassandra.apache.org.Google Scholar
Errors in Database Systems, Eventual Consistency, and the CAP Theorem. http://cacm.acm.org/blogs.Google Scholar
M. K. Aguilera, A. Merchant, M. A. Shah, A. C. Veitch, and C. T. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In ACM Trans. on Computer Systems, pages 5:1--5:48, 27(3), 2009. Google ScholarDigital Library
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A Fast Array of Wimpy Nodes. In SOSP, pages 1--14, 2009. Google ScholarDigital Library
L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An Analysis of Data Corruption in the Storage Stack. In FAST, pages 8:1--8:28, 2008. Google ScholarDigital Library
E. A. Brewer. Towards Robust Distributed Systems. In PODC, pages 7--7, 2000. Google ScholarDigital Library
D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme Scale with full SQL Language Support in Microsoft SQL Azure. In SIGMOD, pages 1021--1024, 2010. Google ScholarDigital Library
E. Cecchet, G. Candea, and A. Ailamaki. Middleware-Based Database Replication: The Gaps Between Theory and Practice. In SIGMOD, pages 739--752, 2008. Google ScholarDigital Library
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos Made Live: An Engineering Perspective. In PODC, pages 398--407, 2007. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, pages 205--218, 2006. Google ScholarDigital Library
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. PVLDB, 1:1277--1288, August 2008. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In SOSP, pages 205--220, 2007. Google ScholarDigital Library
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In SIGMOD, pages 1--8, 1984. Google ScholarDigital Library
S. Elnikety, S. G. Dropsho, and F. Pedone. Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication. In EuroSys, pages 117--130, 2006. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, pages 29--43, 2003. Google ScholarDigital Library
H. Hsiao and D. J. Dewitt. Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. In ICDE, pages 227--254, 1990. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-Free Coordination for Internet-scale Systems. In USENIX, 2010. Google ScholarDigital Library
B. Kemme and G. Alonso. Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication. In VLDB, pages 134--143, 2000. Google ScholarDigital Library
L. Lamport. The Part-Time Parliament. In ACM Trans. on Computer Systems, pages 133--169, 16(2), 1998. Google ScholarDigital Library
L. Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18--25, December 2001.Google Scholar
L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and Primary-Backup Replication. In PODC, pages 312--313, 2009. Google ScholarDigital Library
M. K. McKusick and S. Quinlan. GFS: Evolution on Fast-Forward. In ACM Queue, 7(7), 2009. Google ScholarDigital Library
M. Pease, R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults. In Journal of The ACM, pages 228--234, 1980. Google ScholarDigital Library
C. Plattner and G. Alonso. Ganymed: Scalable Replication for Transactional Web Applications. In Middleware, pages 155--174, 2004. Google ScholarDigital Library
D. Skeen. Nonblocking Commit Protocols. In SIGMOD, pages 133--142, 1981. Google ScholarDigital Library
F. Yang, J. Shanmugasundaram, and R. Yerneni. A Scalable Data Platform for a Large Number of Small Applications. In CIDR, 2009.Google Scholar

Index Terms

Recommendations

Using Paxos to Build a Lightweight, Highly Available Key-Value Data Store
WISA '13: Proceedings of the 2013 10th Web Information System and Application Conference

Key-value data store has been widely used in e-commerce systems. The availability issue, which means no data loss and continuous service to users, is quite essential in such systems. This paper presents a lightweight, highly available architecture of key-...
Read More
AUTOMATIC REPLICATION FOR HIGHLY AVAILABLE SERVICES
Read More
Highly available transactions: virtues and limitations

To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 4, Issue 4
January 2011
59 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 January 2011
Published in pvldb Volume 4, Issue 4
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 1,153
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using Paxos to build a scalable, consistent, and highly available datastore

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Using Paxos to Build a Lightweight, Highly Available Key-Value Data Store

AUTOMATIC REPLICATION FOR HIGHLY AVAILABLE SERVICES

Highly available transactions: virtues and limitations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using Paxos to build a scalable, consistent, and highly available datastore

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Using Paxos to Build a Lightweight, Highly Available Key-Value Data Store

AUTOMATIC REPLICATION FOR HIGHLY AVAILABLE SERVICES

Highly available transactions: virtues and limitations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media