ABSTRACT
Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.
- B. Calder, J. Wang, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In SOSP 201 Google ScholarDigital Library
- O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In FAST 2012. Google ScholarDigital Library
- P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, Chapter 7, Addison Wesley Publishing Company, ISBN 0-201-10715-5, 1997. Google ScholarDigital Library
- C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system?. ACM TODS, 11(4):378--396, 1986. Google ScholarDigital Library
- C. Mohan and B. Lindsay. Efficient commit protocols for the tree of processes model of distributed transactions. ACM SIGOPS Operating Systems Review, 19(2):40--52, 1985. Google ScholarDigital Library
- D.K. Gifford. Weighted voting for replicated data. In SOSP 1979. Google ScholarDigital Library
- C. Mohan, D.L. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17 (1): 94--162, 1992. Google ScholarDigital Library
- R. van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI 2004. Google ScholarDigital Library
- A. Kopytov. Sysbench Manual. Available at http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdfGoogle Scholar
- J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, and R. Wang. High performance transactions in deuteronomy. In CIDR 2015.Google Scholar
- P. Bailis, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Scalable atomic visibility with RAMP Transactions. In SIGMOD 2014. Google ScholarDigital Library
- P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Highly available transactions: virtues and limitations. In VLDB 2014. Google ScholarDigital Library
- R. Taft, E. Mansour, M. Serafini, J. Duggan, A.J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-Store: fine-grained elastic partitioning for distributed transaction processing systems. In VLDB 2015. Google ScholarDigital Library
- R. Woollen. The internal design of salesforce.com's multi-tenant architecture. In SoCC 2010. Google ScholarDigital Library
- S. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR, 17(3):341--370, 1985. Google ScholarDigital Library
- S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, 2002. Google ScholarDigital Library
- D.J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2), 2012. Google ScholarDigital Library
- A. Adya. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD Thesis, MIT, 1999. Google ScholarDigital Library
- Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1), Mar. 2005. Google ScholarDigital Library
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In SIGMOD 1995. Google ScholarDigital Library
- P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. ACM Queue, 11(3), March 2013. Google ScholarDigital Library
- P. Bernstein and S. Das. Rethinking eventual consistency. In SIGMOD, 2013. Google ScholarDigital Library
- B. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB 2008. Google ScholarDigital Library
- J. C. Corbett, J. Dean, et al. Spanner: Google's globally-distributed database. In OSDI 2012. Google ScholarDigital Library
- David K. Gifford. Information Storage in a Decentralized Computer System. Tech. rep. CSL-81--8. PhD dissertation. Xerox PARC, July 1982. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool?. CACM 53 (1):72--77, 2010. Google ScholarDigital Library
- J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases. 1(2) pp. 141--259, 2007. Google ScholarDigital Library
- J. Gray, R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of locks in a shared data base. In VLDB 1975. Google ScholarDigital Library
- P-A Larson, et al. High-Performance Concurrency control mechanisms for main-memory databases. PVLDB, 5(4): 298--309, 2011. Google ScholarDigital Library
- M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2): 21--27, 2013.Google Scholar
- V. Leis, A. Kemper, and T. Neumann. Exploiting hardware transactional memory in main-memory databases. In ICDE 2014.Google ScholarCross Ref
- H. Mühe, S. Wolf, A. Kemper, and T. Neumann: An evaluation of strict timestamp ordering concurrency control for main-memory database systems. In IMDM Workshop 2013.Google Scholar
- M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. ACM TOCS 10(1): 26--52, 1992. Google ScholarDigital Library
- J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB 6(10): 877--888, 2013. Google ScholarDigital Library
- J. Levandoski, D. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In ICDE 2013. Google ScholarDigital Library
- M. Aguilera, J. Leners, and M. Walfish. Yesquel: scalable SQL storage for web applications. In SOSP 2015. Google ScholarDigital Library
- Percona Lab. TPC-C Benchmark over MySQL. Available at https://github.com/Percona-Lab/tpcc-mysqlGoogle Scholar
- P. Bernstein, C. Reid, and S. Das. Hyder -- A transactional record manager for shared flash. In CIDR 2011.Google Scholar
- M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3): 2009. Google ScholarDigital Library
- M. Weiner. Sharding Pinterest: How we scaled our MySQL fleet. Pinterest Engineering Blog. Available at: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleetGoogle Scholar
- G. Graefe. Instant recovery for data center savings. ACM SIGMOD Record. 44(2):29--34, 2015. Google ScholarDigital Library
- J. Dean and L. Barroso. The tail at scale. CACM 56(2):74--80, 2013. Google ScholarDigital Library
Index Terms
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
Recommendations
The Snowflake Elastic Data Warehouse
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataWe live in the golden age of distributed computing. Public cloud platforms now offer virtually unlimited compute and storage resources on demand. At the same time, the Software-as-a-Service (SaaS) model brings enterprise-class systems to users who ...
Cloud-native database systems at Alibaba: opportunities and challenges
Cloud-native databases become increasingly important for the era of cloud computing, due to the needs for elasticity and on-demand usage by various applications. These challenges from cloud applications present new opportunities for cloud-native ...
Taurus Database: How to be Fast, Available, and Frugal in the Cloud
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataUsing cloud Database as a Service (DBaaS) offerings instead of on-premise deployments is increasingly common. Key advantages include improved availability and scalability at a lower cost than on-premise alternatives. In this paper, we describe the ...
Comments