Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Authors:
Sheng Li

Intel Labs

Intel Labs
View Profile

,
Hyeontaek Lim

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Victor W. Lee

Intel Labs

Intel Labs
View Profile

,
Jung Ho Ahn

Seoul National University

Seoul National University
View Profile

,
Anuj Kalia

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Michael Kaminsky

Intel Labs

Intel Labs
View Profile

,
David G. Andersen

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
O. Seongil

Seoul National University

Seoul National University
View Profile

,
Sukhan Lee

Seoul National University

Seoul National University
View Profile

,
Pradeep Dubey

Intel Labs

Intel Labs
View Profile

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer ArchitectureJune 2015Pages 476–488https://doi.org/10.1145/2749469.2750416

Published:13 June 2015Publication History

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 476–488

ABSTRACT

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of datacenters. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused upon improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts too showed orders of magnitude improvement over stock memcached.

We aim at architecting high performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems, but also leads to guided optimizations atop a recent design to achieve a record-setting throughput of 120 million requests per second (MRPS) on a single commodity server. Our implementation delivers 9.2X the performance (RPS) and 2.8X the system energy efficiency (RPS/watt) of the best-published FPGA-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.

References

Amazon Elasticache, http://aws.amazon.com/elasticache/.Google Scholar
Intel® Data Direct I/O Technology, http://www.intel.com/content/www/us/en/io/direct-data-i-o.html.Google Scholar
Intel® Ethernet Flow Director, http://www.intel.com/content/www/us/en/ethernet-controllers/ethernet-flow-director-video.html.Google Scholar
How Linkedin uses memcached, http://www.oracle.com/technetwork/server-storage/ts-4696-159286.pdf.Google Scholar
Intel® I/O Acceleration Technology, http://www.intel.com/content/www/us/en/wireless-network/accel-technology.html.Google Scholar
Mellanox® 100Gbps Ethernet NIC, http://www.mellanox.com/related-docs/prod_silicon/PB_ConnectX-4_VPI_Card.pdf.Google Scholar
Memcached: A distributed memory object caching system, http://memcached.org/.Google Scholar
Memcached SPOF Mystery, https://blog.twitter.com/2010/memcached-spof-mystery.Google Scholar
Netflix EVCache, http://techblog.netflix.com/2012/01/ephemeral-volatile-caching-in-cloud.html.Google Scholar
Mellanox® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED), http://www.mellanox.com/page/products_dyn?product_family=26.Google Scholar
J. Ahn, S. Li, S. O, and N. P. Jouppi, "McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling," in ISPASS, 2013.Google Scholar
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, "Workload analysis of a large-scale key-value store," in SIGMETRICS, 2012. Google ScholarDigital Library
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion, "IX: A protected dataplane operating system for high throughput and low latency," in OSDI, 2014. Google ScholarDigital Library
M. Blott, K. Karras, L. Liu, K. Vissers, J. Bär, and Z. István, "Achieving 10Gbps line-rate key-value stores with FPGAs," in HotCloud, 2013.Google Scholar
S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala, "An FPGA Memcached appliance," in FPGA, 2013. Google ScholarDigital Library
B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking cloud serving systems with YCSB," in SOCC, 2010. Google ScholarDigital Library
M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy, "RouteBricks: Exploiting parallelism to scale software routers," in SOSP, 2009. Google ScholarDigital Library
A. Dragojević, D. Narayanan, M. Castro, and O. Hodson, "FaRM: Fast remote memory," in NSDI, 2014. Google ScholarDigital Library
B. Fan, D. G. Andersen, and M. Kaminsky, "MemC3: Compact and concurrent memcache with dumber caching and smarter hashing," in NSDI, 2013. Google ScholarDigital Library
A. Gutierrez, M. Cieslak, B. Giridhar, R. G. Dreslinski, L. Ceze, and T. Mudge, "Integrated 3D-stacked server designs for increasing physical density of key-value stores," in ASPLOS, 2014. Google ScholarDigital Library
S. Han, K. Jang, K. Park, and S. Moon, "PacketShader: a GPU-accelerated software router," in SIGCOMM, 2010. Google ScholarDigital Library
M. Herlihy, N. Shavit, and M. Tzafrir, "Hopscotch hashing," in Distributed Computing. Springer, 2008, pp. 350--364. Google ScholarDigital Library
R. Huggahalli, R. Iyer, and S. Tetrick, "Direct cache access for high bandwidth network I/O," in ISCA, 2005. Google ScholarDigital Library
Intel, "Intel Data Plane Development Kit (Intel DPDK)," http://www.intel.com/go/dpdk, 2014.Google Scholar
R. Jevtic, H. Le, M. Blagojevic, S. Bailey, K. Asanovic, E. Alon, and B. Nikolic, "Per-core DVFS with switched-capacitor converters for energy efficiency in manycore processors," IEEE TVLSI, vol. 23, no. 4, pp. 723--730, 2015.Google Scholar
A. Kalia, M. Kaminsky, and D. G. Andersen, "Using RDMA efficiently for key-value services," in SIGCOMM, 2014. Google ScholarDigital Library
R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat, "Chronos: Predictable low latency for data center applications," in SOCC, 2012. Google ScholarDigital Library
M. Lavasani, H. Angepat, and D. Chiou, "An FPGA-based in-line accelerator for Memcached," in HotChips, 2013.Google Scholar
S. Li, J. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009. Google ScholarDigital Library
S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi, "System-level integrated server architectures for scale-out datacenters," in MICRO, 2011. Google ScholarDigital Library
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky, "MICA: A holistic approach to fast in-memory key-value storage," in NSDI, 2014. Google ScholarDigital Library
K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin Servers with Smart Pipes: Designing SoC accelerators for Memcached," in ISCA, 2013. Google ScholarDigital Library
P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi, "Scale-out processors," in ISCA, 2012. Google ScholarDigital Library
Y. Mao, E. Kohler, and R. T. Morris, "Cache craftiness for fast multicore key-value storage," in EuroSys, 2012. Google ScholarDigital Library
C. Mitchell, Y. Geng, and J. Li, "Using one-sided RDMA reads to build a fast, CPU-efficient key-value store," in USENIX ATC, 2013. Google ScholarDigital Library
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani, "Scaling Memcache at Facebook," in NSDI, 2013. Google ScholarDigital Library
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot, "Scale-out NUMA," in ASPLOS, 2014. Google ScholarDigital Library
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum, "Fast crash recovery in RAMCloud," in SOSP, 2011. Google ScholarDigital Library
R. Pagh and F. Rodler, "Cuckoo hashing," Journal of Algorithms, vol. 51, no. 2, pp. 122--144, May 2004. Google ScholarDigital Library
D. A. Patterson, "Latency lags bandwith," Commun. ACM, vol. 47, no. 10, pp. 71--75, 2004. Google ScholarDigital Library
A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris, "Improving network connection locality on multicore systems," in EuroSys, 2012. Google ScholarDigital Library
S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe, "Arrakis: The operating system is the control plane," in OSDI, 2014. Google ScholarDigital Library
L. Rizzo, "netmap: A novel framework for fast packet I/O," in USENIX ATC, 2012. Google ScholarDigital Library
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," in ISCA, 1996. Google ScholarDigital Library

Index Terms

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Recommendations

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented data center infrastructure. Their performance and efficiency directly affect the QoS of web services and the ...
Read More
Architecting to achieve a billion requests per second throughput on a single key-value store server platform
ISCA'15

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the ...
Read More
Many-core key-value store
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Scaling data centers to handle task-parallel work-loads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell
ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Copyright © 2015 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2015
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 3,403
  Total Downloads
- Downloads (Last 12 months)327
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Many-core key-value store