skip to main content
10.1145/2485922.2485926acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Thin servers with smart pipes: designing SoC accelerators for memcached

Published:23 June 2013Publication History

ABSTRACT

Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of memcached behavior. We discover that, regardless of CPU microarchitecture, memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks.

Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance memcached deployment. TSSP couples an embedded-class low-power core to a memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

References

  1. Private communication with Facebook engineers, 2012.Google ScholarGoogle Scholar
  2. Zynq-7000 All Programmable SoC, 2012.Google ScholarGoogle Scholar
  3. D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. Energy proportional datacenter networks. In Proceedings of the International Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of the Symposium on Networked Systems Design and Implementation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proceedings of the Symposium on Operating Systems Principles, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In Proceedings of the International Symposium on Computer Architecture, 2011. Google ScholarGoogle Scholar
  8. D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in haystack: facebook's photo storage. In Proceedings of the Symposium on Operating System Design and Implementation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele. Many-core key-value store. In Proceedings of the International Green Computing Conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Bhattacharjee and M. Martonosi. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. Performance Analysis of System Overheads in TCP/IP Workloads. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Cha, A. Mislove, and K. P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the International Conference on World Wide Web, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA memcached appliance. In Proceedings of the International Symposium on Field Programmable Gate Arrays, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. In Proceedings of the Symposium on Operating Systems Principles, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Facebook. Memcached Tech Talk with M. Zuckerberg, 2010.Google ScholarGoogle Scholar
  17. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: a scalable and flexible data center network. In Proceedings of the Conference on Data Communication, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous cpu-gpu systems. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Huggahalli, R. Iyer, and S. Tetrick. Direct Cache Access for High Bandwidth Network I/O. In Proceedings of the International Symposium on Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the International Conference on Parallel Processing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: predictable low latency for data center applications. In Proceedings of the Symposium on Cloud Computing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Kirsch and M. Mitzenmacher. The power of one move: Hashing schemes for hardware. IEEE/ACM Transactions on Networking, 18(6):1752--1765, dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Liao and L. Bhuyan. Performance measurement of an integrated nic architecture with 10gbe. In Proceedings of the Symposium on High Performance Interconnects, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proceedings of the International Symposium on Computer Architecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Metreveli, N. Zeldovich, and M. F. Kaashoek. Cphash: a cache-partitioned hash table. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Minshall, Y. Saito, J. C. Mogul, and B. Verghese. Application performance pitfalls and TCP's Nagle algorithm. SIGMETRICS Performance Evaluation Review, 27(4), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. In Proceedings of the Symposium on Networked Systems Design and Implementation, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Regnier, S. Makineni, I. Illikkal, R. Iyer, D. Minturn, R. Huggahalli, D. Newell, L. Cline, and A. Foong. Tcp onloading for data center servers. Computer, 37(11), nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It's time for low latency. In Proceedings of the Conference on Hot Topics in Operating Systems, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Stuedi, A. Trivedi, and B. Metzler. Wimpy nodes with 10gbe: leveraging one-sided operations in soft-rdma to boost memcached. In Proceedings of the USENIX Annual Technical Conference, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Janapa Reddi, Benjamin Lee, Trishul Chilimbi, and Kushagra Vaid. Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency. In Proceedings of the International Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Wiggins and J. Langston. Enhancing the Scalability of Memcached, 2012.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
    June 2013
    686 pages
    ISBN:9781450320795
    DOI:10.1145/2485922
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
      ICSA '13
      June 2013
      666 pages
      ISSN:0163-5964
      DOI:10.1145/2508148
      Issue’s Table of Contents

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 June 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader