ABSTRACT
We introduce "asynchronized concurrency (ASCY)," a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ASCY leads to implementations which are portably scalable: they scale across different types of hardware platforms, including single and multi-socket ones, for various classes of workloads, such as read-only and read-write, and according to different performance metrics, including throughput, latency, and energy. We substantiate our thesis through the most exhaustive evaluation of CSDSs to date, involving 6 platforms, 22 state-of-the-art CSDS algorithms, 10 re-engineered state-of-the-art CSDS algorithms following the ASCY patterns, and 2 new CSDS algorithms designed with ASCY in mind. We observe up to 30% improvements in throughput in the re-engineered algorithms, while our new algorithms out-perform the state-of-the-art alternatives.
- Dan Alistarh, Patrick Eugster, Maurice Herlihy, Alexander Matveev, and Nir Shavit. StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation. EuroSys 2014. Google ScholarDigital Library
- Maya Arbel and Hagit Attiya. Concurrent Updates with RCU: Search Tree As an Example. PODC 2014. Google ScholarDigital Library
- Andrea Arcangeli, Mingming Cao, Paul E McKenney, and Dipankar Sarma. Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel. USENIX ATC 2003.Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schupbach, and Akhilesh Singhania. The multikernel: a new OS architecture for scalable multicore systems. SOSP 2009. Google ScholarDigital Library
- Silas Boyd-Wickizer, Austin T Clements, Yandong Mao, Aleksey Pesterev, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. OSDI 2010. Google ScholarDigital Library
- Anastasia Braginsky, Alex Kogan, and Erez Petrank. Drop the anchor: lightweight memory management for non-blocking data structures. SPAA 2013. Google ScholarDigital Library
- Nathan G Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. A Practical Concurrent Binary Search Tree. PPoPP 2010. Google ScholarDigital Library
- Austin T Clements, M Frans Kaashoek, and Nickolai Zeldovich. Scalable address spaces using RCU balanced trees. In ACM SIGARCH Computer Architecture News, volume 40, pages 199--210. ACM, 2012. Google ScholarDigital Library
- Austin T Clements, M Frans Kaashoek, Nickolai Zeldovich, Robert T Morris, and Eddie Kohler. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors. SOSP 2013. Google ScholarDigital Library
- Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. IEEE Micro, 30(2):16--29, March 2010. Google ScholarDigital Library
- Tudor David, Rachid Guerraoui, Tong Che, and Vasileios Trigonakis. Designing ASCY-compliant Concurrent Search Data Structures. Technical report, EPFL, Lausanne, 2014.Google Scholar
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP 2013. Google ScholarDigital Library
- Mathieu Desnoyers, Paul E McKenney, Alan S Stern, Michel R Dagenais, and Jonathan Walpole. User-level implementations of read-copy update. Parallel and Distributed Systems, IEEE Transactions on, 23(2):375--382, 2012. Google ScholarDigital Library
- David L Detlefs, Paul A Martin, Mark Moir, and Guy L Steele Jr. Lock-free reference counting. Distributed Computing, 15(4):255--271, 2002. Google ScholarDigital Library
- Dana Drachsler, Martin Vechev, and Eran Yahav. Practical Concurrent Binary Search Trees via Logical Ordering. PPoPP 2014. Google ScholarDigital Library
- Aleksandar Dragojevic, Maurice Herlihy, Yossi Lev, and Mark Moir. On the power of hardware transactional memory to simplify memory management. PODC 2011. Google ScholarDigital Library
- Faith Ellen, Panagiota Fatourou, Eric Ruppert, and Franck van Breugel. Non-blocking Binary Search Trees. PODC 2010. Google ScholarDigital Library
- Facebook. RocksDB. http://rocksdb.org.Google Scholar
- Bin Fan, David G Andersen, and Michael Kaminsky. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. NSDI 2013. Google ScholarDigital Library
- Keir Fraser. Practical Lock-Freedom. PhD thesis, University of Cambridge, 2004.Google Scholar
- Anders Gidenstam, Marina Papatriantafilou, Hakan Sundell, and Philippas Tsigas. Efficient and reliable lock-free memory reclamation based on reference counting. Parallel and Distributed Systems, IEEE Transactions on, 20(8):1173--1187, 2009. Google ScholarDigital Library
- Vincent Gramoli. More than You Ever Wanted to Know about Synchronization. PPoPP 2015. Google ScholarDigital Library
- Timothy L Harris. A Pragmatic Implementation of Non-blocking Linked Lists. DISC 2001. Google ScholarDigital Library
- Thomas E Hart, Paul E McKenney, Angela Demke Brown, and Jonathan Walpole. Performance of memory reclamation for lockless synchronization. Journal of Parallel and Distributed Computing, 67(12):1270--1285, 2007. Google ScholarDigital Library
- Steve Heller, Maurice Herlihy, Victor Luchangco, Mark Moir, III Scherer, William N, and Nir Shavit. A Lazy Concurrent List-Based Set Algorithm. In Principles of Distributed Systems, volume 3974. 2006. Google ScholarDigital Library
- Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. A simple optimistic skiplist algorithm. SIROCCO 2007. Google ScholarDigital Library
- Maurice Herlihy, Victor Luchangco, and Mark Moir. Obstruction-free synchronization: Double-ended queues as an example. ICDCS 2003. Google ScholarDigital Library
- Maurice Herlihy, Victor Luchangco, and Mark Moir. The repeat offender problem: a mechanism for supporting dynamic-sized lock-free data structures. Technical report, 2002. Google ScholarDigital Library
- Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming, Revised First Edition. 2012. Google ScholarDigital Library
- Maurice P Herlihy, Yosef Lev, and Nir N Shavit. Concurrent lock-free skiplist with wait-free contains operator, May 3 2011. US Patent 7,937,378.Google Scholar
- Maurice P Herlihy and Jeannette M Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- Shane V Howley and Jeremy Jones. A non-blocking internal binary search tree. SPAA 2012. Google ScholarDigital Library
- Nicholas Hunt, Paramjit Singh Sandhu, and Luis Ceze. Characterizing the performance and energy efficiency of lock-free data structures. INTERACT 2011. Google ScholarDigital Library
- Intel. Intel Transactional Synchronization Extensions Overview. 2013.Google Scholar
- Intel. Intel xeon processor e3-1200 v3 product family - specification update. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf, 2014.Google Scholar
- Intel Thread Building Blocks. https://www.threadingbuildingblocks.org.Google Scholar
- Doug Lea. Overview of package util.concurrent Release 1.3.4. http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html, 2003.Google Scholar
- Hyeontaek Lim, Bin Fan, David G Andersen, and Michael Kaminsky. SILT: A Memory-efficient, High-performance Key-value Store. SOSP 2011. Google ScholarDigital Library
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. Cache craftiness for fast multicore key-value storage. EuroSys 2012. Google ScholarDigital Library
- Paul E McKenney, Dipankar Sarma, and Maneesh Soni. Scaling Dcache with RCU. Linux Journal, 2004(117), January 2004. Google ScholarDigital Library
- Paul E McKenney and John D Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems, pages 509--518, 1998.Google Scholar
- Memcached. http://www.memcached.org.Google Scholar
- Zviad Metreveli, Nickolai Zeldovich, and M Frans Kaashoek. CPHASH: A Cache-partitioned Hash Table. PPoPP 2012. Google ScholarDigital Library
- Maged M Michael. High performance dynamic lock-free hash tables and list-based sets. SPAA 2002. Google ScholarDigital Library
- Maged M Michael. Hazard pointers: Safe memory reclamation for lock-free objects. Parallel and Distributed Systems, IEEE Transactions on, 15(6):491--504, 2004. Google ScholarDigital Library
- Aravind Natarajan and Neeraj Mittal. Fast Concurrent Lock-free Binary Search Trees. PPoPP 2014. Google ScholarDigital Library
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. Scaling Memcache at Facebook. NSDI 2013. Google ScholarDigital Library
- Oracle. CopyOnWriteArrayList in Java docs. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArrayList.html.Google Scholar
- William Pugh. Concurrent Maintenance of Skip Lists. Technical report, 1990. Google ScholarDigital Library
- Hakan Sundell and Philippas Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. IPDPS 2003. Google ScholarDigital Library
- Tilera. Tilera TILE-Gx. http://www.tilera.com/products/processors/TILE-Gx_Family.Google Scholar
- Josh Triplett, Paul E McKenney, and Jonathan Walpole. Re-sizable, scalable, concurrent hash tables via relativistic programming. USENIX ATC 2011. Google ScholarDigital Library
- John D Valois. Lock-free linked lists using compare-and-swap. PODC 1995. Google ScholarDigital Library
Index Terms
- Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
Recommendations
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
ASPLOS'15We introduce "asynchronized concurrency (ASCY)," a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ...
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
ASPLOS '15We introduce "asynchronized concurrency (ASCY)," a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ...
Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and ManycoresThe wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of ...
Comments