ABSTRACT
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for software, therefore, has to address them.
We investigate the mixed-size behaviour of ARMv8 and IBM POWER architectures and implementations: by experiment, by developing semantic models, by testing the correspondence between these, and by discussion with ARM and IBM staff. This turns out to be surprisingly subtle, and on the way we have to revisit the fundamental concepts of coherence and sequential consistency, which change in this setting. In particular, we show that adding a memory barrier between each instruction does not restore sequential consistency. We go on to extend the C/C++11 model to support non-atomic mixed-size memory accesses.
This is a necessary step towards semantics for real-world shared-memory concurrent code, beyond litmus tests.
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., C-28(9):690– 691, 1979. Google ScholarDigital Library
- L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache systems. IEEE Trans. Comput., 27(12):1112–1118, December 1978. Google ScholarDigital Library
- William W. Collier. Principles of architecture for systems of parallel processes. Technical Report TR 00.3100, IBM Poughkeepsie, 1981.Google Scholar
- Michel Dubois, Christoph Scheurich, and Faye A. Briggs. Memory access buffering in multiprocessors. In Proc. ISCA ’86, pages 434– 442, 1986. Google ScholarDigital Library
- J. Misra. Axioms for memory access in asynchronous hardware systems. ACM Trans. Program. Lang. Syst., 8(1):142–153, 1986. Google ScholarDigital Library
- Dennis Shasha and Marc Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, April 1988. Google ScholarDigital Library
- James R. Goodman. Cache consistency and sequential consistency. Technical Report Technical Report 61, IEEE Scalable Coherent Interface (SCI) Working Group, March 1989.Google Scholar
- Sarita V. Adve and Mark D. Hill. Weak ordering — a new definition. In Proc. ISCA ’90, pages 2–14. ACM, 1990. Google ScholarDigital Library
- Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proc. ISCA ’90, pages 15–26. ACM, 1990. Google ScholarDigital Library
- William W. Collier. Reasoning About Parallel Architectures. Prentice-Hall, Inc., 1992. Google ScholarDigital Library
- Pradeep S. Sindhu, Jean-Marc Frailong, and Michel Cekleov. Formal Specification of Memory Models, pages 25–41. Springer US, 1992.Google Scholar
- Prince Kohli, Gil Neiger, and Mustaque Ahamad. A characterization of scalable shared memories. In ICPP: International Conference on Parallel Processing, pages 332–335, 1993. Google ScholarDigital Library
- F. Corella, J. M. Stone, and C. M. Barton. A formal specification of the PowerPC shared memory architecture. Technical Report RC18638, IBM, 1993.Google Scholar
- David L Dill, Seungjoon Park, and Andreas G. Nowatzyk. Formal specification of abstract memory models. In Proceedings of the 1993 Symposium on Research on Integrated Systems, pages 38–52. MIT Press, 1993. Google ScholarDigital Library
- The SPARC Architecture Manual, Version 9. SPARC Int., Inc., 1994. Google ScholarDigital Library
- Hagit Attiya and Roy Friedman. Programming DEC-Alpha based multiprocessors the easy way (extended abstract). In Proc. SPAA, pages 157–166, New York, NY, USA, 1994. ACM. Google ScholarDigital Library
- José M. Bernabéu-Aubán and Vicente Cholvi-juan. Formalizing memory coherency models. Journal of Computing and Information, 1:653–672, 1994.Google Scholar
- K. Gharachorloo. Memory consistency models for shared-memory multiprocessors. WRL Research Report, 95(9), 1995. Google ScholarDigital Library
- Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. Causal memory: definitions, implementation, and programming. Distributed Computing, 9(1):37–49, 1995.Google ScholarDigital Library
- Lisa Higham, Jalal Kawash, and Nathaly Verwaal. Weak memory consistency models. Part I: Definitions and comparisons. Technical report, Department of Computer Science, University of Calgary, 1998.Google Scholar
- Prosenjit Chatterjee and Ganesh Gopalakrishnan. Towards a formal model of shared memory consistency for Intel Itaniumtm. In 19th International Conference on Computer Design (ICCD 2001), September 2001, Austin, TX, USA, pages 515–518, 2001. Google ScholarDigital Library
- Intel. A formal specification of Intel Itanium processor family memory ordering, 2002. http://download.intel.com/design/ Itanium/Downloads/25142901.pdf.Google Scholar
- A. Adir, H. Attiya, and G. Shurek. Information-flow models for shared memory with an application to the PowerPC architecture. IEEE Trans. Parallel Distrib. Syst., 14(5):502–515, 2003. Google ScholarDigital Library
- Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom, and Konrad Slind. Nemos: A framework for axiomatic and executable specifications of memory consistency models. In 18th International Parallel and Distributed Processing Symposium (IPDPS), Santa Fe, New Mexico, USA, 2004.Google ScholarCross Ref
- Lisa Higham, LillAnne Jackson, and Jalal Kawash. Programmercentric conditions for Itanium memory consistency. In Proceedings of the 8th International Conference on Distributed Computing and Networking, ICDCN’06, pages 58–69. Springer-Verlag, 2006. Google ScholarDigital Library
- Arvind Arvind and Jan-Willem Maessen. Memory model = instruction reordering + store atomicity. In Proc. ISCA ’06, pages 29–40. IEEE Computer Society, 2006. Google ScholarDigital Library
- N. Chong and S. Ishtiaq. Reasoning about the ARM weakly consistent memory model. In MSPC, 2008. Google ScholarDigital Library
- Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant, Magnus Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Proc. POPL 2009, pages 379–391, January 2009. Google ScholarDigital Library
- J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Zappa Nardelli. The semantics of Power and ARM multiprocessor machine code. In Proc. DAMP 2009, January 2009. Google ScholarDigital Library
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In Proc. CAV, 2010. Google ScholarDigital Library
- Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-TSO. In Proceedings of TPHOLs 2009: Theorem Proving in Higher Order Logics, LNCS 5674, pages 391–407, 2009. Google ScholarDigital Library
- Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. x86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors. Communications of the ACM, 53(7):89–97, July 2010. (Research Highlights). Google ScholarDigital Library
- Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. Understanding POWER multiprocessors. In Proc. PLDI ’11, pages 175–186, 2011. Google ScholarDigital Library
- Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER. In Proc. POPL 2012, pages 509–520, 2012. Google ScholarDigital Library
- Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. Synchronising C/C++ and POWER. In Proceedings of PLDI 2012, the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation (Beijing), pages 311–322, 2012. Google ScholarDigital Library
- Luc Maranget, Susmit Sarkar, and Peter Sewell. A tutorial introduction to the ARM and POWER relaxed memory models. Draft available from http://www.cl.cam.ac.uk/~pes20/ ppc-supplemental/test7.pdf, 2012.Google Scholar
- Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory. ACM TOPLAS, 36(2):7:1–7:74, July 2014. Google ScholarDigital Library
- Kathryn E. Gray, Gabriel Kerneis, Dominic Mulligan, Christopher Pulte, Susmit Sarkar, and Peter Sewell. An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors. In Proc. MICRO-48, the 48th Annual IEEE/ACM International Symposium on Microarchitecture, December 2015. Google ScholarDigital Library
- Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture, operationally: Concurrency and ISA. In Proceedings of POPL: the 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016. Google ScholarDigital Library
- Sizhuo Zhang, Arvind, and Muralidaran Vijayaraghavan. Taming weak memory models. CoRR, abs/1606.05416, 2016.Google Scholar
- Linux kernel lockrefs. https://lwn.net/Articles/565734/, http://git.kernel.org/cgit/linux/kernel/git/torvalds/ linux.git/tree/lib/lockref.c, http://git.kernel.org/cgit/ linux/kernel/git/torvalds/linux.git/tree/include/linux/ lockref.h.Google Scholar
- ARM Ltd. ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile), 2015. ARM DDI 0487A.h (ID092915).Google Scholar
- Power ISATM Version 2.07. IBM, 2013.Google Scholar
- Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. Litmus: running tests against hardware. In Proceedings of TACAS 2011, pages 41–44. Springer-Verlag, 2011. Google ScholarDigital Library
- H.-J. Boehm and S. Adve. Foundations of the C++ concurrency memory model. In Proc. PLDI, 2008. Google ScholarDigital Library
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In Proc. POPL, 2011. Google ScholarDigital Library
- Yatin A. Manerkar, Caroline Trippel, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. Counterexamples and proof loophole for the C/C++ to POWER and ARMv7 trailing-sync compiler mappings. CoRR, abs/1611.01507, 2016.Google Scholar
- Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairing sequential consistency in C/C++11. Note, available at http://plv.mpi-sws.org/scfix/, 2016.Google Scholar
- Susmit Sarkar and Peter Sewell. Corrigendum: C/C++11 to POWER concurrency compilation scheme correctness proof. Note, available at http://www.cl.cam.ac.uk/users/pes20/cppppc/corrigendum. html, December 2016.Google Scholar
- P. Cenciarelli, A. Knapp, and E. Sibilio. The Java memory model: Operationally, denotationally, axiomatically. In ESOP, 2007. Google ScholarDigital Library
- J. Ševˇcík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, 2008. Google ScholarDigital Library
- Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell. The problem of programming language concurrency semantics. In Proceedings of ESOP 2015, 2015.Google ScholarCross Ref
- Jean Pichon-Pharabod and Peter Sewell. A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions. In Proceedings of POPL, 2016. Google ScholarDigital Library
- Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty, and Peter Sewell. Supplementary material. http://www.cl.cam.ac.uk/ ~pes20/popl17/,Google Scholar
- Dominic P. Mulligan, Scott Owens, Kathryn E. Gray, Tom Ridge, and Peter Sewell. Lem: reusable engineering of real-world semantics. In Proceedings of ICFP 2014: the 19th ACM SIGPLAN International Conference on Functional Programming, pages 175–188, 2014. Google ScholarDigital Library
- P. Becker, editor. Programming Languages — C++. 2011. ISO/IEC 14882:2011. http://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2011/n3242.pdf.Google Scholar
- Mark John Batty. The C11 and C++11 Concurrency Model. PhD thesis, University of Cambridge Computer Laboratory, 2014.Google Scholar
- P. E. McKenney and R. Silvera. Example POWER implementation for C/C++ memory model. http://www.rdrop.com/users/paulmck/ scalability/paper/N2745r.2011.03.04a.html, 2011.Google Scholar
- Jade Alglave and Luc Maranget. The diy tool. http://diy.inria. fr/.Google Scholar
- Mark Batty, Mike Dodds, and Alexey Gotsman. Library abstraction for C/C++ concurrency. In Proc. POPL ’13, pages 235–248. ACM, 2013. Google ScholarDigital Library
- Aaron Turon, Viktor Vafeiadis, and Derek Dreyer. GPS: Navigating weak memory with ghosts, protocols, and separation. In Proc. OOPSLA ’14, 2014. Google ScholarDigital Library
- Richard Bornat, Jade Alglave, and Matthew J. Parkinson. New lace and arsenic: adventures in weak memory with a program logic. CoRR, abs/1512.01416, 2015.Google Scholar
Index Terms
- Mixed-size concurrency: ARM, POWER, C/C++11, and SC
Recommendations
Modelling the ARMv8 architecture, operationally: concurrency and ISA
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesIn this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable ...
Mixed-size concurrency: ARM, POWER, C/C++11, and SC
POPL '17Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems ...
Synchronising C/C++ and POWER
PLDI '12Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these ...
Comments