research-article

Open Access

Automated Classification of Data Races Under Both Strong and Weak Memory Models

Authors:
Baris Kasikci

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland
View Profile

,
Cristian Zamfir

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland
View Profile

,
George Candea

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, Switzerland
View Profile

ACM Transactions on Programming Languages and Systems Volume 37 Issue 3Article No.: 8pp 1–44https://doi.org/10.1145/2734118

Published:22 May 2015Publication History

ACM Transactions on Programming Languages and Systems

Abstract

Data races are one of the main causes of concurrency problems in multithreaded programs. Whether all data races are bad, or some are harmful and others are harmless, is still the subject of vigorous scientific debate [Narayanasamy et al. 2007; Boehm 2012]. What is clear, however, is that today's code has many data races [Kasikci et al. 2012; Jin et al. 2012; Erickson et al. 2010], and fixing data races without introducing bugs is time consuming [Godefroid and Nagappan 2008]. Therefore, it is important to efficiently identify data races in code and understand their consequences to prioritize their resolution.

We present Portend⁺, a tool that not only detects races but also automatically classifies them based on their potential consequences: Could they lead to crashes or hangs? Could their effects be visible outside the program? Do they appear to be harmless? How do their effects change under weak memory models? Our proposed technique achieves high accuracy by efficiently analyzing multiple paths and multiple thread schedules in combination, and by performing symbolic comparison between program outputs.

We ran Portend⁺ on seven real-world applications: it detected 93 true data races and correctly classified 92 of them, with no human effort. Six of them were harmful races. Portend⁺'s classification accuracy is up to 89% higher than that of existing tools, and it produces easy-to-understand evidence of the consequences of “harmful” races, thus both proving their harmfulness and making debugging easier. We envision Portend⁺ being used for testing and debugging, as well as for automatically triaging bug reports.

References

Sarita V. Adve and Mark D. Hill. 1990. Weak ordering-a new definition. Computer Architecture News 18, 2, 2--14. Google ScholarDigital Library
Associated Press. 2004. GE Acknowledges Blackout Bug. Retrieved April 2, 2015, from http://www.securityfocus.com/news/8032.Google Scholar
Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi. 2010. On the verification problem for weak memory models. In Proceeedings of the Symposium on Principles of Programming Languages. Google ScholarDigital Library
Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. 2010. Efficient system-enforced deterministic parallelism. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Domagoj Babic and Alan J. Hu. 2008. Calysto: Scalable and precise extended static checking. In Proceedings of the 30th International Conference on Software Engineering. Google ScholarDigital Library
Tom Bergan, Joseph Devietti and Luis Ceze. 2011. The deterministic execution hammer: How well does it actually pound nails&quest; In Proceedings of the Workshop on Determinism and Correctness in Parallel Programming.Google Scholar
Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Robert L. Bocchino Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’09). Google ScholarDigital Library
Hans-J. Boehm. 2007. Reordering constraints for pthread-style locks. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’07). Google ScholarDigital Library
Hans-J. Boehm. 2011. How to miscompile programs with “benign” data races. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism. Google ScholarDigital Library
Hans-J. Boehm. 2012. Position paper: Nondeterminism is unavoidable, but data races are pure evil. In Proceedings of the ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES’12). Google ScholarDigital Library
Hans-J. Boehm and Sarita V. Adve. 2012. You don’t know jack about shared variables or memory models. Communications of the ACM 55, 2, 48--54. Google ScholarDigital Library
Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. 2010. PACER: Proportional detection of data races. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Stefan Bucur, Vlad Ureche, Cristian Zamfir, and George Candea. 2011. Parallel symbolic execution for automated real-world software testing. In Proceedings of the ACM EuroSys European Conference on Computer Systems. Google ScholarDigital Library
Sebastian Burckhardt, Rajeev Alur, and Milo M. K. Martin. 2006. Bounded model checking of concurrent data types on relaxed memory models: A case study. In Proceedings of the International Conference on Computer Aided Verification. Google ScholarDigital Library
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
George Candea, Stefan Bucur, Vitaly Chipounov, Vova Kuznetsov, and Cristian Zamfir. 2010. Automated software reliability services: Using reliability tools should be as easy as Webmail. In Proceedings of the Symposium on Operating Systems Design and Implementation.Google Scholar
Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of the International Symposium on Computer Architecture. Google ScholarDigital Library
Vitaly Chipounov and George Candea. 2011. Enabling sophisticated analyses of x86 binaries with RevGen. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems and Networks. Google ScholarDigital Library
Heming Cui, Jingyue Wu, Chia Che Tsai, and Junfeng Yang. 2010. Stable deterministic multithreading through schedule memoization. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Michel Dubois, Christoph Scheurich, and Faye Briggs. 1986. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, static detection of race conditions and deadlocks. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, and Kirk Olynyk. 2010. Effective data-race detection for the kernel. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI’10). Google ScholarDigital Library
Brad Fitzpatrick. 2013. Memcached Home Page. Retrieved April 2, 2015, from http://memcached.org.Google Scholar
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Cormac Flanagan and Stephen N. Freund. 2010. Adversarial memory for detecting destructive races. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Pedro Fonseca, Cheng Li, and Rodrigo Rodrigues. 2011. Finding complex concurrency bugs in large multi-threaded applications. In Proceedings of the ACM EuroSys European Conference on Computer Systems. Google ScholarDigital Library
Vijay Ganesh and David L. Dill. 2007. A decision procedure for bit-vectors and arrays. In Proceedings of the International Conference on Computer Aided Verification. Google ScholarDigital Library
Jeff Gilchrist. 2013. Parallel BZIP2 (PBZIP2). Retrieved April 2, 2015, from http://compression.ca/pbzip2.Google Scholar
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt. 2009. Debugging in the (very) large: Ten years of implementation and experience. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Patrice Godefroid, Michael Y. Levin, and David Molnar. 2008. Automated whitebox fuzz testing. In Proceedings of the Network and Distributed System Security Symposium.Google Scholar
Patrice Godefroid and Nachiappan Nagappan. 2008. Concurrency at Microsoft—an exploratory survey. In Proceedings of the International Conference on Computer Aided Verification.Google Scholar
Steven Hand. 2012. An experiment in determinism. Communications of the ACM 55, 5, 110. Google ScholarDigital Library
Helgrind. 2012. Helgrind Home Page. Retrieved April 2, 2015, from http://valgrind.org/docs/manual/hg-manual.html.Google Scholar
Intel Corp. 2012. Parallel Inspector. Retrieved April 2, 2015, from https://software.intel.com/en-us/intel-inspector-xe.Google Scholar
ISO14882. 2011. ISO/IEC 14882:2011: Information Technology—Programming languages—C++. International Organization for Standardization, London, UK.Google Scholar
ISO9899. 2011. ISO/IEC 9899:2011: Information Technology—Programming Languages—C. International Organization for Standardization, London, UK.Google Scholar
Ali Jannesari and Walter F. Tichy. 2010. Identifying ad-hoc synchronization for enhanced race detection. In Proceedings of the International Parallel and Distributed Processing Symposium.Google Scholar
Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu. 2012. Automated concurrency-bug fixing. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Vineet Kahlon, Franjo Ivančić, and Aarti Gupta. 2005. Reasoning about threads communicating via locks. In Proceedings of the International Conference on Computer Aided Verification. Google ScholarDigital Library
Baris Kasikci, Cristian Zamfir, and George Candea. 2012. Data races vs. data race bugs: Telling the difference with Portend. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21, 7, 558--565. Google ScholarDigital Library
Chris Lattner. 2012. “libc++” C++ Standard Library. Retrieved April 2, 2015, from http://libcxx.llvm.org/.Google Scholar
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization. Google ScholarDigital Library
Henry Ledgard. 1983. Reference Manual for the ADA Programming Language. Springer-Verlag, New York, NY. Google ScholarDigital Library
Nancy G. Leveson and Clark S. Turner. 1993. An investigation of the Therac-25 accidents. IEEE Computer 26, 7, 18--41. Google ScholarDigital Library
Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: Efficient deterministic multithreading. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
Shan Lu, Joseph Tucek, Feng Qin, and Yuanyuan Zhou. 2006. AVIO: Detecting atomicity violations via access interleaving invariants. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Jeremy Manson, William Pugh, and Sarita V. Adve. 2005. The Java memory model. In Proceedings of the Symposium on Principles of Programming Languages. Google ScholarDigital Library
Daniel Marino, Madanlal Musuvathi, and Satish Narayanasamy. 2009. LiteRace: Effective sampling for lightweight data-race detection. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Cal McPherson. 2012. Ctrace Home Page. Retrieved April 2, 2015, from http://ctrace.sourceforge.net.Google Scholar
John Mellor-Crummey. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’91). Google ScholarDigital Library
Memcached. 2009. Issue 127: INCR/DECR Operations Are Not Thread Safe. Retrieved April 2, 2015, from http://code.google.com/p/memcached/issues/detail&quest;id&lowbar;127.Google Scholar
Sang L. Min and Jong-Deok Choi. 1991. An efficient cache-based access anomaly detection scheme. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Madanlal Musuvathi, Sebastian Burckhardt, Pravesh Kothari, and Santosh Nagarakatte. 2010. A randomized scheduler with probabilistic guarantees of finding bugs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gérard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and reproducing heisenbugs in concurrent programs. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Edwards, and Brad Calder. 2007. Automatically classifying benign and harmful data races using replay analysis. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Adrian Nistor, Darko Marinov, and Josep Torrellas. 2009. Light64: Lightweight hardware support for data race detection during systematic testing of parallel programs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO-42). Google ScholarDigital Library
Robert O’Callahan and Jong-Deok Choi. 2003. Hybrid dynamic data race detection. In Proceedings of the Symposium on Principles and Practice of Parallel Computing. Google ScholarDigital Library
Milos Prvulovic and Josep Torrellas. 2003. ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). Google ScholarDigital Library
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems 15, 4, 391--411. Google ScholarDigital Library
Edith Schonberg. 2004. On-the-fly detection of access anomalies (with retrospective). ACM SIGPLAN Notices 39, 4, 313--327. Google ScholarDigital Library
Koushik Sen. 2008. Race directed random testing of concurrent programs. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. In Proceedings of the Symposium on the Foundations of Software Engineering. Google ScholarDigital Library
Konstantin Serebryany and Timur Iskhodzhanov. 2009. ThreadSanitizer—data race detection in practice. In Proceedings of the Workshop on Binary Instrumentation and Applications. Google ScholarDigital Library
Richard L. Sites (Ed.). 1992. Alpha Architecture Reference Manual. Digital Press. Google ScholarDigital Library
Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound predictive race detection in polynomial time. ACM SIGPLAN Notices 47, 1, 387--400. Google ScholarDigital Library
SQLite. 2013. SQLite Home Page. Retrieved April 2, 2015, from http://www.sqlite.org/.Google Scholar
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02). Google ScholarDigital Library
Chen Tian, Vijay Nagarajan, Rajiv Gupta, and Sriraman Tallam. 2008. Dynamic recognition of synchronization operations for improved data race detection. In Proceedings of the International Symposium on Software Testing and Analysis. Google ScholarDigital Library
Kaushik Veeraraghavan, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. Detecting and surviving data races using complementary schedules. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
Jan Wen Voung, Ranjit Jhala, and Sorin Lerner. 2007. RELAY: Static race detection on millions of lines of code. In Proceedings of the Symposium on the Foundations of Software Engineering. Google ScholarDigital Library
David L. Weaver and Tom Germond (Eds.). 1994. The SPARC Architecture Manual, Version 9. Prentice Hall. Google ScholarDigital Library
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. Google ScholarDigital Library
Jingyue Wu, Heming Cui, and Junfeng Yang. 2010. Bypassing races in live applications with execution filters. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. 2010. Ad-hoc synchronization considered harmful. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Yu Yang, Xiaofang Chen, Ganesh Gopalakrishnan, and Robert M. Kirby. 2007. Distributed dynamic partial order reduction based verification of threaded software. In Proceedings of the International SPIN Workshop. Google ScholarDigital Library
Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: Efficient detection of data race conditions via adaptive tracking. In Proceedings of the Symposium on Operating Systems Principles. Google ScholarDigital Library
Cristian Zamfir and George Candea. 2010. Execution synthesis: A technique for automated debugging. In Proceedings of the ACM EuroSys European Conference on Computer Systems. Google ScholarDigital Library
Jiaqi Zhang, Weiwei Xiong, Yang Liu, Soyeon Park, Yuanyuan Zhou, and Zhiqiang Ma. 2011. ATDetector: Improving the accuracy of a commercial data race detector by identifying address transfer. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library

Index Terms

Automated Classification of Data Races Under Both Strong and Weak Memory Models
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

Data races vs. data race bugs: telling the difference with portend
ASPLOS '12

Even though most data races are harmless, the harmful ones are at the heart of some of the worst concurrency bugs. Alas, spotting just the harmful data races in programs is like finding a needle in a haystack: 76%-90% of the true data races reported by ...
Read More
Data races vs. data race bugs: telling the difference with portend
ASPLOS '12

Even though most data races are harmless, the harmful ones are at the heart of some of the worst concurrency bugs. Alas, spotting just the harmful data races in programs is like finding a needle in a haystack: 76%-90% of the true data races reported by ...
Read More
Data races vs. data race bugs: telling the difference with portend
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Even though most data races are harmless, the harmful ones are at the heart of some of the worst concurrency bugs. Alas, spotting just the harmful data races in programs is like finding a needle in a haystack: 76%-90% of the true data races reported by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Programming Languages and Systems Volume 37, Issue 3
June 2015
134 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/2785583
Editor:
Jens Palsberg
University of California, Los Angeles, USA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 2015
- Accepted: 1 February 2015
- Revised: 1 December 2014
- Received: 1 February 2013
Published in toplas Volume 37, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data races
concurrency
symbolic execution
triage
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 511
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automated Classification of Data Races Under Both Strong and Weak Memory Models

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Data races vs. data race bugs: telling the difference with portend

Data races vs. data race bugs: telling the difference with portend

Data races vs. data race bugs: telling the difference with portend

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automated Classification of Data Races Under Both Strong and Weak Memory Models

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Data races vs. data race bugs: telling the difference with portend

Data races vs. data race bugs: telling the difference with portend

Data races vs. data race bugs: telling the difference with portend

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media