skip to main content
10.1145/3180155.3180199acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open Access

Inferring and asserting distributed system invariants

Published:27 May 2018Publication History

ABSTRACT

Distributed systems are difficult to debug and understand. A key reason for this is distributed state, which is not easily accessible and must be pieced together from the states of the individual nodes in the system.

We propose Dinv, an automatic approach to help developers of distributed systems uncover the runtime distributed state properties of their systems. Dinv uses static and dynamic program analyses to infer relations between variables at different nodes. For example, in a leader election algorithm, Dinv can relate the variable leader at different nodes to derive the invariant ∀ nodes i, j, leaderi = leaderj. This can increase the developer's confidence in the correctness of their system. The developer can also use Dinv to convert an inferred invariant into a distributed runtime assertion on distributed state.

We applied Dinv to several popular distributed systems, such as etcd Raft, Hashicorp Serf, and Taipei-Torrent, which have between 1.7K and 144K LOC and are widely used. Dinv derived useful invariants for these systems, including invariants that capture the correctness of distributed routing strategies, leadership, and key hash distribution. We also used Dinv to assert correctness of the inferred etcd Raft invariants at runtime, using these asserts to detect injected silent bugs.

References

  1. M. Ahuja, A. D. Kshemkalyani, and T. Carlson. A basic unit of computation in distributed systems. In International Conference on Distributed Computing Systems (ICDCS), 1990.Google ScholarGoogle ScholarCross RefCross Ref
  2. AlDanial. cloc: Count Lines of Code. https://github.com/AlDanial/cloc, 2016.Google ScholarGoogle Scholar
  3. O. Babaoglu and M. Raynal. Specification and Verification of Dynamic Properties in Distributed Computations. Journal of Parallel and Distributed Computing, 28(2):173 -- 185, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bernstein and E. Newcomer. Principles of Transaction Processing: For the Systems Professional. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy. Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight. In International Conference on Software Engineering (ICSE), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Beschastnikh, Y. Brun, M. D. Ernst, A. Krishnamurthy, and T. E. Anderson. Mining Temporal Invariants from Partially Ordered Logs. SIGOPS Open Syst. Rev., 45(3):39--46, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Beschastnikh, P. Wang, Y. Brun, and M. D. Ernst. Debugging distributed systems: Challenges and options for validation and debugging. Communications of the ACM, 59(8):32--37, Aug. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Cai and D. Thain. DisrIA: A Cost-effective Dynamic Impact Analysis for Distributed Programs. In International Conference on Automated Software Engineering (ASE), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM TOCS, 3(1):63--75, Feb. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. E. Cook and A. L. Wolf. Discovering Models of Software Processes from Event-based Data. ACM TOSEM, 7(3):215--249, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. F. Cooper, A. Silberstein, E. Tarn, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Symposium on Cloud Computing (SoCC), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Cooper and K. Marzullo. Consistent Detection of Global Predicates. In ACM/ONR Workshop on Parallel and Distributed Debugging (PADD), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. CoreOS. A Distributed init System. https://github.com/coreos/fleet, 2013.Google ScholarGoogle Scholar
  14. CoreOS. Distributed reliable key-value store for the most critical data of a distributed system. https://github.com/coreos/etcd, 2013.Google ScholarGoogle Scholar
  15. CoreOS. Reboot manager for the CoreOS update engine. https://github.com/coreos/locksmith, 2014.Google ScholarGoogle Scholar
  16. A. Das, I. Gupta, and A. Motivala. Swim: Scalable weakly-consistent infection-style process group membership protocol. In International Conference on Dependable Systems and Networks (DSN), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dinv homepage. https://bitbucket.org/bestchai/dinv/.Google ScholarGoogle Scholar
  18. Ú. Erlingsson, M. Peinado, S. Peter, and M. Budiu. Fay: Extensible Distributed Tracing from Kernels to Clusters. In Symposium on Operating Systems Principles (SOSP), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. D. Ernst, J. H. Perkins, P.J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69(1--3):35--45, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Fitzpatrick. Groupcache. https://github.com/golang/groupcache, 2014.Google ScholarGoogle Scholar
  21. V. K. Garg. Maximal Antichain Lattice Algorithms for Distributed Computations. In Distributed Computing and Networking, pages 240--254. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. D. Geels, G. Altekar, P. Maniatis, T Roscoe, and I. Stoica. Friday: Global Comprehension for Distributed Replay. In Symposium on Networked Systems Design and Implementation (NSDI), Cambridge, MA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Groeneveld, A. Mesbah, and A. Van Deursen. Automatic invariant detection in dynamic web applications. Technical report, Delft University of Technology, Software Engineering Research Group, 2010.Google ScholarGoogle Scholar
  24. R. Gusella and S. Zatti. The Accuracy of the Clock Synchronization Achieved by TEMPO in Berkeley UNIX 4.3BSD. IEEE TSE, 15(7):847--853, July 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hashicorp. Service orchestration and management tool. https://www.serf.io/docs/internals/gossip.html, 2014.Google ScholarGoogle Scholar
  26. C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill. IronFleet: Proving Practical Distributed Systems Correct. In Symposium on Operating Systems Principles (SOSP), pages 1--17, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jackpal. A(nother) Bittorrent client written in the go programming language. https://github.com/jackpal/Taipei-Torrent, 2010.Google ScholarGoogle Scholar
  28. R. A. Jeff Overbey. Go Doctor - The Golang Refactoring Engine. http://gorefactor.org/index.html, 2014.Google ScholarGoogle Scholar
  29. Y Junqueira. Kademlia/Mainline DHT node in Go. https://github.com/nictuku/dht, 2012.Google ScholarGoogle Scholar
  30. A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: Verifying Network-Wide Invariants in Real Time. In Symposium on Networked Systems Design and Implementation (NSDI), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: finding liveness bugs in systems code. In Symposium on Networked Systems Design and Implementation (NSDI), Cambridge, MA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kubernetes. Production-Grade Container Scheduling and Management. http://kubernetes.io/, 2014.Google ScholarGoogle Scholar
  33. S. Kumar, S.-C. Khoo, A. Roychoudhury, and D. Lo. Inferring Class Level Specifications for Distributed Systems. In International Conference on Software Engineering (ICSE), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Kusano, A. Chattopadhyay, and C. Wang. Dynamic Generation of Likely Invariants for Multithreaded Programs. In International Conference on Software Engineering (ICSE), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3S: Debugging Deployed Distributed Systems. In Symposium on Networked Systems Design and Implementation (NSDI), San Francisco, CA, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Liu, W. Lin, A. Pan, and Z. Zhang. WiDS Checker: Combating Bugs in Distributed Systems. In Symposium on Networked Systems Design & Implementation (NSDI), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. G. Lou, Q. Fu, Y. Wang, and J. Li. Mining dependency in distributed systems through unstructured logs analysis. SIGOPS Oper. Syst. Rev., 44(l):91--96, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Mace, R. Roelke, and R. Fonseca. Pivot tracing: Dynamic causal monitoring for distributed systems. In Symposium on Operating Systems Principles (SOSP), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Mattern. Virtual Time and Global States of Distributed Systems. In Parallel and Distributed Algorithms, pages 215--226, 1989.Google ScholarGoogle Scholar
  40. P. Maymounkov and D. Mazières. Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In International Workshop on Peer-to-Peer Systems (IPTPS), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Ne Win, M. D. Ernst, S. J. Garland, D. Kırlı, and N. Lynch. Using simulated execution in verifying distributed algorithms. Software Tools for Technology Transfer, 6(1):67--76, July 2004.Google ScholarGoogle ScholarCross RefCross Ref
  42. D. Ongaro and J. Ousterhout. In Search of an Understandable Consensus Algorithm. In USENIXATC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. J. Ottenstein and L. M. Ottenstein. The Program Dependence Graph in a Software Development Environment. SIGPLAN Not., 19(5):177--184, Apr. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. K. Ousterhout. The Role of Distributed State. In In CMU Computer Science: a 25th Anniversary Commemorative, pages 199--217. ACM Press, 1991.Google ScholarGoogle Scholar
  45. P. Reynolds, C. Killian, J. L. Wiener, J. C. Mogul, M. A. Shah, and A. Vahdat. Pip: Detecting the Unexpected in Distributed Systems. In Symposium on Networked Systems Design and Implementation (NSDI), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Romer and J. Ma. PDA: Passive distributed assertions for sensor networks. In International Conference on Information Processing in Sensor Networks (IPSN), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. RunLim. RunLim. http://fmv.jku.at/runlim/, 2016.Google ScholarGoogle Scholar
  48. R. R. Sambasivan, A. X. Zheng, M. D. Rosa, E. Krevat, S. Whitman, M. Stroucken, W. Wang, L. Xu, and G. R. Ganger. Diagnosing Performance Changes by Comparing Request Flows. In Symposium on Networked Systems Design and Implementation (NSDI), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. F. B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv, 22(4):299--319, Dec. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. I. Sergey, J. R. Wilcox, and Z. Tatlock. Programming and Proving with Distributed Protocols. In Symposium on Principles of Programming Languages (POPL), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical report, Google, Inc., 2010.Google ScholarGoogle Scholar
  52. N. Walkinshaw, M. Roper, M. Wood, and N. W. M. Roper. The Java System Dependence Graph. In International Workshop on Source Code Analysis and Manipulation (SCAM), 2003.Google ScholarGoogle Scholar
  53. R. J. Walls, Y. Brun, M. Liberatore, and B. N. Levine. Discovering specification violations in networked software systems. In International Symposiumon Software Reliability Engineering (ISSRE), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T Anderson. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In Conference on Programming Language Design and Implementation (PLDI), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting Large-Scale System Problems by Mining Console Logs. In Symposium on Operating Systems Principles (SOSP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Yabandeh, A. Anand, M. Canini, and D. Kostic. Finding Almost-Invariants in Distributed Systems. In International Symposium on Reliable Distributed Systems (SRDS), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. MODIST: Transparent Model Checking of Unmodified Distributed Systems. In Symposium on Networked Systems Design and Implementation (NSDI), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. P. Zave. Using Lightweight Modeling to Understand Chord. SIGCOMM Comput. Commun. Rev., 42(2):49--57, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. X. Zhao, Y. Zhang, D. Lion, M. F. Ullah, Y. Luo, D. Yuan, and M. Stumm. Lprof: A Non-intrusive Request Flow Profiler for Distributed Systems. In Symposium on Operating System Design and Implementation (OSDI), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inferring and asserting distributed system invariants
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICSE '18: Proceedings of the 40th International Conference on Software Engineering
              May 2018
              1307 pages
              ISBN:9781450356381
              DOI:10.1145/3180155
              • Conference Chair:
              • Michel Chaudron,
              • General Chair:
              • Ivica Crnkovic,
              • Program Chairs:
              • Marsha Chechik,
              • Mark Harman

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 May 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate276of1,856submissions,15%

              Upcoming Conference

              ICSE 2025

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader