skip to main content
10.1145/3510003.3510112acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

ExAIS: executable AI semantics

Published:05 July 2022Publication History

ABSTRACT

Neural networks can be regarded as a new programming paradigm, i.e., instead of building ever-more complex programs through (often informal) logical reasoning in the programmers' mind, complex 'AI' systems are built by optimising generic neural network models with big data. In this new paradigm, AI frameworks such as TensorFlow and PyTorch play a key role, which is as essential as the compiler for traditional programs. It is known that the lack of a proper semantics for programming languages (such as C), i.e., a correctness specification for compilers, has contributed to many problematic program behaviours and security issues. While it is in general hard to have a correctness specification for compilers due to the high complexity of programming languages and their rapid evolution, we have a unique opportunity to do it right this time for neural networks (which have a limited set of functions, and most of them have stable semantics). In this work, we report our effort on providing a correctness specification of neural network frameworks such as TensorFlow. We specify the semantics of almost all TensorFlow layers in the logical programming language Prolog. We demonstrate the usefulness of the semantics through two applications. One is a fuzzing engine for TensorFlow, which features a strong oracle and a systematic way of generating valid neural networks. The other is a model validation approach which enables consistent bug reporting for TensorFlow models.

References

  1. 2020. Wrong default value in GRU layer documentaion. https://github.com/tensorflow/tensorflow/issues/45705Google ScholarGoogle Scholar
  2. 2021. 1D Convolution Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DGoogle ScholarGoogle Scholar
  3. 2021. AveragePooling3D does not support float64 and produces confusing error message. https://github.com/tensorflow/tensorflow/issues/48644Google ScholarGoogle Scholar
  4. 2021. ConvLSTM2D layer wrong computation. https://github.com/keras-team/keras/issues/15224Google ScholarGoogle Scholar
  5. 2021. cropping layer additional error message. https://github.com/tensorflow/tensorflow/issues/50612Google ScholarGoogle Scholar
  6. 2021. Dense Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseGoogle ScholarGoogle Scholar
  7. 2021. Dot layer incomplete description. https://github.com/tensorflow/tensorflow/issues/45706Google ScholarGoogle Scholar
  8. 2021. Dropout Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DropoutGoogle ScholarGoogle Scholar
  9. 2021. error reporting model(x) vs model.predict(x). https://github.com/tensorflow/tensorflow/issues/50618Google ScholarGoogle Scholar
  10. 2021. ExAIS: Executable AI Semantics Repository. https://github.com/rschumi0/ExAISGoogle ScholarGoogle Scholar
  11. 2021. InputSpec argument ignored. https://github.com/keras-team/keras/issues/15225Google ScholarGoogle Scholar
  12. 2021. InputSpec missing float64 support and wrong error message. https://github.com/keras-team/keras/issues/15226Google ScholarGoogle Scholar
  13. 2021. layer order in functional API graph models. https://github.com/tensorflow/tensorflow/issues/50306Google ScholarGoogle Scholar
  14. 2021. Masking Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaskingGoogle ScholarGoogle Scholar
  15. 2021. ReLU layer wrong result with negative threshold. https://github.com/tensorflow/tensorflow/issues/48646Google ScholarGoogle Scholar
  16. 2021. SeparableConv documention missing argument constraint. https://github.com/tensorflow/tensorflow/issues/45259Google ScholarGoogle Scholar
  17. 2021. Softmax layer unexpected and confusing error message. https://github.com/tensorflow/tensorflow/issues/50467Google ScholarGoogle Scholar
  18. 2021. Softmax layer unexpected behaviour for axis=0. https://github.com/tensorflow/tensorflow/issues/48647Google ScholarGoogle Scholar
  19. 2021. TensorFlow layer documentation. https://www.tensorflow.org/api_docs/python/tf/keras/layersGoogle ScholarGoogle Scholar
  20. 2021. Wrong error message for DepthwiseConv2D. https://github.com/tensorflow/tensorflow/issues/45703Google ScholarGoogle Scholar
  21. Max Bramer. 2013. Logic Programming with Prolog. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  22. Junhua Ding, Xiaojun Kang, and Xin-Hua Hu. 2017. Validating a Deep Learning Framework by Metamorphic Testing. In 2nd IEEE/ACM International Workshop on Metamorphic Testing, MET@ICSE 2017, Buenos Aires, Argentina, May 22, 2017. IEEE Computer Society, 28--34. Google ScholarGoogle ScholarCross RefCross Ref
  23. Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 118--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Thomas Michael Fehlmann. 2020. Autonomous Real-Time Testing: Testing Artificial Intelligence and Other Complex Systems. Logos Verlag Berlin GmbH.Google ScholarGoogle Scholar
  25. Mark Harman, Phil McMinn, Jerffeson Teixeira de Souza, and Shin Yoo. 2010. Search Based Software Engineering: Techniques, Taxonomy, Tutorial. In Empirical Software Engineering and Verification - International Summer Schools, LASER 2008--2010, Elba Island, Italy, Revised Tutorial Lectures (Lecture Notes in Computer Science), Bertrand Meyer and Martin Nordio (Eds.), Vol. 7007. Springer, 1--59. Google ScholarGoogle ScholarCross RefCross Ref
  26. Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Trans. Vis. Comput. Graph 25, 8 (2019), 2674--2693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1110--1121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Daniel Jackson and Craig Damon. 1996. Elements of Style: Analyzing a Software Design Feature with a Counterexample Detector. In Proceedings of the 1996 International Symposium on Software Testing and Analysis, ISSTA 1996, San Diego, CA, USA, January 8--10, 1996. ACM, 239--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yitong Li. 2020. Documentation-Guided Fuzzing for Testing Deep Learning API Functions. Master's thesis. University of Waterloo.Google ScholarGoogle Scholar
  30. Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial Testing for Deep Learning Systems. CoRR abs/1806.07723 (2018). http://arxiv.org/abs/1806.07723Google ScholarGoogle Scholar
  31. Mahdi Nejadgholi and Jinqiu Yang. 2019. A Study of Oracle Approximations in Testing Deep Learning Libraries. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11--15, 2019. IEEE, 785--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ulf Nilsson and Jan Małuszyński. 1990. Logic, programming and Prolog. Wiley Chichester.Google ScholarGoogle Scholar
  33. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 1027--1038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria E. Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2019. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 51, 5 (2019), 92:1--92:36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Grigore Rosu and Traian-Florin Serbanuta. 2010. An overview of the K semantic framework. J. Log. Algebraic Methods Program. 79, 6 (2010), 397--434. Google ScholarGoogle ScholarCross RefCross Ref
  36. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Eldon Schoop, Forrest Huang, and Bjoern Hartmann. 2021. UMLAUT: Debugging Deep Learning Programs using Program Structure and Model Behavior. In CHI '21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8--13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 310:1--310:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing Bug-Free Machine Learning Systems With Formal Mathematics. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3047--3056. http://proceedings.mlr.press/v70/selsam17a.htmlGoogle ScholarGoogle Scholar
  39. Traian-Florin Serbanuta, Grigore Rosu, and José Meseguer. 2009. A rewriting logic approach to operational semantics. Inf. Comput. 207, 2 (2009), 305--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Arnab Sharma and Heike Wehrheim. 2019. Testing Machine Learning Algorithms for Balanced Data Usage. In 12th IEEE Conference on Software Testing, Validation and Verification, ICST 2019, Xi'an, China, April 22--27, 2019. IEEE, 125--135. Google ScholarGoogle ScholarCross RefCross Ref
  41. Siwakorn Srisakaokul, Zhengkai Wu, Angello Astorga, Oreoluwa Alebiosu, and Tao Xie. 2018. Multiple-Implementation Testing of Supervised Learning Software. In The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018 (AAAI Workshops), Vol. WS-18. AAAI Press, 384--391. https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17301Google ScholarGoogle Scholar
  42. Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Trans. Vis. Comput. Graph. 25, 1 (2019), 353--363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 788--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. CoRR abs/1906.10742 (2019). http://arxiv.org/abs/1906.10742Google ScholarGoogle Scholar
  45. Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ExAIS: executable AI semantics
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICSE '22: Proceedings of the 44th International Conference on Software Engineering
              May 2022
              2508 pages
              ISBN:9781450392211
              DOI:10.1145/3510003

              Copyright © 2022 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 July 2022

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate276of1,856submissions,15%

              Upcoming Conference

              ICSE 2025

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader