ABSTRACT
Neural networks can be regarded as a new programming paradigm, i.e., instead of building ever-more complex programs through (often informal) logical reasoning in the programmers' mind, complex 'AI' systems are built by optimising generic neural network models with big data. In this new paradigm, AI frameworks such as TensorFlow and PyTorch play a key role, which is as essential as the compiler for traditional programs. It is known that the lack of a proper semantics for programming languages (such as C), i.e., a correctness specification for compilers, has contributed to many problematic program behaviours and security issues. While it is in general hard to have a correctness specification for compilers due to the high complexity of programming languages and their rapid evolution, we have a unique opportunity to do it right this time for neural networks (which have a limited set of functions, and most of them have stable semantics). In this work, we report our effort on providing a correctness specification of neural network frameworks such as TensorFlow. We specify the semantics of almost all TensorFlow layers in the logical programming language Prolog. We demonstrate the usefulness of the semantics through two applications. One is a fuzzing engine for TensorFlow, which features a strong oracle and a systematic way of generating valid neural networks. The other is a model validation approach which enables consistent bug reporting for TensorFlow models.
- 2020. Wrong default value in GRU layer documentaion. https://github.com/tensorflow/tensorflow/issues/45705Google Scholar
- 2021. 1D Convolution Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DGoogle Scholar
- 2021. AveragePooling3D does not support float64 and produces confusing error message. https://github.com/tensorflow/tensorflow/issues/48644Google Scholar
- 2021. ConvLSTM2D layer wrong computation. https://github.com/keras-team/keras/issues/15224Google Scholar
- 2021. cropping layer additional error message. https://github.com/tensorflow/tensorflow/issues/50612Google Scholar
- 2021. Dense Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseGoogle Scholar
- 2021. Dot layer incomplete description. https://github.com/tensorflow/tensorflow/issues/45706Google Scholar
- 2021. Dropout Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DropoutGoogle Scholar
- 2021. error reporting model(x) vs model.predict(x). https://github.com/tensorflow/tensorflow/issues/50618Google Scholar
- 2021. ExAIS: Executable AI Semantics Repository. https://github.com/rschumi0/ExAISGoogle Scholar
- 2021. InputSpec argument ignored. https://github.com/keras-team/keras/issues/15225Google Scholar
- 2021. InputSpec missing float64 support and wrong error message. https://github.com/keras-team/keras/issues/15226Google Scholar
- 2021. layer order in functional API graph models. https://github.com/tensorflow/tensorflow/issues/50306Google Scholar
- 2021. Masking Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaskingGoogle Scholar
- 2021. ReLU layer wrong result with negative threshold. https://github.com/tensorflow/tensorflow/issues/48646Google Scholar
- 2021. SeparableConv documention missing argument constraint. https://github.com/tensorflow/tensorflow/issues/45259Google Scholar
- 2021. Softmax layer unexpected and confusing error message. https://github.com/tensorflow/tensorflow/issues/50467Google Scholar
- 2021. Softmax layer unexpected behaviour for axis=0. https://github.com/tensorflow/tensorflow/issues/48647Google Scholar
- 2021. TensorFlow layer documentation. https://www.tensorflow.org/api_docs/python/tf/keras/layersGoogle Scholar
- 2021. Wrong error message for DepthwiseConv2D. https://github.com/tensorflow/tensorflow/issues/45703Google Scholar
- Max Bramer. 2013. Logic Programming with Prolog. Springer. Google ScholarCross Ref
- Junhua Ding, Xiaojun Kang, and Xin-Hua Hu. 2017. Validating a Deep Learning Framework by Metamorphic Testing. In 2nd IEEE/ACM International Workshop on Metamorphic Testing, MET@ICSE 2017, Buenos Aires, Argentina, May 22, 2017. IEEE Computer Society, 28--34. Google ScholarCross Ref
- Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 118--128. Google ScholarDigital Library
- Thomas Michael Fehlmann. 2020. Autonomous Real-Time Testing: Testing Artificial Intelligence and Other Complex Systems. Logos Verlag Berlin GmbH.Google Scholar
- Mark Harman, Phil McMinn, Jerffeson Teixeira de Souza, and Shin Yoo. 2010. Search Based Software Engineering: Techniques, Taxonomy, Tutorial. In Empirical Software Engineering and Verification - International Summer Schools, LASER 2008--2010, Elba Island, Italy, Revised Tutorial Lectures (Lecture Notes in Computer Science), Bertrand Meyer and Martin Nordio (Eds.), Vol. 7007. Springer, 1--59. Google ScholarCross Ref
- Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Trans. Vis. Comput. Graph 25, 8 (2019), 2674--2693. Google ScholarDigital Library
- Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1110--1121. Google ScholarDigital Library
- Daniel Jackson and Craig Damon. 1996. Elements of Style: Analyzing a Software Design Feature with a Counterexample Detector. In Proceedings of the 1996 International Symposium on Software Testing and Analysis, ISSTA 1996, San Diego, CA, USA, January 8--10, 1996. ACM, 239--249. Google ScholarDigital Library
- Yitong Li. 2020. Documentation-Guided Fuzzing for Testing Deep Learning API Functions. Master's thesis. University of Waterloo.Google Scholar
- Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial Testing for Deep Learning Systems. CoRR abs/1806.07723 (2018). http://arxiv.org/abs/1806.07723Google Scholar
- Mahdi Nejadgholi and Jinqiu Yang. 2019. A Study of Oracle Approximations in Testing Deep Learning Libraries. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11--15, 2019. IEEE, 785--796. Google ScholarDigital Library
- Ulf Nilsson and Jan Małuszyński. 1990. Logic, programming and Prolog. Wiley Chichester.Google Scholar
- Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 1027--1038. Google ScholarDigital Library
- Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria E. Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2019. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 51, 5 (2019), 92:1--92:36. Google ScholarDigital Library
- Grigore Rosu and Traian-Florin Serbanuta. 2010. An overview of the K semantic framework. J. Log. Algebraic Methods Program. 79, 6 (2010), 397--434. Google ScholarCross Ref
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarDigital Library
- Eldon Schoop, Forrest Huang, and Bjoern Hartmann. 2021. UMLAUT: Debugging Deep Learning Programs using Program Structure and Model Behavior. In CHI '21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8--13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 310:1--310:16. Google ScholarDigital Library
- Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing Bug-Free Machine Learning Systems With Formal Mathematics. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3047--3056. http://proceedings.mlr.press/v70/selsam17a.htmlGoogle Scholar
- Traian-Florin Serbanuta, Grigore Rosu, and José Meseguer. 2009. A rewriting logic approach to operational semantics. Inf. Comput. 207, 2 (2009), 305--340. Google ScholarDigital Library
- Arnab Sharma and Heike Wehrheim. 2019. Testing Machine Learning Algorithms for Balanced Data Usage. In 12th IEEE Conference on Software Testing, Validation and Verification, ICST 2019, Xi'an, China, April 22--27, 2019. IEEE, 125--135. Google ScholarCross Ref
- Siwakorn Srisakaokul, Zhengkai Wu, Angello Astorga, Oreoluwa Alebiosu, and Tao Xie. 2018. Multiple-Implementation Testing of Supervised Learning Software. In The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018 (AAAI Workshops), Vol. WS-18. AAAI Press, 384--391. https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17301Google Scholar
- Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Trans. Vis. Comput. Graph. 25, 1 (2019), 353--363. Google ScholarDigital Library
- Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 788--799. Google ScholarDigital Library
- Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. CoRR abs/1906.10742 (2019). http://arxiv.org/abs/1906.10742Google Scholar
- Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 129--140. Google ScholarDigital Library
Index Terms
- ExAIS: executable AI semantics
Recommendations
Semantics for an Actor-Based Real-Time Language
WPDRTS '96: Proceedings of the 4th International Workshop on Parallel and Distributed Real-Time SystemsWe give formal semantics for a distributed concurrent object-oriented real-time programming language based on a variant of the actor model which includes an extension enabling the specification of time constraints on message-invocation. Real-time ...
Design and Implementation of a Tool for Specifying Specification in SOFL
Revised Selected Papers of the Second International Workshop on Structured Object-Oriented Formal Language and Method - Volume 7787Structure Object-oriented Formal Language SOFL is not just a formal language for writing formal specification. It is also an approach and a methodology. SOFL provides a three-step approach for modelling a software system using formal specification. ...
Semantic-Based Neural Network Repair
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisRecently, neural networks have spread into numerous fields including many safety-critical systems. Neural networks are built (and trained) by programming in frameworks such as TensorFlow and PyTorch. Developers apply a rich set of pre-defined layers ...
Comments