research-article

ExAIS: executable AI semantics

Authors:
Richard Schumi

Singapore Management University, Singapore

Singapore Management University, Singapore
View Profile

,
Jun Sun

Singapore Management University, Singapore

Singapore Management University, Singapore
View Profile

ICSE '22: Proceedings of the 44th International Conference on Software EngineeringMay 2022Pages 859–870https://doi.org/10.1145/3510003.3510112

Published:05 July 2022Publication History

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 859–870

ABSTRACT

Neural networks can be regarded as a new programming paradigm, i.e., instead of building ever-more complex programs through (often informal) logical reasoning in the programmers' mind, complex 'AI' systems are built by optimising generic neural network models with big data. In this new paradigm, AI frameworks such as TensorFlow and PyTorch play a key role, which is as essential as the compiler for traditional programs. It is known that the lack of a proper semantics for programming languages (such as C), i.e., a correctness specification for compilers, has contributed to many problematic program behaviours and security issues. While it is in general hard to have a correctness specification for compilers due to the high complexity of programming languages and their rapid evolution, we have a unique opportunity to do it right this time for neural networks (which have a limited set of functions, and most of them have stable semantics). In this work, we report our effort on providing a correctness specification of neural network frameworks such as TensorFlow. We specify the semantics of almost all TensorFlow layers in the logical programming language Prolog. We demonstrate the usefulness of the semantics through two applications. One is a fuzzing engine for TensorFlow, which features a strong oracle and a systematic way of generating valid neural networks. The other is a model validation approach which enables consistent bug reporting for TensorFlow models.

References

2020. Wrong default value in GRU layer documentaion. https://github.com/tensorflow/tensorflow/issues/45705Google Scholar
2021. 1D Convolution Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DGoogle Scholar
2021. AveragePooling3D does not support float64 and produces confusing error message. https://github.com/tensorflow/tensorflow/issues/48644Google Scholar
2021. ConvLSTM2D layer wrong computation. https://github.com/keras-team/keras/issues/15224Google Scholar
2021. cropping layer additional error message. https://github.com/tensorflow/tensorflow/issues/50612Google Scholar
2021. Dense Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseGoogle Scholar
2021. Dot layer incomplete description. https://github.com/tensorflow/tensorflow/issues/45706Google Scholar
2021. Dropout Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DropoutGoogle Scholar
2021. error reporting model(x) vs model.predict(x). https://github.com/tensorflow/tensorflow/issues/50618Google Scholar
2021. ExAIS: Executable AI Semantics Repository. https://github.com/rschumi0/ExAISGoogle Scholar
2021. InputSpec argument ignored. https://github.com/keras-team/keras/issues/15225Google Scholar
2021. InputSpec missing float64 support and wrong error message. https://github.com/keras-team/keras/issues/15226Google Scholar
2021. layer order in functional API graph models. https://github.com/tensorflow/tensorflow/issues/50306Google Scholar
2021. Masking Layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaskingGoogle Scholar
2021. ReLU layer wrong result with negative threshold. https://github.com/tensorflow/tensorflow/issues/48646Google Scholar
2021. SeparableConv documention missing argument constraint. https://github.com/tensorflow/tensorflow/issues/45259Google Scholar
2021. Softmax layer unexpected and confusing error message. https://github.com/tensorflow/tensorflow/issues/50467Google Scholar
2021. Softmax layer unexpected behaviour for axis=0. https://github.com/tensorflow/tensorflow/issues/48647Google Scholar
2021. TensorFlow layer documentation. https://www.tensorflow.org/api_docs/python/tf/keras/layersGoogle Scholar
2021. Wrong error message for DepthwiseConv2D. https://github.com/tensorflow/tensorflow/issues/45703Google Scholar
Max Bramer. 2013. Logic Programming with Prolog. Springer. Google ScholarCross Ref
Junhua Ding, Xiaojun Kang, and Xin-Hua Hu. 2017. Validating a Deep Learning Framework by Metamorphic Testing. In 2nd IEEE/ACM International Workshop on Metamorphic Testing, MET@ICSE 2017, Buenos Aires, Argentina, May 22, 2017. IEEE Computer Society, 28--34. Google ScholarCross Ref
Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 118--128. Google ScholarDigital Library
Thomas Michael Fehlmann. 2020. Autonomous Real-Time Testing: Testing Artificial Intelligence and Other Complex Systems. Logos Verlag Berlin GmbH.Google Scholar
Mark Harman, Phil McMinn, Jerffeson Teixeira de Souza, and Shin Yoo. 2010. Search Based Software Engineering: Techniques, Taxonomy, Tutorial. In Empirical Software Engineering and Verification - International Summer Schools, LASER 2008--2010, Elba Island, Italy, Revised Tutorial Lectures (Lecture Notes in Computer Science), Bertrand Meyer and Martin Nordio (Eds.), Vol. 7007. Springer, 1--59. Google ScholarCross Ref
Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Trans. Vis. Comput. Graph 25, 8 (2019), 2674--2693. Google ScholarDigital Library
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1110--1121. Google ScholarDigital Library
Daniel Jackson and Craig Damon. 1996. Elements of Style: Analyzing a Software Design Feature with a Counterexample Detector. In Proceedings of the 1996 International Symposium on Software Testing and Analysis, ISSTA 1996, San Diego, CA, USA, January 8--10, 1996. ACM, 239--249. Google ScholarDigital Library
Yitong Li. 2020. Documentation-Guided Fuzzing for Testing Deep Learning API Functions. Master's thesis. University of Waterloo.Google Scholar
Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial Testing for Deep Learning Systems. CoRR abs/1806.07723 (2018). http://arxiv.org/abs/1806.07723Google Scholar
Mahdi Nejadgholi and Jinqiu Yang. 2019. A Study of Oracle Approximations in Testing Deep Learning Libraries. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11--15, 2019. IEEE, 785--796. Google ScholarDigital Library
Ulf Nilsson and Jan Małuszyński. 1990. Logic, programming and Prolog. Wiley Chichester.Google Scholar
Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 1027--1038. Google ScholarDigital Library
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria E. Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2019. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 51, 5 (2019), 92:1--92:36. Google ScholarDigital Library
Grigore Rosu and Traian-Florin Serbanuta. 2010. An overview of the K semantic framework. J. Log. Algebraic Methods Program. 79, 6 (2010), 397--434. Google ScholarCross Ref
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarDigital Library
Eldon Schoop, Forrest Huang, and Bjoern Hartmann. 2021. UMLAUT: Debugging Deep Learning Programs using Program Structure and Model Behavior. In CHI '21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8--13, 2021, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 310:1--310:16. Google ScholarDigital Library
Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing Bug-Free Machine Learning Systems With Formal Mathematics. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3047--3056. http://proceedings.mlr.press/v70/selsam17a.htmlGoogle Scholar
Traian-Florin Serbanuta, Grigore Rosu, and José Meseguer. 2009. A rewriting logic approach to operational semantics. Inf. Comput. 207, 2 (2009), 305--340. Google ScholarDigital Library
Arnab Sharma and Heike Wehrheim. 2019. Testing Machine Learning Algorithms for Balanced Data Usage. In 12th IEEE Conference on Software Testing, Validation and Verification, ICST 2019, Xi'an, China, April 22--27, 2019. IEEE, 125--135. Google ScholarCross Ref
Siwakorn Srisakaokul, Zhengkai Wu, Angello Astorga, Oreoluwa Alebiosu, and Tao Xie. 2018. Multiple-Implementation Testing of Supervised Learning Software. In The Workshops of the The Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018 (AAAI Workshops), Vol. WS-18. AAAI Press, 384--391. https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17301Google Scholar
Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Trans. Vis. Comput. Graph. 25, 1 (2019), 353--363. Google ScholarDigital Library
Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 788--799. Google ScholarDigital Library
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. CoRR abs/1906.10742 (2019). http://arxiv.org/abs/1906.10742Google Scholar
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16--21, 2018, Frank Tip and Eric Bodden (Eds.). ACM, 129--140. Google ScholarDigital Library

Index Terms

ExAIS: executable AI semantics

Index terms have been assigned to the content through auto-classification.

Recommendations

Semantics for an Actor-Based Real-Time Language
WPDRTS '96: Proceedings of the 4th International Workshop on Parallel and Distributed Real-Time Systems

We give formal semantics for a distributed concurrent object-oriented real-time programming language based on a variant of the actor model which includes an extension enabling the specification of time constraints on message-invocation. Real-time ...
Read More
Design and Implementation of a Tool for Specifying Specification in SOFL
Revised Selected Papers of the Second International Workshop on Structured Object-Oriented Formal Language and Method - Volume 7787

Structure Object-oriented Formal Language SOFL is not just a formal language for writing formal specification. It is also an approach and a methodology. SOFL provides a three-step approach for modelling a software system using formal specification. ...
Read More
Semantic-Based Neural Network Repair
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Recently, neural networks have spread into numerous fields including many safety-critical systems. Neural networks are built (and trained) by programming in frameworks such as TensorFlow and PyTorch. Developers apply a rich set of pre-defined layers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AI frameworks
AI libraries
AI model generation
deep learning models
model validation
semantics
specification
test case generation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 101
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ExAIS: executable AI semantics

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantics for an Actor-Based Real-Time Language

Design and Implementation of a Tool for Specifying Specification in SOFL

Semantic-Based Neural Network Repair