Abstract
In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.
In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.
- Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarDigital Library
- Jong-hoon An, Avik Chaudhuri, Jeffrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459–472.Google ScholarDigital Library
- Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In ECOOP, Vol. 5. Springer, 428–452.Google Scholar
- Justus Bogner and Manuel Merkel. 2022. To type or not to type? a systematic comparison of the software quality of JavaScript and typescript applications on GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 658–669.Google ScholarDigital Library
- Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google Scholar
- Satish Chandra, Colin S Gordon, Jean-Baptiste Jeannin, Cole Schlesinger, Manu Sridharan, Frank Tip, and Youngil Choi. 2016. Type inference for static compilation of JavaScript. ACM SIGPLAN Notices 51, 10 (2016), 410–429.Google ScholarDigital Library
- Winter Collin and Lownds Tony. 2006. PEP 3107 - Function Annotations. https://peps.python.org/pep-3107/.Google Scholar
- Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2010. A controlled experiment for program comprehension through trace visualization. IEEE Transactions on Software Engineering 37, 3 (2010), 341–355.Google ScholarDigital Library
- Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316(2021).Google Scholar
- Santanu Kumar Dash, Miltiadis Allamanis, and Earl T Barr. 2018. Refinym: Using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107–117.Google ScholarDigital Library
- Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in python: an empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarDigital Library
- Michael Furr, Jong-hoon An, Jeffrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM symposium on Applied Computing. 1859–1866.Google ScholarDigital Library
- Zheng Gao, Christian Bird, and Earl T Barr. 2017. To type or not to type: quantifying detectable bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 758–769.Google ScholarDigital Library
- Google. 2022. Pytype. https://github.com/google/pytype.Google Scholar
- Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19, 5 (2014), 1335–1382.Google ScholarDigital Library
- Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-based type inference for Python 3. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II 30. Springer, 12–19.Google ScholarCross Ref
- Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162.Google ScholarDigital Library
- Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for javascript.. In SAS, Vol. 9. Springer, 238–255.Google Scholar
- Kevin Jesse, Premkumar T Devanbu, and Toufique Ahmed. 2021. Learning type annotation: is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.Google ScholarDigital Library
- Jetbrains. 2020. Python developer survey conducted by jetbrains and python software foundation. https://www.jetbrains.com/lp/python-developers-survey-2020/.Google Scholar
- Wuxia Jin, Dinghong Zhong, Zifan Ding, Ming Fan, and Ting Liu. 2021. Where to start: Studying type annotation practices in python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 529–541.Google ScholarDigital Library
- Faizan Khan, Boqi Chen, Daniel Varro, and Shane Mcintosh. 2021. An Empirical Study of Type-Related Defects in Python Projects. IEEE Transactions on Software Engineering 48, 8 (2021), 3145–3158.Google ScholarDigital Library
- Robert V Krejcie and Daryle W Morgan. 1970. Determining sample size for research activities. Educational and psychological measurement 30, 3 (1970), 607–610.Google Scholar
- Triet HM Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.Google ScholarDigital Library
- Jukka Lehtosalo. 2019. PEP 589 – TypedDict: Type hints for dictionaries with a fixed set of keys. https://www.python.org/dev/peps/pep-0589/.Google Scholar
- Jukka Lehtosalo, G v Rossum, Ivan Levkivskyi, Michael J Sullivan, David Fisher, Greg Price, Michael Lee, N Seyfer, R Barton, S Ilinskiy, et al. 2017. Mypy-optional static typing for python. https://mypy-lang.org/.Google Scholar
- Ivan Levkivskyi, Jukka Lehtosalo, and Łukasz Langa. 2017. PEP 544 – protocols: Structural subtyping (static duck typing). https://www.python.org/dev/peps/pep-0544/.Google Scholar
- Magnus Madsen. 2015. Static analysis of dynamic languages. https://pure.au.dk/ws/files/85299449/Thesis.pdf(2015).Google Scholar
- Eva Maia, Nelma Moreira, and Rogério Reis. 2012. A static type inference for python. Proc. of DYLA 5, 1 (2012), 1.Google Scholar
- Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 304–315.Google ScholarDigital Library
- Microsoft. 2022. Pyright. https://github.com/microsoft/pyright.Google Scholar
- Amir M Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589.Google ScholarCross Ref
- Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarDigital Library
- Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley–Milner typing. Proceedings of the ACM on Programming Languages 3, POPL(2019), 1–29.Google Scholar
- GitHub Octoverse. 2022. The 2022 state of open source software. https://octoverse.github.com/.Google Scholar
- John-Paul Ore, Carrick Detweiler, and Sebastian Elbaum. 2021. An empirical study on type annotations: Accuracy, speed, and suggestion effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2(2021), 1–29.Google ScholarDigital Library
- John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis. 2018. Assessing the type annotation burden. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 190–201.Google ScholarDigital Library
- Francisco Ortin, Jose Baltasar Garcia Perez-Schofield, and Jose Manuel Redondo. 2015. Towards a static type checker for python. In European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, Vol. 15. 1–2.Google Scholar
- Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and Charles Sutton. 2020. Opttyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348(2020).Google Scholar
- Jibesh Patra and Michael Pradel. 2022. Nalin: learning from runtime behavior to find name-value inconsistencies in jupyter notebooks. In Proceedings of the 44th International Conference on Software Engineering. 1469–1481.Google ScholarDigital Library
- Zvonimir Pavlinovic. 2019. Leveraging Program Analysis for Type Inference. Ph. D. Dissertation. New York University.Google Scholar
- Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.Google ScholarDigital Library
- Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarDigital Library
- Python. 2022. Pyre: A performant type-checker for Python 3. https://pyre-check.org/.Google Scholar
- Jochen Quante. 2008. Do Dynamic Object Process Graphs Support Program Understanding?-A Controlled Experiment.. In 2008 16th IEEE international conference on program comprehension. IEEE, 73–82.Google ScholarDigital Library
- Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 155–165.Google ScholarDigital Library
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from” big code”. ACM SIGPLAN Notices 50, 1 (2015), 111–124.Google ScholarDigital Library
- Brianna M Ren, John Toman, T Stephen Strickland, and Jeffrey S Foster. 2013. The ruby type checker. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1565–1572.Google ScholarDigital Library
- Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Guido Salvaneschi and Mira Mezini. 2016. Debugging for reactive programming. In Proceedings of the 38th International Conference on Software Engineering. 796–807.Google ScholarDigital Library
- Sandro Schulze, Jörg Liebig, Janet Siegmund, and Sven Apel. 2013. Does the discipline of preprocessor annotations matter? A controlled experiment. In Proceedings of the 12th international conference on Generative programming: concepts & experiences. 65–74.Google ScholarDigital Library
- Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. Vol. 39. Cambridge University Press Cambridge.Google Scholar
- Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.Google ScholarDigital Library
- IEEE Spectrum. 2022. Top Programming Languages 2022. https://spectrum.ieee.org/top-programming-languages.Google Scholar
- Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In Proceedings of the 7th symposium on Dynamic languages. 97–106.Google ScholarDigital Library
- Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static Type Recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.Google ScholarDigital Library
- Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: a human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.Google ScholarDigital Library
- Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. https://www.python.org/dev/peps/pep-0484/.Google Scholar
- Guido van Rossum and Ivan Levkivskyi. 2014. PEP 483 – The Theory of Type Hints. https://www.python.org/dev/peps/pep-0483/.Google Scholar
- Guido van van Rossum. 2004. Adding Optional Static Typing to Python. https://www.artima.com/weblogs/viewpost.jsp?thread=85551.Google Scholar
- Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
- Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
- Yin Wang. 2022. Pysonar2. https://github.com/yinwang0/pysonar2..Google Scholar
- Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161(2020).Google Scholar
- Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python predictive analysis for bug detection. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 121–132.Google ScholarDigital Library
- Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 607–618.Google ScholarDigital Library
- Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep Learning with Static Slicing for Python Type Inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.Google Scholar
- Łukasz Langa. 2019. PEP 585 – type hinting generics in standard collections. https://www.python.org/dev/peps/pep-0585/.Google Scholar
Index Terms
- Generating Python Type Annotations from Type Inference: How Far Are We?
Recommendations
Static Type Recommendation for Python
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software EngineeringRecently, Python has adopted optional type annotation to support type checking and program documentation. However, to enjoy the benefits, developers have to manually write type annotations, which is recognized to be a time-consuming task. To alleviate ...
The evolution of type annotations in python: an empirical study
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringType annotations and gradual type checkers attempt to reveal errors and facilitate maintenance in dynamically typed programming languages. Despite the availability of these features and tools, it is currently unclear how quickly developers are ...
MLF: raising ML to the power of system F
ICFP '03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programmingWe propose a type system MLF that generalizes ML with first-class polymorphism as in System F. Expressions may contain second-order type annotations. Every typable expression admits a principal type, which however depends on type annotations. Principal ...
Comments