research-article

Beyond duplicates: towards understanding and predicting link types in issue tracking systems

Authors:
Clara Marie Lüders

University of Hamburg, Hamburg, Germany

University of Hamburg, Hamburg, Germany
View Profile

,
Abir Bouraffa

University of Hamburg, Hamburg, Germany

University of Hamburg, Hamburg, Germany
View Profile

,
Walid Maalej

University of Hamburg, Hamburg, Germany

University of Hamburg, Hamburg, Germany
View Profile

MSR '22: Proceedings of the 19th International Conference on Mining Software RepositoriesMay 2022Pages 48–60https://doi.org/10.1145/3524842.3528457

Published:17 October 2022Publication History

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

Pages 48–60

ABSTRACT

Software projects use Issue Tracking Systems (ITS) like JIRA to track issues and organize the workflows around them. Issues are often inter-connected via different links such as the default JIRA link types Duplicate, Relate, Block, or Subtask. While previous research has mostly focused on analyzing and predicting duplication links, this work aims at understanding the various other link types, their prevalence, and characteristics towards a more reliable link type prediction. For this, we studied 607,208 links connecting 698,790 issues in 15 public JIRA repositories. Besides the default types, the custom types Depend, Incorporate, Split, and Cause were also common. We manually grouped all 75 link types used in the repositories into five general categories: General Relation, Duplication, Composition, Temporal / Causal, and Workflow. Comparing the structures of the corresponding graphs, we observed several trends. For instance, Duplication links tend to represent simpler issue graphs often with two components and Composition links present the highest amount of hierarchical tree structures (97.7%). Surprisingly, General Relation links have a significantly higher transitivity score than Duplication and Temporal / Causal links.

Motivated by the differences between the link types and by their popularity, we evaluated the robustness of two state-of-the-art duplicate detection approaches from the literature on the JIRA dataset. We found that current deep-learning approaches confuse between Duplication and other links in almost all repositories. On average, the classification accuracy dropped by 6% for one approach and 12% for the other. Extending the training sets with other link types seems to partly solve this issue. We discuss our findings and their implications for research and practice.

References

Mehdi Amoui, Nilam Kaushik, Abraham Al-Dabbagh, Ladan Tahvildari, Shimin Li, and Weining Liu. 2013. Search-based duplicate defect detection: An industrial experience. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, USA, 173--182. Google ScholarCross Ref
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006. Who Should Fix This Bug? Association for Computing Machinery, New York, NY, USA, 361--370. Google ScholarDigital Library
Deeksha Arya, Wenting Wang, Jin L. C. Guo, and Jinghui Cheng. 2019. Analysis and Detection of Information Types of Open Source Software Issue Discussions. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19). IEEE Press, Piscataway, NJ, USA, 454--464. Google ScholarDigital Library
Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker. 2010. Communication, Collaboration, and Bugs: The Social Nature of Issue Tracking in Small, Collocated Teams. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW '10). ACM, New York, NY, USA, 291--300. Google ScholarDigital Library
Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and 3 Sunghun Kim. 2008. Duplicate bug reports considered harmful ... really?. In 2008 IEEE International Conference on Software Maintenance. IEEE, USA, 337--345. Google ScholarCross Ref
Elizabeth Bjarnason, Krzysztof Wnuk, and Björn Regnell. 2011. Requirements are slipping through the gaps --- A case study on causes & effects of communication gaps in large-scale software development. In 2011 IEEE 19th International Requirements Engineering Conference. IEEE, USA, 37--46. Google ScholarDigital Library
Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Manish Shrivastava. 2018. DWEN: Deep Word Embedding Network for Duplicate Bug Report Detection in Software Repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18). Association for Computing Machinery, New York, NY, USA, 193--194. Google ScholarDigital Library
Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucrédio, and Silvio Romero de Lemos Meira. 2010. An Initial Study on the Bug Report Duplication Problem. In 2010 14th European Conference on Software Maintenance and Reengineering. ICSE, USA, 264--267. Google ScholarDigital Library
Xiaoyun Cheng, Naming Liu, Lin Guo, Zhou Xu, and Tao Zhang. 2020. Blocking Bug Prediction Based on XGBoost with Enhanced Features. In 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, USA, 902--911. Google ScholarCross Ref
Jayati Deshmukh, K. M. Annervaz, Sanjay Podder, Shubhashis Sengupta, and Neville Dubash. 2017. Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, USA, 115--124. Google ScholarCross Ref
Gouri Deshpande, Quim Motger, Cristina Palomares, Ikagarjot Kamra, Katarzyna Biesialska, Xavier Franch, Guenther Ruhe, and Jason Ho. 2020. Requirements Dependency Extraction by Integrating Active Learning with Ontology-Based Retrieval. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, USA, 78--89. Google ScholarCross Ref
Qiang Fan, Yue Yu, Gang Yin, Tao Wang, and Huaimin Wang. 2017. Where Is the Road for Issue Reports Classification Based on Text Mining?. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, USA, 121--130. Google ScholarDigital Library
Camilo Fitzgerald, Emmanuel Letier, and Anthony Finkelstein. 2011. Early failure prediction in feature request management systems. In 2011 IEEE 19th International Requirements Engineering Conference. IEEE, USA, 229--238. Google ScholarDigital Library
Davide Fucci, Cristina Palomares, Xavier Franch, Dolors Costal, Mikko Raatikainen, Martin Stettinger, Zijad Kurtanovic, Tero Kojo, Lars Koenig, Andreas Falkner, Gottfried Schenner, Fabrizio Brasca, Tomi Männistö, Alexander Felfernig, and Walid Maalej. 2018. Needs and Challenges for a Platform to Support Large-Scale Requirements Engineering: A Multiple-Case Study. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '18). Association for Computing Machinery, New York, NY, USA, Article 19, 10 pages. Google ScholarDigital Library
Derek L. Hansen, Ben Shneiderman, Marc A. Smith, and Itai Himelboim (Eds.). 2020. (second edition ed.). Morgan Kaufmann, USA. 31--51 pages. Google ScholarCross Ref
Jianjun He, Ling Xu, Meng Yan, Xin Xia, and Yan Lei. 2020. Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks. Association for Computing Machinery, New York, NY, USA, 117--127. Google ScholarDigital Library
Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It's not a bug, it's a feature: How misclassification impacts bug prediction. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, USA, 392--401. Google ScholarCross Ref
Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. 2009. Improving Bug Triage with Bug Tossing Graphs. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (ESEC/FSE '09). Association for Computing Machinery, New York, NY, USA, 111--120. Google ScholarDigital Library
Ahmed Lamkanfi, Serge Demeyer, Emanuel Giger, and Bart Goethals. 2010. Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, USA, 1--10. Google ScholarCross Ref
Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, and Tim Verdonck. 2011. Comparing mining algorithms for predicting the severity of a reported bug. In 2011 15th European Conference on Software Maintenance and Reengineering. IEEE, IEEE, USA, 249--258.Google ScholarDigital Library
Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating Duplicate Bug Datasets. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). Association for Computing Machinery, New York, NY, USA, 392--395. Google ScholarDigital Library
Lisha Li, Zhilei Ren, Xiaochen Li, Weiqin Zou, and He Jiang. 2018. How Are Issue Units Linked? Empirical Study on the Linking Behavior in GitHub. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC). IEEE, USA, 386--395. Google ScholarCross Ref
Garm Lucassen, Fabiano Dalpiaz, Jan Martijn E.M. van der Werf, Sjaak Brinkkemper, and Didar Zowghi. 2017. Behavior-Driven Requirements Traceability via Automated Acceptance Tests. In 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW). IEEE, USA, 431--434. Google ScholarCross Ref
Robert J. Walker Martin P. Robillard, Walid Maalej and Thomas Zimmermann (Eds.). 2014. . Springer, Berlin Heidelberg. Google ScholarCross Ref
Thorsten Merten, Matúš Falis, Paul Hübner, Thomas Quirchmayr, Simone Bürsner, and Barbara Paech. 2016. Software Feature Request Detection in Issue Tracking Systems. In 2016 IEEE 24th International Requirements Engineering Conference (RE). IEEE, USA, 166--175. Google ScholarCross Ref
Thorsten Merten, Daniel Krämer, Bastian Mager, Paul Schell, Simone Bürsner, and Barbara Paech. 2016. Do information retrieval algorithms for automated traceability perform effectively on issue tracking system data?. In International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer International Publishing, Cham, 45--62.Google ScholarDigital Library
Lloyd Montgomery, Clara Lüders, and Walid Maalej. 2022. An Alternative Issue Tracking Dataset of Public Jira Repositories. In 2022 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, USA.Google Scholar
M. E. J. Newman. 2002. Assortative Mixing in Networks. Phys. Rev. Lett. 89 (Oct 2002), 208701. Issue 20. Google ScholarCross Ref
Alexander Nicholson, Deeksha M. Arya, and Jin L.C. Guo. 2020. Traceability Network Analysis: A Case Study of Links in Issue Tracking Systems. In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE). IEEE, USA, 39--47. Google ScholarCross Ref
Patrick Rempel and Parick Mäder. 2017. Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality. IEEE Transactions on Software Engineering 43, 8 (2017), 777--797. Google ScholarDigital Library
Thiago Marques Rocha and André Luiz Da Costa Carvalho. 2021. SiameseQAT: A Semantic Context-Based Duplicate Bug Report Detection Using Replicated Cluster Information. IEEE Access 9 (2021), 44610--44630. Google ScholarCross Ref
Thomas Schank and Dorothea Wagner. 2005. Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study. In Experimental and Efficient Algorithms, Sotiris E. Nikoletseas (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 606--609.Google Scholar
Marcus Seiler and Barbara Paech. 2017. Using tags to support feature management across issue tracking systems and version control systems. In International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer International Publishing, Cham, 174--180.Google ScholarCross Ref
Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, and Kenichi Matsumoto. 2017. Bug or Not? Bug Report Classification Using N-Gram IDF. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, USA, 534--538. Google ScholarCross Ref
C Albert Thompson, Gail C Murphy, Marc Palyart, and Marko Gašparic. 2016. How software developers use work breakdown relationships in issue repositories. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, IEEE, USA, 281--285.Google ScholarDigital Library
Juha Tiihonen, Mikko Raatikainen, Lalli Myllyaho, Clara Marie Lüders, Tomi Männistö, et al. 2019. Coping with Inconsistent Models of Requirements. In Proceedings of the 21st Configuration Workshop Hamburg, Germany, September 19th to 20th, 2019. Rheinisch-Westfaelische Technische Hochschule Aachen, Aachen, Germany, 1--8.Google Scholar
Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information. In Proceedings of the 30th International Conference on Software Engineering (ICSE '08). Association for Computing Machinery, New York, NY, USA, 461--470. Google ScholarDigital Library
Jifeng Xuan, He Jiang, Zhilei Ren, and Weiqin Zou. 2012. Developer prioritization in bug repositories. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, USA, 25--35. Google ScholarCross Ref
Tao Zhang, He Jiang, Xiapu Luo, and Alvin T.S. Chan. 2016. A Literature Review of Research in Bug Resolution: Tasks, Challenges and Future Directions. Comput. J. 59, 5 (2016), 741--773. Google ScholarCross Ref
Ye Zhang and Byron Wallace. 2017. A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 253--263. https://aclanthology.org/I17-1026Google Scholar
Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. 2016. Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28, 3 (2016), 150--176.Google ScholarDigital Library
Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schröter, and Cathrin Weiss. 2010. What Makes a Good Bug Report? IEEE Transactions on Software Engineering 36, 5 (2010), 618--643. Google ScholarDigital Library

Recommendations

Orthogonal frequency division multiple access PON (OFDMA-PON) for colorless upstream transmission beyond 10 Gb/s
Special issue on next-generation broadband optical access network technologies

In this paper, we overview the fundamental principles of next-generation optical Orthogonal Frequency Division Multiple Access (OFDMA)-PON systems, with a particular focus on upstream architectures capable of achieving 10⁺ Gb/s colorless upstream ...
Read More
Optical Grooming Capable Wavelength Division Multiplexing node architecture for beyond 100 Gbps transport
Abstract
Mixed-line-rate wavelength division multiplexing (WDM) networks with discrete channel spacing generalize the fixed grid WDM networks and can support mixed-electronic-optical grooming efficiently. For optical transport networks with ...
Read More
Scaling metropolitan area networks to 1tb/s and beyond with hornet
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
General Chair:
David Lo
Singapore Management University, Singapore
,
Program Chairs:
Shane McIntosh
University of Waterloo, Canada
,
Nicole Novielli
University of Bari, Italy
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 105
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond duplicates: towards understanding and predicting link types in issue tracking systems

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

ABSTRACT

References

Cited By

Recommendations

Orthogonal frequency division multiple access PON (OFDMA-PON) for colorless upstream transmission beyond 10 Gb/s

Optical Grooming Capable Wavelength Division Multiplexing node architecture for beyond 100 Gbps transport

Scaling metropolitan area networks to 1tb/s and beyond with hornet

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Beyond duplicates: towards understanding and predicting link types in issue tracking systems

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

ABSTRACT

References

Cited By

Recommendations

Orthogonal frequency division multiple access PON (OFDMA-PON) for colorless upstream transmission beyond 10 Gb/s

Optical Grooming Capable Wavelength Division Multiplexing node architecture for beyond 100 Gbps transport

Scaling metropolitan area networks to 1tb/s and beyond with hornet

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Optical Grooming Capable Wavelength Division Multiplexing node architecture for beyond 100 Gbps transport