Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development

https://doi.org/10.1016/j.infsof.2016.10.006Get rights and content

Abstract

Context: The pull-based software development helps developers make contributions flexibly and efficiently. Commenters freely discuss code changes and provide suggestions. Core members make decision of pull requests. Both commenters and core members are reviewers in the evaluation of pull requests. Since some popular projects receive many pull requests, commenters may not notice new pull requests in time, and even ignore appropriate pull requests.

Objective: Our objective in this paper is to analyze attributes that affect the precision and recall of commenter prediction, and choose appropriate attributes to build commenter recommendation approach.

Method: We collect 19,543 pull requests, 206,664 comments and 4817 commenters from 8 popular projects in GitHub. We build approaches based on different attributes, including activeness, text similarity, file similarity and social relation. We also build composite approaches, including time-based text similarity, time-based file similarity and time-based social relation. The time-based social relation approach is the state-of-the-art approach proposed by Yu et al. Then we compare precision and recall of different approaches.

Results: We find that for 8 projects, the activeness based approach achieves the top-3 precision of 0.276, 0.386, 0.389, 0.516, 0.322, 0.572, 0.428, 0.402, and achieves the top-3 recall of 0.475, 0.593, 0.613, 0.66, 0.644, 0.791, 0.714, 0.65, which outperforms approaches based on text similarity, file similarity or social relation by a substantial margin. Moreover, the activeness based approach achieves better precision and recall than composite approaches. In comparison with the state-of-the-art approach, the activeness based approach improves the top-3 precision by 178.788%, 30.41%, 25.08%, 41.76%, 49.07%, 32.71%, 25.15%, 78.67%, and improves the top-3 recall by 196.875%, 36.32%, 29.05%, 46.02%, 43.43%, 27.79%, 25.483%, 79.06% for 8 projects.

Conclusion: The activeness is the most important attribute in the commenter prediction. The activeness based approach can be used to improve the commenter recommendation in code review.

Introduction

The pull-based software development is an emerging paradigm for distributed software development [1], [2]. Developers pull code changes from other repositories or the same repository in different branches, and merge them locally, rather than push changes to a central repository. Various open source software hosting sites, notably Github, provide support for pull-based development and allow developers to make contributions flexibly and efficiently. In GitHub, developers are allowed to fork repositories and make changes without asking for permission. Developers can submit pull requests when they want to merge their changes into the repositories they fork from. This pull-based software development separates making modification and integrating change, and makes contributing to others’ repositories much easier than it has ever been [3].

As shown in Fig. 1, the pull request process mainly includes three roles in GitHub, namely contributors, core members and commenters. Firstly, contributors modify codes to fix bugs or improve attributes. When a set of changes is ready, contributors create and submit pull requests to code review platform. Secondly, core members are trusted members of the community. Only experienced and excellent developers are chosen as core members, and they are granted the privilege of directly committing codes to project repositories [4]. Core members evaluate some submitted codes, and decide whether to merge these code changes into repositories or not. Thirdly, any developer can leave comments on pull requests and become a commenter [5], [6]. Commenters freely discuss whether code style meets the standard [7], whether repositories require modification, or whether submitted codes have good quality. These comments provide suggestions for the evaluation of pull requests [8]. Core members and commenters both belong to reviewers, but they play different roles in code review process.

In GitHub, popular projects receive many pull requests and make core members feel overwhelmed [4]. Previous works [9], [10] propose methods to recommend suitable core members for new pull requests. Besides, core members solicit opinions of the community about the merging decision [1], [4]. Developers in the community leave comments on pull requests, and assist core members to make decision. The code review process benefits from the wisdom of the crowd. However, popular projects receive many pull requests, and commenters may not notice new pull requests in time, or even ignore appropriate pull requests [5], [6]. Thus, an automatic commenter recommendation approach could remind commenters of appropriate pull requests, and encourage commenters to provide opinions.

We compare the importance of different attributes for recommending commenters in GitHub. Given a new pull request of a contributor and an candidate, we mainly consider 4 kinds of attributes: The activeness is based on recent activities of the candidate; The text similarity measures whether the candidate leaves comments on pull requests with similar description as the new pull request; The file similarity describes distance between modified code files of the new pull request and pull requests commented on by the candidate; The social relation measures whether the candidate prefers to comment on the pull request submitted by this contributor. We take a further step and consider composite attributes, including time-based text similarity, time-based file similarity and time-based social relation. The time-based social relation approach is the state-of-the-art approach [5], [6]. Yu et al. uses social relations and builds comment networks to predict appropriate commenters of incoming pull requests [5], [6]. We give detailed definitions of these attributes and build corresponding approaches (Section 3).

Approaches based on different attributes achieve various precision and recall, and the good attribute selection is critical for accurate recommendation. The goal of this work is to analyze attributes that affect the performance of commenter prediction, and choose appropriate attributes to build commenter recommendation approach. We find the attribute which has the best precision and recall in the commenter recommendation. We further explore whether the combination of attributes improves the precision and recall.

In this paper, we collect 19,543 pull requests, 206,664 comments and 4817 commenters from 8 popular projects in GitHub (Section 2). We measure precisions and recalls of approaches based on different attributes (Section 4). The experimental results show that (1) for 8 projects, the activeness based approach achieves the top-3 precision of 0.276, 0.386, 0.389, 0.516, 0.322, 0.572, 0.428, 0.402, and achieves the top-3 recall of 0.475, 0.593, 0.613, 0.66, 0.644, 0.791, 0.714, 0.65. The activeness based approach outperforms approaches based on text similarity, file similarity or social relation by a substantial margin. The activeness is the most important attribute in the commenter recommendation. (2) The activeness based approach achieves better precision and recall than composite approaches. In comparison with the state-of-the-art approach [5], the activeness based approach improves the top-3 precision by 178.788%, 30.41%, 25.08%, 41.76%, 49.07%, 32.71%, 25.15%, 78.67%, and improves the top-3 recall by 196.875%, 36.32%, 29.05%, 46.02%, 43.43%, 27.79%, 25.483%, 79.06% for 8 projects.

The main contributions of this paper are as follows:

  • We propose approaches based on different attributes to solve the commenter recommendation problem, including activeness, text similarity, file similarity, time-based text similarity and time-based file similarity.

  • We experiment on a broad range of datasets containing a total of 19,543 pull requests and 206,664 comments to compare the importance of attributes in the commenter recommendation. Experiment results show that the activeness is the most important attribute. Moreover, the activeness based approach outperforms state-of-the-art approach [5] by a substantial margin.

Section snippets

Background and data collection

Before diving into the commenter recommendation, we begin by providing background information about contribution evaluation process in GitHub. Then, we introduce how our datasets are collected, and report statistics of our datasets.

Attributes in commenter recommendation

Fig. 4 presents the overall framework of commenter recommendation approach. When a contributor submits a new pull request, the recommendation approach firstly analyzes historical data, and finds developers who ever leave comments before. These former commenters become candidates for the recommendation. Secondly, every commenter’s previous comments and corresponding pull requests are extracted from the historical data. Thirdly, the recommendation approach extracts attribute values from previous

Experiments and results

In this section, we evaluate precisions and recalls of approaches based on different attributes. The experimental environment is a windows server 2012, 64-bit, Intel(R) Xeon(R) 1.90  GHz server with 24GB RAM. In this experimental environment, we compute results of all approaches. Since these approaches do not use techniques like machine learning, they do not cost much time or resource. We first present our experiment setup and evaluation metrics (Sections 4.1 and 4.2). We then present our

Threats to validity

Threats to internal validity relate to experimenter bias and errors. We use a set of scripts to download and process the large GitHub data. We have checked these scripts and fixed errors that we found. However, we may still ignore few errors.

Threats to external validity relates to the generalizability of our study. Firstly, our experimental results are limited to 8 popular projects. We find that activeness based approach has higher precision and recall than CN-based approach, which are based on

The usefulness of comments

Some commenters may discuss trivial, tangential, or unrelated issues, and their comments are useless for the code review [30]. In this subsection, we randomly select 400 comments, and manually analyze their usefulness. Table 1 shows that our datasets have 206,664 comments. 400 comments from a population of 206,664 yield a 95% confidence level with 4.9% error margin.

Previous work [30] proposes an approach for assessing discussion usefulness. A comment is considered as useful if it is directly

Related work

Studies on Code Review. Several previous studies explored review process of code contribution. Nurolahzade et al. discovered that core members were often overwhelmed with many patches they had to review [33]. Rigby et al. observed that if modified codes were not reviewed immediately, they were likely not to be reviewed [34]. Rigby et al. further understood broadcast based peer review in open source software projects [8]. Rigby et al. found that code reviews were expensive, because they needed

Conclusion

In this paper, we build approaches based on different attributes, including activeness, text similarity, file similarity and social relation. We also build composite approaches, which combine the activeness and other attributes together. To investigate the importance of attributes, we perform experiments on 8 projects in GitHub, which include 19,543 pull requests, 206,664 comments and 4817 commenters. The experimental results show that the activeness based approach outperforms approaches based

Acknowledgment

This work is supported by National Natural Science Foundation of China under Grant No. 61300006, and the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2015ZX-24, and Beijing Natural Science Foundation under Grant No. 4163074.

References (46)

  • E. Cohen et al.

    Maintaining time-decaying stream aggregates

    J. Algorithm

    (2006)
  • Y. Yu et al.

    Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment?

    Inf. Softw. Technol.

    (2016)
  • G. Gousios et al.

    An exploratory study of the pull-based software development model

    Proceedings of the 36th ICSE, Hyderabad, India

    (2014)
  • J. Tsay et al.

    Influence of social and technical factors for evaluating contribution in github

    Proceedings of the 36th ICSE, Hyderabad, India

    (2014)
  • E. Kalliamvakou et al.

    The promises and perils of mining github

    Proceedings of MSR, Hyderabad, India

    (2014)
  • G. Gousios et al.

    Work practices and challenges in pull-based development: the integrators perspective

    Proceedings of the 37th ICSE, Florence, Italy

    (2015)
  • Y. Yu et al.

    Who should review this pull-request: reviewer recommendation to expedite crowd collaboration

    Proceedings of the 21st APSEC, Jeju, Korea

    (2014)
  • Y. Yu et al.

    Reviewer recommender of pull-requests in github

    Proceedings of the 30th ICSME, Victoria, Canada

    (2014)
  • V.J. Hellendoorn et al.

    Will they like this? Evaluating code contributions with language models

    Proceedings of the 12nd MSR, Florence, Italy

    (2015)
  • P.C. Rigby et al.

    Understanding broadcast based peer review on open source software projects

    Proceedings of the 33rd ICSE, Honolulu, USA

    (2011)
  • M.L. de Lima et al.

    Developers assignment for analyzing pull requests

    Proceedings of SAC, Salamanca, Spain

    (2015)
  • J. Jiang et al.

    Coredevrec: automatic core member recommendation for contribution evaluation

    J. Comput. Sci. Technol.

    (2015)
  • B. Vasilescu et al.

    Quality and productivity outcomes relating to continuous integration in github

    Proceedings of FSE, Bergamo, Italy

    (2015)
  • Z. Wang et al.

    Role distribution and transformation in open source software project teams

    Proceedings of APSEC, New Delhi, India

    (2015)
  • L. Dabbish et al.

    Social coding in github: transparency and collaboration in an open software repository

    Proceedings of CSCW, Washington, USA

    (2012)
  • C. Bird et al.

    Open borders? Immigration in open source projects

    Proceedings of the 4th MSR, Minneapolis,USA

    (2007)
  • G. Robles et al.

    Contributor turnover in libre software projects

    Open Source Systems

    (2006)
  • M. Foucault et al.

    Developer turnover in open-source software

    Proceedings of FSE, Bergamo, Italy

    (2015)
  • A. Rastogi et al.

    What community contribution pattern says about stability of software project?

    Proceedings of APSEC, Jeju, Korea

    (2014)
  • J. Anvik et al.

    Who should fix this bug?

    Proceedings of the 28th ICSE, Shanghai, China

    (2006)
  • D. Matter et al.

    Assigning bug reports using a vocabulary-based expertise model of developers

    Proceedings of the 6th MSR, Vancouver, Canada

    (2009)
  • X. Xia et al.

    Accurate developer recommendation for bug resolution

    Proceedings of the 20th WCRE, Koblenz, Germany

    (2013)
  • I.S. Dhillon et al.

    Concept decompositions for large sparse text data using clustering

    Mach. Learn.

    (2001)
  • Cited by (75)

    View all citing articles on Scopus
    View full text