Abstract
Open bug repositories prove to be very helpful in Software Engineering, since they provide a platform for developers and end-users to report bugs. Along with summary and description, reporters are also expected to assign a component name. Without knowledge of internal structure of a project often a wrong component name is assigned. These incorrect naming may delay process of bug related activities. Pre-requisite to develop any automated component prediction system is to extract relevant features from bug reports. Bug reports are in natural language, therefore, before using them for training process, a vector space of relevant features is built. TFIDF Weighting and Topic Modeling techniques have been examined in this work w.r.t. their ability to choose selective terms from bug reports. This work has done a comparative analysis of two above mentioned preprocessing techniques along with three classifiers—Naive Bayes, SVM and C4.5 in context of correct component prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, D., Zhang, H., Liu, R., Lin, M., Wu, W.: Predicting bugs’ components via mining bug reports. J. Softw. 7(5), 1149–1154 (2012)
Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 392–401. IEEE Press (2013, May)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 5(4), 130–137 (2010)
Pingclasai, N., Hata, H., Matsumoto, K.I.: Classifying bug reports to bugs and other requests using topic modeling. In: Software Engineering Conference (APSEC, 2013 20th Asia-Pacific), pp. 13–18. IEEE (2013, December)
Porter, M.F.: Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html (2008). Accessed 11-03 2008
ˇCubrani´c, D., Murphy, G.C.: Automatic bug triage using text classification. In: Proceedings of Software Engineering and Knowledge Engineering, pp. 92–97 (2004)
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, pp. 361–370. ACM (2006, May)
Sureka, A.: Learning to classify bug reports into components. In: Objects, Models, Components, Patterns, pp. 288–303. Springer, Berlin Heidelberg (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Weka 3—Data mining with open source machine learning software in java. http://www.cs.waikato.ac.nz/ml/weka
Topic-modeling-tool—A graphical user interface tool for topic modeling-google project hosting. https://code.google.com/p/topic-modeling-tool
Joachims, T.: Text categorization with support vector machines: learning with many relevant features, pp. 137–142. Springer, Berlin Heidelberg (1998)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997, July)
Issue Navigator-ASF JIRA. http://issues.apache.org/jira
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Jangra, M., Singh, S.K. (2016). Evaluating Topic Modeling as Pre-processing for Component Prediction in Bug Reports. In: Choudhary, R., Mandal, J., Auluck, N., Nagarajaram, H. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 452. Springer, Singapore. https://doi.org/10.1007/978-981-10-1023-1_46
Download citation
DOI: https://doi.org/10.1007/978-981-10-1023-1_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1021-7
Online ISBN: 978-981-10-1023-1
eBook Packages: EngineeringEngineering (R0)