ABSTRACT
Software support tickets contain short and noisy text from the customers. Software products are often represented by various surface forms and informal abbreviations. Automatically identifying software mentions from support tickets and determining the official names and versions are helpful for many downstream applications, \eg routing the support tickets to the right expert groups for support. In this work, we study the problem ofsoftware product name extraction andlinking from support tickets. We first annotate and analyze sampled tickets to understand the language patterns. Next, we design features using local, contextual, and external information sources, for extraction and linking models. In experiments, we show that linear models with the proposed features are able to deliver better and more consistent results, compared with the state-of-the-art baseline models, even on dataset with sparse labels.
- Shivali Agarwal, Vishalaksh Aggarwal, Arjun R Akula, Gargi Banerjee Dasgupta, and Giriprasad Sridhara. 2017. Automatic problem extraction and analysis from unstructured text in IT tickets . IBM Journal of Research and Development , Vol. 61, 1 (2017), 4--41. Google ScholarDigital Library
- Vishalaksh Aggarwal, Shivali Agarwal, Gaargi B Dasgupta, Giriprasad Sridhara, and Vijay E. 2016. ReAct: A System for Recommending Actions for Rapid Resolution of IT Service Incidents. In IEEE International Conference on Services Computing. 1--8.Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics , Vol. 5 (2017), 135--146.Google ScholarCross Ref
- Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of German Society for Computational Linguistics and Language Technology (2009), 31--40.Google Scholar
- Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics , Vol. 18, 4 (1992), 467--479. Google ScholarDigital Library
- Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In International Conference on Software Engineering . 450--461. Google ScholarDigital Library
- Nancy Chinchor and Beth Sundheim. 1993. MUC-5 evaluation metrics. In Conference on Message Understanding . 69--78. Google ScholarDigital Library
- Yu Deng, KE Maghraoui, TD Griffin, V Agarwal, SG Tamilselvam, RD Sharnagat, TH Alexander, NE Gómez, CM Cramer, A Bivens, et almbox. 2017. Advanced search system for IT support services . IBM Journal of Research and Development , Vol. 61, 1 (2017), 3--27. Google ScholarDigital Library
- Jiang Guo, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Revisiting Embedding Features for Simple Semi-supervised Learning.. In Conference on Empirical Methods in Natural Language Processing . 110--120.Google ScholarCross Ref
- Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Conference of the North American Chapter of the Association for Computational Linguistics. 320--327. Google ScholarDigital Library
- Jianglei Han and Mohammad Akbari. 2018. Vertical Domain Text Classification: Towards Understanding IT Tickets Using Deep Neural Networks. In AAAI Conference on Artificial Intelligence .Google Scholar
- Ea-Ee Jan, Kuan-Yu Chen, and Tsuyoshi Idé. 2015. Probabilistic text analytics framework for information technology service desk tickets. In IFIP/IEEE Symposium on Integrated Network Management. 870--873.Google ScholarCross Ref
- Ea-Ee Jan, Jian Ni, Niyu Ge, Naga Ayachitula, and Xiaolan Zhang. 2013. A statistical machine learning approach for ticket mining in IT service delivery. In IFIP/IEEE Symposium on Integrated Network Management. 541--546.Google Scholar
- John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning . 282 -- 289. Google ScholarDigital Library
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 260--270.Google Scholar
- Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.Google Scholar
- Ta Hsin Li, Rong Liu, Noi Sukaviriya, Ying Li, Jeaha Yang, Michael Sandin, and Juhnyoung Lee. 2014. Incident Ticket Analytics for IT Application Management Services. In IEEE International Conference on Services Computing. 568--574. Google ScholarDigital Library
- Xuezhe Ma and Eduard Hovy. 2016. End-to-End Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 1064--1074.Google ScholarCross Ref
- Gabor Melli and Christian Romming. 2012. An Overview of the CPROD1 Contest on Consumer Product Recognition within User Generated Postings and Normalization against a Large Product Catalog. In IEEE International Conference on Data Mining Workshops . Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Conference on Neural Information Processing Systems. 3111--3119. Google ScholarDigital Library
- David Nadeau. 2007. Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision . Ph.D. Dissertation. University of Ottawa. Google ScholarDigital Library
- Xuelian Pan, Erjia Yan, Qianqian Wang, and Weina Hua. 2015. Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics , Vol. 9, 4 (2015), 860--871.Google ScholarCross Ref
- Chiu Jason P.C. and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics , Vol. 4 (2016), 357--370.Google ScholarCross Ref
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing. 1532--1543.Google ScholarCross Ref
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics .Google ScholarCross Ref
- Karthikeyan Ponnalagu. 2017. Ontology-driven root-cause analytics for user-reported symptoms in managed IT systems . IBM Journal of Research and Development , Vol. 61, 1 (2017), 5--53. Google ScholarDigital Library
- Rahul Potharaju, Joseph Chan, Luhui Hu, Cristina Nita-Rotaru, Mingshi Wang, Liyuan Zhang, and Navendu Jain. 2015. ConfSeer: Leveraging Customer Support Knowledge Bases for Automated Misconfiguration Detection. Proceedings of the VLDB Endowment , Vol. 8, 12 (2015). Google ScholarDigital Library
- Rahul Potharaju, Navendu Jain, and Cristina Nita-Rotaru. 2013. Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets.. In USENIX Symposium on Networked Systems Design and Implementation . 127--141. Google ScholarDigital Library
- Duangmanee Pew Putthividhya and Junling Hu. 2011. Bootstrapped named entity recognition for product attribute extraction. In Conference on Empirical Methods in Natural Language Processing. 1557--1567. Google ScholarDigital Library
- Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Conference on Computational Natural Language Learning. 147--155. Google ScholarDigital Library
- Stefan Rüd, Massimiliano Ciaramita, Jens Müller, and Hinrich Schütze. 2011. Piggyback: Using search engines for robust cross-domain named entity recognition. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 965--975. Google ScholarDigital Library
- Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval .Cambridge University Press.Google Scholar
- Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun'ichi Tsujii. 2012. BRAT: a web-based tool for NLP-assisted text annotation. In European Chapter of the Association for Computational Linguistics. 102--107. Google ScholarDigital Library
- Charles Sutton and Andrew McCallum. 2012. An introduction to conditional random fields. Foundations and Trends® in Machine Learning , Vol. 4, 4 (2012), 267--373. Google ScholarDigital Library
- Henry S Vieira, Altigran S da Silva, Pável Calado, Marco Cristo, and Edleno S de Moura. 2016. Towards the Effective Linking of Social Media Contents to Products in E-Commerce Catalogs. In International Conference on Information and Knowledge Management. 1049--1058. Google ScholarDigital Library
- Qing Wang, Wubai Zhou, Chunqiu Zeng, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2017. Constructing the knowledge base for cognitive IT service management. In IEEE International Conference on Services Computing. 410--417.Google ScholarCross Ref
- Shen Wei, Wang Jianyong, and Han Jiawei. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering , Vol. 27, 2 (2015), 443--460.Google ScholarCross Ref
- Sen Wu, Zhanpeng Fang, and Jie Tang. 2012. Accurate product name recognition from user generated content. In IEEE International Conference on Data Mining Workshops. 874--877. Google ScholarDigital Library
- Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efficiently Answering Technical Questions - A Knowledge Graph Approach. In AAAI Conference on Artificial Intelligence .Google Scholar
- Yangjie Yao and Aixin Sun. 2016. Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web , Vol. 19, 5 (2016), 783--805. Google ScholarDigital Library
- Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, and Nachiket Kapre. 2016a. Software-specific named entity recognition in software engineering social content. In IEEE Conference on Software Analysis, Evolution, and Reengineering , Vol. 1. 90--101.Google ScholarCross Ref
- Deheng Ye, Zhenchang Xing, Chee Yong Foo, Jing Li, and Nachiket Kapre. 2016b. Learning to extract api mentions from informal natural language discussions. In International Conference on Software Maintenance and Evolution . 389--399.Google ScholarCross Ref
- Wubai Zhou, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2015a. Recommending ticket resolution using feature adaptation. In International Conference on Network and Service Management. 15--21. Google ScholarDigital Library
- Wubai Zhou, Liang Tang, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2015b. Resolution recommendation for event tickets in service management. In IFIP/IEEE Symposium on Integrated Network Management. 287--295.Google ScholarCross Ref
Index Terms
- Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets
Recommendations
Lightweight Multilingual Entity Extraction and Linking
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningText analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to ...
Exploiting anonymous entity mentions for named entity linking
AbstractNamed entity linking or named entity disambiguation is to link entity mentions to corresponding entities in a knowledge base for resolving the ambiguity of entity mentions. Recently, collective linking methods exploit document-level coherence of ...
Populating knowledge base with collective entity mentions: a graph-based approach
ASONAM '14: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningPopulating a knowledge base with new entity mentions extracted from unstructured text can help enhance its coverage and freshness. It naturally consists of two subtasks, namely, fine-grained entity classification and entity linking. Existing studies ...
Comments