research-article

Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets

Authors:
Jianglei Han

SAP Leonardo ML & Nanyang Technological University, Singapore

SAP Leonardo ML & Nanyang Technological University, Singapore
View Profile

,
Ka Hian Goh

SAP Leonardo ML & Nanyang Technological University, Singapore

SAP Leonardo ML & Nanyang Technological University, Singapore
View Profile

,
Aixin Sun

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Mohammad Akbari

SAP Leonardo ML & University College London, London, United Kingdom

SAP Leonardo ML & University College London, London, United Kingdom
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 2263–2271https://doi.org/10.1145/3269206.3272026

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 2263–2271

ABSTRACT

Software support tickets contain short and noisy text from the customers. Software products are often represented by various surface forms and informal abbreviations. Automatically identifying software mentions from support tickets and determining the official names and versions are helpful for many downstream applications, \eg routing the support tickets to the right expert groups for support. In this work, we study the problem ofsoftware product name extraction andlinking from support tickets. We first annotate and analyze sampled tickets to understand the language patterns. Next, we design features using local, contextual, and external information sources, for extraction and linking models. In experiments, we show that linear models with the proposed features are able to deliver better and more consistent results, compared with the state-of-the-art baseline models, even on dataset with sparse labels.

References

Shivali Agarwal, Vishalaksh Aggarwal, Arjun R Akula, Gargi Banerjee Dasgupta, and Giriprasad Sridhara. 2017. Automatic problem extraction and analysis from unstructured text in IT tickets . IBM Journal of Research and Development , Vol. 61, 1 (2017), 4--41. Google ScholarDigital Library
Vishalaksh Aggarwal, Shivali Agarwal, Gaargi B Dasgupta, Giriprasad Sridhara, and Vijay E. 2016. ReAct: A System for Recommending Actions for Rapid Resolution of IT Service Incidents. In IEEE International Conference on Services Computing. 1--8.Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics , Vol. 5 (2017), 135--146.Google ScholarCross Ref
Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of German Society for Computational Linguistics and Language Technology (2009), 31--40.Google Scholar
Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics , Vol. 18, 4 (1992), 467--479. Google ScholarDigital Library
Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In International Conference on Software Engineering . 450--461. Google ScholarDigital Library
Nancy Chinchor and Beth Sundheim. 1993. MUC-5 evaluation metrics. In Conference on Message Understanding . 69--78. Google ScholarDigital Library
Yu Deng, KE Maghraoui, TD Griffin, V Agarwal, SG Tamilselvam, RD Sharnagat, TH Alexander, NE Gómez, CM Cramer, A Bivens, et almbox. 2017. Advanced search system for IT support services . IBM Journal of Research and Development , Vol. 61, 1 (2017), 3--27. Google ScholarDigital Library
Jiang Guo, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Revisiting Embedding Features for Simple Semi-supervised Learning.. In Conference on Empirical Methods in Natural Language Processing . 110--120.Google ScholarCross Ref
Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Conference of the North American Chapter of the Association for Computational Linguistics. 320--327. Google ScholarDigital Library
Jianglei Han and Mohammad Akbari. 2018. Vertical Domain Text Classification: Towards Understanding IT Tickets Using Deep Neural Networks. In AAAI Conference on Artificial Intelligence .Google Scholar
Ea-Ee Jan, Kuan-Yu Chen, and Tsuyoshi Idé. 2015. Probabilistic text analytics framework for information technology service desk tickets. In IFIP/IEEE Symposium on Integrated Network Management. 870--873.Google ScholarCross Ref
Ea-Ee Jan, Jian Ni, Niyu Ge, Naga Ayachitula, and Xiaolan Zhang. 2013. A statistical machine learning approach for ticket mining in IT service delivery. In IFIP/IEEE Symposium on Integrated Network Management. 541--546.Google Scholar
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning . 282 -- 289. Google ScholarDigital Library
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 260--270.Google Scholar
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.Google Scholar
Ta Hsin Li, Rong Liu, Noi Sukaviriya, Ying Li, Jeaha Yang, Michael Sandin, and Juhnyoung Lee. 2014. Incident Ticket Analytics for IT Application Management Services. In IEEE International Conference on Services Computing. 568--574. Google ScholarDigital Library
Xuezhe Ma and Eduard Hovy. 2016. End-to-End Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 1064--1074.Google ScholarCross Ref
Gabor Melli and Christian Romming. 2012. An Overview of the CPROD1 Contest on Consumer Product Recognition within User Generated Postings and Normalization against a Large Product Catalog. In IEEE International Conference on Data Mining Workshops . Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Conference on Neural Information Processing Systems. 3111--3119. Google ScholarDigital Library
David Nadeau. 2007. Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision . Ph.D. Dissertation. University of Ottawa. Google ScholarDigital Library
Xuelian Pan, Erjia Yan, Qianqian Wang, and Weina Hua. 2015. Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics , Vol. 9, 4 (2015), 860--871.Google ScholarCross Ref
Chiu Jason P.C. and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics , Vol. 4 (2016), 357--370.Google ScholarCross Ref
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing. 1532--1543.Google ScholarCross Ref
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics .Google ScholarCross Ref
Karthikeyan Ponnalagu. 2017. Ontology-driven root-cause analytics for user-reported symptoms in managed IT systems . IBM Journal of Research and Development , Vol. 61, 1 (2017), 5--53. Google ScholarDigital Library
Rahul Potharaju, Joseph Chan, Luhui Hu, Cristina Nita-Rotaru, Mingshi Wang, Liyuan Zhang, and Navendu Jain. 2015. ConfSeer: Leveraging Customer Support Knowledge Bases for Automated Misconfiguration Detection. Proceedings of the VLDB Endowment , Vol. 8, 12 (2015). Google ScholarDigital Library
Rahul Potharaju, Navendu Jain, and Cristina Nita-Rotaru. 2013. Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets.. In USENIX Symposium on Networked Systems Design and Implementation . 127--141. Google ScholarDigital Library
Duangmanee Pew Putthividhya and Junling Hu. 2011. Bootstrapped named entity recognition for product attribute extraction. In Conference on Empirical Methods in Natural Language Processing. 1557--1567. Google ScholarDigital Library
Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Conference on Computational Natural Language Learning. 147--155. Google ScholarDigital Library
Stefan Rüd, Massimiliano Ciaramita, Jens Müller, and Hinrich Schütze. 2011. Piggyback: Using search engines for robust cross-domain named entity recognition. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 965--975. Google ScholarDigital Library
Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval .Cambridge University Press.Google Scholar
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun'ichi Tsujii. 2012. BRAT: a web-based tool for NLP-assisted text annotation. In European Chapter of the Association for Computational Linguistics. 102--107. Google ScholarDigital Library
Charles Sutton and Andrew McCallum. 2012. An introduction to conditional random fields. Foundations and Trends® in Machine Learning , Vol. 4, 4 (2012), 267--373. Google ScholarDigital Library
Henry S Vieira, Altigran S da Silva, Pável Calado, Marco Cristo, and Edleno S de Moura. 2016. Towards the Effective Linking of Social Media Contents to Products in E-Commerce Catalogs. In International Conference on Information and Knowledge Management. 1049--1058. Google ScholarDigital Library
Qing Wang, Wubai Zhou, Chunqiu Zeng, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2017. Constructing the knowledge base for cognitive IT service management. In IEEE International Conference on Services Computing. 410--417.Google ScholarCross Ref
Shen Wei, Wang Jianyong, and Han Jiawei. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering , Vol. 27, 2 (2015), 443--460.Google ScholarCross Ref
Sen Wu, Zhanpeng Fang, and Jie Tang. 2012. Accurate product name recognition from user generated content. In IEEE International Conference on Data Mining Workshops. 874--877. Google ScholarDigital Library
Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efficiently Answering Technical Questions - A Knowledge Graph Approach. In AAAI Conference on Artificial Intelligence .Google Scholar
Yangjie Yao and Aixin Sun. 2016. Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web , Vol. 19, 5 (2016), 783--805. Google ScholarDigital Library
Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li, and Nachiket Kapre. 2016a. Software-specific named entity recognition in software engineering social content. In IEEE Conference on Software Analysis, Evolution, and Reengineering , Vol. 1. 90--101.Google ScholarCross Ref
Deheng Ye, Zhenchang Xing, Chee Yong Foo, Jing Li, and Nachiket Kapre. 2016b. Learning to extract api mentions from informal natural language discussions. In International Conference on Software Maintenance and Evolution . 389--399.Google ScholarCross Ref
Wubai Zhou, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2015a. Recommending ticket resolution using feature adaptation. In International Conference on Network and Service Management. 15--21. Google ScholarDigital Library
Wubai Zhou, Liang Tang, Tao Li, Larisa Shwartz, and Genady Ya Grabarnik. 2015b. Resolution recommendation for event tickets in service management. In IFIP/IEEE Symposium on Integrated Network Management. 287--295.Google ScholarCross Ref

Index Terms

Recommendations

Lightweight Multilingual Entity Extraction and Linking
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to ...
Read More
Exploiting anonymous entity mentions for named entity linking
Abstract
Named entity linking or named entity disambiguation is to link entity mentions to corresponding entities in a knowledge base for resolving the ambiguity of entity mentions. Recently, collective linking methods exploit document-level coherence of ...
Read More
Populating knowledge base with collective entity mentions: a graph-based approach
ASONAM '14: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Populating a knowledge base with new entity mentions extracted from unstructured text can help enhance its coverage and freshness. It naturally consists of two subtasks, namely, fine-grained entity classification and entity linking. Existing studies ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity extraction
entity linking
support ticket
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 287
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Effective Extraction and Linking of Software Mentions from User-Generated Support Tickets

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lightweight Multilingual Entity Extraction and Linking

Exploiting anonymous entity mentions for named entity linking

Populating knowledge base with collective entity mentions: a graph-based approach