ABSTRACT
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take aone-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose atwo-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match2, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et almbox. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv (2016).Google ScholarDigital Library
- Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K Roy, and Kevin A Schneider. 2016. Mining duplicate questions in stack overflow. In MSR. ACM, 402--412.Google Scholar
- Alberto Barrón-Cedeno, Simone Filice, Giovanni Da San Martino, Shafiq Joty, Lluís Màrquez, Preslav Nakov, and Alessandro Moschitti. 2015. Thread-level information for comment classification in community question answering. In ACL. 687--693.Google Scholar
- Andrei Z Broder. 1997. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES. IEEE, 21--29.Google Scholar
- Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. 2011. Learning the Latent Topics for Question Retrieval in Community QA. In IJCNLP.Google Scholar
- Xin Cao, Gao Cong, Bin Cui, Christian S Jensen, and Quan Yuan. 2012. Approaches to exploring category information for question retrieval in community question-answer archives. TOIS, Vol. 30, 2 (2012), 7.Google ScholarDigital Library
- Yunbo Cao, Huizhong Duan, Chin-Yew Lin, Yong Yu, and Hsiao-Wuen Hon. 2008. Recommending questions using the mdl-based tree cut model. In WWW. ACM, 81--90.Google Scholar
- David Carmel, Avihai Mejer, Yuval Pinter, and Idan Szpektor. 2014. Improving term weighting for community question answering search using syntactic analysis. In CIKM. ACM, 351--360.Google Scholar
- Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2016. Enhanced lstm for natural language inference. arXiv (2016).Google Scholar
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv (2014).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL (2018).Google Scholar
- Huizhong Duan, Yunbo Cao, Chin-Yew Lin, and Yong Yu. 2008. Searching questions by identifying question topic and question focus. In ACL. 156--164.Google Scholar
- Simone Filice, Giovanni Da San Martino, and Alessandro Moschitti. 2017. KeLP at SemEval-2017 task 3: Learning pairwise patterns in community question answering. In SemEval-2017. 326--333.Google Scholar
- Bent Fuglede and Flemming Topsoe. 2004. Jensen-Shannon divergence and Hilbert space embedding. In ISIT. IEEE, 31.Google Scholar
- Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural language inference over interaction space. arXiv (2017).Google Scholar
- Sparsh Gupta and Vitor R Carvalho. 2019. FAQ Retrieval Using Attentive Matching. In SIGIR. ACM, 929--932.Google Scholar
- Dan Gusfield. 1997. Algorithms on strings, trees, and sequences: computer science and computational biology.Google Scholar
- Francisco Guzmán, Lluís Màrquez, and Preslav Nakov. 2019. Machine translation evaluation meets community question answering. arXiv (2019).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
- Doris Hoogeveen, Andrew Bennett, Yitong Li, Karin M Verspoor, and Timothy Baldwin. 2018. Detecting misflagged duplicate questions in community question-answering archives. In AAAI.Google Scholar
- Doris Hoogeveen, Karin M Verspoor, and Timothy Baldwin. 2015. CQADupStack: A benchmark data set for community question-answering research. In ADCS. ACM, 3.Google Scholar
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700--4708.Google Scholar
- Paul Jaccard. 1901. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat, Vol. 37 (1901), 547--579.Google Scholar
- Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. arxiv: cs.CL/1902.10186Google Scholar
- Jiwoon Jeon, W Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In CIKM. ACM, 84--90.Google Scholar
- Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer topic model for question retrieval in community question answering. In CIKM. ACM, 2471--2474.Google Scholar
- Di Liang, Fubao Zhang, Weidong Zhang, Qi Zhang, Jinlan Fu, Minlong Peng, Tao Gui, and Xuanjing Huang. 2019. Adaptive Multi-Attention Network Incorporating Answer Information for Duplicate Question Detection. (2019).Google Scholar
- Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv (2013).Google Scholar
- Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the variance of the adaptive learning rate and beyond. arXiv (2019).Google Scholar
- Alessandro Moschitti. 2006. Efficient convolution kernels for dependency and constituent syntactic trees. In ECML. Springer, 318--329.Google Scholar
- Preslav Nakov, Lluís Màrquez, and Francisco Guzmán. 2016. It takes three to tango: triangulation approach to answer ranking in community question answering. In EMNLP. 1586--1597.Google Scholar
- Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In AAAI.Google Scholar
- Xipeng Qiu and Xuanjing Huang. 2015. Convolutional neural tensor network architecture for community-based question answering. In IJCAI.Google Scholar
- Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et almbox. 1995. Okapi at TREC-3. NIST SP, Vol. 109 (1995), 109.Google Scholar
- Lorenza Romano, Milen Kouylekov, Idan Szpektor, Ido Dagan, and Alberto Lavelli. 2006. Investigating a generic paraphrase-based approach for relation extraction. In EACL.Google Scholar
- Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kurohashi. 2019. FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance. SIGIR (2019).Google Scholar
- Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, and Sadao Kurohashi. 2012. Tsubaki: An open search engine infrastructure for developing information access methodology. Journal of information processing, Vol. 20, 1 (2012), 216--227.Google ScholarCross Ref
- Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. arXiv (2018).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.Google Scholar
- Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016a. A deep architecture for semantic matching with multiple positional sentence representations. In AAAI.Google Scholar
- Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016b. Match-srnn: Modeling the recursive matching structure with spatial rnn. IJCAI (2016).Google Scholar
- Mengqiu Wang, Noah A Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In EMNLP-CoNLL. 22--32.Google Scholar
- Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv (2017).Google Scholar
- Guoshun Wu, Yixuan Sheng, Man Lan, and Yuanbin Wu. 2017. ECNU at SemEval-2017 Task 3: Using Traditional and Deep Learning Methods to Address Community Question Answering Task. In SemEval-2017. 365--369.Google Scholar
- Wei Wu, Xu Sun, and Houfeng Wang. 2018. Question condensing networks for answer selection in community question answering. In ACL. 1746--1755.Google Scholar
- Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft. 2008. Retrieval models for question and answer archives. In SIGIR. ACM, 475--482.Google Scholar
- Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, and Zhong Chen. 2013. Cqarank: jointly model topics and expertise in community question answering. In CIKM. ACM, 99--108.Google Scholar
- Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, and Haiqing Chen. 2019. Simple and Effective Text Matching with Richer Alignment Features. ACL (2019).Google Scholar
- Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In COLING. ACL, 947--953.Google Scholar
- Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. TOIS, Vol. 22, 2 (2004), 179--214.Google ScholarDigital Library
- Kai Zhang, Wei Wu, Haocheng Wu, Zhoujun Li, and Ming Zhou. 2014. Question retrieval with high quality answers in community question answering. In CIKM. ACM, 371--380.Google Scholar
- Guangyou Zhou, Li Cai, Jun Zhao, and Kang Liu. 2011a. Phrase-based translation model for question retrieval in community question answer archives. In ACL. ACL, 653--662.Google Scholar
- Guangyou Zhou, Yubo Chen, Daojian Zeng, and Jun Zhao. 2013a. Towards faster and better retrieval models for question search. In CIKM. ACM, 2139--2148.Google Scholar
- Guangyou Zhou, Yang Liu, Fang Liu, Daojian Zeng, and Jun Zhao. 2013b. Improving question retrieval in community question answering using world knowledge. In IJCAI.Google Scholar
- Tom Chao Zhou, Chin-Yew Lin, Irwin King, Michael R Lyu, Young-In Song, and Yunbo Cao. 2011b. Learning to suggest questions in online forums. In AAAI.Google Scholar
Index Terms
Match²: A Matching over Matching Model for Similar Question Identification
Recommendations
A community question-answering refinement system
HT '11: Proceedings of the 22nd ACM conference on Hypertext and hypermediaCommunity Question Answering (CQA) websites, which archive millions of questions and answers created by CQA users to provide a rich resource of information that is missing at web search engines and QA websites, have become increasingly popular. Web ...
Predicting web searcher satisfaction with existing community-based answers
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalCommunity-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 ...
Evaluating and predicting answer quality in community QA
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalQuestion answering (QA) helps one go beyond traditional keywords-based querying and retrieve information in more precise form than given by a document or a list of documents. Several community-based QA (CQA) services have emerged allowing information ...
Comments