skip to main content
10.1145/3397271.3401143acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Match²: A Matching over Matching Model for Similar Question Identification

Authors Info & Claims
Published:25 July 2020Publication History

ABSTRACT

Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take aone-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose atwo-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match2, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et almbox. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K Roy, and Kevin A Schneider. 2016. Mining duplicate questions in stack overflow. In MSR. ACM, 402--412.Google ScholarGoogle Scholar
  3. Alberto Barrón-Cedeno, Simone Filice, Giovanni Da San Martino, Shafiq Joty, Lluís Màrquez, Preslav Nakov, and Alessandro Moschitti. 2015. Thread-level information for comment classification in community question answering. In ACL. 687--693.Google ScholarGoogle Scholar
  4. Andrei Z Broder. 1997. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES. IEEE, 21--29.Google ScholarGoogle Scholar
  5. Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. 2011. Learning the Latent Topics for Question Retrieval in Community QA. In IJCNLP.Google ScholarGoogle Scholar
  6. Xin Cao, Gao Cong, Bin Cui, Christian S Jensen, and Quan Yuan. 2012. Approaches to exploring category information for question retrieval in community question-answer archives. TOIS, Vol. 30, 2 (2012), 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yunbo Cao, Huizhong Duan, Chin-Yew Lin, Yong Yu, and Hsiao-Wuen Hon. 2008. Recommending questions using the mdl-based tree cut model. In WWW. ACM, 81--90.Google ScholarGoogle Scholar
  8. David Carmel, Avihai Mejer, Yuval Pinter, and Idan Szpektor. 2014. Improving term weighting for community question answering search using syntactic analysis. In CIKM. ACM, 351--360.Google ScholarGoogle Scholar
  9. Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2016. Enhanced lstm for natural language inference. arXiv (2016).Google ScholarGoogle Scholar
  10. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv (2014).Google ScholarGoogle Scholar
  11. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL (2018).Google ScholarGoogle Scholar
  12. Huizhong Duan, Yunbo Cao, Chin-Yew Lin, and Yong Yu. 2008. Searching questions by identifying question topic and question focus. In ACL. 156--164.Google ScholarGoogle Scholar
  13. Simone Filice, Giovanni Da San Martino, and Alessandro Moschitti. 2017. KeLP at SemEval-2017 task 3: Learning pairwise patterns in community question answering. In SemEval-2017. 326--333.Google ScholarGoogle Scholar
  14. Bent Fuglede and Flemming Topsoe. 2004. Jensen-Shannon divergence and Hilbert space embedding. In ISIT. IEEE, 31.Google ScholarGoogle Scholar
  15. Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural language inference over interaction space. arXiv (2017).Google ScholarGoogle Scholar
  16. Sparsh Gupta and Vitor R Carvalho. 2019. FAQ Retrieval Using Attentive Matching. In SIGIR. ACM, 929--932.Google ScholarGoogle Scholar
  17. Dan Gusfield. 1997. Algorithms on strings, trees, and sequences: computer science and computational biology.Google ScholarGoogle Scholar
  18. Francisco Guzmán, Lluís Màrquez, and Preslav Nakov. 2019. Machine translation evaluation meets community question answering. arXiv (2019).Google ScholarGoogle Scholar
  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google ScholarGoogle Scholar
  20. Doris Hoogeveen, Andrew Bennett, Yitong Li, Karin M Verspoor, and Timothy Baldwin. 2018. Detecting misflagged duplicate questions in community question-answering archives. In AAAI.Google ScholarGoogle Scholar
  21. Doris Hoogeveen, Karin M Verspoor, and Timothy Baldwin. 2015. CQADupStack: A benchmark data set for community question-answering research. In ADCS. ACM, 3.Google ScholarGoogle Scholar
  22. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700--4708.Google ScholarGoogle Scholar
  23. Paul Jaccard. 1901. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat, Vol. 37 (1901), 547--579.Google ScholarGoogle Scholar
  24. Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. arxiv: cs.CL/1902.10186Google ScholarGoogle Scholar
  25. Jiwoon Jeon, W Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In CIKM. ACM, 84--90.Google ScholarGoogle Scholar
  26. Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer topic model for question retrieval in community question answering. In CIKM. ACM, 2471--2474.Google ScholarGoogle Scholar
  27. Di Liang, Fubao Zhang, Weidong Zhang, Qi Zhang, Jinlan Fu, Minlong Peng, Tao Gui, and Xuanjing Huang. 2019. Adaptive Multi-Attention Network Incorporating Answer Information for Duplicate Question Detection. (2019).Google ScholarGoogle Scholar
  28. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv (2013).Google ScholarGoogle Scholar
  29. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the variance of the adaptive learning rate and beyond. arXiv (2019).Google ScholarGoogle Scholar
  30. Alessandro Moschitti. 2006. Efficient convolution kernels for dependency and constituent syntactic trees. In ECML. Springer, 318--329.Google ScholarGoogle Scholar
  31. Preslav Nakov, Lluís Màrquez, and Francisco Guzmán. 2016. It takes three to tango: triangulation approach to answer ranking in community question answering. In EMNLP. 1586--1597.Google ScholarGoogle Scholar
  32. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In AAAI.Google ScholarGoogle Scholar
  33. Xipeng Qiu and Xuanjing Huang. 2015. Convolutional neural tensor network architecture for community-based question answering. In IJCAI.Google ScholarGoogle Scholar
  34. Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et almbox. 1995. Okapi at TREC-3. NIST SP, Vol. 109 (1995), 109.Google ScholarGoogle Scholar
  35. Lorenza Romano, Milen Kouylekov, Idan Szpektor, Ido Dagan, and Alberto Lavelli. 2006. Investigating a generic paraphrase-based approach for relation extraction. In EACL.Google ScholarGoogle Scholar
  36. Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kurohashi. 2019. FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance. SIGIR (2019).Google ScholarGoogle Scholar
  37. Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, and Sadao Kurohashi. 2012. Tsubaki: An open search engine infrastructure for developing information access methodology. Journal of information processing, Vol. 20, 1 (2012), 216--227.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. arXiv (2018).Google ScholarGoogle Scholar
  39. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.Google ScholarGoogle Scholar
  40. Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016a. A deep architecture for semantic matching with multiple positional sentence representations. In AAAI.Google ScholarGoogle Scholar
  41. Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016b. Match-srnn: Modeling the recursive matching structure with spatial rnn. IJCAI (2016).Google ScholarGoogle Scholar
  42. Mengqiu Wang, Noah A Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In EMNLP-CoNLL. 22--32.Google ScholarGoogle Scholar
  43. Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv (2017).Google ScholarGoogle Scholar
  44. Guoshun Wu, Yixuan Sheng, Man Lan, and Yuanbin Wu. 2017. ECNU at SemEval-2017 Task 3: Using Traditional and Deep Learning Methods to Address Community Question Answering Task. In SemEval-2017. 365--369.Google ScholarGoogle Scholar
  45. Wei Wu, Xu Sun, and Houfeng Wang. 2018. Question condensing networks for answer selection in community question answering. In ACL. 1746--1755.Google ScholarGoogle Scholar
  46. Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft. 2008. Retrieval models for question and answer archives. In SIGIR. ACM, 475--482.Google ScholarGoogle Scholar
  47. Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, and Zhong Chen. 2013. Cqarank: jointly model topics and expertise in community question answering. In CIKM. ACM, 99--108.Google ScholarGoogle Scholar
  48. Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, and Haiqing Chen. 2019. Simple and Effective Text Matching with Richer Alignment Features. ACL (2019).Google ScholarGoogle Scholar
  49. Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In COLING. ACL, 947--953.Google ScholarGoogle Scholar
  50. Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. TOIS, Vol. 22, 2 (2004), 179--214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kai Zhang, Wei Wu, Haocheng Wu, Zhoujun Li, and Ming Zhou. 2014. Question retrieval with high quality answers in community question answering. In CIKM. ACM, 371--380.Google ScholarGoogle Scholar
  52. Guangyou Zhou, Li Cai, Jun Zhao, and Kang Liu. 2011a. Phrase-based translation model for question retrieval in community question answer archives. In ACL. ACL, 653--662.Google ScholarGoogle Scholar
  53. Guangyou Zhou, Yubo Chen, Daojian Zeng, and Jun Zhao. 2013a. Towards faster and better retrieval models for question search. In CIKM. ACM, 2139--2148.Google ScholarGoogle Scholar
  54. Guangyou Zhou, Yang Liu, Fang Liu, Daojian Zeng, and Jun Zhao. 2013b. Improving question retrieval in community question answering using world knowledge. In IJCAI.Google ScholarGoogle Scholar
  55. Tom Chao Zhou, Chin-Yew Lin, Irwin King, Michael R Lyu, Young-In Song, and Yunbo Cao. 2011b. Learning to suggest questions in online forums. In AAAI.Google ScholarGoogle Scholar

Index Terms

  1. Match²: A Matching over Matching Model for Similar Question Identification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2020
      2548 pages
      ISBN:9781450380164
      DOI:10.1145/3397271

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader