skip to main content
10.1145/3447548.3467147acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Pre-trained Language Model based Ranking in Baidu Search

Published:14 August 2021Publication History

ABSTRACT

As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system; (3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engine's performance.

Skip Supplemental Material Section

Supplemental Material

kdd_baidu_rank.mp4

mp4

220.8 MB

References

  1. Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. 2020. Better fine-tuning by reducing representational collapse. arXiv:2008.03156 (2020).Google ScholarGoogle Scholar
  2. Kristjan Arumae, Qing Sun, and Parminder Bhatia. 2020. An Empirical Investigation towards Efficient Multi-Domain Language Model Pre-training. In EMNLP'20.Google ScholarGoogle ScholarCross RefCross Ref
  3. Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and M. Auli. 2019. Cloze-driven Pretraining of Self-attention Networks. In EMNLP'19.Google ScholarGoogle Scholar
  4. Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The longdocument transformer. arXiv:2004.05150 (2020).Google ScholarGoogle Scholar
  5. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning (2010).Google ScholarGoogle Scholar
  6. Zhe Cao, Tao Qin, T. Liu, Ming-Feng Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML'07.Google ScholarGoogle Scholar
  7. Wei-Cheng Chang, X Yu Felix, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2019. Pre-training Tasks for Embedding-based Large-scale Retrieval. In ICLR'19.Google ScholarGoogle Scholar
  8. Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Largescale validation and analysis of interleaved search evaluation. TOIS (2012).Google ScholarGoogle Scholar
  9. O. Chapelle and Y. Zhang. 2009. A dynamic bayesian network click model for web search ranking. In WWW'09.Google ScholarGoogle Scholar
  10. J. Chen, Hande Dong, Xiao lei Wang, Fuli Feng, Ming-Chieh Wang, and X. He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv:2010.03240 (2020).Google ScholarGoogle Scholar
  11. Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. 2020. Rethinking attention with performers. arXiv:2009.14794 (2020).Google ScholarGoogle Scholar
  12. William S Cooper, Fredric C Gey, and Daniel P Dabney. 1992. Probabilistic retrieval based on staged logistic regression. In SIGIR'92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT'19.Google ScholarGoogle Scholar
  14. Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. JMLR (2003).Google ScholarGoogle Scholar
  15. Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. JCSS (1997).Google ScholarGoogle Scholar
  16. Luyu Gao, Zhuyun Dai, and J. Callan. 2020. Modularized Transfomer-based Ranking Framework. In EMNLP'20.Google ScholarGoogle Scholar
  17. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In CIKM'16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Suchin Gururangan, Ana Marasovi?, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv:2004.10964 (2020).Google ScholarGoogle Scholar
  20. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  21. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM'13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, MatthewTang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR'18.Google ScholarGoogle ScholarCross RefCross Ref
  23. K. Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. In SIGIR'17.Google ScholarGoogle Scholar
  24. Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2019. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv:1911.03437 (2019).Google ScholarGoogle Scholar
  25. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In SIGKDD'02.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In SIGIR'20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google ScholarGoogle Scholar
  28. Nikita Kitaev, ?ukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv:2001.04451 (2020).Google ScholarGoogle Scholar
  29. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018).Google ScholarGoogle Scholar
  30. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942 (2019).Google ScholarGoogle Scholar
  31. Jinhyuk Lee,Wonjin Yoon, Sungdong Kim, D. Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2020).Google ScholarGoogle Scholar
  32. Ping Li, Qiang Wu, and Christopher Burges. 2007. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. NIPS'07 (2007).Google ScholarGoogle Scholar
  33. Jimmy Lin, Rodrigo Nogueira, and A. Yates. 2020. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).Google ScholarGoogle Scholar
  34. Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval (2009).Google ScholarGoogle Scholar
  35. Yiding Liu, Weixue Lu, Suqi Cheng, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, and Dawei Yin. 2021. Pre-trained Language Model forWeb-scale Retrieval in Baidu Search. In SIGKDD'21.Google ScholarGoogle Scholar
  36. Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2020. PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval. arXiv:2010.10137 (2020).Google ScholarGoogle Scholar
  37. Oded Z Maimon and Lior Rokach. 2014. Data mining with decision trees: theory and applications. World scientific.Google ScholarGoogle Scholar
  38. Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP'18.Google ScholarGoogle Scholar
  39. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019).Google ScholarGoogle Scholar
  40. Rodrigo Nogueira, W. Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-Stage Document Ranking with BERT. arXiv:1910.14424 (2019).Google ScholarGoogle Scholar
  41. Morteza Mousa Pasandi, Mohsen Hajabdollahi, Nader Karimi, and Shadrokh Samavi. 2020. Modeling of Pruning Techniques for Deep Neural Networks Simplification. arXiv:2001.04062 (2020).Google ScholarGoogle Scholar
  42. Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. Are loss functions all the same? Neural computation (2004).Google ScholarGoogle Scholar
  43. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM'14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv preprint arXiv:1904.09223 (2019).Google ScholarGoogle Scholar
  45. Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv:2009.06732 (2020).Google ScholarGoogle Scholar
  46. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, L. Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS'17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. SinongWang, Belinda Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv:2006.04768 (2020).Google ScholarGoogle Scholar
  48. Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR'17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS'19 (2019).Google ScholarGoogle Scholar
  50. Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et al. 2016. Ranking relevance in yahoo search. In SIGKDD'16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In ICML'20.Google ScholarGoogle Scholar
  52. R. Zhang, Revanth Reddy Gangi Reddy, Md Arafat Sultan, V. Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, S. Roukos, A. Sil, and T. Ward. 2020. Multi-Stage Pre-training for Low-Resource Domain Adaptation. arXiv:2010.05904 (2020).Google ScholarGoogle Scholar
  53. Shiqi Zhao, H. Wang, Chao Li, T. Liu, and Y. Guan. 2011. Automatically Generating Questions from Queries for Community-based Question Answering. In IJCNLP'11.Google ScholarGoogle Scholar
  54. Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2020. Whole-Chain Recommendations. In CIKM'20.Google ScholarGoogle Scholar
  55. Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR'07.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, and Xiang Ren. 2020. Pre-training text-to-text transformers for concept centric common sense. arXiv:2011.07956 (2020).Google ScholarGoogle Scholar
  57. Daniel M Ziegler, Nisan Stiennon, JeffreyWu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv:1909.08593 (2019).Google ScholarGoogle Scholar
  58. Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In SIGKDD'19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In WSDM'20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In SIGIR'20.Google ScholarGoogle Scholar

Index Terms

  1. Pre-trained Language Model based Ranking in Baidu Search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
        August 2021
        4259 pages
        ISBN:9781450383325
        DOI:10.1145/3447548

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 August 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader