ABSTRACT
As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system; (3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engine's performance.
Supplemental Material
- Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. 2020. Better fine-tuning by reducing representational collapse. arXiv:2008.03156 (2020).Google Scholar
- Kristjan Arumae, Qing Sun, and Parminder Bhatia. 2020. An Empirical Investigation towards Efficient Multi-Domain Language Model Pre-training. In EMNLP'20.Google ScholarCross Ref
- Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and M. Auli. 2019. Cloze-driven Pretraining of Self-attention Networks. In EMNLP'19.Google Scholar
- Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The longdocument transformer. arXiv:2004.05150 (2020).Google Scholar
- Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning (2010).Google Scholar
- Zhe Cao, Tao Qin, T. Liu, Ming-Feng Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML'07.Google Scholar
- Wei-Cheng Chang, X Yu Felix, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2019. Pre-training Tasks for Embedding-based Large-scale Retrieval. In ICLR'19.Google Scholar
- Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Largescale validation and analysis of interleaved search evaluation. TOIS (2012).Google Scholar
- O. Chapelle and Y. Zhang. 2009. A dynamic bayesian network click model for web search ranking. In WWW'09.Google Scholar
- J. Chen, Hande Dong, Xiao lei Wang, Fuli Feng, Ming-Chieh Wang, and X. He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv:2010.03240 (2020).Google Scholar
- Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. 2020. Rethinking attention with performers. arXiv:2009.14794 (2020).Google Scholar
- William S Cooper, Fredric C Gey, and Daniel P Dabney. 1992. Probabilistic retrieval based on staged logistic regression. In SIGIR'92.Google ScholarDigital Library
- J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT'19.Google Scholar
- Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. JMLR (2003).Google Scholar
- Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. JCSS (1997).Google Scholar
- Luyu Gao, Zhuyun Dai, and J. Callan. 2020. Modularized Transfomer-based Ranking Framework. In EMNLP'20.Google Scholar
- Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.Google ScholarDigital Library
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In CIKM'16.Google ScholarDigital Library
- Suchin Gururangan, Ana Marasovi?, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv:2004.10964 (2020).Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).Google Scholar
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM'13.Google ScholarDigital Library
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, MatthewTang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR'18.Google ScholarCross Ref
- K. Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. In SIGIR'17.Google Scholar
- Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2019. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv:1911.03437 (2019).Google Scholar
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In SIGKDD'02.Google ScholarDigital Library
- Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In SIGIR'20.Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google Scholar
- Nikita Kitaev, ?ukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv:2001.04451 (2020).Google Scholar
- Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018).Google Scholar
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942 (2019).Google Scholar
- Jinhyuk Lee,Wonjin Yoon, Sungdong Kim, D. Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2020).Google Scholar
- Ping Li, Qiang Wu, and Christopher Burges. 2007. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. NIPS'07 (2007).Google Scholar
- Jimmy Lin, Rodrigo Nogueira, and A. Yates. 2020. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).Google Scholar
- Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval (2009).Google Scholar
- Yiding Liu, Weixue Lu, Suqi Cheng, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, and Dawei Yin. 2021. Pre-trained Language Model forWeb-scale Retrieval in Baidu Search. In SIGKDD'21.Google Scholar
- Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2020. PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval. arXiv:2010.10137 (2020).Google Scholar
- Oded Z Maimon and Lior Rokach. 2014. Data mining with decision trees: theory and applications. World scientific.Google Scholar
- Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP'18.Google Scholar
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019).Google Scholar
- Rodrigo Nogueira, W. Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-Stage Document Ranking with BERT. arXiv:1910.14424 (2019).Google Scholar
- Morteza Mousa Pasandi, Mohsen Hajabdollahi, Nader Karimi, and Shadrokh Samavi. 2020. Modeling of Pruning Techniques for Deep Neural Networks Simplification. arXiv:2001.04062 (2020).Google Scholar
- Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. Are loss functions all the same? Neural computation (2004).Google Scholar
- Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM'14.Google ScholarDigital Library
- Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv preprint arXiv:1904.09223 (2019).Google Scholar
- Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv:2009.06732 (2020).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, L. Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS'17.Google ScholarDigital Library
- SinongWang, Belinda Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv:2006.04768 (2020).Google Scholar
- Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR'17.Google ScholarDigital Library
- Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS'19 (2019).Google Scholar
- Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et al. 2016. Ranking relevance in yahoo search. In SIGKDD'16.Google ScholarDigital Library
- Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In ICML'20.Google Scholar
- R. Zhang, Revanth Reddy Gangi Reddy, Md Arafat Sultan, V. Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, S. Roukos, A. Sil, and T. Ward. 2020. Multi-Stage Pre-training for Low-Resource Domain Adaptation. arXiv:2010.05904 (2020).Google Scholar
- Shiqi Zhao, H. Wang, Chao Li, T. Liu, and Y. Guan. 2011. Automatically Generating Questions from Queries for Community-based Question Answering. In IJCNLP'11.Google Scholar
- Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2020. Whole-Chain Recommendations. In CIKM'20.Google Scholar
- Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR'07.Google ScholarDigital Library
- Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, and Xiang Ren. 2020. Pre-training text-to-text transformers for concept centric common sense. arXiv:2011.07956 (2020).Google Scholar
- Daniel M Ziegler, Nisan Stiennon, JeffreyWu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv:1909.08593 (2019).Google Scholar
- Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In SIGKDD'19.Google ScholarDigital Library
- Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In WSDM'20.Google ScholarDigital Library
- Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In SIGIR'20.Google Scholar
Index Terms
- Pre-trained Language Model based Ranking in Baidu Search
Recommendations
Pre-trained Language Model-based Retrieval and Ranking for Web Search
Pre-trained language representation models (PLMs) such as BERT and Enhanced Representation through kNowledge IntEgration (ERNIE) have been integral to achieving recent improvements on various downstream tasks, including information retrieval. However, it ...
Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningSearch engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebModern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Comments