ABSTRACT
We introduce a novel dataset of real multi-destination trips booked through Booking.com's online travel platform. The dataset consists of 1.5 million reservations representing 359,000 unique journeys made across 39,000 destinations. As such, the data is particularly well suited to model sequential recommendation and retrieval problems in a high cardinality target space. To preserve user privacy and protect business-sensitive statistics, the data is fully anonymized, sampled and limited to five user origin markets. Even so, the dataset is representative of the general travel purchase behavior and therefore presents a uniquely valuable resource for Machine Learning and information retrieval researchers. This work provides an overview of the dataset. It reports several benchmark results for relevant recommendation problems obtained as part of the recently held Booking.com data challenge during the WSDM WebTour workshop.
Supplemental Material
- Jens Adamczak, Gerard-Paul Leyson, Peter Knees, Yashar Deldjoo, Farshad Bakhshandegan Moghaddam, Julia Neidhardt, Wolfgang Wörndl, and Philipp Monreal. 2019. Session-based hotel recommendations: Challenges and future directions. arXiv preprint arXiv:1908.00071 (2019).Google Scholar
- Mart'in Baigorria Alonso. 2021. Data Augmentation Using Many-To-Many RNNs for Session-Aware Recommender Systems. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent cross: Making use of context in recurrent recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 46--54.Google ScholarDigital Library
- Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1--4.Google ScholarDigital Library
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarDigital Library
- Barbara Rychalska, Konrad Gouchowski, and Jacek Dbrowski. 2021. Modeling Multi-Destination Trips with Sketch-Based Model. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour'21) .Google Scholar
- Dmitri Goldenberg, Kostia Kofman, Javier Albert, Sarai Mizrachi, Adam Horowitz, and Irene Teinemaa. 2021 a. Personalization in Practice: Methods and Applications. In Proceedings of the 14th International Conference on Web Search and Data Mining .Google ScholarDigital Library
- Dmitri Goldenberg, Kostia Kofman, Pavel Levin, Sarai Mizrachi, Maayan Kafry, and Guy Nadav. 2021 b. Booking.com WSDM WebTour 2021 Data Challenge. http://www.bookingchallenge.com. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Shotaro Ishihara, Shuhei Goda, and Yuya Matsumura. 2021. Weighted Averaging of Various LSTM Models for Next Destination Recommendation. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Paweł Jankiewicz, Liudmyla Kyrashchuk, Paweł Sienkowski, and Magdalena Wójcik. 2019. Boosting algorithms for a session-based, context-aware recommender system in an online travel domain. In Proceedings of the Workshop on ACM Recommender Systems Challenge. 1--5.Google ScholarDigital Library
- Dietmar Jannach, Gabriel de Souza P. Moreira, and Even Oldridge. 2020. Why Are Deep Learning Models Not Consistently Winning Recommender Systems Competitions Yet? A Position Paper. In Proceedings of the Recommender Systems Challenge 2020. 44--49.Google Scholar
- Tsvi Kuflik, Catalin Mihai Barbu, Amra Deli?, Dmitri Goldenberg, Julia Neidhardt, Ludocik Coba, and Markus Zanker. 2021. WebTour 2021 Workshop on Web and Tourism. In Proceedings of the 14th International Conference on Web Search and Data Mining .Google ScholarDigital Library
- Tobias Lang and Matthias Rettenmeier. 2017. Understanding consumer behavior with recurrent neural networks. In Workshop on Machine Learning Methods for Recommender Systems .Google Scholar
- Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).Google Scholar
- Sarai Mizrachi and Pavel Levin. 2019. Combining Context Features in Sequence-Aware Recommender Systems.. In RecSys (Late-Breaking Results). 11--15.Google Scholar
- Aleksandr Petrov and Yuriy Makarov. 2021. Attention-based neural re-ranking approach for next city in trip recommendations. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-aware recommender systems. ACM Computing Surveys (CSUR), Vol. 51, 4 (2018), 1--36.Google ScholarDigital Library
- C Quoc and Viet Le. 2007. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, Vol. 19 (2007), 193--200.Google Scholar
- Barbara Rychalska, Piotr Bka bel, Konrad Gołuchowski, Andrzej Michałowski, and Jacek Dka browski. 2021. Cleora: A Simple, Strong and Scalable Graph Embedding Scheme. arXiv preprint arXiv:2102.02302 (2021).Google Scholar
- Marlesson RO Santana and Anderson Soares. 2021. Hybrid Model with Time Modeling for Sequential Recommender Systems. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Benedikt Schifferer, Chris Deotte, Jean-Francois Puget, Gabriel de Souza Pereira Moreira, Gilberto Titericz, Jiwei Liu, and Ronay Ak. 2021. Using Deep Learning to Win the Booking.com WSDM WebTour21 Challenge on Sequential Recommendations. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21) .Google Scholar
- Benedikt Schifferer, Gilberto Titericz, Chris Deotte, Christof Henkel, Kazuki Onodera, Jiwei Liu, Bojan Tunguz, Even Oldridge, Gabriel De Souza Pereira Moreira, and Ahmet Erdem. 2020. GPU Accelerated Feature Engineering and Training for Recommender Systems. In Proceedings of the Recommender Systems Challenge 2020. 16--23.Google Scholar
- Gourav G Shenoy, Mangirish A Wagle, and Anwar Shaikh. 2017. Kaggle competition: Expedia hotel recommendations. arXiv preprint arXiv:1703.02915 (2017).Google Scholar
- Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1441--1450.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. https://arxiv.org/pdf/1706.03762.pdfGoogle Scholar
- Yuanzhe Zhou, Shikang Wu, and Chenyang Zheng. 2021. Explore next destination prediction. In Proceedings of the ACM WSDM Workshop on Web Tourism (WSDM Webtour '21).Google Scholar
Index Terms
- Booking.com Multi-Destination Trips Dataset
Recommendations
The plista dataset
NRS '13: Proceedings of the 2013 International News Recommender Systems Workshop and ChallengeReleasing datasets has fostered research in fields such as information retrieval and recommender systems. Datasets are typically tailored for specific scenarios. In this work, we present the plista dataset. The dataset contains a collection of news ...
RecSys Challenge 2022 Dataset: Dressipi 1M Fashion Sessions
RecSysChallenge '22: Proceedings of the Recommender Systems Challenge 2022As part of the RecSys Challenge 2022, the Dressipi 1M Fashion Sessions dataset is publicly released. This paper gives an overview of the content and structure of the dataset, as well as explaining the process by which it was constructed. The dataset ...
A dataset of clone references with gaps
MSR 2014: Proceedings of the 11th Working Conference on Mining Software RepositoriesThis paper introduces a new dataset of clone references, which is a set of correct clones consisting of their locational information with their gapped lines. Bellon's dataset is one of widely used clone datasets. Bellon's dataset contains many clone ...
Comments