research-article

Open Access

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability

Authors:
Jean-François Kagy

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Flip Korn

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Afshin Rostamizadeh

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Chris Welty

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 3116–3126https://doi.org/10.1145/3534678.3539152

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3116–3126

ABSTRACT

Here we leverage the power of the crowd: online users who are willing to answer questions about dish availability at restaurants visited. While motivated users are happy to contribute knowledge, they are much less likely to respond to "silly'' or embarrassing questions (e.g., "DoesPizza Hut serve pizza?'' or "DoesMike's Vegan Restaurant serve steak?'')

In this paper, we study the problem of Vexation-Aware Active Learning (VAAL), where judiciously selected questions are targeted towards improving restaurant-dish model prediction, subject to a limit on the percentage of "unsure'' answers or "dismissals'' (e.g., swiping the app closed) measuring user vexation. We formalize the selection problem as an integer program and solve it efficiently using a distributed solution that scales linearly with the number of candidate questions. Since our algorithm relies on an accurate estimation of the unsure-dismiss rate (UDR), we present a regression model that provides high-quality results compared to baselines including collaborative filtering. Finally, we demonstrate in a live system that our proposed VAAL strategy performs competitively against classical (margin-based) active learning approaches while reducing the UDR for the questions being asked.

References

Omar Alonso, Catherine C. Marshall, and Marc Najork. 2013. A Human-Centered Framework for Ensuring Reliability on Crowdsourced Labeling Tasks. In Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, An Adjunct to the Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, November 7--9, 2013, Palm Springs, CA, USA (AAAI Workshops), Vol. WS-13--18. AAAI . http://www.aaai.org/ocs/index.php/HCOMP/HCOMP13/paper/view/7487Google ScholarCross Ref
David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O'Donoghue, and Warren Schudy. 2022. Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient. arxiv: math.OC/2106.04756Google Scholar
Kalesha Bullard, Yannick Schroecker, and Sonia Chernova. 2019. Active Learning within Constrained Environments through Imitation of an Expert Questioner. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019, Sarit Kraus (Ed.). ijcai.org, 2045--2052. https://doi.org/10.24963/ijcai.2019/283Google ScholarCross Ref
Antonin Chambolle and Thomas Pock. 2011. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision , Vol. 40, 1 (2011), 120--145. http://dblp.uni-trier.de/db/journals/jmiv/jmiv40.html#ChambolleP11Google ScholarDigital Library
Wei Chu, Martin Zinkevich, Lihong Li, Achint Thomas, and Belle L. Tseng. 2011. Unbiased online active learning in data streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21--24, 2011, Chid Apté, Joydeep Ghosh, and Padhraic Smyth (Eds.). ACM, 195--203. https://doi.org/10.1145/2020408.2020444Google ScholarDigital Library
Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, and Sanjiv Kumar. 2021. Batch Active Learning at Scale. Advances in Neural Information Processing Systems , Vol. 34 (2021).Google Scholar
Peng Dai, Jeffrey M. Rzeszotarski, Praveen Paritosh, and Ed H. Chi. 2015. And Now for Something Completely Different: Improving Crowdsourcing Workflows with Micro-Diversions. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW 2015, Vancouver, BC, Canada, March 14 - 18, 2015 , , Dan Cosley, Andrea Forte, Luigina Ciolfi, and David McDonald (Eds.). ACM, 628--638. https://doi.org/10.1145/2675133.2675260Google ScholarDigital Library
Pinar Donmez, Jaime G. Carbonell, and Paul N. Bennett. 2007. Dual Strategy Active Learning. In Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17--21, 2007, Proceedings (Lecture Notes in Computer Science), , Joost N. Kok, Jacek Koronacki, Ramó n Ló pez de Má ntaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron (Eds.), Vol. 4701. Springer, 116--127. https://doi.org/10.1007/978--3--540--74958--5_14Google Scholar
Pinar Donmez, Jaime G Carbonell, and Jeff Schneider. 2009. Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 259--268.Google ScholarDigital Library
Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active Learning by Querying Informative and Representative Examples. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6--9 December 2010, Vancouver, British Columbia, Canada , , John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta (Eds.). Curran Associates, Inc., 892--900. https://proceedings.neurips.cc/paper/2010/hash/5487315b1286f907165907aa8fc96619-Abstract.htmlGoogle Scholar
Sheng-Jun Huang, Jia-Lve Chen, Xin Mu, and Zhi-Hua Zhou. 2017. Cost-Effective Active Learning from Diverse Labelers.. In IJCAI . 1879--1885.Google Scholar
Panagiotis G. Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: targeted crowdsourcing with a billion (potential) users. In 23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7--11, 2014 , , Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM , 143--154. https://doi.org/10.1145/2566486.2567988Google ScholarDigital Library
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer , Vol. 42, 8 (Aug. 2009), 30--37.Google ScholarDigital Library
Evgeny Krivosheev, Siarhei Bykau, Fabio Casati, and Sunil Prabhakar. 2020. Detecting and Preventing Confused Labels in Crowdsourced Data. Proc. VLDB Endow. , Vol. 13, 11 (2020), 2522--2535. http://www.vldb.org/pvldb/vol13/p2522-krivosheev.pdfGoogle ScholarDigital Library
Nikolaos Lagos, Salah Ait-Mokhtar, and Ioan Calapodescu. 2020. Point-Of-Interest Semantic Tag Completion in a Global Crowdsourced Search-and-Discovery Database. In ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020) (Frontiers in Artificial Intelligence and Applications), Giuseppe De Giacomo, Alejandro Catalá, Bistra Dilkina, Michela Milano, Sené n Barro, Alberto Bugar'i n, and Jé rô me Lang (Eds.), Vol. 325. IOS Press, 2993--3000. https://doi.org/10.3233/FAIA200474Google Scholar
Steffen Rendle, Walid Krichene, Li Zhang, and John R. Anderson. 2020. Neural Collaborative Filtering vs. Matrix Factorization Revisited. In RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22--26, 2020, Rodrygo L. T. Santos, Leandro Balby Marinho, Elizabeth M. Daly, Li Chen, Kim Falk, Noam Koenigstein, and Edleno Silva de Moura (Eds.). ACM , 240--248. https://doi.org/10.1145/3383313.3412488Google ScholarDigital Library
Burr Settles. 2009. Active Learning Literature Survey . Computer Sciences Technical Report 1648. University of Wisconsin--Madison. http://axon.cs.byu.edu/ martinez/classes/778/Papers/settles.activelearning.pdfGoogle Scholar
Dominic Seyler, Mohamed Yahya, Klaus Berberich, and Omar Alonso. 2016. Automated question generation for quality control in human computation tasks. In Proceedings of the 8th ACM Conference on Web Science, WebSci 2016, Hannover, Germany, May 22--25, 2016, Wolfgang Nejdl, Wendy Hall, Paolo Parigi, and Steffen Staab (Eds.). ACM , 360--362. https://doi.org/10.1145/2908131.2908210Google ScholarDigital Library
Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . 614--622.Google ScholarDigital Library
Luis von Ahn and Laura Dabbish. 2008. Designing games with a purpose. Commun. ACM , Vol. 51, 8 (2008), 58--67. https://doi.org/10.1145/1378704.1378719Google ScholarDigital Library
Chris Welty, Lora Aroyo, Flip Korn, Sara McCarthy, and Shubin Zhao. 2021. Rapid Instance-Level Knowledge Acquisition for Google Maps from Class-Level Common Sense. In Proceedings of HCOMP-2021 . AAAI.Google ScholarCross Ref
Chris Welty, Lora Aroyo, Flip Korn, Sara M. McCarthy, and Shubin Zhao. 2022. Addressing Label Sparsity with Class-Level Common Sense for Google Maps. Frontiers Artif. Intell. , Vol. 5 (2022).Google ScholarCross Ref

Index Terms

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Active lmitation learning: formal and practical reductions to I.I.D. learning

In standard passive imitation learning, the goal is to learn a policy that performs as well as a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and ...
Read More
A review and experimental analysis of active learning over crowdsourced data
Abstract
Training data creation is increasingly a key bottleneck for developing machine learning, especially for deep learning systems. Active learning provides a cost-effective means for creating training data by selecting the most informative instances ...
Read More
Multiple-view multiple-learner active learning

Generally, collecting a large quantity of unlabeled examples is feasible, but labeling them all is not. Active learning can reduce the number of labeled examples needed to train a good classifier. Existing active learning algorithms can be roughly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Check for updates
Author Tags
active learning
crowdsourcing
user-generated content
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 251
  Total Downloads
- Downloads (Last 12 months)100
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active lmitation learning: formal and practical reductions to I.I.D. learning

A review and experimental analysis of active learning over crowdsourced data

Multiple-view multiple-learner active learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Vexation-Aware Active Learning for On-Menu Restaurant Dish Availability

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active lmitation learning: formal and practical reductions to I.I.D. learning

A review and experimental analysis of active learning over crowdsourced data

Multiple-view multiple-learner active learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media