research-article

A Bayesian Approach for Quantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning

Authors:
Tahera Hossain

University of Michigan and Kyushu Institute of Technology, Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, Japan

University of Michigan and Kyushu Institute of Technology, Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, Japan

0000-0002-8985-9043
View Profile

,
Wanggang Shen

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI

0000-0002-6824-9393
View Profile

,
Anindya Antar

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI

0000-0001-9912-8757
View Profile

,
Snehal Prabhudesai

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI

0000-0003-0179-4466
View Profile

,
Sozo Inoue

Kyushu Institute of Technology, Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, Japan

Kyushu Institute of Technology, Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, Japan

0000-0003-1109-8130
View Profile

,
Xun Huan

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI

0000-0001-6544-2764
View Profile

,
Nikola Banovic

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI

0000-0002-2790-3264
View Profile

Authors Info & Claims

ACM Transactions on Computer-Human Interaction Volume 30 Issue 1Article No.: 8pp 1–27https://doi.org/10.1145/3551388

Published:07 March 2023Publication History

ACM Transactions on Computer-Human Interaction

Abstract

Computational models that formalize complex human behaviors enable study and understanding of such behaviors. However, collecting behavior data required to estimate the parameters of such models is often tedious and resource intensive. Thus, estimating dataset size as part of data collection planning (also known as Sample Size Determination) is important to reduce the time and effort of behavior data collection while maintaining an accurate estimate of model parameters. In this article, we present a sample size determination method based on Uncertainty Quantification (UQ) for a specific Inverse Reinforcement Learning (IRL) model of human behavior, in two cases: (1) pre-hoc experiment design—conducted in the planning stage before any data is collected, to guide the estimation of how many samples to collect; and (2) post-hoc dataset analysis—performed after data is collected, to decide if the existing dataset has sufficient samples and whether more data is needed. We validate our approach in experiments with a realistic model of behaviors of people with Multiple Sclerosis (MS) and illustrate how to pick a reasonable sample size target. Our work enables model designers to perform a deeper, principled investigation of the effects of dataset size on IRL model parameters.

REFERENCES

[1] Adaimi Rebecca and Thomaz Edison. 2019. Leveraging active learning and conditional mutual information to minimize data annotation in human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 23 pages. DOI:Google ScholarDigital Library
[2] Adcock C. J.. 1997. Sample size determination: A review. Journal of the Royal Statistical Society: Series D (The Statistician) 46, 2 (1997), 261–283. DOI:Google ScholarCross Ref
[3] Alejo Roberto, García Vicente, and Pacheco J.. 2015. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Processing Letters 42, 3 (2015), 603–617. DOI:Google ScholarDigital Library
[4] Banovic Nikola, Buzali Tofi, Chevalier Fanny, Mankoff Jennifer, and Dey Anind K.. 2016. Modeling and understanding human routine behavior. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,248–260. DOI:Google ScholarDigital Library
[5] Banovic Nikola, Grossman Tovi, and Fitzmaurice George. 2013. The effect of time-based cost of error in target-directed pointing tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY,1373–1382. DOI:Google ScholarDigital Library
[6] Banovic Nikola, Mankoff Jennifer, and Dey Anind K.. 2018. Computational model of human routine behaviors. In Proceedings of the Computational Interaction. Oulasvirta Antti, Kristensson Per Ola, Bi Xiaojun, and Howes Andrew (Eds.), Oxford University Press, Oxford, 377–398.Google Scholar
[7] Banovic Nikola, Oulasvirta Antti, and Kristensson Per Ola. 2019. Computational modeling in human-computer interaction. In Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,1–7. DOI:Google ScholarDigital Library
[8] Banovic Nikola, Wang Anqi, Jin Yanfeng, Chang Christie, Ramos Julian, Dey Anind, and Mankoff Jennifer. 2017. Leveraging human routine models to detect and generate human behaviors. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.ACM, New York, NY,6683–6694. DOI:Google ScholarDigital Library
[9] Begoli Edmon, Bhattacharya Tanmoy, and Kusnezov Dimitri. 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20–23. DOI:Google ScholarCross Ref
[10] Berger James O.. 1985. Statistical Decision Theory and Bayesian Analysis. Springer New York, New York, NY. DOI:Google ScholarCross Ref
[11] Bernardo Jose M. and Smith Adrian F. M.. 2000. Bayesian Theory. John Wiley & Sons, New York, NY.Google Scholar
[12] Bishop Christopher. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York.Google ScholarDigital Library
[13] Blei David M., Kucukelbir Alp, and McAuliffe Jon D.. 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 518 (2017), 859–877. DOI:Google ScholarCross Ref
[14] Boonyanunta Natthaphan and Zeephongsekul Panlop. 2004. Predicting the relationship between the size of training sample and the predictive power of classifiers. In Proceedings of the Knowledge-based Intelligent Information and Engineering Systems. Negoita Mircea Gh., Howlett Robert J., and Jain Lakhmi C. (Eds.), Springer, Berlin,529–535.Google ScholarCross Ref
[15] Breiman Leo. 2001. Statistical modeling: The two cultures. StatisticalScience 16, 3 (2001), 199–231.Google Scholar
[16] Brooks Steve, Gelman Andrew, Jones Galin, and Meng Xiao-Li (Eds.), 2011. Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC. DOI:Google ScholarCross Ref
[17] Brown Daniel S. and Niekum Scott. 2017. Efficient probabilistic performance bounds for inverse reinforcement learning. arXiv preprint arXiv:1707.00724 (2017).Google Scholar
[18] Brown Daniel S. and Niekum Scott. 2018. Efficient probabilistic performance bounds for inverse reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
[19] Chaloner Kathryn and Verdinelli Isabella. 1995. Bayesian experimental design: A review. Statistical Science 10, 3 (1995), 273–304. DOI:Google ScholarCross Ref
[20] Chang Youngjae, Mathur Akhil, Isopoussu Anton, Song Junehwa, and Kawsar Fahim. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies 4, 1 (2020), 30 pages. DOI:Google ScholarDigital Library
[21] Chen Xiuli, Starke Sandra Dorothee, Baber Chris, and Howes Andrew. 2017. A cognitive model of how people make decisions through interaction with visual displays. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,1205–1216. DOI:Google ScholarDigital Library
[22] Cohen Jacob. 1977. CHAPTER 1 - The concepts of power analysis. In Proceedings of the Statistical Power Analysis for the Behavioral Sciences. Cohen Jacob (Ed.), Academic Press, 1–17. DOI:Google ScholarCross Ref
[23] Cover Thomas A. and Thomas Joy A.. 2006. Elements of Information Theory (2nd ed.). John Wiley & Sons, Hoboken, NJ.Google Scholar
[24] Crespo Luis G., Kenny Sean P., and Giesy Daniel P.. 2014. The NASA langley multidisciplinary uncertainty quantification challenge. In Proceedings of the 16th AIAA Non-Deterministic Approaches Conference. American Institute of Aeronautics and Astronautics, Reston, Virginia. DOI:Google ScholarCross Ref
[25] Cui Yuchen and Niekum Scott. 2017. Active learning from critiques via bayesian inverse reinforcement learning. In Proceedings of the Robotics: Science and Systems Workshop on Mathematical Models, Algorithms, and Human-Robot Interaction.Google Scholar
[26] Eagle Nathan and Pentland Alex Sandy. 2009. Eigenbehaviors: Identifying structure in routine. Behavioral Ecology and Sociobiology 63, 7 (2009), 1057–1066. DOI:Google ScholarCross Ref
[27] Farrahi Katayoun and Gatica-Perez Daniel. 2012. Extracting mobile behavioral patterns with the distant N-gram topic model. In Proceedings of the 2012 16th International Symposium on Wearable Computers. 1–8.Google ScholarDigital Library
[28] Fiebrink Rebecca, Cook Perry R., and Trueman Dan. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,147–156. DOI:Google ScholarDigital Library
[29] Figueroa Rosa L., Zeng-Treitler Qing, Kandula Sasikiran, and Ngo Long H.. 2012. Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making 12, 1 (2012), 8.Google ScholarCross Ref
[30] Finn Chelsea, Levine Sergey, and Abbeel Pieter. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the 33rd International Conference on International Conference on Machine Learning.JMLR.org, 49–58.Google Scholar
[31] Foreman-Mackey Daniel, Hogg David W., Lang Dustin, and Goodman Jonathan. 2013. emcee: The MCMC hammer. Publications of the Astronomical Society of the Pacific 125, 925 (2013), 306.Google ScholarCross Ref
[32] Fukunaga K. and Hayes R. R.. 1989. Effects of sample size in classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 8 (1989), 873–885.Google ScholarDigital Library
[33] Gebhardt Christoph, Hecox Brian, Opheusden Bas van, Wigdor Daniel, Hillis James, Hilliges Otmar, and Benko Hrvoje. 2019. Learning cooperative personalized policies from gaze data. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology.Association for Computing Machinery, New York, NY,197–208. DOI:Google ScholarDigital Library
[34] Gebhardt Christoph, Oulasvirta Antti, and Hilliges Otmar. 2020. Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. arXiv:cs.AI/2001.02122.Google Scholar
[35] Gershman Samuel J., Horvitz Eric J., and Tenenbaum Joshua B.. 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245 (2015), 273–278. arXiv:https://science.sciencemag.org/content/349/6245/273.full.pdf.Google ScholarCross Ref
[36] Ghanem Roger, Higdon David, and Owhadi Houman (Eds.). 2017. Handbook of uncertainty quantification. Springer International Publishing, Cham. arXiv:1507.00398.Google ScholarCross Ref
[37] Gilks W. R., Richardson S., and Spiegelhalter D. J.. 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall, New York, NY.Google Scholar
[38] Glowacka Dorota, Ruotsalo Tuukka, Konuyshkova Ksenia, Athukorala kumaripaba, Kaski Samuel, and Jacucci Giulio. 2013. Directing exploratory search: Reinforcement learning from user interactions with keywords. In Proceedings of the 2013 International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,117–128. DOI:Google ScholarDigital Library
[39] Goodman Jonathan and Weare Jonathan. 2010. Ensemble samplers with affine invariance. Communications in Applied Mathematics and Computational Science 5, 1 (2010), 65–80.Google ScholarCross Ref
[40] Hauser Stephen L. and Oksenberg Jorge R.. 2006. The neurobiology of multiple sclerosis: Genes, inflammation, and neurodegeneration. Neuron 52, 1 (2006), 61–76.Google ScholarCross Ref
[41] Huan Xun and Marzouk Youssef M.. 2013. Simulation-based optimal Bayesian experimental design for nonlinear systems. Journal of Computational Physics 232, 1 (2013), 288–317. DOI:Google ScholarDigital Library
[42] Imani Mahdi and Braga-Neto Ulisses M.. 2018. Control of gene regulatory networks using Bayesian inverse reinforcement learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 4 (2018), 1250–1261.Google ScholarDigital Library
[43] Inoue Sozo, Lago Paula, Hossain Tahera, Mairittha Tittaya, and Mairittha Nattaya. 2019. Integrating activity recognition and nursing care records: The system, deployment, and a verification study. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 24 pages. DOI:Google ScholarDigital Library
[44] Inoue Sozo and Pan Xincheng. 2016. Supervised and unsupervised transfer learning for activity recognition from simple in-home sensors. In Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services.Association for Computing Machinery, New York, NY,20–27. DOI:Google ScholarDigital Library
[45] Japkowicz Nathalie and Stephen Shaju. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429–449.Google ScholarCross Ref
[46] Jaynes Edwin T. and Bretthorst G. L.. 2003. Probability Theory: The Logic of Science. Cambridge University Press.Google ScholarCross Ref
[47] Joyce James M.. 2011. Kullback–Leibler Divergence. Springer, Berlin,720–722. DOI:Google ScholarCross Ref
[48] Kaelbling Leslie Pack, Littman Michael L., and Moore Andrew W.. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 1 (1996), 237–285.Google ScholarCross Ref
[49] Kangasrääsiö Antti and Kaski Samuel. 2018. Inverse reinforcement learning from summary data. Machine Learning 107, 8 (2018), 1517–1535. DOI:Google ScholarDigital Library
[50] Kangasrääsiö Antti, Jokinen Jussi P. P., Oulasvirta Antti, Howes Andrew, and Kaski Samuel. 2019. Parameter inference for computational cognitive models with approximate bayesian computation. Cognitive Science 43, 6 (2019), e12738. DOI:Google ScholarCross Ref
[51] Kennedy Marc. C. and O’Hagan Anthony. 2001. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 3 (2001), 425–464. DOI:Google ScholarCross Ref
[52] Kotseruba Iuliia and Tsotsos John K.. 2020. 40 years of cognitive architectures: Core cognitive abilities and practical applications. Artificial Intelligence Review 53, 1 (2020), 17–94. DOI:Google ScholarDigital Library
[53] Kratz Anna L., Braley Tiffany J., Foxen-Craft Emily, Scott Eric, III John F. Murphy, and Murphy Susan L.. 2017. How do pain, fatigue, depressive, and cognitive symptoms relate to well-being and social and physical functioning in the daily lives of individuals with multiple sclerosis? Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2160–2166.Google ScholarCross Ref
[54] Kratz Anna L., Murphy Susan L., and Braley Tiffany J.. 2017. Ecological momentary assessment of pain, fatigue, depressive, and cognitive symptoms reveals significant daily variability in multiple sclerosis. Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2142–2150.Google ScholarCross Ref
[55] Kratz Anna L., Murphy Susan L., and Braley Tiffany J.. 2017. Pain, fatigue, and cognitive symptoms are temporally associated within but not across days in multiple sclerosis. Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2151–2159.Google ScholarCross Ref
[56] Kullback Solomon and Leibler Richard A.. 1951. On information and sufficiency. The Annals of Mathematical Statistics 22, 1 (1951), 79–86.Google ScholarCross Ref
[57] Leino Katri, Oulasvirta Antti, and Kurimo Mikko. 2019. RL-KLM: Automating keystroke-level modeling with reinforcement learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,476–480. DOI:Google ScholarDigital Library
[58] Leino Katri, Todi Kashyap, Oulasvirta Antti, and Kurimo Mikko. 2019. Computer-supported form design using keystroke-level modeling with reinforcement learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion.Association for Computing Machinery, New York, NY, 85–86. DOI:Google ScholarDigital Library
[59] Lenth Russell V.. 2001. Some practical guidelines for effective sample size determination. The American Statistician 55, 3 (2001), 187–193. DOI:Google ScholarCross Ref
[60] Lewis Richard L., Howes Andrew, and Singh Satinder. 2014. Computational rationality: Linking mechanism and behavior through bounded utility maximization. Topics in Cognitive Science 6, 2 (2014), 279–311. DOI:Google ScholarCross Ref
[61] Li Nan, Kambhampati Subbarao, and Yoon Sungwook. 2009. Learning probabilistic hierarchical task networks to capture user preferences. In Proceedings of the International Joint Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/view/417/874.Google Scholar
[62] Lindley Dennis V.. 1997. The choice of sample size. Journal of the Royal Statistical Society. Series D (The Statistician) 46, 2 (1997), 129–138. Retrieved from http://www.jstor.org/stable/2988516.Google Scholar
[63] Liu Qiang and Wang Dilin. 2016. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Proceedings of the Advances in Neural Information Processing Systems 29. Barcelona, Spain, 2378–2386.Google Scholar
[64] Magnusson Magnus S.. 2000. Discovering hidden time patterns in behavior: T-patterns and their detection. Behavior Research Methods, Instruments, and Computers 32, 1 (2000), 93–110. DOI:Google ScholarCross Ref
[65] Mann Gideon S. and McCallum Andrew. 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. Journal of Machine Learning Research 11, 32 (2010), 955–984. Retrieved from http://jmlr.org/papers/v11/mann10a.html.Google ScholarDigital Library
[66] Maxwell Scott E., Kelley Ken, and Rausch Joseph R.. 2008. Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology 59, 1 (2008), 537–563. DOI:Google ScholarCross Ref
[67] Melnik Roderick. 2015. Universality of Mathematical Models in Understanding Nature, Society, and Man-Made World. John Wiley & Sons, Ltd., 1–16. DOI:Google ScholarCross Ref
[68] Müller Peter. 2005. Simulation based optimal design. Handbook of Statistics 25 (2005), 509–518. DOI:Google ScholarCross Ref
[69] Ng Andrew Y. and Russell Stuart J.. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning.Morgan Kaufmann Publishers Inc., San Francisco, CA,663–670.Google Scholar
[70] Oberkampf William L., Trucano Timothy G., and Hirsch Charles. 2004. Verification, validation, and predictive capability in computational engineering and physics. Applied Mechanics Reviews 57, 5 (2004), 345–384. DOI:Google ScholarCross Ref
[71] O’Hagan Anthony, Buck Caitlin E., Daneshkhah Alireza, Eiser J. Richard, Garthwaite Paul H., Jenkinson David J., Oakley Jeremy E., and Rakow Tim. 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons, Ltd, Chichester, United Kingdom. DOI:Google ScholarCross Ref
[72] Oulasvirta Antti, Jokinen Jussi P. P., and Howes Andrew. 2022. Computational rationality as a theory of interaction. In Proceedingsof the 2022 CHI Conference on Human Factors in Computing Systems.Google Scholar
[73] Parmigiani Giovanni and Inoue Lurdes Y. T.. 2009. Decision Theory: Principles and Approaches. John Wiley & Sons, Inc., West Sussex, United Kingdom. Retrieved from http://books.google.com/books?id=mnjGCYqWj7EC&pgis=1.Google ScholarCross Ref
[74] Pilch Martin, Trucano Timothy G., and Helton Jon C.. 2006. Ideas Underlying Quantification of Margins and Uncertainties (QMU): A White Paper. Technical Report. Sandia National Laboratories.Google Scholar
[75] Puterman Martin L.. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons.Google Scholar
[76] Ramachandran Deepak and Amir Eyal. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2586–2591.Google Scholar
[77] Robert Christian P. and Casella George. 2004. Monte Carlo Statistical Methods. Springer New York, NY. DOI:Google ScholarCross Ref
[78] Rojas-Barahona Lina M. and Cerisara Christophe. 2014. Bayesian inverse reinforcement learning for modeling conversational agents in a virtual environment. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 503–514.Google ScholarDigital Library
[79] Ross Stephane, Gordon Geoffrey J., and Bagnell J. Andrew. 2011. No-regret reductions for imitation learning and structured prediction. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics.Google Scholar
[80] Sadilek Adam and Krumm John. 2012. Far out: Predicting long-term human mobility. In Proceedings of the AAAI Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4845/5275.Google Scholar
[81] Salsman John M., Victorson David, Choi Seung W., Peterman Amy H., Heinemann Allen W., Nowinski Cindy, and Cella David. 2013. Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system. Quality of Life Research 22, 9 (2013), 2569–2580.Google ScholarCross Ref
[82] Sarcar Sayan, Jokinen Jussi P. P., Oulasvirta Antti, Wang Zhenxin, Silpasuwanchai Chaklam, and Ren Xiangshi. 2018. Ability-based optimization of touchscreen interactions. IEEE Pervasive Computing 17, 1 (2018), 15–26. DOI:Google ScholarDigital Library
[83] Scott David W.. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.Google ScholarCross Ref
[84] Settles Burr. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
[85] Sivia D. S. and Skilling J.. 2006. Data Analysis: A Bayesian Tutorial (2nd ed.). Oxford University Press, New York, NY.Google Scholar
[86] Soukoreff R. William and MacKenzie I. Scott. 2004. Towards a standard for pointing device evaluation, perspectives on 27 years of fitts’ law research in HCI. International Journal of Human-Computer Studies 61, 6 (2004), 751–789. DOI:Google ScholarDigital Library
[87] Venkatraman Arun, Hebert Martial, and Bagnell J.. 2015. Improving multi-step prediction of learned time series models. In Proceedings of the AAAI Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9592/9976.Google ScholarCross Ref
[88] Toussaint Udo Von. 2011. Bayesian inference in physics. Reviews of Modern Physics 83, 3 (2011), 943–999. DOI:Google ScholarCross Ref
[89] Wilson Robert C. and Collins Anne G. E.. 2019. Ten simple rules for the computational modeling of behavioral data. eLife 8 (2019), e49547. DOI:Google ScholarCross Ref
[90] Xu Xuhai, Chikersal Prerna, Doryab Afsaneh, Villalba Daniella K., Dutcher Janine M., Tumminia Michael J., Althoff Tim, Cohen Sheldon, Creswell Kasey G., Creswell J. David, Mankoff Jennifer, and Dey Anind K.. 2019. Leveraging routine behavior and contextually-filtered features for depression detection among college students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3, (2019), 33 pages. DOI:Google ScholarDigital Library
[91] Yao Shuochao, Zhao Yiran, Shao Huajie, Zhang Aston, Zhang Chao, Li Shen, and Abdelzaher Tarek. 2018. RDeepSense: Reliable deep mobile computing models with uncertainty estimations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 26 pages. DOI:Google ScholarDigital Library
[92] Yao Shuochao, Zhao Yiran, Shao Huajie, Zhang Chao, Zhang Aston, Hu Shaohan, Liu Dongxin, Liu Shengzhong, Su Lu, and Abdelzaher Tarek. 2018. SenseGAN: Enabling deep learning for internet of things with a semi-supervised framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3, (2018), 21 pages. DOI:Google ScholarDigital Library
[93] Zeng Yunxiu, Xu Kai, Yin Quanjun, Qin L., Zha Yabing, and Yeoh William. 2018. Inverse reinforcement learning based human behavior modeling for goal recognition in dynamic local network interdiction. In Proceedings of the AAAI Workshops.Google Scholar
[94] Ziebart Brian, Dey Anind, and Bagnell J. Andrew. 2012. Probabilistic pointing target prediction via inverse optimal control. In Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,1–10. DOI:Google ScholarDigital Library
[95] Ziebart Brian D., Bagnell J. Andrew, and Dey Anind K.. 2010. Modeling interaction via the principle of maximum causal entropy. In Proceedings of the 27th International Conference on International Conference on Machine Learning.Omnipress, 1255–1262. Retrieved from http://dl.acm.org/citation.cfm?id=3104322.3104481.Google ScholarDigital Library
[96] Ziebart Brian D., Maas Andrew L., Dey Anind K., and Bagnell J. Andrew. 2008. Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior. In Proceedings of the 10th International Conference on Ubiquitous Computing. Association for Computing Machinery, New York, NY, 322–331. DOI:Google ScholarDigital Library
[97] Ziemssen Tjalf, Kern Raimar, and Thomas Katja. 2016. Multiple sclerosis: Clinical profiling and data collection as prerequisite for personalized medicine approach. BMC Neurology 16, 1 (2016), 124.Google ScholarCross Ref

Index Terms

A Bayesian Approach for Quantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models

Recommendations

Inverse reinforcement learning from summary data

Inverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data. However, traditional IRL methods are only applicable when the observations are in the form of state-action ...
Read More
Preference elicitation and inverse reinforcement learning
ECMLPKDD'11: Proceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a ...
Read More
Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer-Human Interaction Volume 30, Issue 1
February 2023
537 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/3585399
Editors:
Kristina Höök
KTH Royal Institute of Technology, Sweden
,
Kasper Hornbæk
University of Copenhagen, Denmark
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2023
- Online AM: 27 July 2022
- Accepted: 30 May 2022
- Revised: 30 April 2022
- Received: 20 January 2021
Published in tochi Volume 30, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Sample size determination
behavior modeling
inverse reinforcement learning
bayesian inference
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 858
  Total Downloads
- Downloads (Last 12 months)412
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

A Bayesian Approach for Quantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning

ACM Transactions on Computer-Human Interaction

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Inverse reinforcement learning from summary data

Preference elicitation and inverse reinforcement learning

Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling