Abstract
Computational models that formalize complex human behaviors enable study and understanding of such behaviors. However, collecting behavior data required to estimate the parameters of such models is often tedious and resource intensive. Thus, estimating dataset size as part of data collection planning (also known as Sample Size Determination) is important to reduce the time and effort of behavior data collection while maintaining an accurate estimate of model parameters. In this article, we present a sample size determination method based on Uncertainty Quantification (UQ) for a specific Inverse Reinforcement Learning (IRL) model of human behavior, in two cases: (1) pre-hoc experiment design—conducted in the planning stage before any data is collected, to guide the estimation of how many samples to collect; and (2) post-hoc dataset analysis—performed after data is collected, to decide if the existing dataset has sufficient samples and whether more data is needed. We validate our approach in experiments with a realistic model of behaviors of people with Multiple Sclerosis (MS) and illustrate how to pick a reasonable sample size target. Our work enables model designers to perform a deeper, principled investigation of the effects of dataset size on IRL model parameters.
- [1] . 2019. Leveraging active learning and conditional mutual information to minimize data annotation in human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 23 pages.
DOI: Google ScholarDigital Library - [2] . 1997. Sample size determination: A review. Journal of the Royal Statistical Society: Series D (The Statistician) 46, 2 (1997), 261–283.
DOI: Google ScholarCross Ref - [3] . 2015. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Processing Letters 42, 3 (2015), 603–617.
DOI: Google ScholarDigital Library - [4] . 2016. Modeling and understanding human routine behavior. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,248–260.
DOI: Google ScholarDigital Library - [5] . 2013. The effect of time-based cost of error in target-directed pointing tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY,1373–1382.
DOI: Google ScholarDigital Library - [6] . 2018. Computational model of human routine behaviors. In Proceedings of the Computational Interaction. , , , and (Eds.), Oxford University Press, Oxford, 377–398.Google Scholar
- [7] . 2019. Computational modeling in human-computer interaction. In Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,1–7.
DOI: Google ScholarDigital Library - [8] . 2017. Leveraging human routine models to detect and generate human behaviors. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.ACM, New York, NY,6683–6694.
DOI: Google ScholarDigital Library - [9] . 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20–23.
DOI: Google ScholarCross Ref - [10] . 1985. Statistical Decision Theory and Bayesian Analysis. Springer New York, New York, NY.
DOI: Google ScholarCross Ref - [11] . 2000. Bayesian Theory. John Wiley & Sons, New York, NY.Google Scholar
- [12] . 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York.Google ScholarDigital Library
- [13] . 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 518 (2017), 859–877.
DOI: Google ScholarCross Ref - [14] . 2004. Predicting the relationship between the size of training sample and the predictive power of classifiers. In Proceedings of the Knowledge-based Intelligent Information and Engineering Systems. , , and (Eds.), Springer, Berlin,529–535.Google ScholarCross Ref
- [15] . 2001. Statistical modeling: The two cultures. StatisticalScience 16, 3 (2001), 199–231.Google Scholar
- [16] , , , and (Eds.), 2011. Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC.
DOI: Google ScholarCross Ref - [17] . 2017. Efficient probabilistic performance bounds for inverse reinforcement learning. arXiv preprint arXiv:1707.00724 (2017).Google Scholar
- [18] . 2018. Efficient probabilistic performance bounds for inverse reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [19] . 1995. Bayesian experimental design: A review. Statistical Science 10, 3 (1995), 273–304.
DOI: Google ScholarCross Ref - [20] . 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies 4, 1 (2020), 30 pages.
DOI: Google ScholarDigital Library - [21] . 2017. A cognitive model of how people make decisions through interaction with visual displays. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,1205–1216.
DOI: Google ScholarDigital Library - [22] . 1977. CHAPTER 1 - The concepts of power analysis. In Proceedings of the Statistical Power Analysis for the Behavioral Sciences. (Ed.), Academic Press, 1–17.
DOI: Google ScholarCross Ref - [23] . 2006. Elements of Information Theory (2nd ed.). John Wiley & Sons, Hoboken, NJ.Google Scholar
- [24] . 2014. The NASA langley multidisciplinary uncertainty quantification challenge. In Proceedings of the 16th AIAA Non-Deterministic Approaches Conference. American Institute of Aeronautics and Astronautics, Reston, Virginia.
DOI: Google ScholarCross Ref - [25] . 2017. Active learning from critiques via bayesian inverse reinforcement learning. In Proceedings of the Robotics: Science and Systems Workshop on Mathematical Models, Algorithms, and Human-Robot Interaction.Google Scholar
- [26] . 2009. Eigenbehaviors: Identifying structure in routine. Behavioral Ecology and Sociobiology 63, 7 (2009), 1057–1066.
DOI: Google ScholarCross Ref - [27] . 2012. Extracting mobile behavioral patterns with the distant N-gram topic model. In Proceedings of the 2012 16th International Symposium on Wearable Computers. 1–8.Google ScholarDigital Library
- [28] . 2011. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.Association for Computing Machinery, New York, NY,147–156.
DOI: Google ScholarDigital Library - [29] . 2012. Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making 12, 1 (2012), 8.Google ScholarCross Ref
- [30] . 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the 33rd International Conference on International Conference on Machine Learning.JMLR.org, 49–58.Google Scholar
- [31] . 2013. emcee: The MCMC hammer. Publications of the Astronomical Society of the Pacific 125, 925 (2013), 306.Google ScholarCross Ref
- [32] . 1989. Effects of sample size in classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 8 (1989), 873–885.Google ScholarDigital Library
- [33] . 2019. Learning cooperative personalized policies from gaze data. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology.Association for Computing Machinery, New York, NY,197–208.
DOI: Google ScholarDigital Library - [34] . 2020. Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. arXiv:cs.AI/2001.02122.Google Scholar
- [35] . 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 6245 (2015), 273–278.
arXiv:https://science.sciencemag.org/content/349/6245/273.full.pdf .Google ScholarCross Ref - [36] , , and (Eds.). 2017. Handbook of uncertainty quantification. Springer International Publishing, Cham. arXiv:1507.00398.Google ScholarCross Ref
- [37] . 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall, New York, NY.Google Scholar
- [38] . 2013. Directing exploratory search: Reinforcement learning from user interactions with keywords. In Proceedings of the 2013 International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,117–128.
DOI: Google ScholarDigital Library - [39] . 2010. Ensemble samplers with affine invariance. Communications in Applied Mathematics and Computational Science 5, 1 (2010), 65–80.Google ScholarCross Ref
- [40] . 2006. The neurobiology of multiple sclerosis: Genes, inflammation, and neurodegeneration. Neuron 52, 1 (2006), 61–76.Google ScholarCross Ref
- [41] . 2013. Simulation-based optimal Bayesian experimental design for nonlinear systems. Journal of Computational Physics 232, 1 (2013), 288–317.
DOI: Google ScholarDigital Library - [42] . 2018. Control of gene regulatory networks using Bayesian inverse reinforcement learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 4 (2018), 1250–1261.Google ScholarDigital Library
- [43] . 2019. Integrating activity recognition and nursing care records: The system, deployment, and a verification study. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 24 pages.
DOI: Google ScholarDigital Library - [44] . 2016. Supervised and unsupervised transfer learning for activity recognition from simple in-home sensors. In Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services.Association for Computing Machinery, New York, NY,20–27.
DOI: Google ScholarDigital Library - [45] . 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429–449.Google ScholarCross Ref
- [46] . 2003. Probability Theory: The Logic of Science. Cambridge University Press.Google ScholarCross Ref
- [47] . 2011. Kullback–Leibler Divergence. Springer, Berlin,720–722.
DOI: Google ScholarCross Ref - [48] . 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 1 (1996), 237–285.Google ScholarCross Ref
- [49] . 2018. Inverse reinforcement learning from summary data. Machine Learning 107, 8 (2018), 1517–1535.
DOI: Google ScholarDigital Library - [50] . 2019. Parameter inference for computational cognitive models with approximate bayesian computation. Cognitive Science 43, 6 (2019), e12738.
DOI: Google ScholarCross Ref - [51] . 2001. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 3 (2001), 425–464.
DOI: Google ScholarCross Ref - [52] . 2020. 40 years of cognitive architectures: Core cognitive abilities and practical applications. Artificial Intelligence Review 53, 1 (2020), 17–94.
DOI: Google ScholarDigital Library - [53] . 2017. How do pain, fatigue, depressive, and cognitive symptoms relate to well-being and social and physical functioning in the daily lives of individuals with multiple sclerosis? Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2160–2166.Google ScholarCross Ref
- [54] . 2017. Ecological momentary assessment of pain, fatigue, depressive, and cognitive symptoms reveals significant daily variability in multiple sclerosis. Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2142–2150.Google ScholarCross Ref
- [55] . 2017. Pain, fatigue, and cognitive symptoms are temporally associated within but not across days in multiple sclerosis. Archives of Physical Medicine and Rehabilitation 98, 11 (2017), 2151–2159.Google ScholarCross Ref
- [56] . 1951. On information and sufficiency. The Annals of Mathematical Statistics 22, 1 (1951), 79–86.Google ScholarCross Ref
- [57] . 2019. RL-KLM: Automating keystroke-level modeling with reinforcement learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,476–480.
DOI: Google ScholarDigital Library - [58] . 2019. Computer-supported form design using keystroke-level modeling with reinforcement learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion.Association for Computing Machinery, New York, NY, 85–86.
DOI: Google ScholarDigital Library - [59] . 2001. Some practical guidelines for effective sample size determination. The American Statistician 55, 3 (2001), 187–193.
DOI: Google ScholarCross Ref - [60] . 2014. Computational rationality: Linking mechanism and behavior through bounded utility maximization. Topics in Cognitive Science 6, 2 (2014), 279–311.
DOI: Google ScholarCross Ref - [61] . 2009. Learning probabilistic hierarchical task networks to capture user preferences. In Proceedings of the International Joint Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/view/417/874.Google Scholar
- [62] . 1997. The choice of sample size. Journal of the Royal Statistical Society. Series D (The Statistician) 46, 2 (1997), 129–138. Retrieved from http://www.jstor.org/stable/2988516.Google Scholar
- [63] . 2016. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Proceedings of the Advances in Neural Information Processing Systems 29. Barcelona, Spain, 2378–2386.Google Scholar
- [64] . 2000. Discovering hidden time patterns in behavior: T-patterns and their detection. Behavior Research Methods, Instruments, and Computers 32, 1 (2000), 93–110.
DOI: Google ScholarCross Ref - [65] . 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. Journal of Machine Learning Research 11, 32 (2010), 955–984. Retrieved from http://jmlr.org/papers/v11/mann10a.html.Google ScholarDigital Library
- [66] . 2008. Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology 59, 1 (2008), 537–563.
DOI: Google ScholarCross Ref - [67] . 2015. Universality of Mathematical Models in Understanding Nature, Society, and Man-Made World. John Wiley & Sons, Ltd., 1–16.
DOI: Google ScholarCross Ref - [68] . 2005. Simulation based optimal design. Handbook of Statistics 25 (2005), 509–518.
DOI: Google ScholarCross Ref - [69] . 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning.Morgan Kaufmann Publishers Inc., San Francisco, CA,663–670.Google Scholar
- [70] . 2004. Verification, validation, and predictive capability in computational engineering and physics. Applied Mechanics Reviews 57, 5 (2004), 345–384.
DOI: Google ScholarCross Ref - [71] . 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons, Ltd, Chichester, United Kingdom.
DOI: Google ScholarCross Ref - [72] . 2022. Computational rationality as a theory of interaction. In Proceedingsof the 2022 CHI Conference on Human Factors in Computing Systems.Google Scholar
- [73] . 2009. Decision Theory: Principles and Approaches. John Wiley & Sons, Inc., West Sussex, United Kingdom. Retrieved from http://books.google.com/books?id=mnjGCYqWj7EC&pgis=1.Google ScholarCross Ref
- [74] . 2006. Ideas Underlying Quantification of Margins and Uncertainties (QMU): A White Paper.
Technical Report . Sandia National Laboratories.Google Scholar - [75] . 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons.Google Scholar
- [76] . 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2586–2591.Google Scholar
- [77] . 2004. Monte Carlo Statistical Methods. Springer New York, NY.
DOI: Google ScholarCross Ref - [78] . 2014. Bayesian inverse reinforcement learning for modeling conversational agents in a virtual environment. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 503–514.Google ScholarDigital Library
- [79] . 2011. No-regret reductions for imitation learning and structured prediction. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics.Google Scholar
- [80] . 2012. Far out: Predicting long-term human mobility. In Proceedings of the AAAI Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4845/5275.Google Scholar
- [81] . 2013. Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system. Quality of Life Research 22, 9 (2013), 2569–2580.Google ScholarCross Ref
- [82] . 2018. Ability-based optimization of touchscreen interactions. IEEE Pervasive Computing 17, 1 (2018), 15–26.
DOI: Google ScholarDigital Library - [83] . 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.Google ScholarCross Ref
- [84] . 2009. Active Learning Literature Survey.
Technical Report . University of Wisconsin-Madison Department of Computer Sciences.Google Scholar - [85] . 2006. Data Analysis: A Bayesian Tutorial (2nd ed.). Oxford University Press, New York, NY.Google Scholar
- [86] . 2004. Towards a standard for pointing device evaluation, perspectives on 27 years of fitts’ law research in HCI. International Journal of Human-Computer Studies 61, 6 (2004), 751–789.
DOI: Google ScholarDigital Library - [87] . 2015. Improving multi-step prediction of learned time series models. In Proceedings of the AAAI Conference on Artificial Intelligence. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9592/9976.Google ScholarCross Ref
- [88] . 2011. Bayesian inference in physics. Reviews of Modern Physics 83, 3 (2011), 943–999.
DOI: Google ScholarCross Ref - [89] . 2019. Ten simple rules for the computational modeling of behavioral data. eLife 8 (2019), e49547.
DOI: Google ScholarCross Ref - [90] . 2019. Leveraging routine behavior and contextually-filtered features for depression detection among college students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3, (2019), 33 pages.
DOI: Google ScholarDigital Library - [91] . 2018. RDeepSense: Reliable deep mobile computing models with uncertainty estimations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 26 pages.
DOI: Google ScholarDigital Library - [92] . 2018. SenseGAN: Enabling deep learning for internet of things with a semi-supervised framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3, (2018), 21 pages.
DOI: Google ScholarDigital Library - [93] . 2018. Inverse reinforcement learning based human behavior modeling for goal recognition in dynamic local network interdiction. In Proceedings of the AAAI Workshops.Google Scholar
- [94] . 2012. Probabilistic pointing target prediction via inverse optimal control. In Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces.Association for Computing Machinery, New York, NY,1–10.
DOI: Google ScholarDigital Library - [95] . 2010. Modeling interaction via the principle of maximum causal entropy. In Proceedings of the 27th International Conference on International Conference on Machine Learning.Omnipress, 1255–1262. Retrieved from http://dl.acm.org/citation.cfm?id=3104322.3104481.Google ScholarDigital Library
- [96] . 2008. Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior. In Proceedings of the 10th International Conference on Ubiquitous Computing. Association for Computing Machinery, New York, NY, 322–331.
DOI: Google ScholarDigital Library - [97] . 2016. Multiple sclerosis: Clinical profiling and data collection as prerequisite for personalized medicine approach. BMC Neurology 16, 1 (2016), 124.Google ScholarCross Ref
Index Terms
- A Bayesian Approach for Quantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning
Recommendations
Inverse reinforcement learning from summary data
Inverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data. However, traditional IRL methods are only applicable when the observations are in the form of state-action ...
Preference elicitation and inverse reinforcement learning
ECMLPKDD'11: Proceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IIIWe state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a ...
Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling
Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global ...
Comments