Abstract
This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit for model-constrained OED. The package targets scientists and researchers interested in understanding the details of OED formulations and approaches. It is also meant to enable researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models). OED, inverse problems (e.g., Bayesian inversion), and data assimilation (DA) are closely related research fields, and their formulations overlap significantly. Thus, PyOED is continuously being expanded with a plethora of Bayesian inversion, DA, and OED methods as well as new scientific simulation models, observation error models, and observation operators. These pieces are added such that they can be permuted to enable testing OED methods in various settings of varying complexities. The PyOED core is completely written in Python and utilizes the inherent object-oriented capabilities; however, the current version of PyOED is meant to be extensible rather than scalable. Specifically, PyOED is developed to “enable rapid development and benchmarking of OED methods with minimal coding effort and to maximize code reutilization.” This paper provides a brief description of the PyOED layout and philosophy and provides a set of exemplary test cases and tutorials to demonstrate the potential of the package.
- Alen Alexanderian. 2021. Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: a review. Inverse Problems 37, 4 (2021), 043001.Google ScholarCross Ref
- Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. 2014. A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized ℓ0-sparsification. SIAM Journal on Scientific Computing 36, 5 (2014), A2122–A2148.Google ScholarDigital Library
- Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. 2016. A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems. SIAM Journal on Scientific Computing 38, 1 (2016), A243–A272.Google ScholarDigital Library
- Alen Alexanderian and Arvind K Saibaba. 2018. Efficient D-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems. SIAM Journal on Scientific Computing 40, 5 (2018), A2956–A2985.Google ScholarDigital Library
- Mark Asch, Marc Bocquet, and Maëlle Nodet. 2016. Data assimilation: methods, algorithms, and applications. SIAM.Google Scholar
- Richard C Aster, Brian Borchers, and Clifford H Thurber. 2018. Parameter estimation and inverse problems. Elsevier.Google Scholar
- Ahmed Attia. 2023. PyOED Documentation. Retrieved December 6, 2023 from https://web.cels.anl.gov/~aattia/pyoed/index.htmlGoogle Scholar
- Ahmed Attia. 2023. PyOED GitLab Repository. Retrieved December 6, 2023 from https://gitlab.com/ahmedattia/pyoedGoogle Scholar
- Ahmed Attia, Alen Alexanderian, and Arvind K Saibaba. 2018. Goal-oriented optimal design of experiments for large-scale Bayesian linear inverse problems. Inverse Problems 34, 9 (2018), 095009.Google ScholarCross Ref
- Ahmed Attia and Emil Constantinescu. 2022. Optimal Experimental Design for Inverse Problems in the Presence of Observation Correlations. SIAM Journal on Scientific Computing 44, 4 (2022), A2808–A2842.Google ScholarDigital Library
- Ahmed Attia, Sven Leyffer, and Todd Munson. 2022. Stochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of Experiments. SIAM Journal on Scientific Computing 44, 2 (2022), B395–B427.Google ScholarDigital Library
- Ahmed Attia, Sven Leyffer, and Todd Munson. 2023. Robust A-optimal experimental design for Bayesian inverse problems. In preparation (2023).Google Scholar
- Ahmed Attia, Vishwas Rao, and Adrian Sandu. 2015. A sampling approach for four dimensional data assimilation. In Dynamic Data-Driven Environmental Systems Science. Springer, 215–226.Google Scholar
- Ahmed Attia, Vishwas Rao, and Adrian Sandu. 2016. A hybrid Monte Carlo sampling smoother for four dimensional data assimilation. International Journal for Numerical Methods in Fluids (2016). https://doi.org/10.1002/fld.4259fld.4259.Google ScholarCross Ref
- Ahmed Attia and Adrian Sandu. 2015. A hybrid Monte Carlo sampling filter for non-Gaussian data assimilation. AIMS Geosciences 1, geosci-01-00041 (2015), 4–1–78. https://doi.org/10.3934/geosci.2015.1.41Google ScholarCross Ref
- Ahmed Attia and Adrian Sandu. 2019. DATeS: a highly extensible data assimilation testing suite v1. 0. Geoscientific Model Development 12, 2 (2019), 629–649.Google ScholarCross Ref
- Ahmed Attia, Răzvan Ştefănescu, and Adrian Sandu. 2017. The reduced-order hybrid Monte Carlo sampling smoother. International Journal for Numerical Methods in Fluids 83, 1 (2017), 28–51.Google ScholarCross Ref
- Satish Balay, Shrirang Abhyankar, Steven Benson, Jed Brown, Peter R Brune, Kristopher R Buschelman, Emil Constantinescu, Alp Dener, Jacob Faibussowitsch, William D Gropp, et al. 2022. PETSc/TAO users manual. Technical Report. Argonne National Laboratory (ANL), Argonne, IL (United States).Google Scholar
- RN Bannister. 2017. A review of operational methods of variational and ensemble-variational data assimilation. Quarterly Journal of the Royal Meteorological Society 143, 703 (2017), 607–633.Google ScholarCross Ref
- Johnathan M. Bardsley, Tiangang Cui, Youssef M. Marzouk, and Zheng Wang. 2020. Scalable Optimization-Based Sampling on Function Space. SIAM Journal on Scientific Computing 42, 2 (2020), A1317–A1347. https://doi.org/10.1137/19M1245220Google ScholarDigital Library
- Joakim Beck, Ben Mansour Dia, Luis FR Espath, Quan Long, and Raul Tempone. 2018. Fast Bayesian experimental design: Laplace-based importance sampling for the expected information gain. Computer Methods in Applied Mechanics and Engineering 334 (June 2018), 523–553.Google Scholar
- Alexandros Beskos, Mark Girolami, Shiwei Lan, Patrick E. Farrell, and Andrew M. Stuart. 2017. Geometric MCMC for infinite-dimensional inverse problems. J. Comput. Phys. 335 (2017), 327–351.Google ScholarCross Ref
- James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jaxGoogle Scholar
- Tan Bui-Thanh, Omar Ghattas, James Martin, and Georg Stadler. 2013. A computational framework for infinite-dimensional Bayesian inverse problems Part I: The linearized case, with application to global seismic inversion. SIAM Journal on Scientific Computing 35, 6 (2013), A2494–A2523.Google ScholarDigital Library
- S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. 2013. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster. Statist. Sci. 28, 3 (2013), 424–446.Google ScholarCross Ref
- Roger Daley. 1991. Atmospheric data analysis. Cambridge University Press. 457 pages.Google Scholar
- Geir Evensen. 2009. Data assimilation: the ensemble Kalman filter. Vol. 2. Springer.Google Scholar
- Valerii Fedorov and Jon Lee. 2000. Design of experiments in statistics. In Handbook of semidefinite programming. Springer, Boston, 511–532.Google ScholarCross Ref
- Valerii Vadimovich Fedorov. 2013. Theory of optimal experiments. Elsevier.Google Scholar
- Robert J Flassig and René Schenkendorf. 2018. Model-based design of experiments: where to go. In Ninth Vienna Internatioal Conference on Mathematical Modelling. 875–876.Google ScholarCross Ref
- H Pearl Flath, Lucas C Wilcox, Volkan Akçelik, Judith Hill, Bart van Bloemen Waanders, and Omar Ghattas. 2011. Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM Journal on Scientific Computing 33, 1 (2011), 407–432.Google ScholarDigital Library
- Marco Foracchia, Andrew Hooker, Paolo Vicini, and Alfredo Ruggeri. 2004. POPED, a software for optimal experiment design in population kinetics. Computer Methods and Programs in Biomedicine 74, 1 (2004), 29–46.Google ScholarCross Ref
- M. Gerdts. 2005. Solving mixed-integer optimal control problems by branch&bound: a case study from automobile test-driving with gear shift. Optimal Control Applications & Methods 26, 1 (2005), 1–18.Google ScholarCross Ref
- Michael Ghil and Paola Malanotte-Rizzoli. 1991. Data assimilation in meteorology and oceanography. Advances in Geophysics 33 (1991), 141–266.Google ScholarCross Ref
- Eldad Haber, Lior Horesh, and Luis Tenorio. 2008. Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Problems 24, 5 (2008), 055012.Google ScholarCross Ref
- Eldad Haber, Lior Horesh, and Luis Tenorio. 2009. Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems. Inverse Problems 26, 2 (2009), 025002.Google ScholarCross Ref
- Martin Hairer, Andrew M. Stuart, and Sebastian J. Vollmer. 2014. Specral gaps for a Metropolis–Hastings algorithm in infinite dimensions. The Annals of Applied Probability 24, 6 (2014), 2455–2490.Google ScholarCross Ref
- Insu Han, Dmitry Malioutov, and Jinwoo Shin. 2015. Large-scale log-determinant computation through stochastic Chebyshev expansions. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 908–917.Google Scholar
- Radoslav Harman and Lenka Filová. 2019. A brief introduction to the R library OptimalDesign. (2019).Google Scholar
- Xun Huan and Youssef Marzouk. 2014. Gradient-based stochastic optimization methods in Bayesian experimental design. International Journal for Uncertainty Quantification 4, 6 (2014).Google ScholarCross Ref
- Xun Huan and Youssef M Marzouk. 2013. Simulation-based optimal Bayesian experimental design for nonlinear systems. J. Comput. Phys. 232, 1 (2013), 288–317.Google ScholarDigital Library
- John Jakeman. 2022. PyApprox: Enabling efficient model analysis. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).Google Scholar
- Kennedy Putra Kusumo, Kamal Kuriyan, Shankarraman Vaidyaraman, Salvador García-Muñoz, Nilay Shah, and Benoît Chachuat. 2022. Risk mitigation in model-based experiment design: a continuous-effort approach to optimal campaigns. Computers & Chemical Engineering 159 (2022), 107680.Google ScholarCross Ref
- Sven Leyffer. 2001. Integrating SQP and branch-and-bound for mixed integer nonlinear programming. Computational Optimization and Applications 18, 3 (2001), 295–309.Google ScholarDigital Library
- C. Lieberman and K. Willcox. 2013. Goal-Oriented Inference: Approach, Linear Theory, and Application to Advection Diffusion. SIAM Rev. 55, 3 (2013), 493–519. https://doi.org/10.1137/130913110Google ScholarDigital Library
- Chad Lieberman and Karen Willcox. 2014. Nonlinear goal-oriented Bayesian inference: application to carbon capture and storage. SIAM Journal on Scientific Computing 36, 3 (2014), B427–B449.Google ScholarDigital Library
- Quan Long, Marco Scavino, Raúl Tempone, and Suojin Wang. 2013. Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations. Computer Methods in Applied Mechanics and Engineering 259 (2013), 24–39.Google ScholarCross Ref
- Edward N Lorenz. 1996. Predictability: A problem partly solved. In Proc. Seminar on Predictability, Vol. 1.Google Scholar
- Ionel M Navon. 2009. Data assimilation for numerical weather prediction: a review. In Data assimilation for atmospheric, oceanic and hydrologic applications. Springer, 21–65.Google Scholar
- Simon Olofsson, Lukas Hebing, Sebastian Niedenführ, Marc Peter Deisenroth, and Ruth Misener. 2019. GPdoemd: A Python package for design of experiments for model discrimination. Computers & Chemical Engineering 125 (2019), 54–70.Google ScholarCross Ref
- Thomas O’Leary-Roseberry, Xiaosong Du, Anirban Chaudhuri, Joaquim RRA Martins, Karen Willcox, and Omar Ghattas. 2022. Learning high-dimensional parametric maps via reduced basis adaptive residual networks. Computer Methods in Applied Mechanics and Engineering 402 (2022), 115730.Google ScholarCross Ref
- Thomas O’Leary-Roseberry, Umberto Villa, Peng Chen, and Omar Ghattas. 2022. Derivative-informed projected neural networks for high-dimensional parametric maps governed by PDEs. Computer Methods in Applied Mechanics and Engineering 388 (2022), 114199.Google ScholarCross Ref
- Noemi Petra and Georg Stadler. 2011. Model variational inverse problems governed by partial differential equations. Technical Report 11-05. The Institute for Computational Engineering and Sciences, The University of Texas at Austin.Google Scholar
- Luc Pronzato and Andrej Pázman. 2013. Design of experiments in nonlinear models. Lecture Notes in Statistics 212 (2013), 1.Google ScholarCross Ref
- Friedrich Pukelsheim. 2006. Optimal design of experiments. SIAM, Philadelphia.Google Scholar
- Arno Rasch, H Martin Bücker, and André Bardow. 2009. Software supporting optimal experimental design: A case study of binary diffusion using EFCOSS. Computers & Chemical Engineering 33, 4 (2009), 838–849.Google ScholarCross Ref
- Dieter Rasch, Jurgen Pilz, Leon R Verdooren, and Albrecht Gebhardt. 2011. Optimal experimental design with R. CRC Press.Google Scholar
- Arvind K Saibaba, Alen Alexanderian, and Ilse CF Ipsen. 2017. Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137, 2 (2017), 353–395.Google ScholarDigital Library
- Oliver Sailer. 2005. crossdes: A package for design and randomization in crossover studies. Rnews 5, 2 (2005), 24–27.Google Scholar
- Bonnie Sibbald and Chris Roberts. 1998. Understanding controlled trials crossover trials. Bmj 316, 7146 (1998), 1719–1720.Google Scholar
- Ralph C Smith. 2013. Uncertainty quantification: theory, implementation, and applications. Vol. 12. SIAM.Google Scholar
- Andrew M Stuart. 2010. Inverse problems: a Bayesian perspective. Acta Numerica 19 (2010), 451–559.Google ScholarCross Ref
- Yunsheng Tian, Mina Konakovic Lukovic, Michael Foshey, Timothy Erps, Beichen Li, and Wojciech Matusik. 2021. AutoOED: Automated Optimal Experimental Design Platform with Data-and Time-Efficient Multi-Objective Optimization. (2021).Google Scholar
- Luke Tierney and Joseph B Kadane. 1986. Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81, 393 (1986), 82–86.Google ScholarCross Ref
- Dariusz Ucinski. 2000. Optimal sensor location for parameter estimation of distributed processes. International Journal of Control 73, 13 (2000), 1235–1248.Google ScholarCross Ref
- Sanita Vetra-Carvalho, Peter Jan Van Leeuwen, Lars Nerger, Alexander Barth, M Umer Altaf, Pierre Brasseur, Paul Kirchgessner, and Jean-Marie Beckers. 2018. State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus A: Dynamic Meteorology and Oceanography 70, 1 (2018), 1–43.Google ScholarCross Ref
- U. Villa, N. Petra, and O. Ghattas. 2018. hIPPYlib: An Extensible Software Framework for Large-scale Deterministic and Bayesian Inverse Problems. Journal of Open Source Software 3, 30 (2018). https://doi.org/10.21105/joss.00940Google ScholarCross Ref
- Curtis R Vogel. 2002. Computational methods for inverse problems. SIAM.Google Scholar
- Bob Wheeler and Maintainer Jerome Braun. 2019. Package ‘AlgDesign’. R Proj. Stat. Comput 1, 0 (2019), 1–25.Google Scholar
- Keyi Wu, Thomas O’Leary-Roseberry, Peng Chen, and Omar Ghattas. 2023. Large-scale Bayesian optimal experimental design with derivative-informed projected neural network. Journal of Scientific Computing 95, 1 (2023), 30.Google ScholarDigital Library
Index Terms
- PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments
Recommendations
Optimal Experimental Design for Inverse Problems in the Presence of Observation Correlations
Optimal experimental design (OED) is the general formalism of sensor placement and decisions about the data collection strategy for engineered or natural experiments. This approach is prevalent in many critical fields such as battery design, numerical ...
On gauss-verifiability of optimal solutions in variational data assimilation problems with nonlinear dynamics
The problem of variational data assimilation for a nonlinear evolution model is formulated as an optimal control problem to find the initial condition. The optimal solution (analysis) error arises due to the errors in the input data (background and ...
A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-dimensional Bayesian Nonlinear Inverse Problems
We address the problem of optimal experimental design (OED) for Bayesian nonlinear inverse problems governed by partial differential equations (PDEs). The inverse problem seeks to infer an infinite-dimensional parameter from experimental data observed at a ...
Comments