skip to main content
research-article
Free Access
Just Accepted

PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments

Online AM:20 March 2024Publication History
Skip Abstract Section

Abstract

This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit for model-constrained OED. The package targets scientists and researchers interested in understanding the details of OED formulations and approaches. It is also meant to enable researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models). OED, inverse problems (e.g., Bayesian inversion), and data assimilation (DA) are closely related research fields, and their formulations overlap significantly. Thus, PyOED is continuously being expanded with a plethora of Bayesian inversion, DA, and OED methods as well as new scientific simulation models, observation error models, and observation operators. These pieces are added such that they can be permuted to enable testing OED methods in various settings of varying complexities. The PyOED core is completely written in Python and utilizes the inherent object-oriented capabilities; however, the current version of PyOED is meant to be extensible rather than scalable. Specifically, PyOED is developed to “enable rapid development and benchmarking of OED methods with minimal coding effort and to maximize code reutilization.” This paper provides a brief description of the PyOED layout and philosophy and provides a set of exemplary test cases and tutorials to demonstrate the potential of the package.

References

  1. Alen Alexanderian. 2021. Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: a review. Inverse Problems 37, 4 (2021), 043001.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. 2014. A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized ℓ0-sparsification. SIAM Journal on Scientific Computing 36, 5 (2014), A2122–A2148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alen Alexanderian, Noemi Petra, Georg Stadler, and Omar Ghattas. 2016. A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems. SIAM Journal on Scientific Computing 38, 1 (2016), A243–A272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alen Alexanderian and Arvind K Saibaba. 2018. Efficient D-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems. SIAM Journal on Scientific Computing 40, 5 (2018), A2956–A2985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mark Asch, Marc Bocquet, and Maëlle Nodet. 2016. Data assimilation: methods, algorithms, and applications. SIAM.Google ScholarGoogle Scholar
  6. Richard C Aster, Brian Borchers, and Clifford H Thurber. 2018. Parameter estimation and inverse problems. Elsevier.Google ScholarGoogle Scholar
  7. Ahmed Attia. 2023. PyOED Documentation. Retrieved December 6, 2023 from https://web.cels.anl.gov/~aattia/pyoed/index.htmlGoogle ScholarGoogle Scholar
  8. Ahmed Attia. 2023. PyOED GitLab Repository. Retrieved December 6, 2023 from https://gitlab.com/ahmedattia/pyoedGoogle ScholarGoogle Scholar
  9. Ahmed Attia, Alen Alexanderian, and Arvind K Saibaba. 2018. Goal-oriented optimal design of experiments for large-scale Bayesian linear inverse problems. Inverse Problems 34, 9 (2018), 095009.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ahmed Attia and Emil Constantinescu. 2022. Optimal Experimental Design for Inverse Problems in the Presence of Observation Correlations. SIAM Journal on Scientific Computing 44, 4 (2022), A2808–A2842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ahmed Attia, Sven Leyffer, and Todd Munson. 2022. Stochastic Learning Approach for Binary Optimization: Application to Bayesian Optimal Design of Experiments. SIAM Journal on Scientific Computing 44, 2 (2022), B395–B427.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ahmed Attia, Sven Leyffer, and Todd Munson. 2023. Robust A-optimal experimental design for Bayesian inverse problems. In preparation (2023).Google ScholarGoogle Scholar
  13. Ahmed Attia, Vishwas Rao, and Adrian Sandu. 2015. A sampling approach for four dimensional data assimilation. In Dynamic Data-Driven Environmental Systems Science. Springer, 215–226.Google ScholarGoogle Scholar
  14. Ahmed Attia, Vishwas Rao, and Adrian Sandu. 2016. A hybrid Monte Carlo sampling smoother for four dimensional data assimilation. International Journal for Numerical Methods in Fluids (2016). https://doi.org/10.1002/fld.4259fld.4259.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ahmed Attia and Adrian Sandu. 2015. A hybrid Monte Carlo sampling filter for non-Gaussian data assimilation. AIMS Geosciences 1, geosci-01-00041 (2015), 4–1–78. https://doi.org/10.3934/geosci.2015.1.41Google ScholarGoogle ScholarCross RefCross Ref
  16. Ahmed Attia and Adrian Sandu. 2019. DATeS: a highly extensible data assimilation testing suite v1. 0. Geoscientific Model Development 12, 2 (2019), 629–649.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ahmed Attia, Răzvan Ştefănescu, and Adrian Sandu. 2017. The reduced-order hybrid Monte Carlo sampling smoother. International Journal for Numerical Methods in Fluids 83, 1 (2017), 28–51.Google ScholarGoogle ScholarCross RefCross Ref
  18. Satish Balay, Shrirang Abhyankar, Steven Benson, Jed Brown, Peter R Brune, Kristopher R Buschelman, Emil Constantinescu, Alp Dener, Jacob Faibussowitsch, William D Gropp, et al. 2022. PETSc/TAO users manual. Technical Report. Argonne National Laboratory (ANL), Argonne, IL (United States).Google ScholarGoogle Scholar
  19. RN Bannister. 2017. A review of operational methods of variational and ensemble-variational data assimilation. Quarterly Journal of the Royal Meteorological Society 143, 703 (2017), 607–633.Google ScholarGoogle ScholarCross RefCross Ref
  20. Johnathan M. Bardsley, Tiangang Cui, Youssef M. Marzouk, and Zheng Wang. 2020. Scalable Optimization-Based Sampling on Function Space. SIAM Journal on Scientific Computing 42, 2 (2020), A1317–A1347. https://doi.org/10.1137/19M1245220Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Joakim Beck, Ben Mansour Dia, Luis FR Espath, Quan Long, and Raul Tempone. 2018. Fast Bayesian experimental design: Laplace-based importance sampling for the expected information gain. Computer Methods in Applied Mechanics and Engineering 334 (June 2018), 523–553.Google ScholarGoogle Scholar
  22. Alexandros Beskos, Mark Girolami, Shiwei Lan, Patrick E. Farrell, and Andrew M. Stuart. 2017. Geometric MCMC for infinite-dimensional inverse problems. J. Comput. Phys. 335 (2017), 327–351.Google ScholarGoogle ScholarCross RefCross Ref
  23. James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jaxGoogle ScholarGoogle Scholar
  24. Tan Bui-Thanh, Omar Ghattas, James Martin, and Georg Stadler. 2013. A computational framework for infinite-dimensional Bayesian inverse problems Part I: The linearized case, with application to global seismic inversion. SIAM Journal on Scientific Computing 35, 6 (2013), A2494–A2523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. 2013. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster. Statist. Sci. 28, 3 (2013), 424–446.Google ScholarGoogle ScholarCross RefCross Ref
  26. Roger Daley. 1991. Atmospheric data analysis. Cambridge University Press. 457 pages.Google ScholarGoogle Scholar
  27. Geir Evensen. 2009. Data assimilation: the ensemble Kalman filter. Vol. 2. Springer.Google ScholarGoogle Scholar
  28. Valerii Fedorov and Jon Lee. 2000. Design of experiments in statistics. In Handbook of semidefinite programming. Springer, Boston, 511–532.Google ScholarGoogle ScholarCross RefCross Ref
  29. Valerii Vadimovich Fedorov. 2013. Theory of optimal experiments. Elsevier.Google ScholarGoogle Scholar
  30. Robert J Flassig and René Schenkendorf. 2018. Model-based design of experiments: where to go. In Ninth Vienna Internatioal Conference on Mathematical Modelling. 875–876.Google ScholarGoogle ScholarCross RefCross Ref
  31. H Pearl Flath, Lucas C Wilcox, Volkan Akçelik, Judith Hill, Bart van Bloemen Waanders, and Omar Ghattas. 2011. Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM Journal on Scientific Computing 33, 1 (2011), 407–432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marco Foracchia, Andrew Hooker, Paolo Vicini, and Alfredo Ruggeri. 2004. POPED, a software for optimal experiment design in population kinetics. Computer Methods and Programs in Biomedicine 74, 1 (2004), 29–46.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Gerdts. 2005. Solving mixed-integer optimal control problems by branch&bound: a case study from automobile test-driving with gear shift. Optimal Control Applications & Methods 26, 1 (2005), 1–18.Google ScholarGoogle ScholarCross RefCross Ref
  34. Michael Ghil and Paola Malanotte-Rizzoli. 1991. Data assimilation in meteorology and oceanography. Advances in Geophysics 33 (1991), 141–266.Google ScholarGoogle ScholarCross RefCross Ref
  35. Eldad Haber, Lior Horesh, and Luis Tenorio. 2008. Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Problems 24, 5 (2008), 055012.Google ScholarGoogle ScholarCross RefCross Ref
  36. Eldad Haber, Lior Horesh, and Luis Tenorio. 2009. Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems. Inverse Problems 26, 2 (2009), 025002.Google ScholarGoogle ScholarCross RefCross Ref
  37. Martin Hairer, Andrew M. Stuart, and Sebastian J. Vollmer. 2014. Specral gaps for a Metropolis–Hastings algorithm in infinite dimensions. The Annals of Applied Probability 24, 6 (2014), 2455–2490.Google ScholarGoogle ScholarCross RefCross Ref
  38. Insu Han, Dmitry Malioutov, and Jinwoo Shin. 2015. Large-scale log-determinant computation through stochastic Chebyshev expansions. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 908–917.Google ScholarGoogle Scholar
  39. Radoslav Harman and Lenka Filová. 2019. A brief introduction to the R library OptimalDesign. (2019).Google ScholarGoogle Scholar
  40. Xun Huan and Youssef Marzouk. 2014. Gradient-based stochastic optimization methods in Bayesian experimental design. International Journal for Uncertainty Quantification 4, 6 (2014).Google ScholarGoogle ScholarCross RefCross Ref
  41. Xun Huan and Youssef M Marzouk. 2013. Simulation-based optimal Bayesian experimental design for nonlinear systems. J. Comput. Phys. 232, 1 (2013), 288–317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. John Jakeman. 2022. PyApprox: Enabling efficient model analysis. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).Google ScholarGoogle Scholar
  43. Kennedy Putra Kusumo, Kamal Kuriyan, Shankarraman Vaidyaraman, Salvador García-Muñoz, Nilay Shah, and Benoît Chachuat. 2022. Risk mitigation in model-based experiment design: a continuous-effort approach to optimal campaigns. Computers & Chemical Engineering 159 (2022), 107680.Google ScholarGoogle ScholarCross RefCross Ref
  44. Sven Leyffer. 2001. Integrating SQP and branch-and-bound for mixed integer nonlinear programming. Computational Optimization and Applications 18, 3 (2001), 295–309.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Lieberman and K. Willcox. 2013. Goal-Oriented Inference: Approach, Linear Theory, and Application to Advection Diffusion. SIAM Rev. 55, 3 (2013), 493–519. https://doi.org/10.1137/130913110Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chad Lieberman and Karen Willcox. 2014. Nonlinear goal-oriented Bayesian inference: application to carbon capture and storage. SIAM Journal on Scientific Computing 36, 3 (2014), B427–B449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Quan Long, Marco Scavino, Raúl Tempone, and Suojin Wang. 2013. Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations. Computer Methods in Applied Mechanics and Engineering 259 (2013), 24–39.Google ScholarGoogle ScholarCross RefCross Ref
  48. Edward N Lorenz. 1996. Predictability: A problem partly solved. In Proc. Seminar on Predictability, Vol. 1.Google ScholarGoogle Scholar
  49. Ionel M Navon. 2009. Data assimilation for numerical weather prediction: a review. In Data assimilation for atmospheric, oceanic and hydrologic applications. Springer, 21–65.Google ScholarGoogle Scholar
  50. Simon Olofsson, Lukas Hebing, Sebastian Niedenführ, Marc Peter Deisenroth, and Ruth Misener. 2019. GPdoemd: A Python package for design of experiments for model discrimination. Computers & Chemical Engineering 125 (2019), 54–70.Google ScholarGoogle ScholarCross RefCross Ref
  51. Thomas O’Leary-Roseberry, Xiaosong Du, Anirban Chaudhuri, Joaquim RRA Martins, Karen Willcox, and Omar Ghattas. 2022. Learning high-dimensional parametric maps via reduced basis adaptive residual networks. Computer Methods in Applied Mechanics and Engineering 402 (2022), 115730.Google ScholarGoogle ScholarCross RefCross Ref
  52. Thomas O’Leary-Roseberry, Umberto Villa, Peng Chen, and Omar Ghattas. 2022. Derivative-informed projected neural networks for high-dimensional parametric maps governed by PDEs. Computer Methods in Applied Mechanics and Engineering 388 (2022), 114199.Google ScholarGoogle ScholarCross RefCross Ref
  53. Noemi Petra and Georg Stadler. 2011. Model variational inverse problems governed by partial differential equations. Technical Report 11-05. The Institute for Computational Engineering and Sciences, The University of Texas at Austin.Google ScholarGoogle Scholar
  54. Luc Pronzato and Andrej Pázman. 2013. Design of experiments in nonlinear models. Lecture Notes in Statistics 212 (2013), 1.Google ScholarGoogle ScholarCross RefCross Ref
  55. Friedrich Pukelsheim. 2006. Optimal design of experiments. SIAM, Philadelphia.Google ScholarGoogle Scholar
  56. Arno Rasch, H Martin Bücker, and André Bardow. 2009. Software supporting optimal experimental design: A case study of binary diffusion using EFCOSS. Computers & Chemical Engineering 33, 4 (2009), 838–849.Google ScholarGoogle ScholarCross RefCross Ref
  57. Dieter Rasch, Jurgen Pilz, Leon R Verdooren, and Albrecht Gebhardt. 2011. Optimal experimental design with R. CRC Press.Google ScholarGoogle Scholar
  58. Arvind K Saibaba, Alen Alexanderian, and Ilse CF Ipsen. 2017. Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137, 2 (2017), 353–395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Oliver Sailer. 2005. crossdes: A package for design and randomization in crossover studies. Rnews 5, 2 (2005), 24–27.Google ScholarGoogle Scholar
  60. Bonnie Sibbald and Chris Roberts. 1998. Understanding controlled trials crossover trials. Bmj 316, 7146 (1998), 1719–1720.Google ScholarGoogle Scholar
  61. Ralph C Smith. 2013. Uncertainty quantification: theory, implementation, and applications. Vol. 12. SIAM.Google ScholarGoogle Scholar
  62. Andrew M Stuart. 2010. Inverse problems: a Bayesian perspective. Acta Numerica 19 (2010), 451–559.Google ScholarGoogle ScholarCross RefCross Ref
  63. Yunsheng Tian, Mina Konakovic Lukovic, Michael Foshey, Timothy Erps, Beichen Li, and Wojciech Matusik. 2021. AutoOED: Automated Optimal Experimental Design Platform with Data-and Time-Efficient Multi-Objective Optimization. (2021).Google ScholarGoogle Scholar
  64. Luke Tierney and Joseph B Kadane. 1986. Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81, 393 (1986), 82–86.Google ScholarGoogle ScholarCross RefCross Ref
  65. Dariusz Ucinski. 2000. Optimal sensor location for parameter estimation of distributed processes. International Journal of Control 73, 13 (2000), 1235–1248.Google ScholarGoogle ScholarCross RefCross Ref
  66. Sanita Vetra-Carvalho, Peter Jan Van Leeuwen, Lars Nerger, Alexander Barth, M Umer Altaf, Pierre Brasseur, Paul Kirchgessner, and Jean-Marie Beckers. 2018. State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus A: Dynamic Meteorology and Oceanography 70, 1 (2018), 1–43.Google ScholarGoogle ScholarCross RefCross Ref
  67. U. Villa, N. Petra, and O. Ghattas. 2018. hIPPYlib: An Extensible Software Framework for Large-scale Deterministic and Bayesian Inverse Problems. Journal of Open Source Software 3, 30 (2018). https://doi.org/10.21105/joss.00940Google ScholarGoogle ScholarCross RefCross Ref
  68. Curtis R Vogel. 2002. Computational methods for inverse problems. SIAM.Google ScholarGoogle Scholar
  69. Bob Wheeler and Maintainer Jerome Braun. 2019. Package ‘AlgDesign’. R Proj. Stat. Comput 1, 0 (2019), 1–25.Google ScholarGoogle Scholar
  70. Keyi Wu, Thomas O’Leary-Roseberry, Peng Chen, and Omar Ghattas. 2023. Large-scale Bayesian optimal experimental design with derivative-informed projected neural network. Journal of Scientific Computing 95, 1 (2023), 30.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Mathematical Software
      ACM Transactions on Mathematical Software Just Accepted
      ISSN:0098-3500
      EISSN:1557-7295
      Table of Contents

      Copyright © 2024 Copyright held by the owner/author(s).

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Online AM: 20 March 2024
      • Accepted: 4 March 2024
      • Revised: 19 December 2023
      • Received: 6 March 2023
      Published in toms Just Accepted

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)43
      • Downloads (Last 6 weeks)28

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader