Skip to main content
Log in

On the estimation of partially observed continuous-time Markov chains

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Motivated by the increasing use of discrete-state Markov processes across applied disciplines, a Metropolis–Hastings sampling algorithm is proposed for a partially observed process. Current approaches, both classical and Bayesian, have relied on imputing the missing parts of the process and working with a complete likelihood. However, from a Bayesian perspective, the use of latent variables is not necessary and exploiting the observed likelihood function, combined with a suitable Markov chain Monte Carlo method, results in an accurate and efficient approach. A comprehensive comparison with simulated and real data sets demonstrate our approach when compared with alternatives available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Al-Mohy AH, Higham NJ (2011) Computing the action of the matrix exponential, with an application to exponential integrators. SIAM J Sci Comput 33(2):488–511

    Article  MathSciNet  MATH  Google Scholar 

  • Amoros R, King R, Toyoda H, Kumada T, Johnson PJ, Bird TG (2019) A continuous-time hidden Markov model for cancer surveillance using serum biomarkers with application to hepatocellular carcinoma. Metron 77(2):67–86

    Article  MathSciNet  MATH  Google Scholar 

  • Bladt M, Sorensen M (2005) Statistical inference for discretely observed Markov jump processes. J Roy Stat Soc B 67:395–410

    Article  MathSciNet  MATH  Google Scholar 

  • dos Reis G, Smith G (2018) Robust and consistent estimation of generators in credit risk. Quant Financ 18:983–1001

    Article  MathSciNet  MATH  Google Scholar 

  • Fearnhead P, Sherlock C (2006) An exact Gibbs sampler for the Markov-modulated Poisson process. J R Stat Soc: Ser B (Stat Methodol) 68:767–784

    Article  MathSciNet  MATH  Google Scholar 

  • Fukaya K, Royle JA (2013) Markov models for community dynamics allowing for observation error. Ecology 94:2670–2677

    Article  Google Scholar 

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472

    Article  MATH  Google Scholar 

  • Georgoulas A, Hillston J, Sanguinetti G (2017) Unbiased Bayesian inference for population Markov jump processes via random truncations. Stat Comput 27(4):991–1002

    Article  MathSciNet  MATH  Google Scholar 

  • Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361

    Article  Google Scholar 

  • Goulet V, Dutang C, Maechler M, Firth D, Shapira M, Stadelmann M (2021) Package ‘expm’

  • Grimmett GR, Stirzaker DR (1982) Probability and Random Processes. Oxford University Press

  • Higham NJ (2005) The scaling and squaring method for the matrix exponential revisited. SIAM J Matrix Anal Appl 26:1179–1193

    Article  MathSciNet  MATH  Google Scholar 

  • Inamura Y (2006) Estimating continuous time transition matrices from discretely observed data. Bank of Japan, No.06-E07

  • Israel RB, Rosenthal JS, Wei JS (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Financ 11:245–265

    Article  MathSciNet  MATH  Google Scholar 

  • Norris JR (1998) Markov Chains. Cambridge University Press

  • Pardoux E (2008) Markov processes and applications. Algorithms, networks, genome and finance. Wiley

  • Pfeuffer M, Möstel L, Fischer M (2019) An extended likelihood framework for modelling discretely observed credit rating transitions. Quant Financ 19:93–104

    Article  MathSciNet  MATH  Google Scholar 

  • Pfeuffepdr M (2017) ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data. R J 19:127–141

    Article  Google Scholar 

  • Plummer M (2003) JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing, vol 124, No. 125.10, pp 1–10

  • Rao V, Teh YW (2013) Fast MCMC sampling for Markov jump processes and extensions. J Mach Learn Res 14(1):3295–3320

    MathSciNet  MATH  Google Scholar 

  • Sherlock C, Fearnhead P, Roberts GO (2010) The random walk Metropolis: linking theory and practice through a case study. Stat Sci 25(2):172–190

    Article  MathSciNet  MATH  Google Scholar 

  • Sauer M, Stannat W (2016) Reliability of signal transmission in stochastic nerve axon equations. J Comput Neurosci 40:103–111

    Article  MathSciNet  MATH  Google Scholar 

  • Van Kampen NG (2007) Stochastic processes in physics and chemistry. North–Holland

  • Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, van Mulbregt P (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17(3):261-272

  • Zhao T, Wang Z, Cumberworth A, Gsponer J, de Freitas N, Bouchard-Côté A (2016) Bayesian analysis of continuous time Markov chains with application to phylogenetic modelling. Bayesian Anal 11(4):1203–1237

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful for the support CONTEX project 2018-9 and PAPIIT-UNAM project IG100221. The authors are also grateful for the comments and suggestions of a referee which has greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Riva-Palacio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge the support of project CONTEX 2018-9B and PAPIIT-UNAM IG100221.

Appendices

Appendix A: Metropolis-Hastings algorithm (5) details

The Metropolis–Hastings moves used in our work are the following:

  • For \(\lambda \in \Lambda = \left\{ \lambda _1,\ldots ,\lambda _m\right\} \) let \(g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+\) be given by \(g(x)=e^x\) and propose moves from \(\lambda \) to \({\tilde{\lambda }}\) by

    $$\begin{aligned} {\tilde{\lambda }}= g( g^{-1}(\lambda ) +Z ) \end{aligned}$$

    with \(Z\sim \text {Norma}(0,1)\) so \({\tilde{\lambda }}\) has a LogNormal distribution. The corresponding transition kernel is given by \(q({\tilde{\lambda }}\,|\,\lambda )\propto e^{-0.5(\log (\lambda )-\log ({\tilde{\lambda }}))^2}{\tilde{\lambda }}^{-1}\).

  • For transition matrix P with no zeros in off-diagonal entries and

    \(\pmb {p}\in \left\{ \left( p_{j,1},\ldots ,p_{j,j-1},p_{j,j+1},\ldots ,p_{j,m} \right) \, : \, 1 \le j \le m \right\} \), let \(c>0\) and propose

    $$\begin{aligned} \tilde{\pmb {p}}|\pmb {p}\sim \text {Dirichlet}(\pmb {1}+c\pmb {p}) \end{aligned}$$

    with \(\pmb {1}=(1,\ldots ,1)\) so \(\tilde{\pmb {p}}\) has mode \(\pmb {p}\).

  • For transition matrix P with zeros in off-diagonal entries and

    \(p\in \left\{ p_{j,k} \, : \, 1 \le j\le m,\,1 \le k\le m,\, j\ne k\right\} \), let \(g\,:\, {\mathbb {R}}\rightarrow {\mathbb {R}}^+\) be a standard logistic function \(g(x)=e^x/(1+e^x)\) and propose moves from p to \({\tilde{p}}\) by

    $$\begin{aligned} {\tilde{p}}= g( g^{-1}(p) +Z ) \end{aligned}$$

    with \(Z\sim \text {Normal}(0,1)\) so \({\tilde{p}}\) has a LogitNormal distribution. Denote \(\pmb {w}\) as the vector \(\pmb {p}\) with entry value p changed to \({\tilde{p}}\). Finally we do a normalization

    $$\begin{aligned} \tilde{\pmb {p}} = \pmb {w}/\sum _{k=1}^m w. \end{aligned}$$

    The transformation \( f(x_1,\ldots ,x_m) = \left( \frac{x_1}{\sum _{i=1}^m x_i} , \ldots ,\frac{x_{m-1}}{\sum _{i=1}^m x_i},\sum _{i=1}^m x_i \right) \) is known to have Jacobian \(\left( \sum _{i=1}^m x_i \right) ^{-m+1}\). So the corresponding transition kernel is given by \(q(\tilde{\pmb {p}}\,|\,\pmb {p})\propto \exp \left( -0.5(\text {logit}(p_k)-\text {logit}({\tilde{p}}))^2\right) \frac{ \left( {\tilde{p}} + \sum _{i\ne k}p_i \right) ^{m-1}}{{\tilde{p}}(1-{\tilde{p}})}\). Note that we do not marginalize the random variable \(S=\sum _{i=1}^m w_i\) so we have to draw simulations of the auxiliary variable \({\tilde{p}}\) to evaluate the above conditional density.

Appendix B: Second simulation study details

For a generator matrix

$$\begin{aligned} G = \begin{pmatrix} -\lambda _1 & \lambda _1 \\ \lambda _2 & - \lambda _2 \end{pmatrix} \end{aligned}$$

with \(\lambda _1,\lambda _2>0\), we have explicitly that

$$\begin{aligned} Q(t) = \exp (tG) = \begin{pmatrix} \frac{\lambda _2}{\lambda _1 + \lambda _2} + \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} & \frac{\lambda _1}{\lambda _1 + \lambda _2} - \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} \\ & & \\ \frac{\lambda _2}{\lambda _1 + \lambda _2} - \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} & \frac{\lambda _1}{\lambda _1 + \lambda _2} + \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) t} \end{pmatrix}. \end{aligned}$$

For discrete observation over a time mesh \(\Delta k\), \(\Delta \in {\mathbb {R}}^+\), and \(k \in \{1,\ldots , n\}\) for some \(n\in {\mathbb {N}}\) of the Markov chain associated G; let \(n^{eq}_i\) be the number of times the state remains equal for transitions starting in state i and \(n^{ch}_i\) the number of times the state changes for transitions starting in state i along the time mesh, with \(i\in \{1,2\}\). The likelihood is readily seen to be

$$\begin{aligned} {\mathcal {L}}(\lambda )&= \left( \frac{\lambda _2}{\lambda _1 + \lambda _2} + \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{eq}_1} \left( \frac{\lambda _1}{\lambda _1 + \lambda _2} + \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{eq}_2} \\&\times \left( \frac{\lambda _1}{\lambda _1 + \lambda _2} - \frac{\lambda _1}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{ch}_1} \left( \frac{\lambda _2}{\lambda _1 + \lambda _2} - \frac{\lambda _2}{\lambda _1 + \lambda _2}e^{-(\lambda _1 + \lambda _2) \Delta } \right) ^{n^{ch}_2}. \end{aligned}$$

Simulations for the above CTMC with \((\lambda _1,\lambda _2)=(1,1)\) and \((\lambda _1,\lambda _2=(2,1)\) were drawn. The prior distributions were \(\lambda \sim \text {Gamma}(1,1)\) and \(\left\{ p_{j,k} \right\} _{j\ne k}\sim \text {Dirichlet}(1)\) for all the experiments. A CTMC was generated until 1000 transitions were obtained and we considered the discrete observations given by times \(\Delta k\) with \(\Delta =1\) and \(1\le k \le \min \left\{ n \, ;\ X(n\Delta )\text { was fully observed}\right\} \).

In Figs. 6 and 7 we compare the true posterior distributions given the simulated data with the Gibbs sampler 1 realizations of \(\lambda _1\) and \(\lambda _2\); whereas in Figures 8 and 9 we do the same experiment for the Gibbs sampler 2 and in 10, 11 for the Metropolis–Hastings sampler. The choice of priors was taken as indicated above for all the three samplers. The variance in the LogNormal proposals for \(\lambda _i\) in the Metropolis–Hastings were set to 0.0025. We ran 50000 iterations of each sampler after having diagnosed stationarity via the potential scale reduction factor of Gelman and Rubin (1992). We highlight that the run time for the Metropolis–Hastings algorithm is significantly faster than the Gibbs samplers as showed in Tables 1 and 2. This is due to the computational cost of simulating the conditioned trajectories between discretely observed times for the CTMC.

Fig. 6
figure 6

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Gibbs sampler 1 ran for 50000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(1,1)\)

Fig. 7
figure 7

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Gibbs sampler 1 ran for 50000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(2,1)\)

Fig. 8
figure 8

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Gibbs sampler 2 ran for 50000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(1,1)\)

Fig. 9
figure 9

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Gibbs sampler 2 ran for 50000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(2,1)\)

Fig. 10
figure 10

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Metropolis–Hastings sampler ran for 50000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(1,1)\)

Fig. 11
figure 11

Marginal and bivariate histograms of \(\pmb {\lambda }\) for the Metropolis–Hastings sampler ran for 50,000 iterations after diagnosing stationarity compared with the the true posterior distribution. corresponding to \(\pmb {\lambda }=(2,1)\)

Appendix C: Further comparisons for the third simulation study

In this appendix we present further mean generator comparisons between the Gibbs and Metropolis–Hastings samplers for discretely observed CTMCs. Also we present comparisons of generators fitted with the EM algorithm and our Metropolis–Hastings approach.

1.1 C.1: Gibbs and Metropolis–Hastings comparisons

In Figs. 13, 15 and we present the mean generator comparisons for the Gibbs and Metropolis–Hastings samplers with simulation studies determined, respectively, by \(\Delta = 0.5,2\) (Figs. 12, 13, 14, 15, 16, 17, 18, 19, 20).

As mentioned in the main document the entry-wise distance of the Gibbs mean generator with the true generator tends to be greater than the one with respect to the Metropolis–Hasting mean generator; this is particularly illustrated in the mean generators for \(\Delta =0.5\) as shown in Table 1 of the main document in terms of Frobenius distances.

Fig. 12
figure 12

Comparison of estimated generator matrices for Metropolis–Hastings and Gibbs chains for \(\Delta = 4\) with \(\lambda = 0.25\) (first row), \(\lambda =0.5\) (second row) and \(\lambda = 1\) (third row)

Fig. 13
figure 13

Comparison of estimated generator matrices for Metropolis–Hastings and Gibbs chains for \(\Delta = 2\) with \(\lambda = 2\) (first row), \(\lambda =1\) (second row) and \(\lambda = 0.5\) (third row)

Fig. 14
figure 14

Comparison of estimated generator matrices for Metropolis–Hastings and Gibbs chains for \(\Delta = 0.5\) with \(\lambda = 8\) (first row), \(\lambda =4\) (second row) and \(\lambda = 2\) (third row)

Fig. 15
figure 15

Comparison of estimated generator matrices for Metropolis–Hastings and Gibbs chains for \(\Delta = 0.25\) with \(\lambda = 4\) (first row), \(\lambda =8\) (second row) and \(\lambda = 16\) (third row)

1.2 C.2: EM and Metropolis–Hastings comparison

We use the EM algorithm of the ctmcd R package, see Pfeuffer (2017), to compare with the proposed Metropolis–Hastings algorithm in the context of the third simulation study.

Fig. 16
figure 16

Comparison of estimated generator matrices for Metropolis–Hastings and EM algorithms for \(\Delta = 4\) with \(\lambda = 1\) (first row), \(\lambda =0.5\) (second row) and \(\lambda = 0.25\) (third row)

Fig. 17
figure 17

Comparison of estimated generator matrices for Metropolis–Hastings and EM algorithms for \(\Delta = 2\) with \(\lambda = 2\) (first row), \(\lambda =1\) (second row) and \(\lambda = 0.5\) (third row)

Fig. 18
figure 18

Comparison of estimated generator matrices for Metropolis–Hastings and EM algorithms for \(\Delta = 1\) with \(\lambda = 4\) (first row), \(\lambda =2\) (second row) and \(\lambda = 1\) (third row)

Fig. 19
figure 19

Comparison of estimated generator matrices for Metropolis–Hastings and EM algorithms for \(\Delta = 0.5\) with \(\lambda = 8\) (first row), \(\lambda =4\) (second row) and \(\lambda = 2\) (third row)

Fig. 20
figure 20

Comparison of estimated generator matrices for Metropolis–Hastings and EM algorithms for \(\Delta = 0.25\) with \(\lambda = 16\) (first row), \(\lambda =8\) (second row) and \(\lambda = 4\) (third row)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riva-Palacio, A., Mena, R.H. & Walker, S.G. On the estimation of partially observed continuous-time Markov chains. Comput Stat 38, 1357–1389 (2023). https://doi.org/10.1007/s00180-022-01273-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01273-w

Keywords

Navigation