Skip to main content
Log in

MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Sequential traces of user data are frequently observed online and offline, e.g., as sequences of visited websites or as sequences of locations captured by GPS. However, understanding factors explaining the production of sequence data is a challenging task, especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of browsing a website or movement behavior may vary between groups of users. In this work, we tackle this task and propose MixedTrails , a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis is derived from existing literature, theory, or intuition and represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions. For example, when trying to understand human movement in a city and given some data, a hypothesis assuming tourists to be more likely to move towards points of interests than locals can be shown to be more plausible than a hypothesis assuming the opposite. Our approach incorporates such hypotheses as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors, give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that this is a slightly simplified version of the original Trial Roulette method from the HypTrails paper (Singer et al. 2015) hyptrails regarding paper regarding two aspects. First, we do not distribute chips but multiply by a concentration factor which is effectively equivalent and easier to compute. Second, we assume in this paper the same weight in each row of the Markov chain which makes formulating hypotheses and interpreting results easier. However, these simplifications are not required and reverting them is straightforward.

  2. http://dmir.org/mixedtrails

  3. The scripts for generating the synthetic data are included in the code, the Wikispeedia data set (cf. 4.3) is accessible online and the Flickr data (cf. 4.4) is available via e-mail to Martin Becker.

  4. The Wikipedia articles are available at schools-wikipedia.orf (version 2007).

  5. https://snap.stanford.edu/data/wikispeedia.html

  6. Differing from our approach, West et al. (2009) use the similarity between the clicked article and the target concept cos(it), but report that along the game progress, the similarity of the current and the clicked/next article is qualitatively similar. Thus, we use the latter approach since we can only use information from already visited states, not future states.

  7. https://www.flickr.com/.

  8. Note that our approach can also be applied to very different settings in a straight-forward manner.

References

  • Asahara A, Maruyama K, Sato A, Seto K (2011) Pedestrian-movement prediction based on mixed Markov-chain model. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 25–33

  • Baccigalupo C, Plaza E (2006) Case-based sequential ordering of songs for playlist recommendation. In: European conference on case-based reasoning. Springer, pp 286–300

  • Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  • Becker M, Singer P, Lemmerich F, Hotho A, Helic D, Strohmaier M (2015) Photowalking the city: comparing hypotheses about urban photo trails on Flickr. In: Liu TY, Scollon CN, Zhu W (eds) Social informatics. Springer, pp 227–244

  • Becker M, Mewes H, Hotho A, Dimitrov D, Lemmerich F, Strohmaier M (2016) Sparktrails: a MapReduce implementation of HypTrails for comparing hypotheses about human trails. In: Bourdeau J, Hendler J, Nkambou R, Horrocks I, Zhao BY (eds) Proceedings of the 25th international conference companion on world wide web. WWW’16 Companion, Canada. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 17–18

  • Benavoli A, Mangili F, Corani G, Zaffalon M, Ruggeri F (2014) A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In: Proceedings of the 31st international conference on machine learning, ICML’14, Beijing, China, June 2014. JMLR.org, pp 1026–1034

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Blackstone A (2012) Sociological inquiry principles: qualitative and quantitative methods. Flat World Knowledge, Irvington, NY, USA

    Google Scholar 

  • Blei DM , Moreno PJ (2001) Topic segmentation with an aspect hidden Markov model. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 343–348

  • Brumby DP, Howes A (2004) Good enough but i’ll just check: web-page search as attentional refocusing. In: Lovett MC, Schunn CD, Lebiere C, Munro P (eds) Sixth international conference on cognitive modeling: ICCM - 2004. Psychology Press, pp 46–51

  • Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the world-wide web. Comput Netw ISDN Syst 27(6):1065–1073

    Article  Google Scholar 

  • Chalmers M, Rodden K, Brodbeck D (1998) The order of things: activity-centred information access. Comput Netw ISDN Syst 30(1):359–367

    Article  Google Scholar 

  • Chi EH, Pirolli PLT, Chen K, Pitkow J (2001) Using information scent to model user information needs and actions and the web. In: Conference on human factors in computing systems. ACM, pp 490–497

  • Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90(432):1313–1321

    Article  MathSciNet  MATH  Google Scholar 

  • De Choudhury M, Feldman M, Amer-Yahia S, Golbandi N, Lempel R, Yu C (2010) Automatic construction of travel itineraries using social breadcrumbs. In: Proceedings of the 21st ACM conference on hypertext and hypermedia, HT’10, Toronto, Ontario, Canada. ACM, New York, NY, USA, pp 35–44

  • Dimitrov D, Singer P, Lemmerich F, Strohmaier M (2017) What makes a link successful on wikipedia? In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17, Perth, Australia. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 917–926

  • Figueiredo F, Ribeiro B, Almeida JM , Andrade N, Faloutsos C (2016a) Mining online music listening trajectories. In: Proceedings of the 17th ISMIR conference, New York City, USA, August 7–11, 2016

  • Figueiredo F, Ribeiro B, Almeida JM, Faloutsos C (2016b) Tribeflow: mining & predicting user trajectories. In: Proceedings of the 25th international conference on world wide web. WWW ’16, Canada. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 695–706

  • Fox EB, Sudderth EB, Jordan MI, Willsky AS (2010) Bayesian nonparametric methods for learning Markov switching processes. IEEE Signal Process Mag 27(6):43–54

  • Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26(1):78–89

    Article  MathSciNet  Google Scholar 

  • Gabriel KR, Neumann J (1962) A Markov chain model for daily rainfall occurrence at Tel Aviv. Q J R Meteorol Soc 88(375):90–95

    Article  Google Scholar 

  • Gambs S, Killijian M-O, del Prado Cortez MN (2010) Show me how you move and I will tell you who you are. In: Proceedings of the 3rd ACM SIGSPATIAL international workshop on security and privacy in GIS and LBS, SPRINGL ’10, ACM, New York, NY, USA, pp 34–41

  • Gelman A, Hill J, Yajima M (2012) Why we (usually) don’t have to worry about multiple comparisons. J Res Educ Eff 5(2):189–211

    Google Scholar 

  • Ghahramani Z, Jordan MI, Smyth P (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273

    Article  MATH  Google Scholar 

  • Goldwater S, Griffiths T (2007) A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, June 2007. Association for Computational Linguistics, pp 744–751

  • Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782

    Article  Google Scholar 

  • Goodman SN (1998) Multiple comparisons, explained. Am J Epidemiol 147(9):807–812

    Article  Google Scholar 

  • Gupta R, Kumar R, Vassilvitskii S (2016) On mixtures of Markov chains. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., pp 3441–3449

  • Hamilton JD (1990) Analysis of time series subject to changes in regime. J Econom 45(1–2):39–70

    Article  MathSciNet  MATH  Google Scholar 

  • Hayes B et al (2013) First links in the Markov chain. Am Sci 101(2):92–97

    Article  Google Scholar 

  • Herr N (2008) The Sourcebook for Teaching Science, Grades 6-12: Strategies, Activities, and Instructional Resources, Wiley

  • Huberman BA, Pirolli PLT, Pitkow JE, Lukose RM (1998) Strong regularities in world wide web surfing. Science 280(5360):95–97

    Article  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795

    Article  MathSciNet  MATH  Google Scholar 

  • Kemeny JG, Snell JL et al (1960) Finite Markov chains, vol 356. van Nostrand, Princeton

    MATH  Google Scholar 

  • Kruschke JK (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen 142(2):573

    Article  Google Scholar 

  • Kruschke J (2015) In: Doing Bayesian Data Analysis, 2nd edn. Academic Press, Boston

  • Laxman S, Tankasali V, White RW (2008) Stream prediction using a generative model based on frequent episodes in event sequences. In: International conference on knowledge discovery and data mining. ACM, pp 453–461

  • Lemmerich F, Becker M, Singer P, Helic D, Hotho A, Strohmaier M (2016) Mining subgroups with exceptional transition behavior. In: KDD ’16: proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM

  • Markov AA (2006) An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Sci Context 19(04):591–600 Originally published in 1913

    Article  MATH  Google Scholar 

  • Matsubara Y, Sakurai Y, Faloutsos C (2014) Autoplait: automatic mining of co-evolving time sequences. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, pp 193–204

  • Murphy KP (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley

  • Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C (2012) A tale of many cities: universal patterns in human urban mobility. PLoS ONE 7(5):1–10

    Article  Google Scholar 

  • Noulas A, Scellato S, Lathia N, Mascolo C (2012) Mining user mobility features for next place prediction in location-based services. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM ’12. IEEE Computer Society, Washington, DC, USA, pp 1038–1043

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Stanford InfoLab

  • Pirolli PLT, Card SK (1999) Information foraging. Psychol Rev 106(4):643–675

    Article  Google Scholar 

  • Ponte JM, Croft WB (1997) Text segmentation by topic. In: International conference on theory and practice of digital libraries. Springer, pp 113–125

  • Poulsen CS (1990) Mixed Markov and latent Markov modelling applied to brand choice behaviour. Int J Res Mark 7(1):5–19

    Article  MathSciNet  Google Scholar 

  • Rabiner LR, Juang B-H (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16

    Article  Google Scholar 

  • Rendle S, Freudenthaler C, Schmidt-Thieme L (2010) Factorizing personalized Markov chains for next-basket recommendation. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, Raleigh, North Carolina, USA. ACM, New York, NY, USA, pp 811–820

  • Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237

    Article  Google Scholar 

  • Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55

    Article  MathSciNet  Google Scholar 

  • Singer P, Helic D, Taraghi B, Strohmaier M (2014) Detecting memory and structure in human navigation patterns using Markov chain models of varying order. PLoS ONE 9(7):e102070

    Article  Google Scholar 

  • Singer P, Helic D, Hotho A, Strohmaier M (2015) Hyptrails: a Bayesian approach for comparing hypotheses about human trails on the web. In: Proceedings of the 24th International Conference on World Wide Web. WWW ’15, Florence, Italy. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1003–1013

  • Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1985) Fluorescence detection in automated DNA sequence analysis. Nature 321(6071):674–679

    Article  Google Scholar 

  • Smith RL, Tawn JA, Coles SG (1997) Markov chain models for threshold exceedances. Biometrika 84(2):249–268

    Article  MathSciNet  MATH  Google Scholar 

  • Strelioff CC, Crutchfield JP, Hübler AW (2007) Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Phys Rev E 76(1):011106

    Article  MathSciNet  Google Scholar 

  • Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  MATH  Google Scholar 

  • Trochim W (2001) Research methods knowledge base, 2nd edn. Atomic Dog Publishing, Cincinnati, OH, USA

    Google Scholar 

  • Van Mulbregt P, Carp I, Gillick L, Lowe S, Yamron J (1998) Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. In: ICSLP

  • Vanpaemel W (2010) Prior sensitivity in theory testing: an apologia for the bayes factor. J Math Psychol 54(6):491–498

    Article  MathSciNet  MATH  Google Scholar 

  • Walk S, Singer P, Strohmaier M (2014) Sequential action patterns in collaborative ontology-engineering projects: a case-study in the biomedical domain. In: International conference on conference on information & knowledge management. ACM

  • Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 977–984

  • West R, Leskovec J (2012) Human wayfinding in information networks. In: Proceedings of the 21st international conference on world wide web. ACM, pp 619–628

  • West R, Pineau J, Precup D (2009) Wikispeedia: an online game for inferring semantic distances between concepts. In: Proceedings of the 21st international joint conference on artificial intelligence, pp 1598–1603

  • Wetzels R, Tutschkow D, Dolan C, van der Sluis S, Dutilh G, Wagenmakers E-J (2016) A bayesian test for the hot hand phenomenon. J Math Psychol 72:200–209

    Article  MathSciNet  MATH  Google Scholar 

  • White RW, Huang J (2010) Assessing the scenic route: measuring the value of search trails in web logs. In Conference on research and development in information retrieval. ACM, pp 587–594

  • Yang J, McAuley J, Leskovec J, LePendu P, Shah N (2014) Finding progression stages in time-evolving event sequences. In: Proceedings of the 23rd international conference on World wide web. ACM, pp 783–794

Download references

Acknowledgements

This work was partially funded by the BMBF project Kallimachos and the DFG German Science Fund research projects PoSTs II and p2map.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Becker.

Additional information

Responsible editors: Kurt Driessens, Dragi Kocev, Marko Robnik-Šikonja, Myra Spiliopoulou

Appendices

Appendix A: Derivation of the marginal likelihood of MTMC

Given the generative process from Sect. 3.2 and by exploiting the fact that the transition probabilities \(\varvec{\theta }_g\) for each group g as well as the group assignment probabilities \(\gamma _{g|t_k}\) for each transition \(t_k\) are independent, we can write the marginal likelihood of MTMC as follows:

$$\begin{aligned} \Pr (D|H)&= \int \underbrace{\Pr (D | \varvec{\theta },\varvec{\gamma })}_{\text {likelihood}} ~ \underbrace{\Pr (\varvec{\theta }|\varvec{\alpha })}_{\text {prior}} ~ d\varvec{\theta }\nonumber \\&= \int \underbrace{\prod _{t_k \in D} \sum _{g \in G} \gamma _{g|t_k} \cdot \theta _{i_k,j_k|g}}_{\Pr (D | \varvec{\theta },\varvec{\gamma })} \cdot \underbrace{\prod _{g \in G} \Pr (\varvec{\theta }_g|\varvec{\alpha }_g)}_{\Pr (\varvec{\theta }|\varvec{\alpha })} \cdot \prod _{g \in G} d\varvec{\theta }_g \end{aligned}$$
(6)

To solve this integral we take a similar path as in the homogeneous case (cf. Singer et al. 2015). Thus, we need to get the grouping out of the integral. First, we focus on the likelihood \(\Pr (D | \varvec{\theta },\varvec{\gamma })\) where we extend the multiplication over all transitions resulting in an outer sum over all possible group assignments:

$$\begin{aligned} \Pr (D|\varvec{\theta }, \varvec{\gamma })&= \prod _{t_k \in D} \sum _{g \in G} \gamma _{g|t} \cdot \theta _{i_k,j_k|g}\nonumber \\&= \sum _{\begin{array}{c} \omega \in {\varOmega }\\ {\varOmega }= \{\{(t_1, g_1), \ldots , (t_m, g_m)\} | (g_1, \ldots , g_m) \in G^{|D|}\} \end{array}} \prod _{(t_k, g_k) \in \omega } \gamma _{g_k|t_k} \cdot \theta _{i_k,j_k|g_k} \nonumber \\&= \sum _{\omega \in {\varOmega }} \underbrace{\prod _{(t_k, g_k) \in \omega } \gamma _{g_k|t_k}}_{p_\omega } \prod _{(t_k, g_k) \in \omega } \theta _{i_k,j_k|g_k} \nonumber \\&= \sum _{\omega \in {\varOmega }} p_\omega \prod _{g \in G} \prod _{s_i,s_j \in S} \theta _{i,j|g}^{n_{i,j|g,\omega }} \end{aligned}$$
(7)

Here, each \(\omega \) represents a single, fixed group assignment of the set of transitions in D where the set of all possible group assignments \(\omega \) is defined as \({\varOmega }= \{\{(t_1, g_1), \ldots , (t_m, g_m)\} | (g_1, \ldots , g_m) \in G^{|D|}\}\). Furthermore, \(p_\omega \) represents the probability of the respective group assignment \(\omega \in {\varOmega }\). Finally, \(n_{i,j|g,\omega }\) denotes the number of transitions from state \(s_i\) to state \(s_j\) given the group g and the group assignment \(\omega \). What we observe is that, given a specific group assignment \(\omega \), the likelihood is the same as the likelihood in Singer et al. (2015).

We now substitute the likelihood \(\Pr (D|\varvec{\theta }, \varvec{\gamma })\) in Eq. (6) with this reformulated likelihood (Eq. 7) and write the priors for the group dependent transition probabilities \(\Pr (\varvec{\theta }_g|\varvec{\alpha }_g)\) based on the multivariate beta function. Then, we can calculate the marginal likelihood \(\Pr (D|H)\) by taking advantage of the independence of the transition probabilities \(\varvec{\theta }_g\) between groups \(g \in G\) and source states \(s \in S\) as well as the independence of group assignment probabilities \(\gamma _{g_k|t_k}\) between transitions \(t_k \in D\):

$$\begin{aligned} \Pr (D|H)&= \int \underbrace{\sum _{\omega \in {\varOmega }} p_\omega \prod _{g \in G} \prod _{s_i,s_j \in S} \theta _{i,j|g}^{n_{i,j|g,\omega }} }_{\Pr (D|\varvec{\theta }, \varvec{\gamma })} \prod _{g \in G} \underbrace{\prod _{s_i \in S} \frac{1}{B(\varvec{\alpha }_{s_i|g})} \prod _{s_j \in S} \theta _{i,j|g}^{\alpha _{i,j|g} -1 }}_{\Pr (\varvec{\theta }_g|\varvec{\alpha }_g)} \prod _{g \in G} d\varvec{\theta }_g \\&= \sum _{\omega \in {\varOmega }} p_\omega \prod _{g \in G} \prod _{s_i \in S} \frac{1}{B(\varvec{\alpha }_{s_i|g})} \int \prod _{s_j \in S} \theta _{i,j|g}^{n_{i,j|g,\omega } + \alpha _{i,j|g} - 1} d\varvec{\theta }_g \\&= \sum _{\omega \in {\varOmega }} \theta _\omega \prod _{g \in G} \underbrace{\prod _{s_i \in S} \frac{ B(\varvec{n}_{s_i|g,\omega } + \varvec{\alpha }_{s_i|g})}{B(\varvec{\alpha }_{s_i|g})}}_{\Pr (D_{g|\omega }|\varvec{\alpha }_g)} \end{aligned}$$

This concludes the derivation of the marginal likelihood formula in Eq. (5).

Appendix B: Notation overview

The following table provides an overview of all important notations used throughout the article.

S

Set of all states \(S = \{s_1, \ldots , s_n\}\)

D

Set of observed transitions \(D = \{t_1, \ldots , t_m\}\)

G

Set of all groups \(G = \{g_1, \ldots , g_o\}\)

\(src_k,dst_k\)

The source state \(src_k\) and the destination state \(dst_k\) of transtion \(t_k\)

\(i_k,j_k\)

The index of the source state \(i_k\) and the destination state \(j_k\) of transtion \(t_k\)

\(\gamma _{g|t}\)

Probability for transition t to belong to group g

\(\varvec{\gamma }_t\)

Group assignment probabilities for a single transitions \(\varvec{\gamma }_t = \{ \gamma _{g|t} | g \in G \}\)

\(\varvec{\gamma }\)

Group assignment probabilities for all transitions \(\varvec{\gamma }= \{ \varvec{\gamma }_t | t \in D \}\)

\(\theta _{i,j|g}\)

Probability of a transition from state \(s_i\) to state \(s_j\) for group g

\(\varvec{\theta }_{s_i|g}\)

Transition probabilities from state \(s_i\) to all other states in group g, i.e., \(\varvec{\theta }_{s_i|g} = (\theta _{i,1|g}, \ldots , \theta _{i,n|g})\)

\(\varvec{\theta }_g\)

Transition probabilities between states for group g, i.e., \(\varvec{\theta }_g = \{ \varvec{\theta }_{s_i|g} ~|~ s_i \in S \}\)

\(\varvec{\theta }\)

Transition probabilities for all groups \(\varvec{\theta }= \{ \varvec{\theta }_g | g \in G \}\)

\(\varvec{\phi }\)

Belief in transition probabilities (as defined by a hypothesis)

\(\phi _{i,j|g}\)

Belief (from a hypothesis) in the probability of a transition from state \(s_i\) to state \(s_j\) for group g

\(\alpha _{i,j|g}\)

Dirichlet parameter (\(\in \mathbb {N}\)) for the transition from state \(s_i\) to state \(s_j\) in group g

\(\varvec{\alpha }_{s_i|g}\)

Dirichlet parameters for state \(s_i\) in group g, i.e., \(\varvec{\alpha }_{s_i|g} = (\alpha _{i,1|g}, \ldots , \alpha _{i,n|g})\)

\(\varvec{\alpha }_g\)

Dirichlet parameters for the transitions in group g, i.e., \(\varvec{\alpha }_g = \{ \varvec{\alpha }_{s_i|g} ~|~ s_i \in S \}\)

\(\varvec{\alpha }\)

Dirichlet parameters for all groups \(\varvec{\alpha }= \{ \varvec{\alpha }_g | g \in G \}\)

\({\varOmega }\)

The set of all group assignments \({\varOmega }= \{\{(t_1, g_1), \ldots , (t_m, g_m)\} | (g_1, \ldots , g_m) \in G^{|D|}\}\)

\(\omega \)

A fixed group assignment \(\omega \in {\varOmega }\) for each transition in transition dataset D

\(p_\omega \)

The probability for group assignment \(\omega \in {\varOmega }\)

\(n_{i,j|g,\omega }\)

The number of transitions in dataset D from state \(s_i\) to state \(s_j\) given group \(g \in G\) and group assignment \(\omega \in {\varOmega }\)

\(\varvec{n}_{g,\omega }\)

The matrix \(\varvec{n}_{g,\omega } = (n_{i,j|g,\omega })\) holds the number of transitions in dataset D between all states given group \(g \in G\) and group assignment \(\omega \in {\varOmega }\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Becker, M., Lemmerich, F., Singer, P. et al. MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data. Data Min Knowl Disc 31, 1359–1390 (2017). https://doi.org/10.1007/s10618-017-0518-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0518-x

Keywords

Navigation