Abstract
Two closely related problems—stability of the solution to the topic modeling problem and uniqueness of the stochastic matrix factorization are considered. A theorem describing an analytical method for finding out if the stability of the solution to a given stochastic matrix factorization problem is formulated and proved. The practical usefulness of this theorem is investigated by applying it to real-life data.
Similar content being viewed by others
REFERENCES
Wenwu Wang, “Instantaneous versus convolutive non-negative matrix factorization,” in Machine Audition (IGI Global, 2010), pp. 353–370. https://doi.org/10.4018/978-1-61520-919-4.ch015
D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty, “Latent Dirichlet allocation,” J. Machine Learning Res. 3, 993–1022 (2003).
K. V. Vorontsov, “Additive regularization for topic models of text collections,” Dokl. Math. 89, 301–304 (2014).
K. Vorontsov and A. Potapenko, “Tutorial on probabilistic topic modeling: Additive regularization for stochastic matrix factorization,” International Conference on Analysis of Images, Social Networks and Texts, 2014, pp. 29–46.
K. Vorontsov and A. Potapenko, “Additive regularization of topic models,” Machine Learning 101 (1–3), 303–323 (2014).
Yansong Feng and Mirella Lapata, “Topic models for image annotation and text illustration,” Stroudsburg, PA, 2010, pp. 831–839.
T. Hospedales, S. Gong, and T. Xiang, “Video behaviour mining using a dynamic topic model,” Int. J. Comput. Vision. 98, 303–323 (2011).
Xiao-xu Li, Chao-bo Sun, Peng Lu, Xiao-jie Wang, and Yi-xin Zhong, “Simultaneous image classification and annotation based on probabilistic model,” J. China Univ. Posts Telecommun. 19 (2), 107–115 (2012).
J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of population structure using multilocus genotype data,” Genetics 155, 945–959 (2000).
S. Shivashankar, S. Srivathsan, B. Ravindran, and A. V. Tendulkar, “Multi-view methods for protein structure comparison using latent Dirichlet allocation,” Bioinformatics 27 (13), i61–i68 (2011).
I. Vulić, W. De Smet, and M.-F. Moens, “Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora,” Inf. Retrieval 16, 331–368 (2012).
I. Vulić, W. De Smet, J. Tang, and M.-F. Moens, “Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications,” Inf. Process. & Management 51 (1), 111–147 (2015).
A. Ianina, L. Golitsyn, and K. Vorontsov, “Multi-objective topic modeling for exploratory search in tech news,” Communications in Computer and Information Science (Springer, 2017), pp. 181–193.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems 26, ed. by C.J.C. Burges, L. Bottou, M.Welling , (Curran Associates 2013), pp. 3111–3119.
A. Potapenko, A. Popov, and K. Vorontsov, “Interpretable probabilistic embeddings: Bridging the gap between topic models and neural networks,” Communications in Computer and Information Science (Springer, 2017), pp. 167–180.
T. Hofmann, “Probabilistic latent semantic indexing,” Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 99) (ACM Press, 1999).
D. Kochedykov, M. Apishev, L. Golitsyn, and K. Vorontsov, “Fast and modular regularized topic modelling,” 21st Conference of Open Innovations Association (FRUCT), (IEEE, 2017).
D. Donoho and V. Stodden, “When does non-negative matrix factorization give a correct decomposition into parts?” Proc. NIPS 16, 1141–1148 (2004) .
H. Laurberg, M. G. Christensen, M. D. Plumbley, L. K. Hansen, and S. H. Jensen, “Theorems on positive data: On the uniqueness of NMF,” Comput. Intell. Neuroscie. 2008, 1–9 (2008).
N. Gillis, “Sparse and unique nonnegative matrix factorization through data preprocessing,” J. Machine Learning Res. 13 (1), 3349–3386 (2012).
K. Lang, 20 Newsgroups. http://qwone.com/~jason/20Newsgroups/
Funding
This work was supported by the Russian Foundation for Basic Research, project no. 17-07-01536.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by A. Klimontovich
Rights and permissions
About this article
Cite this article
Derbanosov, R.Y., Irkhin, I.A. Issues of Stability and Uniqueness of Stochastic Matrix Factorization. Comput. Math. and Math. Phys. 60, 370–378 (2020). https://doi.org/10.1134/S0965542520030082
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542520030082