Abstract
Organizing webpages into hot topics is one of the key steps to understand the trends from multi-modal web data. To handle this pressing problem, Poisson Deconvolution (PD), a state-of-the-art method, recently is proposed to rank the interestingness of web topics on a similarity graph. Nevertheless, in terms of scalability, PD optimized by expectation-maximization is not sufficiently efficient for a large-scale data set. In this paper, we develop a Stochastic Poisson Deconvolution (SPD) to deal with the large-scale web data sets. Experiments demonstrate the efficacy of the proposed approach in comparison with the state-of-the-art methods on two public data sets and one large-scale synthetic data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Blei, D., Lafferty, J.: A correlated topic model of science. Ann. Appl. Sci. 1, 17–35 (2007)
Blei, D., David, M., Ng, A., Jordan, M., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Putthividhy, D., Attias, H.T., Magarajan, S.S.: Topic regression multi-modal latent Dirichlet allocation for image annotation. In: Computer Vision and Pattern Recognition, vol. 1, pp. 3408–3415 (2010)
Allan, J., Carbonell, J., Doddington, G., Yamron, J., et al.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Cao, J., Ngo, C., Zhang, Y., Li, J.: Tracking web video topics: discovery, visualization, and monitoring. IEEE Trans. Circuits Syst. Video Technol. 21(12), 1835–1846 (2011)
Chen, J., Li, K., Zhu, J., Chen, W.: WarpLDA: a cache efficient o(1) algorithm for latent Dirichlet allocation. Proc. VLDB Endow. 9(10), 744–755 (2015)
Mairal, J.: Optimization with first-order surrogate functions. In: ICML (2013)
Mairal, J.: Stochastic majorization-minimization algorithms for large-scale optimization. In: International Conference on Neural Information Processing Systems, vol. 2, pp. 2283–2291 (2013)
Pang, J., Jia, F., Zhang, C., Zhang, W., Huang, Q., Yin, B.: Unsupervised web topic detection using a ranked clustering-like pattern across similarity cascades. IEEE Trans. Multimed. 17(6), 843–853 (2015)
Pang, J., Tao, F., Zhang, C., Zhang, W., Huang, Q., Yin, B.: Robust latent poisson deconvolution from multiple features for web topic detection. IEEE Trans. Multimed. 18(12), 2482–2493 (2016)
Pang, J., Tao, F., Li, L., Huang, Q., Yin, B., Tian, Q.: A two-step approach to describing web topics via probable keywords and prototype images from background-removed similarities. Neurocomputing 275, 478–487 (2018)
Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9(1), 1–20 (2000)
Hannah, L.A.: Stochastic optimization. Int. Encycl. Soc. Behav. Sci. 5(5), 473–481 (2015)
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: International Conference on Neural Information Processing Systems, pp. 161–168 (2007)
Aiello, L.M., et al.: Sensing trending topics in Twitter. IEEE Trans. Multimed. 15(6), 1268–1282 (2013)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends\(\textregistered \) Mach. Learn. 1(1-2), 1–305 (2008)
Roux, N.L., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: International Conference on Neural Information Processing Systems, vol. 2, pp. 2663–2671 (2012)
Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc. 71(3), 593–613 (2009)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: International Conference on Neural Information Processing Systems, vol. 1, pp. 315–323 (2013)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models. ASID, vol. 89, pp. 355–368. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-011-5014-9_12
Papadopoulous, S., Zigkolis, C., Kompatsiaris, Y., Vakali, A.: Cluster-based landmark and event detection on tagged photo collections. IEEE Multimed. 18(1), 52–63 (2011)
Debatty, T., Michiardi, P., Mees, W.: Fast online K-NN graph building. CoRR (2016)
Wu, X., Hauptmann, G., Ngo, C.: Novelty detection for cross-lingual news story with visual duplicates and speech transcripts. In: ACM Multimedia, pp. 168–177 (2007)
Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: parallel latent Dirichlet allocation for large-scale applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02158-9_26
Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., Huang, Q.: Cross-media topic detection: a multi-modality fusion framework. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2013)
Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2(3), 26:1–26:18 (2011)
Acknowledgements
This work was supported in part by National Natural Science Foundation of China: 61332016, 61472389, 61672069, 61872333, 61650202 and U1636214, in part by Key Research Program of Frontier Sciences, CAS: QYZDJ-SSW-SYS013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, J., Pang, J., Su, L., Liu, Y., Huang, Q. (2019). Accelerating Topic Detection on Web for a Large-Scale Data Set via Stochastic Poisson Deconvolution. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-05710-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05709-1
Online ISBN: 978-3-030-05710-7
eBook Packages: Computer ScienceComputer Science (R0)