ABSTRACT
With the rapid growth of Web 2.0, a variety of content sharing services, such as Flickr, YouTube, Blogger, and TripAdvisor etc, have become extremely popular over the last decade. On these websites, users have created and shared with each other various kinds of resources, such as photos, video, and travel blogs. The sheer amount of user-generated content varies greatly in quality, which calls for a principled method to identify a set of authorities, who created high-quality resources, from a massive number of contributors of content. Since most previous studies only infer global authoritativeness of a user, there is no way to differentiate the authoritativeness in different aspects of life (topics).
In this paper, we propose a novel model of Topic-specific Authority Analysis (TAA), which addresses the limitations of the previous approaches, to identify authorities specific to given query topic(s) on a content sharing service. This model jointly leverages the usage data collected from the sharing log and the favorite log. The parameters in TAA are learned from a constructed training dataset, for which a novel logistic likelihood function is specifically designed. To perform Bayesian inference for TAA with the new logistic likelihood, we extend typical Gibbs sampling by introducing auxiliary variables. Thorough experiments with two real-world datasets demonstrate the effectiveness of TAA in topic-specific authority identification as well as the generalizability of the TAA generative model.
Supplemental Material
- D. Agarwal and B.-C. Chen. flda: Matrix factorization through latent dirichlet allocation. In Proc. of WSDM '10, pages 91--100, New York, NY, USA, 2010. Google ScholarDigital Library
- N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social influence propagation models. In Proc. of ICDM '12, pages 81--90, Washington, DC, USA, 2012. Google ScholarDigital Library
- S. Bellman, E. J. Johnson, G. L. Lohse, and N. Mandel. Designing marketplaces of the artificial with consumers in mind: Four approaches to understanding consumer behavior in electronic environments. J. Interactive Marketing, 20(1), 2006.Google ScholarCross Ref
- B. Bi and J. Cho. Automatically generating descriptions for resources by tag modeling. In Proc. of CIKM '13, pages 2387--2392, 2013. Google ScholarDigital Library
- B. Bi, S. D. Lee, B. Kao, and R. Cheng. Cubelsi: An effective and efficient method for searching resources in social tagging systems. In ICDE, pages 27--38, 2011. Google ScholarDigital Library
- B. Bi, L. Shang, and B. Kao. Collaborative resource discovery in social tagging systems. In Proc. of CIKM '09, pages 1919--1922, 2009. Google ScholarDigital Library
- B. Bi, Y. Tian, Y. Sismanis, A. Balmin, and J. Cho. Scalable topic-specific influence analysis on microblogs. In Proc. of WSDM, pages 513--522, 2014. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, Mar. 2003. Google ScholarDigital Library
- B.-C. Chen, J. Guo, B. Tseng, and J. Yang. User reputation in a comment rating environment. In Proc. of KDD '11, pages 159--167, New York, USA, 2011. Google ScholarDigital Library
- N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized relational topic models with data augmentation. In Proc. of IJCAI '13, pages 1273--1279, 2013. Google ScholarDigital Library
- W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In Proc. of KDD '09, pages 199--208, New York, NY, USA, 2009. Google ScholarDigital Library
- S. Fruhwirth-Schnatter and R. Fruhwirth. Data augmentation and mcmc for binary and multinomial logit models. In Sta Mod Reg Str, pages 111--132. 2010.Google ScholarCross Ref
- A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. November 2013.Google Scholar
- R. B. Gramacy and N. G. Polson. Simulation-based regularized logistic regression. Bayesian Analysis, 7(3):567--590, September 2012.Google ScholarCross Ref
- T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl. 1):5228--5235, April 2004.Google ScholarCross Ref
- T. Haveliwala. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE TKDE, 15(4):784--796, July 2003. Google ScholarDigital Library
- G. Heinrich. Parameter estimation for text analysis,. Technical report, University of Leipzig, 2008.Google Scholar
- C. C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145--168, March 2006.Google ScholarCross Ref
- P. Jurczyk and E. Agichtein. Discovering authorities in question answer communities by using link analysis. In Proc. of CIKM '07, pages 919--922, New York, 2007. Google ScholarDigital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proc. of KDD '03, pages 137--146, New York, 2003. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. JACM, 46(5):604--632, 1999. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarDigital Library
- R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. In Proc. of KDD '08, pages 542--550, 2008. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Proc. of WWW '98, pages 161--172, Brisbane, 1998.Google Scholar
- N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using pólya-gamma latent variables. JASA, 108(504):1339--1349, 2013.Google ScholarCross Ref
- I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In KDD, 2008. Google ScholarDigital Library
- D. Smith, S. Menon, and K. Sivakumar. Online peer and editorial recommendations, trust, and choice in virtual markets. J. Interactive Marketing, 19(3), 2005.Google ScholarCross Ref
- J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In Proc. of KDD '09, pages 807--816, New York, NY, USA, 2009. Google ScholarDigital Library
- C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In KDD, 2011. Google ScholarDigital Library
- Y. Wang, G. Cong, G. Song, and K. Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In Proc. of KDD '10, pages 1039--1048, New York, 2010. Google ScholarDigital Library
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: Finding topic-sensitive influential twitterers. In Proc. of WSDM '10, pages 261--270, 2010. Google ScholarDigital Library
- J. Zhang, M. S. Ackerman, and L. Adamic. Expertise networks in online communities: Structure and algorithms. In WWW '07, pages 221--230, 2007. Google ScholarDigital Library
- T. Zhao, N. Bian, C. Li, and M. Li. Topic-level expert modeling in community question answering. In SDM '13, pages 776--784. SIAM, 2013.Google Scholar
Index Terms
- Who are experts specializing in landscape photography?: analyzing topic-specific authority on content sharing services
Recommendations
Image Restoration Using Gibbs Priors: Boundary Modeling, Treatment of Blurring, and Selection of Hyperparameter
The authors propose a Bayesian model for the restoration of images based on counts of emitted photons. The model treats blurring within the context of an incomplete data problem and utilizes a Gibbs prior to model the spatial correlation of neighboring ...
Attribute weighting for averaged one-dependence estimators
Averaged one-dependence estimators (AODE) is a type of supervised learning algorithm that relaxes the conditional independence assumption that governs standard naïve Bayes learning algorithms. AODE has demonstrated reasonable improvement in terms of ...
Variational Bayesian extreme learning machine
Extreme learning machine (ELM) randomly generates parameters of hidden nodes and then analytically determines the output weights with fast learning speed. The ill-posed problem of parameter matrix of hidden nodes directly causes unstable performance, ...
Comments