ABSTRACT
User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. The analysis of user data is appealing to scientists who work on population studies, recommendations, and large-scale data analytics. We argue for the need for an interactive analysis to understand the multiple facets of user data and address different analytics scenarios. Since user data is often sparse and noisy, we propose to produce labeled groups that describe users with common properties and develop IUGA, an interactive framework based on group discovery primitives to explore the user space. At each step of IUGA, an analyst visualizes group members and may take an action on the group (add/remove members) and choose an operation (exploit/explore) to discover more groups and hence more users. Each discovery operation results in k most relevant and diverse groups. We formulate group exploitation and exploration as optimization problems and devise greedy algorithms to enable efficient group discovery. Finally, we design a principled validation methodology and run extensive experiments that validate the effectiveness of IUGA on large datasets for different user space analysis scenarios.
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, volume 27. ACM, 1998. Google ScholarDigital Library
- R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, 1993. Google ScholarDigital Library
- M. Bhuiyan, S. Mukhopadhyay, and M. A. Hasan. Interactive pattern mining on hidden data: a sampling-based solution. In CIKM, pages 95--104, 2012. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- M. Boley, B. Kang, P. Tokmakov, M. Mampaey, and S. Wrobel. One click mining: Interactive local pattern discovery through implicit preference and performance learning. IDEAS (ACM SIGKDD Workshop), 2013. Google ScholarDigital Library
- F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Exante: Anticipated data reduction in constrained pattern mining. In PKDD, pages 59--70, 2003.Google ScholarCross Ref
- C. Bucila, J. Gehrke, D. Kifer, and W. M. White. Dualminer: a dual-pruning algorithm for itemsets with constraints. In Knowledge Discovery and Data Mining, pages 42--51, 2002. Google ScholarDigital Library
- C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask?: jury selection for decision making tasks on micro-blog services. VLDB, 2012. Google ScholarDigital Library
- J. G. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Research and Development in Information Retrieval, pages 335--336, 1998. Google ScholarDigital Library
- U. Cetintemel, M. Cherniack, J. DeBrabant, Y. Diao, K. Dimitriadou, A. Kalinin, O. Papaemmanouil, and S. B. Zdonik. Query steering for interactive data exploration. In CIDR, 2013.Google Scholar
- O. Chapelle, S. Ji, C. Liao, E. Velipasaoglu, L. Lai, and S.-L. Wu. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval, 14(6):572--592, 2011. Google ScholarDigital Library
- U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3):410--421, 2001.Google ScholarDigital Library
- N. Friedman, M. Goldszmidt, et al. Discretizing continuous attributes while learning bayesian networks. In ICML, pages 157--165, 1996.Google Scholar
- L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys (CSUR), 38(3):9, 2006. Google ScholarDigital Library
- B. Goethals, S. Moens, and J. Vreeken. Mime: A framework for interactive visual pattern mining. In PKDD, 2011. Google ScholarDigital Library
- P. Indyk, S. Mahabadi, M. Mahdian, and V. S. Mirrokni. Composable core-sets for diversity and coverage maximization. In ACM SIGMOD SIGART, pages 100--108. ACM, 2014. Google ScholarDigital Library
- D. S. Johnson. Approximation algorithms for combinatorial problems. In Proceedings of the fifth annual ACM symposium on Theory of computing, pages 38--49. ACM, 1973. Google ScholarDigital Library
- A. Leuski and J. Allan. Strategy-based interactive cluster visualization for information retrieval. International Journal on Digital Libraries, 3:170--184, 2000.Google Scholar
- B. Omidvar-Tehrani, S. Amer-Yahia, and A. Termier. Interactive user group analysis. Research Report RR-LIG-048, LIG, Grenoble, France, 2015.Google Scholar
- B. Omidvar-Tehrani, S. Amer-Yahia, A. Termier, A. Bertaux, E. Gaussier, and M.-C. Rousset. Towards a framework for semantic exploration of frequent patterns. IMMoA, 2013.Google Scholar
- L. Parida. Redescription mining: Structure theory and algorithms. In In Proc. AAAI'05, pages 837--844, 2005. Google ScholarDigital Library
- C. K. sang Leung, P. P. Irani, and C. L. Carmichael. WiFIsViz: Effective Visualization of Frequent Itemsets. In ICDM, 2008. Google ScholarDigital Library
- A. Siebes, J. Vreeken, and M. van Leeuwen. Item sets that compress. In SDM, volume 6, pages 393--404. SIAM, 2006.Google ScholarCross Ref
- T. Uno, M. Kiyomi, and H. Arimura. Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In FIMI, 2004.Google Scholar
- R. West and J. Leskovec. Automatic versus human navigation in information networks. In ICWSM, 2012.Google Scholar
Index Terms
- Interactive User Group Analysis
Recommendations
Data Pipelines for User Group Analytics
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataUser data is becoming increasingly available in various domains ranging from the social Web to electronic patient health records (EHRs). User data is characterized by a combination of demographics (e.g., age, gender, life status) and user actions (e.g., ...
Visual exploration of rating datasets and user groups
AbstractThe increasing availability of rating datasets (i.e., datasets containing user evaluations on items such as products and services) constitutes a new opportunity in various applications ranging from behavioral analytics to ...
Highlights- A Visual Analytics (VA) system for exploring users and forming and exploring groups.
User group analytics: hypothesis generation and exploratory analysis of user data
User data is becoming increasingly available in multiple domains ranging from the social Web to retail store receipts. User data is described by user demographics (e.g., age, gender, occupation) and user actions (e.g., rating a movie, publishing a paper,...
Comments