Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
Introduction
Recently, social networks have helped users to not only amplify interpersonal circles in social network, but also get the latest information. A community consists of the users with common interests or common characteristics. The common interests or common characteristics play an important role in the public opinion investigations, product recommendations, public sentiment analysis, information flow control, and other fields. However, with the growing number of users, the “information overload” problem becomes more obvious. There is an urgent need to identify communities of social networks for providing better product designs, accurate information recommendation and public services (Buccafurri et al., 2014, Buccafurri et al., 2015). Therefore, how to detect communities of social networks becomes a challenging problem.
Some researchers addressed this problem by considering user interests. The user contents can effectively reflect the user opinions and interests. However, due to the sparsity of the contents and the lack of contextual information, user interest detection (UID) faces new challenges. Some researchers detected user interests based on the text features and the semantic information of the contents (Liu et al., 2010, Tang et al., 2012). But when enriching the semantic information, they need the help of external database. Also, some researchers tried to extract most representative words from a short text to realize UID (Sun, 2012). In addition, some researchers paid attention to the hierarchical clustering algorithm for social network (Kim and Shim, 2013, Zhang et al., 2015). However, few researchers discuss the community detection (CD) by considering both the interests and the interaction behaviors between the users.
For addressing the problems mentioned above, a UID algorithm with respect to the semantic information and the improved PageRank algorithm is discussed. Moreover, a CD algorithm considering network topology and user interests is proposed. The main contributions of this paper are summarized as follows:
- •
UID algorithm based on semantic information and improved PageRank algorithm is presented. the PageRank algorithm is improved to calculate the tag weight for target users.
- •
A new social network with the undirected and weighted edges is built, whose edge weights are calculated by considering both the user interests and the direction of edges in the initial social network for identifying the overlapping communities.
- •
The performance of PCD algorithm and some previous typical CD algorithm are evaluated via extensive experiments. The results indicate that our PCD algorithm can effectively detect communities of social networks. And it can also significantly identify the overlapping communities.
The rest of this paper is organized as follows: In Section 2, several related works are presented. In Section 3, our methodology for CD is described systematically. In Section 4, the datasets, evaluation methodology and experimental procedure are introduced, respectively. The experimental results and discussion are presented in Section 5. Lastly, the conclusion is discussed in Section 6.
Section snippets
Related works
With the development of the social networks, a lot of works have been recently conducted to study UID problems and CD problems. In this section, some brief descriptions about the works are given.
Methodology
In this section, a method to detect user interests based on the semantic information and the improved PageRank algorithm is proposed. Furthermore, a method to detect communities based on the network topology and the user interests is presented.
Datasets
In the experiments, all data are derived from Sina microblog (https://weibo.com/). The focused crawler algorithm is developed to get user tags. The experiment data set is the text data including user ID, tags, follower, critics and so on for 100,000 users. The experiment of 200 users selected randomly from the data set is carried out to demonstrate the effectiveness of proposed algorithm. The size of candidate tag set is 60. The user semantic model is constructed by selecting the top 500 topics
Methodology and metrics
The experiments in this section mainly includes two parts. In first part, the quality of users is discussed by the verification experiments. The statistics of the number of users from data sets are obtained by the statistic methodology (Li et al., 2018, Romeo et al., 2017, Serban et al., 2018) to illustrate the sparsity of keywords and personal tags. In second part, to illustrate the feasibility of Hadoop platform, the results of PCD in Hadoop platform is presented. Furthermore, the execution
Conclusion and future work
In this paper, CD algorithm based on network topology and user interests is presented. This paper mainly includes two parts. In first part, the focused crawler algorithm is used to acquire the personal tags from the tags posted by other users. Then, the tags are selected from the tag set based on the TFIDF weighting scheme, the semantic extension of tags and the user semantic model. In addition, the tag vector of user interests is derived with the respective tag weight calculated by the
Acknowledgments
The work was supported by the National Natural Science Foundation (NSF) under grants (No. 61672397, No. 61873341, No. 61472294), Application Foundation Frontier Project of WuHan (No. 2018010401011290), Open Foundation of State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System (GZ2018KF002). Open Foundation of Key Laboratory of Industrial Wireless Network and Networked Control of Ministry of Education. Any opinions, findings, and conclusions are those of the
References (43)
- et al.
Discovering missing me edges across social networks
Information Sciences
(2015) Community detection in graphs
Physics Reports
(2010)- et al.
Hashtag recommendation for multimodal microblog posts
Neurocomputing
(2018) - et al.
Query performance prediction for microblog search
Information Processing & Management
(2017) - et al.
Constrained common cluster based model for community detection in temporal and multiplex networks
Neurocomputing
(2018) - et al.
Exploring coherent topics by topic modeling with term weighting
Information Processing & Management
(2018) - et al.
Personality-based refinement for sentiment classification in microblog
Knowledge-Based Systems
(2017) - et al.
Personalized recommendation of popular blog articles for mobile applications
Information Sciences
(2011) - et al.
Combining tag correlation and user social relation for microblog recommendation
Information Sciences
(2017) - et al.
Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods
Physical A: Statistical Mechanics and its Applications
(2018)
Community detection method based on mixed-norm sparse subspace clustering
Neurocomputing
A Community detection method based on local similarity and degree clustering information
physical A: Statistical Mechanics and its Applications
Phase transition of surprise optimization in community detection
Physical A: Statistical Mechanics and its Applications
Friend recommendation with content spread enhancement in social networks
Information Sciences
Real-time recommendation for microblogs
Information Sciences
Link communities reveal multiscale complexity in networks
Nature
Research-paper recommender systems: A literature survey
International Journal on Digital Libraries
Collaborative personalized tweet recommendation
Resolution limit in community detection
Proceedings of the National Academy of Sciences of the United States of America
Community structure in social and biological networks
Proceedings of The National Academy of Sciences of The United States of America
Cited by (50)
Community detection algorithm for social network based on node intimacy and graph embedding model
2024, Engineering Applications of Artificial IntelligenceTalent recommendation based on attentive deep neural network and implicit relationships of resumes
2023, Information Processing and ManagementCommunity detection in error-prone environments based on particle cooperation and competition with distance dynamics
2022, Physica A: Statistical Mechanics and its ApplicationsA portable clustering algorithm based on compact neighbors for face tagging
2022, Neural NetworksA novel attributed community detection by integration of feature weighting and node centrality
2022, Online Social Networks and MediaInformation matching model and multi-angle tracking algorithm for loan loss-linking customers based on the family mobile social-contact big data network
2022, Information Processing and ManagementCitation Excerpt :The numbers of common friends, Jaccard similarity and cosine similarity were used to characterize the familiarity between users, thereafter, user familiarity and preference similarity were combined to calculate the similarity between individuals (Qiao et al., 2018). To measure the social network similarity of individuals, scholars established an initial social network of follower relationships, transformed the social network into a new social network with non-directional and weighted edges, and estimated edge weights to measure similarity between the edges of social networks by direction vectors and interest vectors in the initial social network (Li et al., 2019). However, the measure of closeness proposed here is used to measure the degree of closeness between loan customers and their family members.