Elsevier

Neurocomputing

Volume 289, 10 May 2018, Pages 195-219
Neurocomputing

Multi-view community detection with heterogeneous information from social media data

https://doi.org/10.1016/j.neucom.2018.02.023Get rights and content

Highlights

  • Heterogeneous information views in social media are combined for community detection.

  • Experimental evaluation showed the benefits of integrating diverse sources.

  • Each source had a particular effect on the quality of the detected communities.

  • The nature of social interactions affect the relevance of the information sources.

  • Symmetrisation strategies also showed differentiated effects on community quality.

Abstract

Since their beginnings, social networks have affected the way people communicate and interact with each other. The continuous growing and pervasive use of social media offers interesting research opportunities for analysing the behaviour and interactions of users. Nowadays, interactions are not only limited to social relations, but also to reading and writing activities. Thus, multiple and complementary information sources are available for characterising users and their activities. One task that could benefit from the integration of those multiple sources is community detection. However, most techniques disregard the effect of information aggregation and continue to focus only on one aspect: the topological structure of networks. This paper focuses on how to integrate social and content-based information originated in social networks for improving the quality of the detected communities. A technique for integrating both the multiple information sources and the semantics conveyed by asymmetric relations is proposed and extensively evaluated on two real-world datasets. Experimental evaluation confirmed the differentiated impact that each information source has on the quality of the detected communities, and shed some light on how to improve such quality by combining both social and content-based information.

Introduction

Social networking sites such as MySpace, Facebook, orTwitter attract millions of users, who everyday publish an enormous amount of content in the form of pictures, tweets, comments and posts. Social networks can be defined as a set of socially-relevant nodes connected by one or more relations. Nodes in such networks are not limited to people, but also represent other entities such as Web pages, journal articles or geographical places, amongst other possibilities. Users of networking sites are required to create profiles where users can describe themselves by sharing their age, locations, interests and picture, amongst other things. Generally, social networks allow users to create and read content, and establish social connections with other users whose nature and semantics might differ from site to site. For example, followee relations in Twitter, or friendship relations in Facebook. Although the technological features of the different social networking sites are similar, the cultures that emerge around them are diverse [3]. Most sites encourage the maintenance of pre-existing social networks, whilst others help strangers to create new connections based on shared interests. In this context, understanding users’ needs arises as a critical issue [9]. Users’ needs could be regarded as users’ desire to obtain information, which could be further specified as long-term (interests) or instant (intends) user needs. Nonetheless, needs are often latent, so inferring them from the observed data might be challenging.

Social networks affect the way people communicate and interact. The pervasive use of social media offers research opportunities for analysing the behaviour of users when interacting with their friends [32], and how such interactions evolve over time [43], in terms of patterns of appearing and disappearing relationships. Unlike social connections formed by people in the physical world, social media users have greater freedom to connect with a wider spectrum of people for distinct reasons. The low cost of link formation might lead to networks with relationships of heterogeneous nature, origin and strength. For example, in Twitter, a user might follow others because they publish interesting information, they have the same interests, they are celebrities or popular individuals in the micro-blogging community, or only because they share some common friends, amongst other possible explanations. As a result, topological relations could lead to the existence of casual links, which could hinder the utilisation of algorithms solely based on topology. Hence, the nature of structural information must be carefully analysed in conjunction with other sources of information or data views to effectively assess the significance and importance of relations. In addition to social information indicating friendship or simpler user interaction, there are other information sources that might implicitly define connections between users in social media. For example, whether two users use the same terms, hashtags, or post on the same topics. It is worth noting that the content users consume or post might depend, for example, on their mood and environment [9]. In light of the fact that users’ needs are implicit, comprehensive research is needed for discovering the mapping between the heterogeneous, and possible multimedia, information in social networks and users’ needs, and how such mapping can be enriched with contextual information.

One fundamental problem in social networks is the identification of groups of users when group membership is not explicitly available. A group, or community, can be defined as a set of elements (users, posts or other elements) that interact more frequently or are more similar to other community members than to outsiders. Community detection has proven to be valuable in diverse domains such as biology, social sciences and bibliometrics. For example, community detection techniques can be used for identifying groups of users with similar purchase history enabling the creation of more efficient recommendation systems that could better guide customers and enhance business opportunities as in Amazon [16], for detecting topics in collaborative systems [25], for identifying real-world landmarks in Flickr by clustering photos [26], for detecting events on Twitter streams [1], for matching high-quality answers to questions in the context of a question answering system [11], or for solving the influence maximisation problem in Foursquare [19].

Several techniques for community detection can be found in the literature. However, most of them only focus on one data view, even though neither social relations nor content by themselves can accurately indicate community membership. For example, in Twitter social relations might be extremely sparse and two users might belong to the same community even if they are not explicitly socially related. Conversely, social media content might be topically diverse and noisy for extracting valuable topical-based relationships. Combining multiple data views as required by social media data poses new challenges. For instance, how to integrate the different views by adequately assessing their importance in the social network, or how to determine whether such integration could actually improve the quality of detected communities.

Considering the increasing amount of information available in social networks and the necessity of integrating heterogeneous data, this paper focuses on the needs and challenges of combining multiple information sources for performing community detection. This work studies how to integrate multiple social and content-based views or information sources aiming at improving the quality of the detected communities. The final goal of the paper is to provide some insights on how to select the relevant views to consider for the task to develop according to the characteristics of the network under analysis. It is worth noting that the selection of the views to integrate depends on the elements available on the social network under analysis, such as the characteristics and semantics of social relations, the semantics of the messages users’ exchange, or the content of such messages, amongst others. Moreover, several alternatives are proposed for integrating the semantics conveyed by the edge directionality embedded on the selected views. Finally, an extensive experimental evaluation of the benefits of combining the different views on diverse social networking sites is performed.

The rest of this paper is organised as follows. Section 2 discusses related research. Section 3 defines the nature of the diverse views to consider in the analysis, and a technique for combining them, as well as exploiting the semantics of edge directionality. Section 4 describes the experimental evaluation performed over real-world datasets. Finally, Section 5 summarises the conclusions drawn from this study and presents future lines of work.

Section snippets

Related work

Generally, social networks are analysed by means of graphs, representing a group of nodes or vertices, which are connected by links or edges. Edges can be directed (as the Followee/Follower relation on Twitter) or undirected (as the friendship relation on Facebook). Communities refer to potentially overlapping groups of nodes that have dense connections within the community, but sparse connections with nodes of other communities. Communities can be defined globally or locally, depending on

Community detection based on heterogeneous social information

The first step to apply a community detection algorithm is to define the information that is going to be available to the algorithm, i.e. the information on which the underlying graph structure will be built upon. When analysing social media, multiple and diverse graphs can be defined. Nodes can represent not only real people, but also diverse entities such as Web pages, journal articles, countries, neighbourhoods, or positions, amongst others [21]. For example, if the goal of the community

Experimental evaluation

This section presents the experimental evaluation performed to assess the effectiveness of the proposed alternatives for leveraging on heterogeneous information provided by social media data, and is organised as follows. Section 4.1 presents the data collections used for evaluating the effectiveness of the presented technique. Section 4.2 presents implementation details and the metrics used for evaluating the different alternatives. Finally, Section 4.4 presents the results derived from the

Conclusions

This work aimed at integrating multiple information sources for performing community detection in social networks. The proposed technique tackled the problem of how to combine several information sources for effectively finding high-quality community partitions. Moreover, it proposed several alternatives for adequately considering the semantics conveyed by directed relations.

Experimental evaluation conducted on two real-world social media datasets demonstrated that the different information

Antonela Tommasel is a member of ISISTAN Research Institute (CONICET-UNICEN) since 2011. She received her Bachelor in Software Engineering at UNICEN University (Argentina) in November 2012 and a PhD in Computer Science degree at the same institution in December 2017. She is also a teacher assistant at the same university. Her research interests include recommender systems, text mining, user modelling and social web.

References (45)

  • P.M. Comar et al.

    A framework for joint community detection across multiple related networks

    Neurocomputing

    (2012)
  • G.W. Corder et al.

    Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach

    (2009)
  • CuiP. et al.

    Social-sensed multimedia computing

    IEEE MultiMedia

    (2016)
  • R.D. Silva et al.

    Measuring quality of similarity functions in approximate data matching

    J. Inform.

    (2007)
  • FangH. et al.

    Community-based question answering via heterogeneous social network learning

  • S. Fortunato

    Community detection in graphs

    Phys. Rep.

    (2010)
  • M.S. Granovetter

    The strength of weak ties

    Am. J. Sociol.

    (1973)
  • R. Guimerà et al.

    Module identification in bipartite and directed networks

    Phys. Rev. E

    (2007)
  • P.K. Reddy et al.

    A Graph Based Approach to Extract a Neighborhood Customer Community for Collaborative Filtering

    (2002)
  • J.G. Lee et al.

    Faving reciprocity in content sharing communities: a comparative analysis of flickr and twitter

  • J. Leskovec et al.

    Empirical comparison of algorithms for network community detection

    Proceedings of the 19th International Conference on World Wide Web

    (2010)
  • LinS. et al.

    Understanding Community Effects on Information Diffusion

    (2015)
  • Cited by (13)

    • Parallel multi-objective evolutionary optimization based dynamic community detection in software ecosystem

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Further, many scholars adopt multi-view learning and graph representation learning techniques for dynamic community detection, among which also adopt historical information. In multi-view learning, information from different angles is collected [28]. Zhou et al. [29] transformed a dynamic community detection problem into a multi-objective optimization problem, and proposed a multi-objective discrete bat algorithm to capture the structure information of a network.

    • LILPA: A label importance based label propagation algorithm for community detection with application to core drug discovery

      2020, Neurocomputing
      Citation Excerpt :

      With the study of complex networks, it is found that community is one of the most important properties of networks. In general, a community in a network consists of a cohesive group of nodes that are relatively densely connected to each other but sparsely connected to other groups [3,4]. There are overlapping nodes shared by several overlapping communities in some networks [5].

    • Social influence based community detection in event-based social networks

      2020, Information Processing and Management
      Citation Excerpt :

      The UID algorithm first builds a new social network with undirected and weighted edges by considering user interests and edges’ direction in initial network, and then utilizes the hierarchical clustering algorithm for detecting communities. Recently, some works focus on the community detection problem in heterogeneous social networks (Huang et al., 2018; Tommasel & Godoy, 2018). Huang et al. (2018) first attempt to mining the overlapping communities in large-scale heterogeneous networks.

    View all citing articles on Scopus

    Antonela Tommasel is a member of ISISTAN Research Institute (CONICET-UNICEN) since 2011. She received her Bachelor in Software Engineering at UNICEN University (Argentina) in November 2012 and a PhD in Computer Science degree at the same institution in December 2017. She is also a teacher assistant at the same university. Her research interests include recommender systems, text mining, user modelling and social web.

    Daniela Godoy is a researcher at CONICET and a member of ISISTAN Research Institute, Tandil, Argentina. She is also a full-time professor in the Department of Computer Science at UNCPBA, Tandil, Argentina. She obtained her Master’s degree in Systems Engineering (2001) and her PhD in Computer Science (2005) at the same university. Her research interests include intelligent agents, user profiling and text mining.

    View full text