Knowledge acquisition from social platforms based on network distributions fitting

https://doi.org/10.1016/j.chb.2014.12.015Get rights and content

Highlights

Abstract

The uniqueness of online social networks makes it possible to implement new methods that increase the quality and effectiveness of research processes. While surveys are one of the most important tools for research, the representativeness of selected online samples is often a challenge and the results are hardly generalizable. An approach based on surveys with representativeness targeted at network measure distributions is proposed and analyzed in this paper. Its main goal is to focus not only on sample representativeness in terms of demographic attributes, but also to follow the measures distributions within main network. The approach presented has many application areas related to online research, sampling a network for the evaluation of collaborative learning processes, and candidate selection for training purposes with the ability to distribute information within a social network.

Introduction

Social networking sites are used as the research environment, and they provide opportunities to analyze real-world behavior (Abbasi, Chai, Liu, & Sagoo, 2012) as well as online activities (Gjoka et al., 2009, Utz and Beukeboom, 2011) with the applications in the areas related to collaborative learning (Kwon, Liu, & Johnson, 2014), computer-mediated educational environments (Rummel & Spada, 2005) and knowledge management (Ordóñez de Pablos, 2004). Due to the complexity of the network structures, the analyses are usually performed using some samples to find structures that are smaller, but which share similar properties and distributions (Ebbes, Huang, Rangaswamy, & Thadakamalla, 2008). Recent studies in this field have focused on new algorithms (Lee et al., 2006, Stumpf et al., 2005) and various areas of application (Gjoka et al., 2009, Lakhina et al., 2003, Rusmevichientong et al., 2001). The knowledge gathered from social network analysis can be extended using either typical surveys or new approaches based on adaptive surveys that optimize survey costs, quality and response rates. Research in this area is still in the early stages and adaptive methods are rarely implemented (Schouten, Calinescu, & Luiten, 2011). Another motivation for further research on the development of sampling methods is to increase the representativeness of survey data. The majority of studies on social media focuses on social network sites such as Facebook, and many of these studies use (online) surveys (Back et al., 2010, Utz and Krämer, 2009). The participants are usually students or self-selected. A problem with this approach is the representativeness of the sample – young, highly educated individuals or highly motivated users are usually overrepresented. Similar issues were identified in the field of knowledge management and collaborative learning to build groups with specific profile (Dascalua, Bodea, Lytras, Ordoñez de Pablos, & Burlacua, 2014). Although it is possible to extract behavioral data from social media and use them as the basis of the analysis (Liu, 2007, Thelwall, 2008), social scientists are often interested in the subjective experience of social media users, such as motivations for and gratifications of social media use, evaluation of competences and knowledge resources within the network (Colomo-Palacios et al., 2014b, Ordóñez de Pablos, 2004, Różewski and Ciszczyk, 2009). To evaluate them, surveys are still the most suitable tool. In this paper, a new method for judging and enhancing the representativeness of an online sample is presented. The authors argue that it might be useful to utilize network measures such as centrality or degree as a basis for determining the representativeness of an online sample vs. the entire population.

Some users have a very central social position within the online social networks, and they possess many more inbound and outbound connections when compared with other users. By comparing the network profile of the sample and the overall population, the representativeness of the online sample can be determined. Moreover, it is possible to develop algorithms that suggest which users should be approached in order to enhance the representativeness of a given sample so that the results will have higher potential in the areas of community building, information dissemination, and collaborative learning (Cowan & Jonard, 2004). The approach presented below is based on selecting an adequate set of candidates in each step of the multistage process to improve the representativeness of the sample in terms of network measures. Depending on the research goal and the area of applications, different network characteristics might be considered. To identify opinion leaders, the best candidates for leadership in collaborative learning or knowledge brokers, it is usually necessary to evaluate centrality measures (Boari & Riboldazzi, 2014). However, fulfilling a bridge position is more important when focusing on advertising and diffusing innovation or spreading knowledge among network nodes. From the perspective of collaborative learning, it is important to select nodes with specific characteristic for future activity within the network, and representative selection can impact on the future spread of knowledge within it.

While the structure of connections within the social network influences collaborative learning processes, there is a clear need to access information about participants and their potential for learning processes and sharing of information with other participants. Collaborative learning and group-based learning is closely related to dynamic social systems (Strijbos, 2001) where the members of the community interact and share experiences with one another (Chiu, 2008). During the learning process, members of the community evaluate other ideas and get engaged in monitoring the tasks and progress of other participants (Chiu, 2000). Key problems found here can be addressed to quantify proper users’ features, select users with specific characteristic, and split users into optimal groups (Long & Qing-hong, 2014) in order to boost the sharing of knowledge in organizations (Lytras, Tennyson, & Ordóñez de Pablos, 2008). During collaborative learning processes, building teams and increasing potential by acquiring additional representatives with specific knowledge or competences can be very important, not only in terms of knowledge itself, but also in terms of network characteristics. While the ability to attain knowledge from all nodes of a network can be limited, sampling methods can be applied to acquire information desired. The proposed method can be adapted to different research goals by using weighted sampling. As online surveys are usually based on voluntary participation, and because there may be low response rates, the obtained sample may have other characteristics than the random sample. The proposed method makes it possible to direct the selection process towards expected characteristics of the sample.

Section snippets

Conventional and adaptive network sampling

Research related to network sampling is based on various techniques using both conventional and adaptive approaches. Sampling design is treated as conventional when it does not use acquired data in the sampling process. The first group of methods in this class is based on random-node selection focused on uniform or proportional-to-node degree probabilities (Maiya & Berger-Wolf, 2010), random edge selection (Ahmed, Neville, & Kompella, 2011) and the egocentric method (Ma, Gustafson, Moitra, &

Conceptual framework

In this part a balanced adaptive distribution fitting approach based on a set of network measure distributions is proposed. Its main goal is to build representative survey responses based on a selected set of participants in terms of distance from the whole network distribution. The function minimizing a distance from the vector of network distributions is proposed, and the network members are selected to fit the reference distributions for the whole network, which are known in advance. In Fig.

Empirical research

The new approach is demonstrated by presenting the results of a survey performed within an online social network based on a graphical virtual world with both entertainment and educational purposes. An online survey covering motivations, self-disclosure and self-presentation was conducted among portal users and filled in by 373 of them, while 9631 users logged into the system in the examined period and were identified by their unique user_ID. The structural measures computed for the full network

Discussion and summary

Growing engagement in social network systems and moving from traditional environments to online systems creates a new space for both theoretical and empirical studies. Together with technological development, the need for new methods also grows, making research processes more efficient and increasing their quality. While adaptive survey methodologies were the subject of earlier research, they are not frequently applied to online research. An alternative to available solutions was presented in

Acknowledgments

The work was partially supported by Fellowship co-Financed by European Union within European Social Fund, by European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 316097 [ENGINE] and by The National Science Centre, the decision no. DEC-2013/09/B/ST6/02317.

References (58)

  • M.M. Chiu

    Adapting teacher interventions to student needs during cooperative learning

    American Educational Research Journal

    (2004)
  • M.M. Chiu

    Flowing toward correct contributions during groups’ mathematics problem solving: A statistical discourse analysis

    Journal of the Learning Sciences

    (2008)
  • R. Colomo-Palacios et al.

    Providing knowledge recommendations: An approach for informal electronic mentoring

    Interactive Learning Environments

    (2014)
  • R. Colomo-Palacios et al.

    I-COMPETERE: Using applied intelligence in search of competency gaps in software project managers

    Information Systems Frontiers

    (2014)
  • R. Colomo-Palacios et al.

    SABUMO-dTest: Design and evaluation of an intelligent collaborative distributed testing framework

    Computer Science and Information Systems

    (2014)
  • Couper, M., & Groves, R., (2009). Moving from prespecified to adaptive survey design. Presented at the Modernization of...
  • M.I. Dascalua et al.

    Improving e-learning communities through optimal composition of multidisciplinary learning groups

    Computers in Human Behaviour

    (2014)
  • Ebbes, P., Huang, Z., Rangaswamy, A., & Thadakamalla, H.P., (2008). Sampling Large-scale Social Networks: Insights from...
  • O. Frank et al.

    Estimating the size of hidden populations using snowball sampling

    Journal of Official Statistics

    (1994)
  • Gjoka, M., Kurant, M., Butts, C.T., & Markopoulou, A., (2009). Practical recommendations on crawling online social...
  • B.S. Greenberg et al.

    Developing an optimal call scheduling strategy for a telephone survey

    Journal of Official Statistic

    (1990)
  • R.M. Groves

    Survey errors and survey costs

    (1989)
  • R.M. Groves et al.

    Responsive design for household surveys: tools for actively controlling survey errors and costs

    Journal of the Royal Statistical Society, Series A

    (2006)
  • M.S. Handcock et al.

    Modeling social networks from sampled data

    Annals of Applied Statistics

    (2010)
  • D.D. Heckathorn

    Extensions of respondent-driven sampling: Analyzing continuous variables and controlling for differential recruitment

    Sociological Methodology

    (2007)
  • J. Jankowski et al.

    The multidimensional study of viral campaigns as branching processes

  • J. Jankowski et al.

    Compensatory seeding in networks with varying availability of nodes

  • S. Kullback et al.

    On information and sufficiency

    Annals of Mathematical Statistics

    (1951)
  • A. Lakhina et al.

    Sampling biases in IP topology measurements

  • Cited by (0)

    View full text