Abstract
In this work, we study the privacy risk due to profile matching across online social networks (OSNs), in which anonymous profiles of OSN users are matched to their real identities using auxiliary information about them. We consider different attributes that are publicly shared by users. Such attributes include both strong identifiers such as user name and weak identifiers such as interest or sentiment variation between different posts of a user in different platforms. We study the effect of using different combinations of these attributes to profile matching in order to show the privacy threat in an extensive way. The proposed framework mainly relies on machine learning techniques and optimization algorithms. We evaluate the proposed framework on three datasets (Twitter - Foursquare, Google+ - Twitter, and Flickr) and show how profiles of the users in different OSNs can be matched with high probability by using the publicly shared attributes and/or the underlying graphical structure of the OSNs. We also show that the proposed framework notably provides higher precision values compared to state-of-the-art that relies on machine learning techniques. We believe that this work will be a valuable step to build a tool for the OSN users to understand their privacy risks due to their public sharings.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Such profiles are required to construct the ground-truth for training.
- 2.
Sets \(\mathrm {A_e}\) and \(\mathrm {T_e}\) do not include any users from sets \(\mathrm {A_t}\) and \(\mathrm {T_t}\).
- 3.
US social security name database includes year of birth, gender, and the corresponding name for babies born in the United States.
- 4.
The case when the sizes of the OSNs are different can be also handled similarly (by padding one OSN with dummy users to equalize the sizes).
References
Google maps API (2020). https://developers.google.com/maps/
Natural language toolkit (2020). http://www.nltk.org/
Swarm (2020). https://www.swarmapp.com/
Twitter streaming API (2020). https://dev.twitter.com/streaming/overview
Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediated Commun. 13(1), 210–230 (2007)
Debnath, S., Ganguly, N., Mitra, P.: Feature weighting in content based recommendation system using social network analysis. In: WWW (2008)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL (2005)
Goga, O., Loiseau, P., Sommer, R., Teixeira, R., Gummadi, K.P.: On the reliability of profile matching across large online social networks. In: KDD (2015)
Halimi, A., Ayday, E.: Profile matching across unstructured online social networks: Threats and countermeasures. arXiv preprint arXiv:1711.01815 (2017)
Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K.: Identifying users across social tagging systems. In: ICWSM (2011)
Ji, S., Li, W., Mittal, P., Hu, X., Beyah, R.: Secgraph: a uniform and open-source evaluation system for graph data anonymization and de-anonymization. In: USENIX Security (2015)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Liu, J., Zhang, F., Song, X., Song, Y.I., Lin, C.Y., Hon, H.W.: What’s in the name?: An unsupervised approach to link users across communities. In: WSDM (2013)
Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: SIGMOD (2014)
Malhotra, A., Totti, L., Meira Jr, W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: ASONAM (2012)
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: IEEE S&P (2009)
Nilizadeh, S., Kapadia, A., Ahn, Y.Y.: Community-enhanced de-anonymization of online social networks. In: CCS (2014)
Nunes, A., Calado, P., Martins, B.: Resolving user identities over social networks through supervised learning and rich similarity features. In: SAC (2012)
Pedarsani, P., Figueiredo, D.R., Grossglauser, M.: A bayesian method for matching two similar graphs without seeds. In: Allerton (2013)
Sharad, K., Danezis, G.: An automated social graph de-anonymization technique. In: WPES (2014)
Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explor. Newslett. 18(2), 5–17 (2017)
Vosecky, J., Hong, D., Shen, V.Y.: User identification across multiple social networks. In: NDT (2009)
Wang, Y., Feng, C., Chen, L., Yin, H., Guo, C., Chu, Y.: User identity linkage across social networks via linked heterogeneous network embedding. World Wide Web 22(6), 2611–2632 (2018). https://doi.org/10.1007/s11280-018-0572-3
Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to de-anonymize social network users. In: IEEE S&P (2010)
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu
Zafarani, R., Liu, H.: Connecting users across social media sites: a behavioral-modeling approach. In: KDD (2013)
Zhou, J., Fan, J.: Translink: User identity linkage across heterogeneous social networks via translating embeddings. In: INFOCOM, pp. 2116–2124 (2019)
Acknowledgments
We thank Volkan Küçük for collecting \(\mathrm {D1}\) and \(\mathrm {D2}\) and for his help in the initial phases of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Halimi, A., Ayday, E. (2020). Profile Matching Across Online Social Networks. In: Meng, W., Gollmann, D., Jensen, C.D., Zhou, J. (eds) Information and Communications Security. ICICS 2020. Lecture Notes in Computer Science(), vol 12282. Springer, Cham. https://doi.org/10.1007/978-3-030-61078-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-61078-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61077-7
Online ISBN: 978-3-030-61078-4
eBook Packages: Computer ScienceComputer Science (R0)