Abstract
Data from social network websites are an excellent source of information for studying human behavior and interactions. Typically, when analyzing such data, the default mode of access is de-identified data, which provides a level of privacy protection. However, due to its inability to link to other data, de-identified data has limitations with regard to answering broad and critically important questions about our complex society. In this study, we investigate the properties of information related to privacy, and we present a novel model of data access called decoupled data access for studying personal data using these properties. “Decoupling” refers to separating out the identifying information from the sensitive data that needs protection. We suggest that decoupled data access can provide flexible data integration with error management while providing the same level of privacy protection as de-identified data. We further test the ability of different mechanisms to hinder inference of identity when names are revealed for data integration. Our results show that through chaffing, not specifying the universe around the data, and revealing names in isolation, the real identities of names for both common and rare names can be protected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Newcombe H, Kennedy J, Axford S, James A (1959) Automatic linkage of vital records. Science 130:954–959
Baldi I, Ponti A, Zanetti R, Ciccone G, Merletti F, Gregori D (2010) The impact of record-linakge bias in the Cox model. J Eval Clin Prac 16:92–96
Lahiri P, Larsen M (2005) Regression analysis with linked data. J Am Stat Assoc 100(469):222–230
Scheuren F, Winkler W (1997) Regression analysis of data files that are computer matched, Part II. Surv Meth 23:157–165
Lane J, Schur C (2010) Balancing access to health data and privacy: a review of the issues and approaches for the future. Health Serv Res 45:1456–1467
U.S. General Accounting Office (GAO) (2001) Record linkage and privacy: issues in creating new federal research and statistical information. In: GA0-01-126SP, April 2001, GAO: U.S. General Accounting Office, Washington, DC 20013
Sweeney L (1997) Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 25(2–3):98–110
Narayanan A, Shmatikov V (2010) Myths and fallacies of personally identifiable information. Commun ACM 53:24–26
Hall R, Fienberg S (2011) Privacy-preserving record linkage. In: Privacy in statistical databases 2010: LNCS 2011, vol 6344/2011, pp 269–283, Privacy in statistical databases, 2010, Corfu, Greece.
Cook K, King G, Laitin D (2010) Providing the Web of social science knowledge for the future: a network of social science data collaboratories. NSF-SBE white paper, Oct 2010
King G (2011) Ensuring the data-rich future of the social sciences. Science 331:719–721
Kum HC, Ahalt S, Carsey T (2011) Dealing with data: governments records. Science 332:1263
Lazer D, Pentland A, Adamic L, Aral S, Barabasi A, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M (2009) Computational social science. Science 323:721–723
Center for Disease Control and Prevention (CDC), NCHS Research Data Center (RDC). http://www.cdc.gov/rdc/
Lane J, Heus P, Mulcahy T (2008) Data access in a cyber-world: making use of cyberinfrastructure. Trans Data Privacy 1:2–16
U.S. Census Bureau, CES Research Data Center (RDC). http://www.census.gov/ces/rdcresearch/index.html
Kum HC, Duncan D, Bowers H, Cambridge D (2009) Linking across multiple databases with less than perfect data. In: NRC-CWDT, June 2009
Reynolds MA, Schieve LA, Martin JA, Jeng G, Macaluso M (2003) Trends in multiple births conceived using assisted reproductive technology, United States, 19972000. Pediatrics 111(Supp 1):1159–1162
Fellegi P, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210, American Statistical Association, Alexandria, VA 22314–3415
Elmagarmid K, Panagiotis GI, Verykios SV (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16, American Statistical Association, 1429 Duke St. Alexandria, VA 22314-3415
Guo S, Dong X, Srivastava D, Zajac R (2010) Record linkage with uniqueness constraints and erroneous values. Proc VLDB Endowment 3(1):417–428, VLDB 2010: Singapore
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘02), ACM, New York, pp 269–278, 10.1145/775047.775087
Whang SE, Benjelloun O, Garcia-Molina H (2009) Generic entity resolution with negative rules. VLDB J 18:1261–1277
Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. Proc VLDB Endowment 3(1):1326–1337
Winkler WE (1999) The state of record linkage and current research problems. In: Technical report, U.S. Bureau of the Census
Yakout M, Atallah MJ, Elmagarmid AK (2009) Efficient private record linkage. In: ICDE IEEE, 2009, pp 1283–1286, ICDE, 2009: Shanghai, China
Agrawal R, Evfimievski A, Srikant R (2003) Information sharing across private databases. In: SIGMOD ‘03, New York, pp 86–97
Freedman MJ, Nissim K, Pinkas B (2004) Efficient private matching and set intersection. In: Proceedings of EUROCRYPT 2004, Heidelberg
Churches T, Christen P (2004) Blind data linkage using n-gram similarity comparisons. In: PAKDD, Lecture notes in computer science, vol 3056. Springer, pp 121–126
Churches T, Christen P (2004) Some methods for blindfolded record linkage. BMC Med Inform Decis Mak 4(1):9
Scannapieco M, Figotin I, Bertino E, Elmagarmid AK (2007) Privacy preserving schema and data matching. In: SIGMOD conference, pp 653–664, SIGMOD, 2007: Beijing, China
Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using bloomfilters. BMC Med Inform Decis Mak 9(1):41
Inan A, Kantarcioglu M, Bertino E, Scannapieco M (2008) A hybrid approach to private record linkage. In: Data engineering, 2008. ICDE 2008. IEEE 24th international conference, ICDE 2008, Cancºn, MÕxico.
Vaidya J, Zhu Y, Clifton C (2005) Privacy preserving data mining. Advances in information security. Springer, New York
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: 2007. ICDE 2007. IEEE 23rd International Conference on data engineering, Piscataway, pp 106–115, 15–20, April 2007
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1), Article 3
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl-Based Syst 10(5):557–570
Dwork C (2008) Differential privacy: a survey of results. In: Theory and applications of models of computation: lecture notes in computer science, vol 4978/2008, pp 1–19, LCNS 2008, Springer-Verlag, Berlin Heidelberg
Rindfleisch TC (1997) Privacy, information technology, and health care. Commun ACM 40(8):92–100. doi:10.1145/257874.257896
Privacy Protection Study Commission. Personal Privacy in an Information Society, July 1977. http://epic.org/privacy/ppsc1977report/
U.S. Department of Health, Education and Welfare (HEW) (1973) Report of the secretary’s advisory committee on automated personal data systems: records, computers and the rights of citizens
Gross R, Acquisti A (2005) Information revelation and privacy in online social networks. In: Pre-proceedings version. ACM workshop on privacy in the electronic society (WPES), Alexandria
Korolova A, Motwani R, Nabar SU, Ying Xu (2008) Link privacy in social networks. In: CIKM ‘08, Oct 2008
Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: WWW 2009 MADRID, WWW 2009 Madrid, Spain
O'Hara K, Shadbolt N (2010) Privacy on the data Web. Commun ACM 53:39–41
Backstrom L, Dwork C, Kleinberg J (2011) Where fore art thou R3579X ?: anonymized social networks, hidden patterns, and structural steganography. Commun ACM 54(12):133–141
Lucas M, Borisov N (2008) flyByNight: mitigating the privacy risks of social networking. In: WPES ’08, Alexandria
Felt A, Evans D (2008) Privacy protection for social networking APIs. In: Workshop on Web 2.0 Security and Privacy (W2SP), W2SP 2008, Oakland, California
Rivest R, Shamir A, Adleman L (1978) A method for obtaining digital signatures and public-key cryptosystems. Commun ACM 21(2):120–126
Corder GW, Foreman DI (2011) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New Jersey, May 2009
Acknowledgement
We thank everyone who participated in the survey. We also thank Mike Reiter and Fred Brooks for their insightful comments, and Gautam Sanka, Ian Sang-Jun Kim, and Ren Bauer for their assistance with the experiment. This research was supported in part by funding from the NC Department of Health and Human Services and by NSF award no. CNS-0915364. The authors gratefully acknowledge their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kum, HC., Ahalt, S., Pathak, D. (2013). Privacy-Preserving Data Integration Using Decoupled Data. In: Altshuler, Y., Elovici, Y., Cremers, A., Aharony, N., Pentland, A. (eds) Security and Privacy in Social Networks. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4139-7_11
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4139-7_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4138-0
Online ISBN: 978-1-4614-4139-7
eBook Packages: Computer ScienceComputer Science (R0)