Skip to main content

Privacy-Preserving Data Integration Using Decoupled Data

  • Chapter
  • First Online:
Book cover Security and Privacy in Social Networks

Abstract

Data from social network websites are an excellent source of information for studying human behavior and interactions. Typically, when analyzing such data, the default mode of access is de-identified data, which provides a level of privacy protection. However, due to its inability to link to other data, de-identified data has limitations with regard to answering broad and critically important questions about our complex society. In this study, we investigate the properties of information related to privacy, and we present a novel model of data access called decoupled data access for studying personal data using these properties. “Decoupling” refers to separating out the identifying information from the sensitive data that needs protection. We suggest that decoupled data access can provide flexible data integration with error management while providing the same level of privacy protection as de-identified data. We further test the ability of different mechanisms to hinder inference of identity when names are revealed for data integration. Our results show that through chaffing, not specifying the universe around the data, and revealing names in isolation, the real identities of names for both common and rare names can be protected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Newcombe H, Kennedy J, Axford S, James A (1959) Automatic linkage of vital records. Science 130:954–959

    Article  Google Scholar 

  2. Baldi I, Ponti A, Zanetti R, Ciccone G, Merletti F, Gregori D (2010) The impact of record-linakge bias in the Cox model. J Eval Clin Prac 16:92–96

    Article  Google Scholar 

  3. Lahiri P, Larsen M (2005) Regression analysis with linked data. J Am Stat Assoc 100(469):222–230

    Article  MathSciNet  MATH  Google Scholar 

  4. Scheuren F, Winkler W (1997) Regression analysis of data files that are computer matched, Part II. Surv Meth 23:157–165

    Google Scholar 

  5. Lane J, Schur C (2010) Balancing access to health data and privacy: a review of the issues and approaches for the future. Health Serv Res 45:1456–1467

    Article  Google Scholar 

  6. U.S. General Accounting Office (GAO) (2001) Record linkage and privacy: issues in creating new federal research and statistical information. In: GA0-01-126SP, April 2001, GAO: U.S. General Accounting Office, Washington, DC 20013

    Google Scholar 

  7. Sweeney L (1997) Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 25(2–3):98–110

    Article  Google Scholar 

  8. Narayanan A, Shmatikov V (2010) Myths and fallacies of personally identifiable information. Commun ACM 53:24–26

    Article  Google Scholar 

  9. Hall R, Fienberg S (2011) Privacy-preserving record linkage. In: Privacy in statistical databases 2010: LNCS 2011, vol 6344/2011, pp 269–283, Privacy in statistical databases, 2010, Corfu, Greece.

    Google Scholar 

  10. Cook K, King G, Laitin D (2010) Providing the Web of social science knowledge for the future: a network of social science data collaboratories. NSF-SBE white paper, Oct 2010

    Google Scholar 

  11. King G (2011) Ensuring the data-rich future of the social sciences. Science 331:719–721

    Article  Google Scholar 

  12. Kum HC, Ahalt S, Carsey T (2011) Dealing with data: governments records. Science 332:1263

    Article  Google Scholar 

  13. Lazer D, Pentland A, Adamic L, Aral S, Barabasi A, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M (2009) Computational social science. Science 323:721–723

    Article  Google Scholar 

  14. Center for Disease Control and Prevention (CDC), NCHS Research Data Center (RDC). http://www.cdc.gov/rdc/

  15. Lane J, Heus P, Mulcahy T (2008) Data access in a cyber-world: making use of cyberinfrastructure. Trans Data Privacy 1:2–16

    MathSciNet  Google Scholar 

  16. U.S. Census Bureau, CES Research Data Center (RDC). http://www.census.gov/ces/rdcresearch/index.html

  17. Kum HC, Duncan D, Bowers H, Cambridge D (2009) Linking across multiple databases with less than perfect data. In: NRC-CWDT, June 2009

    Google Scholar 

  18. Reynolds MA, Schieve LA, Martin JA, Jeng G, Macaluso M (2003) Trends in multiple births conceived using assisted reproductive technology, United States, 19972000. Pediatrics 111(Supp 1):1159–1162

    Google Scholar 

  19. Fellegi P, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210, American Statistical Association, Alexandria, VA 22314–3415

    Article  Google Scholar 

  20. Elmagarmid K, Panagiotis GI, Verykios SV (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16, American Statistical Association, 1429 Duke St. Alexandria, VA 22314-3415

    Article  Google Scholar 

  21. Guo S, Dong X, Srivastava D, Zajac R (2010) Record linkage with uniqueness constraints and erroneous values. Proc VLDB Endowment 3(1):417–428, VLDB 2010: Singapore

    Google Scholar 

  22. Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘02), ACM, New York, pp 269–278, 10.1145/775047.775087

  23. Whang SE, Benjelloun O, Garcia-Molina H (2009) Generic entity resolution with negative rules. VLDB J 18:1261–1277

    Article  Google Scholar 

  24. Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. Proc VLDB Endowment 3(1):1326–1337

    Google Scholar 

  25. Winkler WE (1999) The state of record linkage and current research problems. In: Technical report, U.S. Bureau of the Census

    Google Scholar 

  26. Yakout M, Atallah MJ, Elmagarmid AK (2009) Efficient private record linkage. In: ICDE IEEE, 2009, pp 1283–1286, ICDE, 2009: Shanghai, China

    Google Scholar 

  27. Agrawal R, Evfimievski A, Srikant R (2003) Information sharing across private databases. In: SIGMOD ‘03, New York, pp 86–97

    Google Scholar 

  28. Freedman MJ, Nissim K, Pinkas B (2004) Efficient private matching and set intersection. In: Proceedings of EUROCRYPT 2004, Heidelberg

    Google Scholar 

  29. Churches T, Christen P (2004) Blind data linkage using n-gram similarity comparisons. In: PAKDD, Lecture notes in computer science, vol 3056. Springer, pp 121–126

    Google Scholar 

  30. Churches T, Christen P (2004) Some methods for blindfolded record linkage. BMC Med Inform Decis Mak 4(1):9

    Article  Google Scholar 

  31. Scannapieco M, Figotin I, Bertino E, Elmagarmid AK (2007) Privacy preserving schema and data matching. In: SIGMOD conference, pp 653–664, SIGMOD, 2007: Beijing, China

    Google Scholar 

  32. Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using bloomfilters. BMC Med Inform Decis Mak 9(1):41

    Article  Google Scholar 

  33. Inan A, Kantarcioglu M, Bertino E, Scannapieco M (2008) A hybrid approach to private record linkage. In: Data engineering, 2008. ICDE 2008. IEEE 24th international conference, ICDE 2008, Cancºn, MÕxico.

    Google Scholar 

  34. Vaidya J, Zhu Y, Clifton C (2005) Privacy preserving data mining. Advances in information security. Springer, New York

    Google Scholar 

  35. Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: 2007. ICDE 2007. IEEE 23rd International Conference on data engineering, Piscataway, pp 106–115, 15–20, April 2007

    Google Scholar 

  36. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1), Article 3

    Google Scholar 

  37. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl-Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  38. Dwork C (2008) Differential privacy: a survey of results. In: Theory and applications of models of computation: lecture notes in computer science, vol 4978/2008, pp 1–19, LCNS 2008, Springer-Verlag, Berlin Heidelberg

    Google Scholar 

  39. Rindfleisch TC (1997) Privacy, information technology, and health care. Commun ACM 40(8):92–100. doi:10.1145/257874.257896

    Article  Google Scholar 

  40. Privacy Protection Study Commission. Personal Privacy in an Information Society, July 1977. http://epic.org/privacy/ppsc1977report/

  41. U.S. Department of Health, Education and Welfare (HEW) (1973) Report of the secretary’s advisory committee on automated personal data systems: records, computers and the rights of citizens

    Google Scholar 

  42. Gross R, Acquisti A (2005) Information revelation and privacy in online social networks. In: Pre-proceedings version. ACM workshop on privacy in the electronic society (WPES), Alexandria

    Google Scholar 

  43. Korolova A, Motwani R, Nabar SU, Ying Xu (2008) Link privacy in social networks. In: CIKM ‘08, Oct 2008

    Google Scholar 

  44. Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: WWW 2009 MADRID, WWW 2009 Madrid, Spain

    Google Scholar 

  45. O'Hara K, Shadbolt N (2010) Privacy on the data Web. Commun ACM 53:39–41

    Article  Google Scholar 

  46. Backstrom L, Dwork C, Kleinberg J (2011) Where fore art thou R3579X ?: anonymized social networks, hidden patterns, and structural steganography. Commun ACM 54(12):133–141

    Article  Google Scholar 

  47. Lucas M, Borisov N (2008) flyByNight: mitigating the privacy risks of social networking. In: WPES ’08, Alexandria

    Google Scholar 

  48. Felt A, Evans D (2008) Privacy protection for social networking APIs. In: Workshop on Web 2.0 Security and Privacy (W2SP), W2SP 2008, Oakland, California

    Google Scholar 

  49. Rivest R, Shamir A, Adleman L (1978) A method for obtaining digital signatures and public-key cryptosystems. Commun ACM 21(2):120–126

    Article  MathSciNet  MATH  Google Scholar 

  50. Corder GW, Foreman DI (2011) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New Jersey, May 2009

    Google Scholar 

Download references

Acknowledgement

We thank everyone who participated in the survey. We also thank Mike Reiter and Fred Brooks for their insightful comments, and Gautam Sanka, Ian Sang-Jun Kim, and Ren Bauer for their assistance with the experiment. This research was supported in part by funding from the NC Department of Health and Human Services and by NSF award no. CNS-0915364. The authors gratefully acknowledge their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hye-Chung Kum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kum, HC., Ahalt, S., Pathak, D. (2013). Privacy-Preserving Data Integration Using Decoupled Data. In: Altshuler, Y., Elovici, Y., Cremers, A., Aharony, N., Pentland, A. (eds) Security and Privacy in Social Networks. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4139-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-4139-7_11

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-4138-0

  • Online ISBN: 978-1-4614-4139-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics