skip to main content
10.1145/2505515.2505636acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Disinformation techniques for entity resolution

Published:27 October 2013Publication History

ABSTRACT

We study the problem of disinformation. We assume that an ``agent'' has some sensitive information that the ``adversary'' is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent's goal is to disseminate false information to ``dilute'' what is known by the adversary. We model the adversary as an Entity Resolution (ER) process that pieces together available information. We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. We then evaluate our disinformation planning algorithms on real and synthetic data and compare the robustness of existing ER algorithms. In general, our disinformation techniques can be used as a framework for testing ER robustness.

References

  1. G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, N. Mishra, R. Motwani, U. Srivastava, D. Thomas, J. Widom, and Y. Xu. Vision paper: Enabling privacy for the paranoids. In VLDB, pages 708--719, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Christen. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. A. Hernández and S. J. Stolfo. The merge/purge problem for large databases. In Proc. of ACM SIGMOD, pages 127--138, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Comput. Surv., 31(3):264--323, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Kaufman, S. Rosset, and C. Perlich. Leakage in data mining: formulation, detection, and avoidance. In KDD, pages 556--563, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Papadimitriou and H. Garcia-Molina. Data leakage detection. IEEE TKDE, 23(1):51--63, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Reputation.com. http://www.reputation.com.Google ScholarGoogle Scholar
  10. TrackMeNot. http://cs.nyu.edu/trackmenot.Google ScholarGoogle Scholar
  11. Wall Street Journal. Insurers test data profiles to identify risky clients, 2011.Google ScholarGoogle Scholar
  12. S. E. Whang. Data Analytics: Integration and Privacy. PhD thesis, Stanford University, 2012.Google ScholarGoogle Scholar
  13. S. E. Whang and H. Garcia-Molina. Managing information leakage. In CIDR, pages 79--84, 2011.Google ScholarGoogle Scholar
  14. S. E. Whang and H. Garcia-Molina. A model for quantifying information leakage. In SDM, pages 25--44, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. E. Whang and H. Garcia-Molina. Disinformation techniques for entity resolution. Technical report, Stanford University, available at http://ilpubs.stanford.edu:8090/1014/.Google ScholarGoogle Scholar
  16. W. Winkler. Overview of record linkage and current research directions. Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Disinformation techniques for entity resolution

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
            October 2013
            2612 pages
            ISBN:9781450322638
            DOI:10.1145/2505515

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 October 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader