ABSTRACT
We study the problem of disinformation. We assume that an ``agent'' has some sensitive information that the ``adversary'' is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent's goal is to disseminate false information to ``dilute'' what is known by the adversary. We model the adversary as an Entity Resolution (ER) process that pieces together available information. We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. We then evaluate our disinformation planning algorithms on real and synthetic data and compare the robustness of existing ER algorithms. In general, our disinformation techniques can be used as a framework for testing ER robustness.
- G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, N. Mishra, R. Motwani, U. Srivastava, D. Thomas, J. Widom, and Y. Xu. Vision paper: Enabling privacy for the paranoids. In VLDB, pages 708--719, 2004. Google ScholarDigital Library
- P. Christen. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990. Google ScholarDigital Library
- M. A. Hernández and S. J. Stolfo. The merge/purge problem for large databases. In Proc. of ACM SIGMOD, pages 127--138, 1995. Google ScholarDigital Library
- A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Comput. Surv., 31(3):264--323, 1999. Google ScholarDigital Library
- S. Kaufman, S. Rosset, and C. Perlich. Leakage in data mining: formulation, detection, and avoidance. In KDD, pages 556--563, 2011. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- P. Papadimitriou and H. Garcia-Molina. Data leakage detection. IEEE TKDE, 23(1):51--63, 2011. Google ScholarDigital Library
- Reputation.com. http://www.reputation.com.Google Scholar
- TrackMeNot. http://cs.nyu.edu/trackmenot.Google Scholar
- Wall Street Journal. Insurers test data profiles to identify risky clients, 2011.Google Scholar
- S. E. Whang. Data Analytics: Integration and Privacy. PhD thesis, Stanford University, 2012.Google Scholar
- S. E. Whang and H. Garcia-Molina. Managing information leakage. In CIDR, pages 79--84, 2011.Google Scholar
- S. E. Whang and H. Garcia-Molina. A model for quantifying information leakage. In SDM, pages 25--44, 2012.Google ScholarCross Ref
- S. E. Whang and H. Garcia-Molina. Disinformation techniques for entity resolution. Technical report, Stanford University, available at http://ilpubs.stanford.edu:8090/1014/.Google Scholar
- W. Winkler. Overview of record linkage and current research directions. Technical report, Statistical Research Division, U.S. Bureau of the Census, Washington, DC, 2006.Google Scholar
Index Terms
- Disinformation techniques for entity resolution
Recommendations
Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceOver the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” ...
Collective entity resolution in relational data
Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data ...
Joint entity resolution on multiple datasets
Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact ...
Comments