research-article

Public Access

How to Accurately and Privately Identify Anomalies

Authors:
Hafiz Asif

Rutgers University, Newark, NJ, USA

Rutgers University, Newark, NJ, USA
View Profile

,
Periklis A. Papakonstantinou

Rutgers University, Newark, NJ, USA

Rutgers University, Newark, NJ, USA
View Profile

,
Jaideep Vaidya

Rutgers University, Newark, NJ, USA

Rutgers University, Newark, NJ, USA
View Profile

CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications SecurityNovember 2019Pages 719–736https://doi.org/10.1145/3319535.3363209

Published:06 November 2019Publication History

CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

Pages 719–736

ABSTRACT

Identifying anomalies in data is central to the advancement of science, national security, and finance. However, privacy concerns restrict our ability to analyze data. Can we lift these restrictions and accurately identify anomalies without hurting the privacy of those who contribute their data? We address this question for the most practically relevant case, where a record is considered anomalous relative to other records. We make four contributions. First, we introduce the notion of sensitive privacy, which conceptualizes what it means to privately identify anomalies. Sensitive privacy generalizes the important concept of differential privacy and is amenable to analysis. Importantly, sensitive privacy admits algorithmic constructions that provide strong and practically meaningful privacy and utility guarantees. Second, we show that differential privacy is inherently incapable of accurately and privately identifying anomalies; in this sense, our generalization is necessary. Third, we provide a general compiler that takes as input a differentially private mechanism (which has bad utility for anomaly identification) and transforms it into a sensitively private one. This compiler, which is mostly of theoretical importance, is shown to output a mechanism whose utility greatly improves over the utility of the input mechanism. As our fourth contribution we propose mechanisms for a popular definition of anomaly ((β,r)-anomaly) that (i) are guaranteed to be sensitively private, (ii) come with provable utility guarantees, and (iii) are empirically shown to have an overwhelmingly accurate performance over a range of datasets and evaluation criteria.

Supplemental Material

p719-asif.webm

webm

73.7 MB

Download

References

Charu C Aggarwal. 2015. Outlier analysis. In Data mining. Springer, 237--263.Google Scholar
Miguel E Andrés, Nicolás E Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 901--914.Google ScholarDigital Library
Vic Barnett and Toby Lewis. 2000. Outliers in statistical data .Wiley.Google Scholar
Daniel M Bittner, Anand D Sarwate, and Rebecca N Wright. 2018. Using Noisy Binary Search for Differentially Private Anomaly Detection. In International Symposium on Cyber Security Cryptography and Machine Learning. Springer, 20--37.Google Scholar
Martin Bobrow. 2013. Balancing privacy with public benefit. Nature News, Vol. 500, 7461 (2013), 123.Google ScholarCross Ref
Jonas Böhler, Daniel Bernau, and Florian Kerschbaum. 2017. Privacy-preserving outlier detection for data streams. In IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 225--238.Google ScholarCross Ref
Centers for Medicare & Medicaid Services. 1996. The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Online at http://www.cms.hhs.gov/hipaa/.Google Scholar
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR), Vol. 41, 3 (2009), 15.Google ScholarDigital Library
Ronald Cramer, I. B. Damgård, and Jesper Buus Nielsen. 2015. Secure multiparty computation: an information-theoretic approach .Cambridge University Press.Google Scholar
Andrea Dal Pozzolo, Olivier Caelen, Reid A Johnson, and Gianluca Bontempi. 2015. Calibrating probability with undersampling for unbalanced classification. In Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 159--166.Google ScholarCross Ref
Alison M Darcy, Alan K Louie, and Laura Weiss Roberts. 2016. Machine learning and the profession of medicine. Jama, Vol. 315, 6 (2016), 551--552.Google ScholarCross Ref
Yihe Dong, Samuel B Hopkins, and Jerry Li. 2019. Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection. arXiv preprint arXiv:1906.11366 (2019).Google Scholar
Stelios Doudalis, Ios Kotsogiannis, Samuel Haney, Ashwin Machanavajjhala, and Sharad Mehrotra. 2017. One-sided differential privacy. arXiv preprint arXiv:1712.05888 (2017).Google Scholar
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--12.Google Scholar
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214--226.Google ScholarDigital Library
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. Springer, 265--284.Google Scholar
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.Google Scholar
Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. 2015. Robust traceability from trace amounts. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 650--669.Google ScholarDigital Library
Yaniv Erlich and Arvind Narayanan. 2014. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, Vol. 15, 6 (2014), 409.Google ScholarCross Ref
David Freedman, Robert Pisani, and Roger Purves. 1998. Statistics .W.W. Norton.Google Scholar
Machine Learning Group. 2018. Credit Card Fraud Detection. https://www.kaggle.com/mlg-ulb/creditcardfraud/home .Google Scholar
Melissa Gymrek, Amy L McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science, Vol. 339, 6117 (2013), 321--324.Google Scholar
Xi He, Ashwin Machanavajjhala, and Bolin Ding. 2014. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD. ACM, 1447--1458.Google ScholarDigital Library
Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS genetics, Vol. 4, 8 (2008), e1000167.Google Scholar
Marcello Ienca, Pim Haselager, and Ezekiel J Emanuel. 2018. Brain leaks and consumer neurotechnology. Nature biotechnology, Vol. 36, 9 (2018), 805--810.Google Scholar
Ian Jolliffe. 2011. Principal component analysis. In International encyclopedia of statistical science. Springer, 1094--1096.Google Scholar
Zach Jorgensen, Ting Yu, and Graham Cormode. 2015. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31st International Conference on Data Engineering (ICDE). IEEE, 1023--1034.Google ScholarCross Ref
Seppo Karrila, Julian Hock Ean Lee, and Greg Tucker-Kellogg. 2011. A comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery. Cancer informatics, Vol. 10 (2011), CIN--S6868.Google Scholar
Michael Kearns, Aaron Roth, Zhiwei Steven Wu, and Grigory Yaroslavtsev. 2016. Private algorithms for the protected in social network search. Proceedings of the National Academy of Sciences, Vol. 113, 4 (2016), 913--918.Google ScholarCross Ref
Daniel Kifer and Bing-Rong Lin. 2012. An axiomatic view of statistical privacy and utility. Journal of Privacy and Confidentiality, Vol. 4, 1 (2012), 5--49.Google ScholarCross Ref
Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS), Vol. 39, 1 (2014), 3.Google ScholarDigital Library
Edwin M Knorr and Raymond T Ng. 1997. A Unified Notion of Outliers: Properties and Computation.. In KDD, Vol. 97. 219--222.Google Scholar
Edwin M Knorr and Raymond T Ng. 1998. Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 1998 VLDB. Citeseer, 392--403.Google Scholar
Edward Lui and Rafael Pass. 2015. Outlier privacy. In TCC. Springer, 277--305.Google Scholar
D Luquetti, P Claes, DK Liberton, K Daniels, KM Rosana, EE Quillen, LN Pearson, B McEvoy, M Bauchet, AA Zaidi, et al. 2014. Modeling 3D Facial Shape from DNA. PLoS Genetics, Vol. 10, 3 (2014), e1004224.Google ScholarCross Ref
Ye Nan, Kian Ming Chai, Wee Sun Lee, and Hai Leong Chieu. 2012. Optimizing F-measure: A Tale of Two Approaches. Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Vol. 1 (06 2012).Google Scholar
Ziad Obermeyer and Ezekiel J Emanuel. 2016. Predicting the future-big data, machine learning, and clinical medicine. The New England journal of medicine, Vol. 375, 13 (2016), 1216.Google Scholar
Soumi Ray, Dustin S McEvoy, Skye Aaron, Thu-Trang Hickman, and Adam Wright. 2018. Using statistical anomaly detection models to find clinical decision support malfunctions. Journal of the American Medical Informatics Association (2018).Google Scholar
Shebuti Rayana. 2016. ODDS Library. http://odds.cs.stonybrook.edu. Available at http://odds.cs.stonybrook.edu.Google Scholar
Gordon D Schiff, Lynn A Volk, Mayya Volodarskaya, Deborah H Williams, Lake Walsh, Sara G Myers, David W Bates, and Ronen Rozenblum. 2017. Screening for medication errors using an outlier detection system. Journal of the American Medical Informatics Association, Vol. 24, 2 (2017), 281--287.Google ScholarCross Ref

Index Terms

How to Accurately and Privately Identify Anomalies
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
2. Security and privacy
  1. Security services
    1. Privacy-preserving protocols

Recommendations

Differentially private data publishing via optimal univariate microaggregation and record perturbation
Abstract
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Read More
Preserving Genomic Privacy via Selective Sharing
WPES'20: Proceedings of the 19th Workshop on Privacy in the Electronic Society

Although genomic data has significant impact and widespread usage in medical research, it puts individuals' privacy in danger, even if they anonymously or partially share their genomic data. To address this problem, we present a framework that is ...
Read More
Pufferfish Privacy Mechanisms for Correlated Data
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Many modern databases include personal and sensitive correlated data, such as private information on users connected together in a social network, and measurements of physical activity of single subjects across time. However, differential privacy, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
November 2019
2755 pages
ISBN:9781450367479
DOI:10.1145/3319535
General Chairs:
Lorenzo Cavallaro
King's College London, UK
,
Johannes Kinder
Bundeswehr University Munich, Germany
,
Program Chairs:
XiaoFeng Wang
Indiana University, USA
,
Jonathan Katz
George Mason University, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anomaly identification
differential privacy
outlier detection
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '19 Paper Acceptance Rate149of934submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 1,690
  Total Downloads
- Downloads (Last 12 months)193
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How to Accurately and Privately Identify Anomalies

CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Differentially private data publishing via optimal univariate microaggregation and record perturbation

Preserving Genomic Privacy via Selective Sharing

Pufferfish Privacy Mechanisms for Correlated Data