skip to main content
10.1145/3319535.3363209acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

How to Accurately and Privately Identify Anomalies

Published:06 November 2019Publication History

ABSTRACT

Identifying anomalies in data is central to the advancement of science, national security, and finance. However, privacy concerns restrict our ability to analyze data. Can we lift these restrictions and accurately identify anomalies without hurting the privacy of those who contribute their data? We address this question for the most practically relevant case, where a record is considered anomalous relative to other records. We make four contributions. First, we introduce the notion of sensitive privacy, which conceptualizes what it means to privately identify anomalies. Sensitive privacy generalizes the important concept of differential privacy and is amenable to analysis. Importantly, sensitive privacy admits algorithmic constructions that provide strong and practically meaningful privacy and utility guarantees. Second, we show that differential privacy is inherently incapable of accurately and privately identifying anomalies; in this sense, our generalization is necessary. Third, we provide a general compiler that takes as input a differentially private mechanism (which has bad utility for anomaly identification) and transforms it into a sensitively private one. This compiler, which is mostly of theoretical importance, is shown to output a mechanism whose utility greatly improves over the utility of the input mechanism. As our fourth contribution we propose mechanisms for a popular definition of anomaly ((β,r)-anomaly) that (i) are guaranteed to be sensitively private, (ii) come with provable utility guarantees, and (iii) are empirically shown to have an overwhelmingly accurate performance over a range of datasets and evaluation criteria.

Skip Supplemental Material Section

Supplemental Material

p719-asif.webm

webm

73.7 MB

References

  1. Charu C Aggarwal. 2015. Outlier analysis. In Data mining. Springer, 237--263.Google ScholarGoogle Scholar
  2. Miguel E Andrés, Nicolás E Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 901--914.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Vic Barnett and Toby Lewis. 2000. Outliers in statistical data .Wiley.Google ScholarGoogle Scholar
  4. Daniel M Bittner, Anand D Sarwate, and Rebecca N Wright. 2018. Using Noisy Binary Search for Differentially Private Anomaly Detection. In International Symposium on Cyber Security Cryptography and Machine Learning. Springer, 20--37.Google ScholarGoogle Scholar
  5. Martin Bobrow. 2013. Balancing privacy with public benefit. Nature News, Vol. 500, 7461 (2013), 123.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jonas Böhler, Daniel Bernau, and Florian Kerschbaum. 2017. Privacy-preserving outlier detection for data streams. In IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 225--238.Google ScholarGoogle ScholarCross RefCross Ref
  7. Centers for Medicare & Medicaid Services. 1996. The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Online at http://www.cms.hhs.gov/hipaa/.Google ScholarGoogle Scholar
  8. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR), Vol. 41, 3 (2009), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ronald Cramer, I. B. Damgård, and Jesper Buus Nielsen. 2015. Secure multiparty computation: an information-theoretic approach .Cambridge University Press.Google ScholarGoogle Scholar
  10. Andrea Dal Pozzolo, Olivier Caelen, Reid A Johnson, and Gianluca Bontempi. 2015. Calibrating probability with undersampling for unbalanced classification. In Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 159--166.Google ScholarGoogle ScholarCross RefCross Ref
  11. Alison M Darcy, Alan K Louie, and Laura Weiss Roberts. 2016. Machine learning and the profession of medicine. Jama, Vol. 315, 6 (2016), 551--552.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yihe Dong, Samuel B Hopkins, and Jerry Li. 2019. Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection. arXiv preprint arXiv:1906.11366 (2019).Google ScholarGoogle Scholar
  13. Stelios Doudalis, Ios Kotsogiannis, Samuel Haney, Ashwin Machanavajjhala, and Sharad Mehrotra. 2017. One-sided differential privacy. arXiv preprint arXiv:1712.05888 (2017).Google ScholarGoogle Scholar
  14. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  15. Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--12.Google ScholarGoogle Scholar
  16. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. Springer, 265--284.Google ScholarGoogle Scholar
  18. Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.Google ScholarGoogle Scholar
  19. Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. 2015. Robust traceability from trace amounts. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 650--669.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yaniv Erlich and Arvind Narayanan. 2014. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, Vol. 15, 6 (2014), 409.Google ScholarGoogle ScholarCross RefCross Ref
  21. David Freedman, Robert Pisani, and Roger Purves. 1998. Statistics .W.W. Norton.Google ScholarGoogle Scholar
  22. Machine Learning Group. 2018. Credit Card Fraud Detection. https://www.kaggle.com/mlg-ulb/creditcardfraud/home .Google ScholarGoogle Scholar
  23. Melissa Gymrek, Amy L McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science, Vol. 339, 6117 (2013), 321--324.Google ScholarGoogle Scholar
  24. Xi He, Ashwin Machanavajjhala, and Bolin Ding. 2014. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD. ACM, 1447--1458.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS genetics, Vol. 4, 8 (2008), e1000167.Google ScholarGoogle Scholar
  26. Marcello Ienca, Pim Haselager, and Ezekiel J Emanuel. 2018. Brain leaks and consumer neurotechnology. Nature biotechnology, Vol. 36, 9 (2018), 805--810.Google ScholarGoogle Scholar
  27. Ian Jolliffe. 2011. Principal component analysis. In International encyclopedia of statistical science. Springer, 1094--1096.Google ScholarGoogle Scholar
  28. Zach Jorgensen, Ting Yu, and Graham Cormode. 2015. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31st International Conference on Data Engineering (ICDE). IEEE, 1023--1034.Google ScholarGoogle ScholarCross RefCross Ref
  29. Seppo Karrila, Julian Hock Ean Lee, and Greg Tucker-Kellogg. 2011. A comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery. Cancer informatics, Vol. 10 (2011), CIN--S6868.Google ScholarGoogle Scholar
  30. Michael Kearns, Aaron Roth, Zhiwei Steven Wu, and Grigory Yaroslavtsev. 2016. Private algorithms for the protected in social network search. Proceedings of the National Academy of Sciences, Vol. 113, 4 (2016), 913--918.Google ScholarGoogle ScholarCross RefCross Ref
  31. Daniel Kifer and Bing-Rong Lin. 2012. An axiomatic view of statistical privacy and utility. Journal of Privacy and Confidentiality, Vol. 4, 1 (2012), 5--49.Google ScholarGoogle ScholarCross RefCross Ref
  32. Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS), Vol. 39, 1 (2014), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Edwin M Knorr and Raymond T Ng. 1997. A Unified Notion of Outliers: Properties and Computation.. In KDD, Vol. 97. 219--222.Google ScholarGoogle Scholar
  34. Edwin M Knorr and Raymond T Ng. 1998. Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 1998 VLDB. Citeseer, 392--403.Google ScholarGoogle Scholar
  35. Edward Lui and Rafael Pass. 2015. Outlier privacy. In TCC. Springer, 277--305.Google ScholarGoogle Scholar
  36. D Luquetti, P Claes, DK Liberton, K Daniels, KM Rosana, EE Quillen, LN Pearson, B McEvoy, M Bauchet, AA Zaidi, et al. 2014. Modeling 3D Facial Shape from DNA. PLoS Genetics, Vol. 10, 3 (2014), e1004224.Google ScholarGoogle ScholarCross RefCross Ref
  37. Ye Nan, Kian Ming Chai, Wee Sun Lee, and Hai Leong Chieu. 2012. Optimizing F-measure: A Tale of Two Approaches. Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Vol. 1 (06 2012).Google ScholarGoogle Scholar
  38. Ziad Obermeyer and Ezekiel J Emanuel. 2016. Predicting the future-big data, machine learning, and clinical medicine. The New England journal of medicine, Vol. 375, 13 (2016), 1216.Google ScholarGoogle Scholar
  39. Soumi Ray, Dustin S McEvoy, Skye Aaron, Thu-Trang Hickman, and Adam Wright. 2018. Using statistical anomaly detection models to find clinical decision support malfunctions. Journal of the American Medical Informatics Association (2018).Google ScholarGoogle Scholar
  40. Shebuti Rayana. 2016. ODDS Library. http://odds.cs.stonybrook.edu. Available at http://odds.cs.stonybrook.edu.Google ScholarGoogle Scholar
  41. Gordon D Schiff, Lynn A Volk, Mayya Volodarskaya, Deborah H Williams, Lake Walsh, Sara G Myers, David W Bates, and Ronen Rozenblum. 2017. Screening for medication errors using an outlier detection system. Journal of the American Medical Informatics Association, Vol. 24, 2 (2017), 281--287.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. How to Accurately and Privately Identify Anomalies

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
        November 2019
        2755 pages
        ISBN:9781450367479
        DOI:10.1145/3319535

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 November 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '19 Paper Acceptance Rate149of934submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader