skip to main content
10.1145/2996758.2996766acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Differentially Private Online Active Learning with Applications to Anomaly Detection

Authors Info & Claims
Published:28 October 2016Publication History

ABSTRACT

In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.

References

  1. Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence. Dataset Shift in Machine Learning. MIT Press, Cambridge, MA, 2009. ISBN 0262170051, 9780262170055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41 (3): 15:1--15:58, 2009. URL http://doi.acm.org/10.1145/1541880.1541882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, 2006. URL http://dx.doi.org/10.1007/11681878_14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), pages 464--473, 2014. URL http://dx.doi.org/10.1109/FOCS.2014.56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pages 429--438, 2013. URL http://dx.doi.org/10.1109/FOCS.2013.53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 Global Conference on Signal and Information Processing (GlobalSIP 2013), pages 245--248, 2013. URL http://dx.doi.org/10.1109/GlobalSIP.2013.6736861.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Learning from data with heterogeneous noise using sgd. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pages 894--902, 2015. URL http://jmlr.org/proceedings/papers/v38/song15.html.Google ScholarGoogle Scholar
  8. Stanley L. Warner. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60 (309): 63--69, 1965. URL http://dx.doi.org/10.1080/01621459.1965.10480775.Google ScholarGoogle ScholarCross RefCross Ref
  9. Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 94--103, October 2007. URL http://dx.doi.org/10.1109/FOCS.2007.41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2: 45--66, November 2001. URL http://www.jmlr.org/papers/v2/tong01a.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Victoria J. Hodge and Jim Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22 (2): 85--126, 2004. URL http://dx.doi.org/10.1007/s10462-004--4304-y. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Saharon Rosset and Aron Inger. KDD-cup 99: knowledge discovery in a charitable organization's donor database. ACM SIGKDD Explorations Newsletter, 1 (2): 85--90, 2000. 10.1145/846183.846204. URL http://doi.acm.org/10.1145/846183.846204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hua Tang and Zhuolin Cao. Machine learning-based intrusion detection algorithms. Journal of Computational Information Systems, 5 (6): 1825--1831, 2009.Google ScholarGoogle Scholar
  14. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 928--936, 2003. URL http://www.aaai.org/Papers/ICML/2003/ICML03-120.pdf.Google ScholarGoogle Scholar
  15. Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4 (2): 107--194, 2011. URL http://dx.doi.org/10.1561/2200000018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Brendan McMahan. A survey of algorithms and analysis for adaptive online learning. arXiv preprint arXiv:1403.3465, 2014. URL http://arxiv.org/abs/1403.3465.Google ScholarGoogle Scholar
  17. Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648. 2010. URL http://burrsettles.com/pub/settles.activelearning.pdf.Google ScholarGoogle Scholar
  18. Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65--72, 2006. URL http://doi.acm.org/10.1145/1143844.1143853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yifan Fu, Xingquan Zhu, and Bin Li. A survey on instance selection for active learning. Knowledge and Information Systems, 35 (2): 249--283, 2013. URL http://dx.doi.org/10.1007/s10115-012-0507--8.Google ScholarGoogle ScholarCross RefCross Ref
  20. Steve Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39 (1): 333--361, 2011. URL http://dx.doi.org/10.1214/10-AOS843.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sivan Sabato and Tom Hess. Interactive algorithms: from pool to stream. In Proceedings of the 2016 Conference On Learning Theory (COLT 2016), pages 1419--1439, 2016. URL http://www.jmlr.org/proceedings/papers/v49/sabato16.Google ScholarGoogle Scholar
  22. Rui M. Castro and Robert D. Nowak. Minimax bounds for active learning. IEEE Transactions on Information Theory, 54 (5): 2339--2353, 2008. URL http://dx.doi.org/10.1109/TIT.2008.920189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael Horstein. Sequential transmission using noiseless feedback. IEEE Transactions on Information Theory, 9 (3): 136--143, 1963. URL http://dx.doi.org/10.1109/TIT.1963.1057832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9 (3--4): 211--407, 2014. URL http://dx.doi.org/10.1561/0400000042. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Privacy aware learning. Journal of the ACM (JACM), 61 (6): 38:1--38:57, 2014. URL http://doi.acm.org/10.1145/2666468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Dwork, Naor, Pitassi, and Rothblum}dwork2010continuousCynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 715--724, 2010. URL http://doi.acm.org/10.1145/1806689.1806787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dwork, Naor, Pitassi, Rothblum, and Yekhanin}dwork2010panCynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In Proceedings of The First Symposium on Innovations in Computer Science (ICS 2010), 2010. URL http://www.wisdom.weizmann.ac.il/mathusers/naor/PAPERS/pan_private.pdf.Google ScholarGoogle Scholar
  28. Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Proceedings of the 25th Annual Conference on Learning Theory (COLT 2012), volume 23 of JMLR Workshop and Conference Proceedings, pages 24.1--24.34, 2012. URL http://www.jmlr.org/proceedings/papers/v23/jain12/jain12.Google ScholarGoogle Scholar
  29. Maria-Florina Balcan and Vitaly Feldman. Statistical active learning algorithms. In Advances in Neural Information Processing Systems, pages 1295--1303, 2013. URL http://papers.nips.cc/paper/5101-statistical-active-learning-algorithms.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105 (489): 375--389, 2010. URL http://dx.doi.org/10.1198/jasa.2009.tm08651.Google ScholarGoogle ScholarCross RefCross Ref
  31. Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In Proceedings of The 32nd International Conference on Machine Learning, pages 1376--1385, 2015. URL http://www.jmlr.org/proceedings/papers/v37/kairouz15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12: 1069--1109, 2011. URL http://www.jmlr.org/papers/v12/chaudhuri11a. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Communications of the ACM, 53 (9): 89--97, 2010. URL http://doi.acm.org/10.1145/1810891.1810916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11): 2278--2324, 1998. URL http://dx.doi.org/10.1109/5.726791.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Differentially Private Online Active Learning with Applications to Anomaly Detection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
          October 2016
          144 pages
          ISBN:9781450345736
          DOI:10.1145/2996758

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          AISec '16 Paper Acceptance Rate12of38submissions,32%Overall Acceptance Rate94of231submissions,41%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader