research-article

Public Access

Differentially Private Online Active Learning with Applications to Anomaly Detection

Authors:
Mohsen Ghassemi

Rutgers, The State University of New Jersey, Piscataway, NJ, USA

Rutgers, The State University of New Jersey, Piscataway, NJ, USA
View Profile

,
Anand D. Sarwate

Rutgers, The State University of New Jersey, Piscataway, NJ, USA

Rutgers, The State University of New Jersey, Piscataway, NJ, USA
View Profile

,
Rebecca N. Wright

Rutgers, The State University of New Jersey, Piscataway, NJ, USA

Rutgers, The State University of New Jersey, Piscataway, NJ, USA
View Profile

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and SecurityOctober 2016Pages 117–128https://doi.org/10.1145/2996758.2996766

Published:28 October 2016Publication History

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

Pages 117–128

ABSTRACT

In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.

References

Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence. Dataset Shift in Machine Learning. MIT Press, Cambridge, MA, 2009. ISBN 0262170051, 9780262170055. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41 (3): 15:1--15:58, 2009. URL http://doi.acm.org/10.1145/1541880.1541882. Google ScholarDigital Library
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, 2006. URL http://dx.doi.org/10.1007/11681878_14. Google ScholarDigital Library
Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), pages 464--473, 2014. URL http://dx.doi.org/10.1109/FOCS.2014.56. Google ScholarDigital Library
John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pages 429--438, 2013. URL http://dx.doi.org/10.1109/FOCS.2013.53. Google ScholarDigital Library
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 Global Conference on Signal and Information Processing (GlobalSIP 2013), pages 245--248, 2013. URL http://dx.doi.org/10.1109/GlobalSIP.2013.6736861.Google ScholarCross Ref
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Learning from data with heterogeneous noise using sgd. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pages 894--902, 2015. URL http://jmlr.org/proceedings/papers/v38/song15.html.Google Scholar
Stanley L. Warner. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60 (309): 63--69, 1965. URL http://dx.doi.org/10.1080/01621459.1965.10480775.Google ScholarCross Ref
Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 94--103, October 2007. URL http://dx.doi.org/10.1109/FOCS.2007.41. Google ScholarDigital Library
Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2: 45--66, November 2001. URL http://www.jmlr.org/papers/v2/tong01a.html. Google ScholarDigital Library
Victoria J. Hodge and Jim Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22 (2): 85--126, 2004. URL http://dx.doi.org/10.1007/s10462-004--4304-y. Google ScholarDigital Library
Saharon Rosset and Aron Inger. KDD-cup 99: knowledge discovery in a charitable organization's donor database. ACM SIGKDD Explorations Newsletter, 1 (2): 85--90, 2000. 10.1145/846183.846204. URL http://doi.acm.org/10.1145/846183.846204. Google ScholarDigital Library
Hua Tang and Zhuolin Cao. Machine learning-based intrusion detection algorithms. Journal of Computational Information Systems, 5 (6): 1825--1831, 2009.Google Scholar
Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 928--936, 2003. URL http://www.aaai.org/Papers/ICML/2003/ICML03-120.pdf.Google Scholar
Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4 (2): 107--194, 2011. URL http://dx.doi.org/10.1561/2200000018. Google ScholarDigital Library
H. Brendan McMahan. A survey of algorithms and analysis for adaptive online learning. arXiv preprint arXiv:1403.3465, 2014. URL http://arxiv.org/abs/1403.3465.Google Scholar
Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648. 2010. URL http://burrsettles.com/pub/settles.activelearning.pdf.Google Scholar
Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65--72, 2006. URL http://doi.acm.org/10.1145/1143844.1143853. Google ScholarDigital Library
Yifan Fu, Xingquan Zhu, and Bin Li. A survey on instance selection for active learning. Knowledge and Information Systems, 35 (2): 249--283, 2013. URL http://dx.doi.org/10.1007/s10115-012-0507--8.Google ScholarCross Ref
Steve Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39 (1): 333--361, 2011. URL http://dx.doi.org/10.1214/10-AOS843.Google ScholarCross Ref
Sivan Sabato and Tom Hess. Interactive algorithms: from pool to stream. In Proceedings of the 2016 Conference On Learning Theory (COLT 2016), pages 1419--1439, 2016. URL http://www.jmlr.org/proceedings/papers/v49/sabato16.Google Scholar
Rui M. Castro and Robert D. Nowak. Minimax bounds for active learning. IEEE Transactions on Information Theory, 54 (5): 2339--2353, 2008. URL http://dx.doi.org/10.1109/TIT.2008.920189. Google ScholarDigital Library
Michael Horstein. Sequential transmission using noiseless feedback. IEEE Transactions on Information Theory, 9 (3): 136--143, 1963. URL http://dx.doi.org/10.1109/TIT.1963.1057832. Google ScholarDigital Library
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9 (3--4): 211--407, 2014. URL http://dx.doi.org/10.1561/0400000042. Google ScholarDigital Library
John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Privacy aware learning. Journal of the ACM (JACM), 61 (6): 38:1--38:57, 2014. URL http://doi.acm.org/10.1145/2666468. Google ScholarDigital Library
Dwork, Naor, Pitassi, and Rothblum}dwork2010continuousCynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 715--724, 2010. URL http://doi.acm.org/10.1145/1806689.1806787. Google ScholarDigital Library
Dwork, Naor, Pitassi, Rothblum, and Yekhanin}dwork2010panCynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In Proceedings of The First Symposium on Innovations in Computer Science (ICS 2010), 2010. URL http://www.wisdom.weizmann.ac.il/mathusers/naor/PAPERS/pan_private.pdf.Google Scholar
Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Proceedings of the 25th Annual Conference on Learning Theory (COLT 2012), volume 23 of JMLR Workshop and Conference Proceedings, pages 24.1--24.34, 2012. URL http://www.jmlr.org/proceedings/papers/v23/jain12/jain12.Google Scholar
Maria-Florina Balcan and Vitaly Feldman. Statistical active learning algorithms. In Advances in Neural Information Processing Systems, pages 1295--1303, 2013. URL http://papers.nips.cc/paper/5101-statistical-active-learning-algorithms.pdf. Google ScholarDigital Library
Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105 (489): 375--389, 2010. URL http://dx.doi.org/10.1198/jasa.2009.tm08651.Google ScholarCross Ref
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In Proceedings of The 32nd International Conference on Machine Learning, pages 1376--1385, 2015. URL http://www.jmlr.org/proceedings/papers/v37/kairouz15.Google ScholarDigital Library
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12: 1069--1109, 2011. URL http://www.jmlr.org/papers/v12/chaudhuri11a. Google ScholarDigital Library
Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Communications of the ACM, 53 (9): 89--97, 2010. URL http://doi.acm.org/10.1145/1810891.1810916. Google ScholarDigital Library
Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11): 2278--2324, 1998. URL http://dx.doi.org/10.1109/5.726791.Google ScholarCross Ref

Index Terms

Differentially Private Online Active Learning with Applications to Anomaly Detection
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings
      2. Online learning settings
2. Security and privacy
  1. Formal methods and theory of security
    1. Formal security models

Recommendations

Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)
AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting ...
Read More
A novel anomaly detection approach based on ensemble semi-supervised active learning (ADESSA)
Highlights
- ADESSA detects attacks in CPS when traffic is rare labeled, unbalanced and unknown attacks exist.
- ADESSA builds a balanced training set including high-information and low-information samples with limited budget.
- Adding low-...
Abstract
As an industrial infrastructure, the safety and reliability of the Cyber-Physical System requires the effective anomaly detection. However, the existing detection methods have bottleneck in the face of insufficient training datasets. This work ...
Read More
Practical differentially private online advertising
Abstract
Powered by machine learning technology, online advertising achieves accurate advertisement delivery to potential customers according to online user profiles. However, it raises serious privacy concerns since the learning process may ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
October 2016
144 pages
ISBN:9781450345736
DOI:10.1145/2996758
Program Chairs:
David Mandell Freeman
LinkedIn Corporation, USA
,
Aikaterini Mitrokotsa
Chalmers University of Technology, Sweden
,
Arunesh Sinha
University of Michigan, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
anomaly detection
differential privacy
online learning
stochastic gradient descent
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '16 Paper Acceptance Rate12of38submissions,32%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 731
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Differentially Private Online Active Learning with Applications to Anomaly Detection

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)

A novel anomaly detection approach based on ensemble semi-supervised active learning (ADESSA)

Practical differentially private online advertising

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Differentially Private Online Active Learning with Applications to Anomaly Detection

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)

A novel anomaly detection approach based on ensemble semi-supervised active learning (ADESSA)

Practical differentially private online advertising

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media