The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

Wicker, Jörg; Kramer, Stefan

doi:10.1007/s10618-017-0524-z

The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

Published: 28 June 2017

Volume 31, pages 1419–1443, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

2444 Accesses
3 Citations
5 Altmetric
Explore all metrics

Abstract

User privacy on the internet is an important and unsolved problem. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data are collected and assembled by numerous service providers. Solutions so far focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy. In this work, we suggest a new approach, empowering the user to take advantage of the same tool the other side has, namely data mining to produce data which obfuscates the user’s profile. We apply this approach to search engine queries and use feedback of the search engines in terms of personalized advertisements in an algorithm similar to reinforcement learning to generate new queries potentially confusing the search engine. We evaluated the approach using a real-world data set. While evaluation is hard, we achieve results that indicate that it is possible to influence the user’s profile that the search engine generates. This shows that it is feasible to defend a user’s privacy from a new and more practical perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Privacy protection of user profiles in online search via semantic randomization

Article 27 July 2021

Mercedes Rodriguez-Garcia, Montserrat Batet, … Alexandre Viejo

Privacy Threat Modeling in Personalized Search Systems

Achieve Web Search Privacy by Obfuscation

Notes

While AOL retracted the data, several pages still provide access to the data and keep analyzing it, e.g., see http://www.aolstalker.com/.
This resembles the expected value for the distance between the user interest category \(\kappa _i\) and the assignment to an interest category by the search engine, with the difference that the categories do not exclude each other and thus the probabilities do not sum up to one.
In the terminology of Ceci et al., we are thus using a so-called proper training set, not a hierarchical training set. Another notable difference from standard hierarchical text categorization is that our training set consists of queries, not of full documents.
The implementation is available upon request.
Detailed results and statistics on the results are given in the supplementary material.
This could only be the case when the same action with regard to the user interest category would be chosen, which is not the case. This action almost never gets chosen, as it simply never is evaluated by high scores (details not shown here).

References

Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, New York, pp 439–450
Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694
Article Google Scholar
Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148
Article MathSciNet Google Scholar
Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security. ACM, New York, pp 16–25
Beato F, Conti M, Preneel B (2013) Friend in the middle (fim): tackling de-anonymization in social networks. In: IEEE international conference on pervasive computing and communications workshops (PERCOM Workshops), pp 279–284
Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1807–1814
Bilenko M, Richardson M (2011) Predictive client-side profiles for personalized advertising. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 413–421
Ceci M, Malerba D (2007) Classifying web documents in a hierarchy of categories: a comprehensive study. J Intell Inf Syst 28(1):37–78
Article Google Scholar
Eckersley P (2010) Privacy enhancing technologies: proceedings 10th international symposium, pets 2010, Berlin, Germany, July 21–23. In: Atallah MJ, Hopper NJ (eds) Privacy enhancing technologies, chapter How Unique Is Your Web Browser? Springer, Berlin, pp 1–18
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Gervais A, Shokri R, Singla A, Capkun S, Lenders V (2014) Quantifying web-search privacy. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14. ACM, New York, pp 966–977
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11(1):10–18
Article Google Scholar
Howe DC, Nissenbaum H (2009) Trackmenot: resisting surveillance in web search. In: Kerr I, Steeves V, Lucock C (eds) Lessons from the identity trail: anonymity, privacy, and identity in a networked society, vol 23. Oxford University, Oxford, pp 417–436
Google Scholar
Huang L, Joseph AD, Nelson B, Rubinstein BI, Tygar J (2011) Adversarial machine learning. In: Proceedings of the 4th ACM workshop on security and artificial intelligence. ACM, New York, pp 43–58
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Third IEEE international conference on data mining, pp 99–106
Klivans AR, Long PM, Servedio RA (2009) Learning halfspaces with malicious noise. J Mach Learn Res 10:2715–2740
MathSciNet MATH Google Scholar
Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, New York, pp 641–647
Nikiforakis N, Joosen W, Livshits B (2015) Privaricator: Deceiving fingerprinters with little white lies. In: Proceedings of the 24th international conference on world wide web. International world wide web conferences steering committee, pp 820–830
Nikiforakis N, Kapravelos A, Joosen W, Kruegel C, Piessens F, Vigna G (2013) Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: IEEE symposium on security and privacy (SP), pp 541–555
Pedreschi D, Bonchi F, Turini F, Verykios VS, Atzori M, Malin B, Moelans B, Saygin Y (2008) Privacy protection: regulations and technologies, opportunities and threats. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy: geographic knowledge discovery. Springer, Berlin, pp 101–119
Chapter Google Scholar
Purcell K, Brenner J, Rainie L (2012) Search engine use 2012. Technical report, Pew Internet and American Life Project Washington
Rebollo-Monedero D, Forné J, Domingo-Ferrer J (2012) Query profile obfuscation by means of optimal query exchange between users. IEEE Trans Dependable Secure Comput 9(5):641–654
Google Scholar
Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30
Article Google Scholar
Skarkala ME, Maragoudakis M, Gritzalis S, Mitrou L, Toivonen H, Moen P (2012) Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM ’12. IEEE Computer Society, Washington, DC, pp 423–428
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
Google Scholar
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Record 33(1):50–57
Article Google Scholar
Viejo A, Sánchez D (2014) Profiling social networks to provide useful and privacy-preserving web search. J Assoc Inf Sci Technol 65(12):2444–2458
Article Google Scholar
Wiering M, Van Otterlo M (2012) Reinforcement learning. In: Adaptation, learning, and optimization, vol 12. Springer Berlin Heidelberg
Xu L, Jiang C, Wang J, Yuan J, Ren Y (2014) Information security in big data: privacy and data mining. IEEE Access 2:1149–1176
Article Google Scholar

Download references

Acknowledgements

The authors thank Nicolas Krauter for the help on the initial implementation.

Author information

Authors and Affiliations

Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, 55128, Mainz, Germany
Jörg Wicker & Stefan Kramer

Authors

Jörg Wicker
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jörg Wicker.

Additional information

Responsible editors: Kurt Driessens, Dragi Kocev, Marko Robnik Šikonja, Myra Spiliopoulou

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3827 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wicker, J., Kramer, S. The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile. Data Min Knowl Disc 31, 1419–1443 (2017). https://doi.org/10.1007/s10618-017-0524-z

Download citation

Received: 04 December 2016
Accepted: 19 June 2017
Published: 28 June 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10618-017-0524-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

Abstract

Access this article

Similar content being viewed by others

Privacy protection of user profiles in online search via semantic randomization

Privacy Threat Modeling in Personalized Search Systems

Achieve Web Search Privacy by Obfuscation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 3827 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

Abstract

Access this article

Similar content being viewed by others

Privacy protection of user profiles in online search via semantic randomization

Privacy Threat Modeling in Personalized Search Systems

Achieve Web Search Privacy by Obfuscation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 3827 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation