Skip to main content

Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Abstract

Search log k-anonymization is based on the elimination of infrequent queries under exact (or nearly exact) matching conditions, which usually results in a big data loss and impaired utility. We present a more flexible, semantic approach to k-anonymity that consists of three steps: query concept mining, automatic query expansion, and affinity assessment of expanded queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we first model query concepts as probabilistically weighted n-grams and extract them from the search log data. Then, after expanding the original log queries with their weighted concepts, we find all the k-affine expanded queries under a given affinity threshold Θ, modeled as a generalized k-core of the graph of Θ-affine queries. Experimenting with the AOL data set, we show that this approach achieves levels of privacy comparable to those of plain k-anonymity while at the same time reducing the data losses to a great extent.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adar, E.: User 4xxxxx9: Anonymizing query logs. In: WWW Workshop on Query Log Analysis (2007)

    Google Scholar 

  2. Barbaro, M., Zeller, T.: A face is exposed for aol searcher no. 4417749. New York Times (2006)

    Google Scholar 

  3. Batagelj, V., Zaversnik, M.: Generalized Cores. CoRR cs.DS/0202039 (2002)

    Google Scholar 

  4. Batagelj, V., Zaversnik, M.: An O(m) Algorithm for Cores Decomposition of Networks. CoRR cs.DS/0310049 (2003)

    Google Scholar 

  5. Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR, pp. 491–498 (2008)

    Google Scholar 

  6. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM CSUR 44(1), 1–50 (2012)

    Article  Google Scholar 

  7. Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  8. Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing Search Logs: A Comparative Study of Privacy Guarantees. TKDE 24(3), 520–532 (2012)

    Google Scholar 

  9. Feild, H., Allan, J., Glatt, J.: CrowdLogging: distributed, private, and anonymous search logging. In: SIGIR, pp. 375–384 (2011)

    Google Scholar 

  10. He, Y., Naughton, J.F.: Anonymization of SetValued Data via TopDown, Local Generalization. In: VLDB, pp. 934–945 (2009)

    Google Scholar 

  11. Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM, pp. 1465–1468 (2009)

    Google Scholar 

  12. Hu, Y., Qian, Y., Li, H., Pei, J., Zheng, Q.: Mining Query Subtopics from Search Log Data. In: SIGIR, pp. 305–314 (2012)

    Google Scholar 

  13. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and click privately. In: WWW, pp. 171–180 (2009)

    Google Scholar 

  14. Kumar, R., Novak, J., Pang, B., Tomkins, A.: On anonymizing query logs via token-based hashing. In: WWW (2007)

    Google Scholar 

  15. Kumaran, G., Allan, J.: A Case for Shorter Queries, and Helping Users Create Them. In: NAACL-HLT, pp. 220–227 (2007)

    Google Scholar 

  16. Seidman, S.: Network structure and minimum degree. Social Networks 3(5), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  17. Su, K.-Y., Hsu, Y.-L., Sailard, C.: Constructing a Phrase Structure Grammar by Incorporating Linguistic Knowledge and Statistical Log-Likelihood Ratio. In: ROCLING IV, pp. 257–275 (1991)

    Google Scholar 

  18. Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carpineto, C., Romano, G. (2013). Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics