Abstract
Search log k-anonymization is based on the elimination of infrequent queries under exact (or nearly exact) matching conditions, which usually results in a big data loss and impaired utility. We present a more flexible, semantic approach to k-anonymity that consists of three steps: query concept mining, automatic query expansion, and affinity assessment of expanded queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we first model query concepts as probabilistically weighted n-grams and extract them from the search log data. Then, after expanding the original log queries with their weighted concepts, we find all the k-affine expanded queries under a given affinity threshold Θ, modeled as a generalized k-core of the graph of Θ-affine queries. Experimenting with the AOL data set, we show that this approach achieves levels of privacy comparable to those of plain k-anonymity while at the same time reducing the data losses to a great extent.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adar, E.: User 4xxxxx9: Anonymizing query logs. In: WWW Workshop on Query Log Analysis (2007)
Barbaro, M., Zeller, T.: A face is exposed for aol searcher no. 4417749. New York Times (2006)
Batagelj, V., Zaversnik, M.: Generalized Cores. CoRR cs.DS/0202039 (2002)
Batagelj, V., Zaversnik, M.: An O(m) Algorithm for Cores Decomposition of Networks. CoRR cs.DS/0310049 (2003)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR, pp. 491–498 (2008)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM CSUR 44(1), 1–50 (2012)
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing Search Logs: A Comparative Study of Privacy Guarantees. TKDE 24(3), 520–532 (2012)
Feild, H., Allan, J., Glatt, J.: CrowdLogging: distributed, private, and anonymous search logging. In: SIGIR, pp. 375–384 (2011)
He, Y., Naughton, J.F.: Anonymization of SetValued Data via TopDown, Local Generalization. In: VLDB, pp. 934–945 (2009)
Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM, pp. 1465–1468 (2009)
Hu, Y., Qian, Y., Li, H., Pei, J., Zheng, Q.: Mining Query Subtopics from Search Log Data. In: SIGIR, pp. 305–314 (2012)
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and click privately. In: WWW, pp. 171–180 (2009)
Kumar, R., Novak, J., Pang, B., Tomkins, A.: On anonymizing query logs via token-based hashing. In: WWW (2007)
Kumaran, G., Allan, J.: A Case for Shorter Queries, and Helping Users Create Them. In: NAACL-HLT, pp. 220–227 (2007)
Seidman, S.: Network structure and minimum degree. Social Networks 3(5), 269–287 (1983)
Su, K.-Y., Hsu, Y.-L., Sailard, C.: Constructing a Phrase Structure Grammar by Incorporating Linguistic Knowledge and Statistical Log-Likelihood Ratio. In: ROCLING IV, pp. 257–275 (1991)
Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carpineto, C., Romano, G. (2013). Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)