Skip to main content

Measuring Actual Privacy of Obfuscated Queries in Information Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2025)

Abstract

Privacy is a fundamental right that could be threatened by Information Retrieval (IR) models when applied and trained on sensitive data and personal user information. Although mechanisms have been proposed to protect user privacy, the effectiveness of the privacy protections is typically assessed by studying the relations between performance and parameters of the mechanisms, such as the privacy budget in Differential Privacy (DP). This often causes a disconnection between formal privacy and the privacy experienced by the user, the actual privacy. In this paper, we present the Query Inference for Privacy and Utility (QuIPU) framework, a novel evaluation paradigm to assess actual privacy based on the risk that an “honest-but-curious” IR system can infer the original query from the obfuscated queries received. QuIPU represents the first attempt at measuring actual privacy for IR tasks beyond the comparison of formal privacy parameters. Our analysis shows that formal privacy parameters do not imply actual privacy, causing scenarios where, for the same privacy parameter values, two systems provide different utility, but also different actual privacy. Therefore, there is a necessity for a proper way of assessing the risk, represented by QuIPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Remark on the notation: with \(\mathcal {T}\left( \mathcal {Q}_{\text {obf}}\right) ,\mathcal {T}\left( \mathcal {Q}_{\text {logs}}\right) \) we indicate the sets of text embeddings, and with \(\mathcal {T}(q'_i),\mathcal {T}(q_i)\) the singular vector embedding of the queries.

  2. 2.

    https://github.com/Kekkodf/QuIPU_Framework.

  3. 3.

    https://ir-datasets.com/aol-ia.html.

  4. 4.

    The DP configurations with \(\varepsilon >1\) deviate from the “theoretically safe” privacy setup, i.e., strong assurance about the formal privacy introduced, see DP definition [21].

References

  1. Ahmad, W.U., Chang, K., Wang, H.: Intent-aware query obfuscation for privacy protection in personalized web search. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, pp. 285–294. ACM (2018). https://doi.org/10.1145/3209978.3209983

  2. Anderson, C.: The long tail. Mann, Ivanov & Ferber, Effective Business Model on the Internet-Moscow (2012)

    Google Scholar 

  3. Arampatzis, A., Drosatos, G., Efraimidis, P.: A versatile tool for privacy-enhanced web search. In: Serdyukov, P., et al. (eds.) Advances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24-27, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7814, pp. 368–379. Springer (2013). https://doi.org/10.1007/978-3-642-36973-5_31

  4. Bavadekar, S., et al.: Google COVID-19 search trends symptoms dataset: Anonymization process description (version 1.0). CoRR abs/2009.01265 (2020). https://arxiv.org/abs/2009.01265

  5. Blanco-Justicia, A., Sánchez, D., Domingo-Ferrer, J., Muralidhar, K.: A critical review on the use (and misuse) of differential privacy in machine learning. ACM Comput. Surv. 55(8), 160:1–160:16 (2023). https://doi.org/10.1145/3547139

  6. Bo, H., Ding, S.H.H., Fung, B.C.M., Iqbal, F.: ER-AE: differentially private text generation for authorship anonymization. In: Toutanova, K., et al. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pp. 3997–4007. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.NAACL-MAIN.314, https://doi.org/10.18653/v1/2021.naacl-main.314

  7. Carvalho, R.S., Vasiloudis, T., Feyisetan, O., Wang, K.: TEM: high utility metric differential privacy on text. In: Shekhar, S., Zhou, Z., Chiang, Y., Stiglic, G. (eds.) Proceedings of the 2023 SIAM International Conference on Data Mining, SDM 2023, Minneapolis-St. Paul Twin Cities, MN, USA, April 27-29, 2023, pp. 883–890. SIAM (2023). https://doi.org/10.1137/1.9781611977653.CH99

  8. Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: Cristofaro, E.D., Wright, M.K. (eds.) Privacy Enhancing Technologies - 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7981, pp. 82–102. Springer (2013). https://doi.org/10.1007/978-3-642-39077-7_5

  9. Chau, M., Fang, X., Sheng, O.R.L.: Analysis of the query logs of a web site search engine. J. Assoc. Inf. Sci. Technol. 56(13), 1363–1376 (2005). https://doi.org/10.1002/ASI.20210

  10. Chen, S., et al.: A customized text sanitization mechanism with differential privacy. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 5747–5758. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.findings-acl.355, https://aclanthology.org/2023.findings-acl.355

  11. Clauß, S., Schiffner, S.: Structuring anonymity metrics. In: Juels, A., Winslett, M., Goto, A. (eds.) Proceedings of the 2006 Workshop on Digital Identity Management, Alexandria, VA, USA, November 3, 2006, pp. 55–62. ACM (2006). https://doi.org/10.1145/1179529.1179539

  12. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: Chan, C.Y., Lu, J., Nørvåg, K., Tanin, E. (eds.) Workshops Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pp. 88–93. IEEE Computer Society (2013). https://doi.org/10.1109/ICDEW.2013.6547433

  13. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662

  14. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. CoRR abs/2003.07820 (2020). https://arxiv.org/abs/2003.07820

  15. Damie, M., Hahn, F., Peter, A.: A highly accurate query-recovery attack against searchable encryption using non-indexed documents. In: Bailey, M.D., Greenstadt, R. (eds.) 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pp. 143–160. USENIX Association (2021). https://www.usenix.org/conference/usenixsecurity21/presentation/damie

  16. De Faveri, F.L., Faggioli, G., Ferro, N.: py-PANTERA: a Python PAckage for Natural language obfuscaTion Enforcing pRivacy & Anonymization. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), October 21-25, 2024, Boise, ID, USA. p. 6. Springer (2024). https://doi.org/10.1145/3627673.3679173, https://doi.org/10.1145/3627673.3679173

  17. De Faveri, F.L., Faggioli, G., Ferro, N.: Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy. CoRR abs/2405.09306 (2024). https://doi.org/10.48550/ARXIV.2405.09306

  18. Domingo-Ferrer, J., Sánchez, D., Blanco-Justicia, A.: The limits of differential privacy (and its misuse in data release and machine learning). Commun. ACM 64(7), 33–35 (2021). https://doi.org/10.1145/3433638

  19. Duncan, G., Keller-McNulty, S., Stokes, L.: Disclosure risk vs. data utility: the RU confidentiality map. A Los Alamos National Laboratory Technical Report LA-UR-01-6428, 1–30 (2001)

    Google Scholar 

  20. Dwork, C., Kohli, N., Mulligan, D.K.: Differential privacy in practice: expose your epsilons! J. Priv. Confidentiality 9(2) (2019). https://doi.org/10.29012/JPC.689

  21. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) Theory of Cryptography, pp. 265–284. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  22. Faggioli, G., Ferro, N.: Query obfuscation for information retrieval through differential privacy. In: Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., Ounis, I. (eds.) Advances in Information Retrieval - 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part I. Lecture Notes in Computer Science, vol. 14608, pp. 278–294. Springer (2024). https://doi.org/10.1007/978-3-031-56027-9_17

  23. Faveri, F.L.D., Faggioli, G., Ferro, N.: Beyond the parameters: Measuring actual privacy in obfuscated texts. In: Roitero, K., Viviani, M., Maddalena, E., Mizzaro, S. (eds.) Proceedings of the 14th Italian Information Retrieval Workshop, Udine, Italy, September 5-6, 2024. CEUR Workshop Proceedings, vol. 3802, pp. 53–57. CEUR-WS.org (2024). https://ceur-ws.org/Vol-3802/paper5.pdf

  24. Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 178–186. ACM (2020). https://doi.org/10.1145/3336191.3371856

  25. Feyisetan, O., Kasiviswanathan, S.: Private release of text embedding vectors. In: Pruksachatkun, Y., et al. (eds.) Proceedings of the First Workshop on Trustworthy Natural Language Processing, pp. 15–27. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.trustnlp-1.3, https://aclanthology.org/2021.trustnlp-1.3

  26. Fröbe, M., Schmidt, E.O., Hagen, M.: Efficient query obfuscation with keyqueries. In: He, J., et al. (eds.) WI-IAT ’21: IEEE/WIC/ACM International Conference on Web Intelligence, Melbourne VIC Australia, December 14–17, 2021, pp. 154–161. ACM (2021). https://doi.org/10.1145/3486622.3493950

  27. Habernal, I.: When differential privacy meets NLP: the devil is in the detail. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 1522–1528. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.EMNLP-MAIN.114

  28. Hsu, J., et al.: Differential privacy: an economic method for choosing epsilon. In: IEEE 27th Computer Security Foundations Symposium, CSF 2014, Vienna, Austria, 19-22 July, 2014, pp. 398–410. IEEE Computer Society (2014). https://doi.org/10.1109/CSF.2014.35, https://doi.org/10.1109/CSF.2014.35

  29. Izacard, G., et al.: Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res. 2022 (2022). https://openreview.net/forum?id=jKN1pXi7b0

  30. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000). https://doi.org/10.1016/S0306-4573(99)00056-4

  31. Kang, Y., Liu, Y., Niu, B., Tong, X., Zhang, L., Wang, W.: Input perturbation: a new paradigm between central and local differential privacy. CoRR abs/2002.08570 (2020). https://arxiv.org/abs/2002.08570

  32. Klymenko, O., Meisenbacher, S., Matthes, F.: Differential privacy in natural language processing the story so far. In: Feyisetan, O., Ghanavati, S., Thaine, P., Habernal, I., Mireshghallah, F. (eds.) Proceedings of the Fourth Workshop on Privacy in Natural Language Processing, pp. 1–11. Association for Computational Linguistics, Seattle, United States (2022). https://doi.org/10.18653/v1/2022.privatenlp-1.1, https://aclanthology.org/2022.privatenlp-1.1

  33. Kohli, N., Laskowski, P.: Epsilon voting: mechanism design for parameter selection in differential privacy. In: 2018 IEEE Symposium on Privacy-Aware Computing, PAC 2018, Washington, DC, USA, September 26-28, 2018, pp. 19–30. IEEE (2018). https://doi.org/10.1109/PAC.2018.00009

  34. Lee, J., Clifton, C.: How much is enough? Choosing \(\epsilon \) for differential privacy. In: Lai, X., Zhou, J., Li, H. (eds.) Information Security, 14th International Conference, ISC 2011, Xi’an, China, October 26-29, 2011. Proceedings. Lecture Notes in Computer Science, vol. 7001, pp. 325–340. Springer (2011). https://doi.org/10.1007/978-3-642-24861-0_22

  35. Mattern, J., Weggenmann, B., Kerschbaum, F.: The limits of word level differential privacy. In: Carpuat, M., de Marneffe, M., Ruíz, I.V.M. (eds.) Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp. 867–881. Association for Computational Linguistics (2022). https://doi.org/10.18653/V1/2022.FINDINGS-NAACL.65, https://doi.org/10.18653/v1/2022.findings-naacl.65

  36. Meisenbacher, S.J., Nandakumar, N., Klymenko, A., Matthes, F.: A comparative analysis of word-level metric differential privacy: benchmarking the privacy-utility trade-off. In: Calzolari, N., Kan, M., Hoste, V., Lenci, A., Sakti, S., Xue, N. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy, pp. 174–185. ELRA and ICCL (2024). https://aclanthology.org/2024.lrec-main.16

  37. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748

  38. Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2:1–2:27 (2008). https://doi.org/10.1145/1416950.1416952

  39. Moore, T., Clayton, R.: Evil searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) Financial Cryptography and Data Security, 13th International Conference, FC 2009, Accra Beach, Barbados, February 23-26, 2009. Revised Selected Papers. Lecture Notes in Computer Science, vol. 5628, pp. 256–272. Springer (2009). https://doi.org/10.1007/978-3-642-03549-4_16

  40. National Institute of Standards and Technology: Information security. Tech. Rep. National Institute of Standards and Technology Special Publication 800-60, Volume 1 Revision 1, August, 2008, U.S. Department of Commerce, Washington, D.C. (2008). https://doi.org/10.6028/NIST.SP.800-60v1r1

  41. Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: a human generated machine reading comprehension dataset. In: Besold, T.R., Bordes, A., d’Avila Garcez, A.S., Wayne, G. (eds.) Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016. CEUR Workshop Proceedings, vol. 1773. CEUR-WS.org (2016). https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf

  42. Rao, R.S., Pais, A.R.: Jail-Phish: an improved search engine based phishing detection system. Comput. Secur. 83, 246–267 (2019). https://doi.org/10.1016/J.COSE.2019.02.011

  43. Rényi, A.: On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics, vol. 4, pp. 547–562. University of California Press (1961)

    Google Scholar 

  44. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). http://arxiv.org/abs/1910.01108

  45. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 3–18. IEEE Computer Society (2017). https://doi.org/10.1109/SP.2017.41

  46. Silvestri, F.: Mining query logs: turning search usage data into knowledge. Found. Trends Inf. Retr. 4(1-2), 1–174 (2010). https://doi.org/10.1561/1500000013

  47. Sousa, S., Kern, R.: How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing. Artif. Intell. Rev. 56(2), 1427–1492 (2023). https://doi.org/10.1007/S10462-022-10204-6

  48. Truex, S., Liu, L., Gursoy, M.E., Wei, W., Yu, L.: Effects of differential privacy and data skewness on membership inference vulnerability. In: First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019, Los Angeles, CA, USA, December 12-14, 2019, pp. 82–91. IEEE (2019). https://doi.org/10.1109/TPS-ISA48467.2019.00019

  49. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)

    Google Scholar 

  50. Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004. NIST Special Publication, vol. 500-261. National Institute of Standards and Technology (NIST) (2004). http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf

  51. Wagner, I., Eckhoff, D.: Technical privacy metrics: a systematic survey. ACM Comput. Surv. 51(3), 57:1–57:38 (2018). https://doi.org/10.1145/3168389, https://doi.org/10.1145/3168389

  52. Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: A differentially private text perturbation method using regularized mahalanobis metric. In: Proceedings of the Second Workshop on Privacy in NLP. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.privatenlp-1.2

  53. Xu, Z., Aggarwal, A., Feyisetan, O., Teissier, N.: On a utilitarian approach to privacy preserving text generation. CoRR abs/2104.11838 (2021). https://doi.org/10.48550/ARXIV.2104.11838

  54. Yue, X., Du, M., Wang, T., Li, Y., Sun, H., Chow, S.S.M.: Differential privacy for text analytics via natural text sanitization. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3853–3866. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.findings-acl.337, https://aclanthology.org/2021.findings-acl.337

  55. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with BERT. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020). https://openreview.net/forum?id=SkeHuCVFDr

  56. Zhao, Y., Chen, J.: A survey on differential privacy for unstructured data content. ACM Comput. Surv. 54(10s), 207:1–207:28 (2022). https://doi.org/10.1145/3490237

  57. Zimmerman, S., Thorpe, A., Fox, C., Kruschwitz, U.: Investigating the interplay between searchers’ privacy concerns and their search behavior. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019, pp. 953–956. ACM (2019). https://doi.org/10.1145/3331184.3331280

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Luigi De Faveri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

De Faveri, F.L., Faggioli, G., Ferro, N. (2025). Measuring Actual Privacy of Obfuscated Queries in Information Retrieval. In: Hauff, C., et al. Advances in Information Retrieval. ECIR 2025. Lecture Notes in Computer Science, vol 15572. Springer, Cham. https://doi.org/10.1007/978-3-031-88708-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-88708-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-88707-9

  • Online ISBN: 978-3-031-88708-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics