Skip to main content
Log in

Privacy-preserving indexing of documents on the network

The VLDB Journal Aims and scope Submit manuscript

Abstract

With the ubiquitous collection of data and creation of large distributed repositories, enabling search over this data while respecting access control is critical. A related problem is that of ensuring privacy of the content owners while still maintaining an efficient index of distributed content. We address the problem of providing privacy-preserving search over distributed access-controlled content. Indexed documents can be easily reconstructed from conventional (inverted) indexes used in search. Currently, the need to avoid breaches of access-control through the index requires the index hosting site to be fully secured and trusted by all participating content providers. This level of trust is impractical in the increasingly common case where multiple competing organizations or individuals wish to selectively share content. We propose a solution that eliminates the need of such a trusted authority. The solution builds a centralized privacy-preserving index in conjunction with a distributed access-control enforcing search protocol. Two alternative methods to build the centralized index are proposed, allowing trade offs of efficiency and security. The new index provides strong and quantifiable privacy guarantees that hold even if the entire index is made public. Experiments on a real-life dataset validate performance of the scheme. The appeal of our solution is twofold: (a) content providers maintain complete control in defining access groups and ensuring its compliance, and (b) system implementors retain tunable knobs to balance privacy and efficiency concerns for their particular domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  1. Adam N.R., Wortman J.C.: Security-control methods for statistical databases. ACM Comput. Surv. 21(4), 515–556 (1989)

    Article  Google Scholar 

  2. Bawa, M., Bayardo, R.J. Jr., Rajagopalan, S., Shekita, E.J.: Make it fresh, make it quick—searching a networks of personal webservers. In: Proceedings of the Conference on World Wide Web (WWW) (2003)

  3. Bayardo, R.J. Jr., Agrawal, R., Gruhl, D., Somani, A.: YouServ: A web-hosting and content sharing tool for the masses. In: Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA, pp. 345–354 (2002)

  4. Bellovin, S., Cheswick, W.: Privacy-enhanced searches using encrypted bloom filters. (2004)

  5. Benaloh, J.C.: Secret sharing homomorphisms: keeping shares of a secret secret. In: Proceedings of the Advances in Cryptography (CRYPTO) (1986)

  6. Bloom B.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  7. Blum, M., Goldwasser, S.: An efficient probabilistic public-key encryption that hides all partial information. In: Proceedings of Advances in Cryptology (CRYPTO) (1984)

  8. Brin, S., Page, L.: Anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Conference on World Wide Web (WWW) (1998)

  9. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In: Proceedings of the Conference on Foundations of Computer Science (FOCS) (1995)

  10. Choudhury, A., Maxemchuk, N., Paul, S., Schulzrinne, H.: Copyright protection for electronic publishing over computer networks. AT&T Bell Laboratories Technical Report (1994)

  11. Cramer, R., Damgård, I., Nielsen, J.B.: Multiparty computation from threshold homomorphic encryption. Advances in Cryptology—EUROCRYPT 2001. Lecture Notes in Computer Science, vol. 2045, pp. 280–300. Springer, London (2001)

  12. Damgård, I.B., Jurik, M.J.: Efficient protocols based on probabilistic encryption using composite degree residue classes. Technical Report RS-00-5, BRICS (2000)

  13. Dierks, T., Allen, C.: The tls protocol. RFC 2246, Standards Track, Network Working Group (1999)

  14. Dingledine, R., Freedman, M.J., Molnar, D.: The free haven project: Distributed anonymous storage service. In: Proceedings of the Workshop on Design Issues in Anonymity and Unobservability (2000)

  15. Dyer J., Lindemann M., Perez R., Sailer R., Smith S.W., van Doorn L., Weingart S.: Building the ibm 4758 secure coprocessor. IEEE Comput. 34, 57–66 (2001)

    Google Scholar 

  16. Fouque, P.-A., Poupard, G., Stern, J.: Sharing decryption in the context of voting or lotteries. In: Proceedings of the International Conference on Financial Cryptography (2001)

  17. Freedman, M.J., Morris, R.: Tarzan: A peer-to-peer anonymizing network layer. In: Proceedings of the Conference on Computer and Communications Security (2002)

  18. The freenet project (http://freenetproject.org)

  19. Frier, A., Karlton, P., Kocher, P.: The ssl 3.0 protocol. Netscape Corp., 1996

  20. Gertner, Y., Ishai, Y., Kushilevitz, E., Malkin, T.: Protecting data privacy in private information retrieval schemes. In: Proceedings of the Symposium on Theory of Computation (STOC) (1998)

  21. The gnutella network (http://gnutella.com)

  22. Goh, E.-J.: Secure indexes. Cryptology ePrint Archive, Report 2003/216 (2003)

  23. Goldreich, O.: The Foundations of Cryptography, vol. 2. General Cryptographic Protocols, chap. Cambridge University Press, London (2004)

  24. Goldwasser, S.: Multi-party computations: past and present. In: Proceedings of the Symposium on Principles of Distributed Computing (1997)

  25. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof-systems. In: Proceedings of the Symposium on Theory of Computing (STOC) (1985)

  26. Gravano L., Garcia-Molina H., Tomasic A.: Gloss: Text source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)

    Article  Google Scholar 

  27. Hacigumus, H., Iyer, B., Li, C., Mehrotra, S.: Executing sql over encrypted data in the database-service-provider model. In: Proceedings of the Conference on Management of Data (SIGMOD) (2002)

  28. The kazaa media network (http://www.kazaa.com)

  29. Landwehr C.: Formal models of computer security. ACM Comput. Surv. 13(3), 247–278 (1981)

    Article  Google Scholar 

  30. Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the Conference on Computer and Communications Security (CCS) (1998)

  31. Napster file-sharing (http://www.napster.com)

  32. Okamoto, T., Uchiyama, S.: A new public-key cryptosystem as secure as factoring. In: Proceedings of Advances in Cryptology (Eurocrypt) (1998)

  33. Reiter M.K., Rubin A.D.: Crowds: Anonymity for Web transactions. ACM Trans. Inform. Syst. Secur. 1(1), 66–92 (1998)

    Article  Google Scholar 

  34. Schadow, G., Grannis, S.J., McDonald, C.J.: Privacy-preserving distributed queries for a clinical case research network. In: Proceedings of the IEEE International Conference on Data Mining; Workshop on Privacy, Security, and Data Mining (2002)

  35. Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In: Proceedings of the (IEEE) Symposium on Security and Privacy (2000)

  36. Stern, J.P.: A new and efficient all or nothing disclosure of secrets protocol. In: Kazuo, O., Dingyi, P. (eds.) Advances in Cryptology–ASIACRYPT’98, number 1514 in Lecture Notes in Computer Science (1998)

  37. Syverson, P., Goldschlag, D.M., Reed, M.G.: Anonymous connections and onion routing. In: Proceedings of the IEEE Symposium on Security and Privacy (1997)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaideep Vaidya.

Additional information

Dr. Vaidya’s work was supported by the National Science Foundation under grant CNS-0746943 and by a research resources grant from Rutgers Business School, Newark and New Brunswick.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bawa, M., Bayardo, R.J., Agrawal, R. et al. Privacy-preserving indexing of documents on the network. The VLDB Journal 18, 837–856 (2009). https://doi.org/10.1007/s00778-008-0129-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0129-7

Keywords

Navigation