skip to main content
10.1145/1374376.1374471acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Sketching in adversarial environments

Authors Info & Claims
Published:17 May 2008Publication History

ABSTRACT

We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets.

The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs.

In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.

References

  1. J. Abello, P. M. Pardalos, and M. G. C. Resende, editors. Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via core sets. Combinatorial and Computational Geometry - MSRI Publications, pages 1--30, 2005.]]Google ScholarGoogle Scholar
  3. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comp. Syst. Sci., 58(1):137--147, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Babai and P. G. Kimmel. Randomized simultaneous messages: Solution of a problem of Yao in communication complexity. In 12th CCC, pages 239--246, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, pages 1--16, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In 34th STOC, pages 250--257, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Bar-Yossef. The Complexity of Massive Data Set Computations. PhD thesis, UC Berkeley, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Bar-Yossef, T. S. Jayram, R. Krauthgamer, and R. Kumar. Approximating edit distance efficiently. In 45th FOCS, pages 550--559, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Bellare, O. Goldreich, and S. Goldwasser. Incremental cryptography: The case of hashing and signing. In CRYPTO ’94, pages 216--233, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In EUROCRYPT ’97, pages 163--192, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Blum, W. S. Evans, P. Gemmell, S. Kannan, and M. Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225--244, 1994.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Boneh and M. K. Franklin. An efficient public key traitor tracing scheme. In CRYPTO ’99, pages 338--353, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comp. Syst. Sci., 60(3):630--659, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Infor. Theory, 52(12):5406--5425, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Charikar. Similarity estimation techniques from rounding algorithms. In 34th STOC, pages 380--388, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. In SIROCCO, pages 280--294, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. L. Donoho. Compressed sensing. IEEE Trans. on Infor. Theory, 52(4):1289--1306, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Feder, E. Kushilevitz, M. Naor, and N. Nisan. Amortized communication complexity. SIAM J. Comput., 24(4):736--750, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. J. Strauss, and R. N. Wright. Secure multiparty computation of approximations. ACM Trans. on Alg., 2(3):435--472, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate $L^1$-difference algorithm for massive data streams. SIAM J. Comput., 32(1):131--151, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Gennaro, S. Halevi, and T. Rabin. Secure hash-and-sign signatures without the random oracle. In EUROCRYPT ’99, pages 123--139, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In 10th SODA, pages 909--910, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed sensing. In 39th STOC, pages 237--246, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. of the ACM, 45(4):653--750, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External memory algorithms, pages 107--118. American Mathematical Society, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Indyk. Explicit constructions for compressed sensing of sparse signals. In 19th SODA, pages 30--33, 2008.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In 30th STOC, pages 604--613, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Indyk and D. P. Woodruff. Polylogarithmic private approximations and efficient matching. In 3rd TCC, pages 245--264, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457--474, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Moran, M. Naor, and G. Segev. Deterministic history-independent strategies for storing information on write-once memories. In 34th ICALP, pages 303--315, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput., 22(4):838--856, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In 28th STOC, pages 561--570, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Nisan and A. Ta-Shma. Extracting randomness: A survey and new constructions. J. Comp. Syst. Sci., 58(1):148--173, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25(2):252--271, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Sipser. Expanders, randomness, or time versus space. J. Comp. Syst. Sci., 36(3):379--383, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Ta-Shma, C. Umans, and D. Zuckerman. Lossless condensers, unbalanced expanders, and extractors. Combinatorica, 27(2):213--240, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. S. Witsenhausen and A. D. Wyner. Interframe coder for video signals. U.S. patent 4,191,970, 1980.]]Google ScholarGoogle Scholar
  40. A. C. Yao. Some complexity questions related to distributive computing. In 11th STOC, pages 209--213, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sketching in adversarial environments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing
          May 2008
          712 pages
          ISBN:9781605580470
          DOI:10.1145/1374376

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 May 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          STOC '08 Paper Acceptance Rate80of325submissions,25%Overall Acceptance Rate1,469of4,586submissions,32%

          Upcoming Conference

          STOC '24
          56th Annual ACM Symposium on Theory of Computing (STOC 2024)
          June 24 - 28, 2024
          Vancouver , BC , Canada

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader