ABSTRACT
We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets.
The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs.
In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.
- J. Abello, P. M. Pardalos, and M. G. C. Resende, editors. Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002.]] Google ScholarDigital Library
- P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via core sets. Combinatorial and Computational Geometry - MSRI Publications, pages 1--30, 2005.]]Google Scholar
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comp. Syst. Sci., 58(1):137--147, 1999.]] Google ScholarDigital Library
- L. Babai and P. G. Kimmel. Randomized simultaneous messages: Solution of a problem of Yao in communication complexity. In 12th CCC, pages 239--246, 1997.]] Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, pages 1--16, 2002.]] Google ScholarDigital Library
- M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In 34th STOC, pages 250--257, 2002.]] Google ScholarDigital Library
- Z. Bar-Yossef. The Complexity of Massive Data Set Computations. PhD thesis, UC Berkeley, 2002.]] Google ScholarDigital Library
- Z. Bar-Yossef, T. S. Jayram, R. Krauthgamer, and R. Kumar. Approximating edit distance efficiently. In 45th FOCS, pages 550--559, 2004.]] Google ScholarDigital Library
- M. Bellare, O. Goldreich, and S. Goldwasser. Incremental cryptography: The case of hashing and signing. In CRYPTO ’94, pages 216--233, 1994.]] Google ScholarDigital Library
- M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In EUROCRYPT ’97, pages 163--192, 1997.]] Google ScholarDigital Library
- M. Blum, W. S. Evans, P. Gemmell, S. Kannan, and M. Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225--244, 1994.]]Google ScholarDigital Library
- D. Boneh and M. K. Franklin. An efficient public key traitor tracing scheme. In CRYPTO ’99, pages 338--353, 1999.]] Google ScholarDigital Library
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comp. Syst. Sci., 60(3):630--659, 2000.]] Google ScholarDigital Library
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997.]] Google ScholarDigital Library
- E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Infor. Theory, 52(12):5406--5425, 2006.]] Google ScholarDigital Library
- M. Charikar. Similarity estimation techniques from rounding algorithms. In 34th STOC, pages 380--388, 2002.]] Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. In SIROCCO, pages 280--294, 2006.]] Google ScholarDigital Library
- D. L. Donoho. Compressed sensing. IEEE Trans. on Infor. Theory, 52(4):1289--1306, 2006.]] Google ScholarDigital Library
- T. Feder, E. Kushilevitz, M. Naor, and N. Nisan. Amortized communication complexity. SIAM J. Comput., 24(4):736--750, 1995.]] Google ScholarDigital Library
- J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. J. Strauss, and R. N. Wright. Secure multiparty computation of approximations. ACM Trans. on Alg., 2(3):435--472, 2006.]] Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate $L^1$-difference algorithm for massive data streams. SIAM J. Comput., 32(1):131--151, 2002.]] Google ScholarDigital Library
- R. Gennaro, S. Halevi, and T. Rabin. Secure hash-and-sign signatures without the random oracle. In EUROCRYPT ’99, pages 123--139, 1999.]] Google ScholarDigital Library
- P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In 10th SODA, pages 909--910, 1999.]] Google ScholarDigital Library
- A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed sensing. In 39th STOC, pages 237--246, 2007.]] Google ScholarDigital Library
- O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. of the ACM, 45(4):653--750, 1998.]] Google ScholarDigital Library
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External memory algorithms, pages 107--118. American Mathematical Society, 1999.]] Google ScholarDigital Library
- P. Indyk. Explicit constructions for compressed sensing of sparse signals. In 19th SODA, pages 30--33, 2008.]] Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In 30th STOC, pages 604--613, 1998.]] Google ScholarDigital Library
- P. Indyk and D. P. Woodruff. Polylogarithmic private approximations and efficient matching. In 3rd TCC, pages 245--264, 2006.]] Google ScholarDigital Library
- I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]] Google ScholarDigital Library
- E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457--474, 2000.]] Google ScholarDigital Library
- T. Moran, M. Naor, and G. Segev. Deterministic history-independent strategies for storing information on write-once memories. In 34th ICALP, pages 303--315, 2007.]] Google ScholarDigital Library
- J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput., 22(4):838--856, 1993.]] Google ScholarDigital Library
- I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In 28th STOC, pages 561--570, 1996.]] Google ScholarDigital Library
- N. Nisan and A. Ta-Shma. Extracting randomness: A survey and new constructions. J. Comp. Syst. Sci., 58(1):148--173, 1999.]] Google ScholarDigital Library
- R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25(2):252--271, 1996.]] Google ScholarDigital Library
- M. Sipser. Expanders, randomness, or time versus space. J. Comp. Syst. Sci., 36(3):379--383, 1988.]] Google ScholarDigital Library
- A. Ta-Shma, C. Umans, and D. Zuckerman. Lossless condensers, unbalanced expanders, and extractors. Combinatorica, 27(2):213--240, 2007.]] Google ScholarDigital Library
- H. S. Witsenhausen and A. D. Wyner. Interframe coder for video signals. U.S. patent 4,191,970, 1980.]]Google Scholar
- A. C. Yao. Some complexity questions related to distributive computing. In 11th STOC, pages 209--213, 1979.]] Google ScholarDigital Library
Index Terms
- Sketching in adversarial environments
Recommendations
Sketching in Adversarial Environments
We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of ...
Adversarial Level Agreements for Two-Party Protocols
ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications SecurityAdversaries in cryptography have traditionally been modeled as either semi-honest or malicious. Over the years, however, several works have investigated the design of cryptographic protocols against rational adversaries. The most well-known example are ...
Revisiting Fairness in MPC: Polynomial Number of Parties and General Adversarial Structures
Theory of CryptographyAbstractWe investigate fairness in secure multiparty computation when the number of parties grows polynomially in the security parameter, . Prior to this work, efficient protocols achieving fairness with no honest majority and polynomial number ...
Comments