research-article

Sketching in adversarial environments

Authors:
Ilya Mironov

Microsoft Research, Silicon Valley Campus, Mountain View, CA, USA

Microsoft Research, Silicon Valley Campus, Mountain View, CA, USA
View Profile

,
Moni Naor

Weizmann Institute of Science, Rehovot, Israel

Weizmann Institute of Science, Rehovot, Israel
View Profile

,
Gil Segev

Weizmann Institute of Science, Rehovot, Israel

Weizmann Institute of Science, Rehovot, Israel
View Profile

STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computingMay 2008Pages 651–660https://doi.org/10.1145/1374376.1374471

Published:17 May 2008Publication History

STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing

Pages 651–660

ABSTRACT

We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets.

The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs.

In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.

References

J. Abello, P. M. Pardalos, and M. G. C. Resende, editors. Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002.]] Google ScholarDigital Library
P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via core sets. Combinatorial and Computational Geometry - MSRI Publications, pages 1--30, 2005.]]Google Scholar
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comp. Syst. Sci., 58(1):137--147, 1999.]] Google ScholarDigital Library
L. Babai and P. G. Kimmel. Randomized simultaneous messages: Solution of a problem of Yao in communication complexity. In 12th CCC, pages 239--246, 1997.]] Google ScholarDigital Library
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, pages 1--16, 2002.]] Google ScholarDigital Library
M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In 34th STOC, pages 250--257, 2002.]] Google ScholarDigital Library
Z. Bar-Yossef. The Complexity of Massive Data Set Computations. PhD thesis, UC Berkeley, 2002.]] Google ScholarDigital Library
Z. Bar-Yossef, T. S. Jayram, R. Krauthgamer, and R. Kumar. Approximating edit distance efficiently. In 45th FOCS, pages 550--559, 2004.]] Google ScholarDigital Library
M. Bellare, O. Goldreich, and S. Goldwasser. Incremental cryptography: The case of hashing and signing. In CRYPTO ’94, pages 216--233, 1994.]] Google ScholarDigital Library
M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In EUROCRYPT ’97, pages 163--192, 1997.]] Google ScholarDigital Library
M. Blum, W. S. Evans, P. Gemmell, S. Kannan, and M. Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225--244, 1994.]]Google ScholarDigital Library
D. Boneh and M. K. Franklin. An efficient public key traitor tracing scheme. In CRYPTO ’99, pages 338--353, 1999.]] Google ScholarDigital Library
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comp. Syst. Sci., 60(3):630--659, 2000.]] Google ScholarDigital Library
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997.]] Google ScholarDigital Library
E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Infor. Theory, 52(12):5406--5425, 2006.]] Google ScholarDigital Library
M. Charikar. Similarity estimation techniques from rounding algorithms. In 34th STOC, pages 380--388, 2002.]] Google ScholarDigital Library
G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. In SIROCCO, pages 280--294, 2006.]] Google ScholarDigital Library
D. L. Donoho. Compressed sensing. IEEE Trans. on Infor. Theory, 52(4):1289--1306, 2006.]] Google ScholarDigital Library
T. Feder, E. Kushilevitz, M. Naor, and N. Nisan. Amortized communication complexity. SIAM J. Comput., 24(4):736--750, 1995.]] Google ScholarDigital Library
J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. J. Strauss, and R. N. Wright. Secure multiparty computation of approximations. ACM Trans. on Alg., 2(3):435--472, 2006.]] Google ScholarDigital Library
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate $L^1$-difference algorithm for massive data streams. SIAM J. Comput., 32(1):131--151, 2002.]] Google ScholarDigital Library
R. Gennaro, S. Halevi, and T. Rabin. Secure hash-and-sign signatures without the random oracle. In EUROCRYPT ’99, pages 123--139, 1999.]] Google ScholarDigital Library
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In 10th SODA, pages 909--910, 1999.]] Google ScholarDigital Library
A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed sensing. In 39th STOC, pages 237--246, 2007.]] Google ScholarDigital Library
O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. of the ACM, 45(4):653--750, 1998.]] Google ScholarDigital Library
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External memory algorithms, pages 107--118. American Mathematical Society, 1999.]] Google ScholarDigital Library
P. Indyk. Explicit constructions for compressed sensing of sparse signals. In 19th SODA, pages 30--33, 2008.]] Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In 30th STOC, pages 604--613, 1998.]] Google ScholarDigital Library
P. Indyk and D. P. Woodruff. Polylogarithmic private approximations and efficient matching. In 3rd TCC, pages 245--264, 2006.]] Google ScholarDigital Library
I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]] Google ScholarDigital Library
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457--474, 2000.]] Google ScholarDigital Library
T. Moran, M. Naor, and G. Segev. Deterministic history-independent strategies for storing information on write-once memories. In 34th ICALP, pages 303--315, 2007.]] Google ScholarDigital Library
J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput., 22(4):838--856, 1993.]] Google ScholarDigital Library
I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In 28th STOC, pages 561--570, 1996.]] Google ScholarDigital Library
N. Nisan and A. Ta-Shma. Extracting randomness: A survey and new constructions. J. Comp. Syst. Sci., 58(1):148--173, 1999.]] Google ScholarDigital Library
R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25(2):252--271, 1996.]] Google ScholarDigital Library
M. Sipser. Expanders, randomness, or time versus space. J. Comp. Syst. Sci., 36(3):379--383, 1988.]] Google ScholarDigital Library
A. Ta-Shma, C. Umans, and D. Zuckerman. Lossless condensers, unbalanced expanders, and extractors. Combinatorica, 27(2):213--240, 2007.]] Google ScholarDigital Library
H. S. Witsenhausen and A. D. Wyner. Interframe coder for video signals. U.S. patent 4,191,970, 1980.]]Google Scholar
A. C. Yao. Some complexity questions related to distributive computing. In 11th STOC, pages 209--213, 1979.]] Google ScholarDigital Library

Index Terms

Sketching in adversarial environments
1. Theory of computation

Recommendations

Sketching in Adversarial Environments

We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of ...
Read More
Adversarial Level Agreements for Two-Party Protocols
ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security

Adversaries in cryptography have traditionally been modeled as either semi-honest or malicious. Over the years, however, several works have investigated the design of cryptographic protocols against rational adversaries. The most well-known example are ...
Read More
Revisiting Fairness in MPC: Polynomial Number of Parties and General Adversarial Structures
Theory of Cryptography
Abstract
We investigate fairness in secure multiparty computation when the number of parties $n = poly (λ)$ grows polynomially in the security parameter, $λ$ . Prior to this work, efficient protocols achieving fairness with no honest majority and polynomial number ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing
May 2008
712 pages
ISBN:9781605580470
DOI:10.1145/1374376
General Chair:
Richard Ladner
University of Washington
,
Program Chair:
Cynthia Dwork
Microsoft Research, Silicon Valley
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data stream model.
massive data sets
sketch model
Qualifiers
- research-article
Conference

Acceptance Rates
STOC '08 Paper Acceptance Rate80of325submissions,25%Overall Acceptance Rate1,469of4,586submissions,32%
More
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 266
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sketching in adversarial environments

STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sketching in Adversarial Environments

Adversarial Level Agreements for Two-Party Protocols

Revisiting Fairness in MPC: Polynomial Number of Parties and General Adversarial Structures