[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1374376.1374471acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Sketching in adversarial environments

Published: 17 May 2008 Publication History

Abstract

We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets.
The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs.
In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.

References

[1]
J. Abello, P. M. Pardalos, and M. G. C. Resende, editors. Handbook of Massive Data Sets. Kluwer Academic Publishers, 2002.]]
[2]
P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximation via core sets. Combinatorial and Computational Geometry - MSRI Publications, pages 1--30, 2005.]]
[3]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comp. Syst. Sci., 58(1):137--147, 1999.]]
[4]
L. Babai and P. G. Kimmel. Randomized simultaneous messages: Solution of a problem of Yao in communication complexity. In 12th CCC, pages 239--246, 1997.]]
[5]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, pages 1--16, 2002.]]
[6]
M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In 34th STOC, pages 250--257, 2002.]]
[7]
Z. Bar-Yossef. The Complexity of Massive Data Set Computations. PhD thesis, UC Berkeley, 2002.]]
[8]
Z. Bar-Yossef, T. S. Jayram, R. Krauthgamer, and R. Kumar. Approximating edit distance efficiently. In 45th FOCS, pages 550--559, 2004.]]
[9]
M. Bellare, O. Goldreich, and S. Goldwasser. Incremental cryptography: The case of hashing and signing. In CRYPTO ’94, pages 216--233, 1994.]]
[10]
M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In EUROCRYPT ’97, pages 163--192, 1997.]]
[11]
M. Blum, W. S. Evans, P. Gemmell, S. Kannan, and M. Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225--244, 1994.]]
[12]
D. Boneh and M. K. Franklin. An efficient public key traitor tracing scheme. In CRYPTO ’99, pages 338--353, 1999.]]
[13]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comp. Syst. Sci., 60(3):630--659, 2000.]]
[14]
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997.]]
[15]
E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Infor. Theory, 52(12):5406--5425, 2006.]]
[16]
M. Charikar. Similarity estimation techniques from rounding algorithms. In 34th STOC, pages 380--388, 2002.]]
[17]
G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. In SIROCCO, pages 280--294, 2006.]]
[18]
D. L. Donoho. Compressed sensing. IEEE Trans. on Infor. Theory, 52(4):1289--1306, 2006.]]
[19]
T. Feder, E. Kushilevitz, M. Naor, and N. Nisan. Amortized communication complexity. SIAM J. Comput., 24(4):736--750, 1995.]]
[20]
J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. J. Strauss, and R. N. Wright. Secure multiparty computation of approximations. ACM Trans. on Alg., 2(3):435--472, 2006.]]
[21]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate $L^1$-difference algorithm for massive data streams. SIAM J. Comput., 32(1):131--151, 2002.]]
[22]
R. Gennaro, S. Halevi, and T. Rabin. Secure hash-and-sign signatures without the random oracle. In EUROCRYPT ’99, pages 123--139, 1999.]]
[23]
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In 10th SODA, pages 909--910, 1999.]]
[24]
A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compressed sensing. In 39th STOC, pages 237--246, 2007.]]
[25]
O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. of the ACM, 45(4):653--750, 1998.]]
[26]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External memory algorithms, pages 107--118. American Mathematical Society, 1999.]]
[27]
P. Indyk. Explicit constructions for compressed sensing of sparse signals. In 19th SODA, pages 30--33, 2008.]]
[28]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In 30th STOC, pages 604--613, 1998.]]
[29]
P. Indyk and D. P. Woodruff. Polylogarithmic private approximations and efficient matching. In 3rd TCC, pages 245--264, 2006.]]
[30]
I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]]
[31]
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457--474, 2000.]]
[32]
T. Moran, M. Naor, and G. Segev. Deterministic history-independent strategies for storing information on write-once memories. In 34th ICALP, pages 303--315, 2007.]]
[33]
J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput., 22(4):838--856, 1993.]]
[34]
I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In 28th STOC, pages 561--570, 1996.]]
[35]
N. Nisan and A. Ta-Shma. Extracting randomness: A survey and new constructions. J. Comp. Syst. Sci., 58(1):148--173, 1999.]]
[36]
R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25(2):252--271, 1996.]]
[37]
M. Sipser. Expanders, randomness, or time versus space. J. Comp. Syst. Sci., 36(3):379--383, 1988.]]
[38]
A. Ta-Shma, C. Umans, and D. Zuckerman. Lossless condensers, unbalanced expanders, and extractors. Combinatorica, 27(2):213--240, 2007.]]
[39]
H. S. Witsenhausen and A. D. Wyner. Interframe coder for video signals. U.S. patent 4,191,970, 1980.]]
[40]
A. C. Yao. Some complexity questions related to distributive computing. In 11th STOC, pages 209--213, 1979.]]

Cited By

View all
  • (2024)Unmasking vulnerabilitiesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692093(550-576)Online publication date: 21-Jul-2024
  • (2022)Nearly Optimal Property Preserving HashingAdvances in Cryptology – CRYPTO 202210.1007/978-3-031-15982-4_16(473-502)Online publication date: 12-Oct-2022
  • (2022)Property-Preserving Hash Functions for Hamming Distance from Standard AssumptionsAdvances in Cryptology – EUROCRYPT 202210.1007/978-3-031-07085-3_26(764-781)Online publication date: 30-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing
May 2008
712 pages
ISBN:9781605580470
DOI:10.1145/1374376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream model.
  2. massive data sets
  3. sketch model

Qualifiers

  • Research-article

Conference

STOC '08
Sponsor:
STOC '08: Symposium on Theory of Computing
May 17 - 20, 2008
British Columbia, Victoria, Canada

Acceptance Rates

STOC '08 Paper Acceptance Rate 80 of 325 submissions, 25%;
Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Unmasking vulnerabilitiesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692093(550-576)Online publication date: 21-Jul-2024
  • (2022)Nearly Optimal Property Preserving HashingAdvances in Cryptology – CRYPTO 202210.1007/978-3-031-15982-4_16(473-502)Online publication date: 12-Oct-2022
  • (2022)Property-Preserving Hash Functions for Hamming Distance from Standard AssumptionsAdvances in Cryptology – EUROCRYPT 202210.1007/978-3-031-07085-3_26(764-781)Online publication date: 30-May-2022
  • (2021)Robust Property-Preserving Hash Functions for Hamming Distance and MoreAdvances in Cryptology – EUROCRYPT 202110.1007/978-3-030-77883-5_11(311-337)Online publication date: 17-Oct-2021
  • (2015)Path-quality monitoring in the presence of adversariesIEEE/ACM Transactions on Networking10.1109/TNET.2014.233985323:6(1729-1741)Online publication date: 1-Dec-2015
  • (2013)How robust are linear sketches to adaptive inputs?Proceedings of the forty-fifth annual ACM symposium on Theory of Computing10.1145/2488608.2488624(121-130)Online publication date: 1-Jun-2013
  • (2008)Path-quality monitoring in the presence of adversariesACM SIGMETRICS Performance Evaluation Review10.1145/1384529.137548036:1(193-204)Online publication date: 2-Jun-2008
  • (2008)Path-quality monitoring in the presence of adversariesProceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems10.1145/1375457.1375480(193-204)Online publication date: 2-Jun-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media