[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1325851.1325878dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
research-article

Fast nGram-based string search over data encoded using algebraic signatures

Published: 23 September 2007 Publication History

Abstract

We propose a novel string search algorithm for data stored once and read many times. Our search method combines the sublinear traversal of the record (as in Boyer Moore or Knuth-Morris-Pratt) with the agglomeration of parts of the record and search pattern into a single character -- the algebraic signature -- in the manner of Karp-Rabin. Our experiments show that our algorithm is up to seventy times faster for DNA data, up to eleven times faster for ASCII, and up to a six times faster for XML documents compared with an implementation of Boyer-Moore. To obtain this speed-up, we store records in encoded form, where each original character is replaced with an algebraic signature.
Our method applies to records stored in databases in general and to distributed implementations of a Database As Service (DAS) in particular. Clients send records for insertion and search patterns already in encoded form and servers never operate on records in clear text. No one at a node can involuntarily discover the content of the stored data.

References

[1]
M. Bawa, R. Bayardo, and R. Agrawal. Privacy Preserving Indexing of Documents on the Network. In Proc. Intl. Conf. on Very Large Databases (VLDB), 2003.
[2]
R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762--772, 1977.
[3]
Y.-C. Chang and M. Mitzenmacher. Privacy Preserving Keyword Searches on Remote Encrypted Data. In Proc. Applied Cryptography and Network Security Conference (ACNS), Springer LNCS, 3531, 2005.
[4]
M. Crochemore and M. Lecroq. Pattern Matching and Text Compression ALgorithms. CRC Press Inc, 2004.
[5]
M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.
[6]
P. Golle, J. Staddon, and B. Waters. Public Key Encryption with Conjunctive Field Keyword Search. In Proc. Applied Cryptography and Network Security Conference (ACNS), Springer LNCS, 3531, 2005.
[7]
H. Hacigumus, B. Hore, B. Iyer, and S. Mehrotra. Search on Encrypted Data. Springer Verlag, 2007.
[8]
R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249--260, 1987.
[9]
M.-S. Kim, K.-Y. Whang, J.-G. Lee, and M.-J. Lee. n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 325--336, 2005.
[10]
W. Litwin, R. Mokadem, P. Rigaux, and T. Schwarz. Pattern Matching Using Cumulative Algebraic Signatures and n-gram Sampling. In Intl. Workshop on Grid and Peer-to-Peer Computing Impacts on Large Scale Heterogeneous Distributed Database Systems (GLOBE'06), 2006.
[11]
W. Litwin, R. Mokadem, and T. Schwarz. Pre-computed Algebraic Signatures for faster string search. Protection against incidental viewing and corruption of Data in an SDDS. In Intl. Workshop DBISP2P, 2005.
[12]
W. Litwin and T. Schwarz. Algebraic Signatures for Scalable Distributed Data Structures. In Proc. Intl. Conf. on Data Engineering (ICDE), 2004.
[13]
U. Manber and G. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935, 1993.
[14]
E. Miller, D. Shen, J. Liu, and C. Nicholas. Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System. Journal of Digital Information, 2000.
[15]
T. Schwarz, P. Tsui, and W. Litwin. An Encrypted, Content Searchable Scalable Distributed Data Structure. In Proc. Int. Workshop on Security and Trust in Decentralized / Distributed Data Structures (STD3S), 2006.
[16]
D. Song, D. Wagner, and A. Perrig. Practical Techniques for Searches on Encrypted Data. In Proc. IEEE Security and Privacy Symposium, 2000.

Cited By

View all
  • (2014)Time-sensitive Personalized Query Auto-CompletionProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661921(1599-1608)Online publication date: 3-Nov-2014
  • (2012)A generic framework for efficient and effective subsequence retrievalProceedings of the VLDB Endowment10.14778/2350229.23502715:11(1579-1590)Online publication date: 1-Jul-2012
  • (2012)WHAMACM Transactions on Database Systems10.1145/2389241.238924737:4(1-39)Online publication date: 1-Dec-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '07: Proceedings of the 33rd international conference on Very large data bases
September 2007
1443 pages
ISBN:9781595936493

Sponsors

  • Yahoo! Research
  • Google Inc.
  • SAP
  • Intel: Intel
  • Microsoft Research: Microsoft Research
  • ORACLE: ORACLE
  • Connex.cc
  • HP invent
  • WKO
  • IBM: IBM

Publisher

VLDB Endowment

Publication History

Published: 23 September 2007

Qualifiers

  • Research-article

Conference

VLDB '07
Sponsor:
  • Intel
  • Microsoft Research
  • ORACLE
  • IBM

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Time-sensitive Personalized Query Auto-CompletionProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661921(1599-1608)Online publication date: 3-Nov-2014
  • (2012)A generic framework for efficient and effective subsequence retrievalProceedings of the VLDB Endowment10.14778/2350229.23502715:11(1579-1590)Online publication date: 1-Jul-2012
  • (2012)WHAMACM Transactions on Database Systems10.1145/2389241.238924737:4(1-39)Online publication date: 1-Dec-2012
  • (2011)WHAMProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989370(445-456)Online publication date: 12-Jun-2011
  • (2010)Performance improvement of join queries through algebraic signaturesInternational Journal of Intelligent Information and Database Systems10.1504/IJIIDS.2010.0340844:3(282-306)Online publication date: 1-Jul-2010
  • (2009)Reference-based alignment in large sequence databasesProceedings of the VLDB Endowment10.14778/1687627.16876512:1(205-216)Online publication date: 1-Aug-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media