[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3477132.3483586acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open access

Coeus: A System for Oblivious Document Ranking and Retrieval

Published: 26 October 2021 Publication History

Abstract

Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system.

References

[1]
gensim - topic modelling in python. https://github.com/RaRe-Technologies/gensim/.
[2]
SealPIR: A computational PIR library that achieves low communication costs and high performance. https://github.com/microsoft/SealPIR. Microsoft Research, Redmond, WA.
[3]
Wikimedia downloads: enwiki dump progress on 20210201. https://dumps.wikimedia.org/enwiki/20210201/enwiki-20210201-pages-articles-multistream.xml.bz2/.
[4]
Wikipedia:short description. https://en.wikipedia.org/wiki/Wikipedia:Short_description#Formatting/.
[5]
Wikipedia:wikipedia_records. https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_records#Title_length/.
[6]
S. Agrawal, S. Badrinarayanan, P. Mukherjee, and P. Rindal. Game-Set-MATCH: Using mobile devices for seamless external-facing biometric matching. In ACM Conference on Computer and Communications Security (CCS), pages 1351--1370, 2020.
[7]
C. Aguilar-Melchor, J. Barrier, L. Fousse, and M.-O. Killijian. XPIR: Private Information Retrieval for Everyone. In Privacy Enhancing Technologies Symposium (PETS), 2016.
[8]
D. Agun, J. Shao, S. Ji, S. Tessaro, and T. Yang. Privacy and efficiency tradeoffs for multiword top k search with linear additive rank scoring. In International World Wide Web Conference (WWW), pages 1725--1734, 2018.
[9]
I. Ahmad, Y. Yang, D. Agrawal, A. El Abbadi, and T. Gupta. Addra: Metadata-private voice communication over fully untrusted infrastructure. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 313--329, 2021.
[10]
M. Albrecht, M. Chase, H. Chen, J. Ding, S. Goldwasser, S. Gorbunov, S. Halevi, J. Hoffstein, K. Laine, K. Lauter, S. Lokam, D. Micciancio, D. Moody, T. Morrison, A. Sahai, and V. Vaikuntanathan. Homomorphic encryption security standard. Technical report, HomomorphicEncryption.org, November 2018.
[11]
A. Amirbekyan and V. Estivill-Castro. A new efficient privacy-preserving scalar product protocol. In The Australasian Data Mining Conference (AusDM), 2007.
[12]
S. Angel, H. Chen, K. Laine, and S. Setty. PIR with compressed queries and amortized query processing. In IEEE Symposium on Security and Privacy (S&P), pages 962--979, 2018.
[13]
S. Angel and S. Setty. Unobservable communication over fully untrusted infrastructure. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 551--569, 2016.
[14]
J. Angwin, C. Savage, J. Larson, H. Moltke, L. Poitras, and J. Risen. AT&T helped US spy on Internet on a vast scale. The New York Times, 2015.
[15]
E. Balsa, C. Troncoso, and C. Diaz. OB-PWS: Obfuscation-based private web search. In IEEE Symposium on Security and Privacy (S&P), pages 491--505, 2012.
[16]
M. Barbaro and T. Zeller Jr. A Face Is Exposed for AOL Searcher No. 4417749. The New York Times, Aug. 2006.
[17]
B. Barrett. Security news this week: Russia's SolarWinds hack is a historic mess. Wired, Dec. 2020. https://www.wired.com/story/russia-solarwinds-hack-roundup/.
[18]
A. Beimel, Y. Ishai, and T. Malkin. Reducing the servers computation in private information retrieval: PIR with preprocessing. In Advances in Cryptology---CRYPTO, pages 55--73, 2000.
[19]
J. Bethencourt, D. Song, and B. Waters. New techniques for private stream searching. ACM Transactions on Information and System Security (TISSEC), 12(3):1--32, 2009.
[20]
C. Bösch, P. Hartel, W. Jonker, and A. Peter. A survey of provably secure searchable encryption. ACM Computing Surveys (CSUR), 47(2):1--51, 2014.
[21]
Z. Brakerski. Fully homomorphic encryption without modulus switching from classical GapSVP. In Advances in Cryptology---CRYPTO, pages 868--886, 2012.
[22]
Z. Brakerski, C. Gentry, and V. Vaikuntanathan. (Leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT), 6(3):13, 2014.
[23]
B. Braun, A.J. Feldman, Z. Ren, S. Setty, A. J. Blumberg, and M. Walfish. Verifying computations with state. In ACM Symposium on Operating Systems Principles (SOSP), Nov. 2013.
[24]
D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner. Dynamic searchable encryption in very-large databases: Data structures and implementation. In Network and Distributed System Security Symposium (NDSS), 2014.
[25]
B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords. Cryptology ePrint Archive, Report 1998/003, 1998. https://eprint.iacr.org/1998/003.
[26]
B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval. In Symposium on Foundations of Computer Science (FOCS), 1995.
[27]
R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky. Searchable symmetric encryption: improved definitions and efficient constructions. Journal of Computer Security, 19(5):895--934, 2011.
[28]
G. Danezis and C. Diaz. Space-efficient private search with applications to rateless codes. In International Conference on Financial Cryptography and Data Security (FC), pages 148--162. Springer, 2007.
[29]
E. Dauterman, E. Feng, E. Luo, R. A. Popa, and I. Stoica. DORY: An encrypted search system with distributed trust. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 1101--1119, 2020.
[30]
I. Demertzis, D. Papadopoulos, C. Papamanthou, and S. Shintre. SEAL: Attack mitigation for encrypted databases via adjustable leakage. In USENIX Security Symposium (SEC), pages 2433--2450, 2020.
[31]
X. Ding, H. Pang, and J. Lai. Verifiable and private top-k monitoring. In ACM ASIA Conference on Computer and Communications Security (CCS), pages 553--558, 2013.
[32]
J. Domingo-Ferrer, A. Solanas, and J. Castellà-Roca. h(k)-private information retrieval from privacy-uncooperative queryable databases. Online Information Review, 2009.
[33]
C. Dong and L. Chen. A fast secure dot product protocol with application to privacy preserving association rule mining. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2014.
[34]
C. Dwork. Differential privacy. In International Colloquium on Automata, Languages, and Programming, pages 1--12, 2006.
[35]
J. Fan and F. Vercauteren. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144, 2012.
[36]
R. A. Fink, D. R. Zaret, R. B. Stonehirsch, R. M. Seng, and S. M. Tyson. Streaming, plaintext private information retrieval using regular expressions on arbitrary length search strings. In IEEE Symposium on Privacy-Aware Computing (PAC), pages 107--118. IEEE, 2017.
[37]
M. J. Freedman, Y. Ishai, B. Pinkas, and O. Reingold. Keyword search and oblivious pseudorandom functions. In Theory of Cryptography Conference, pages 303--324. Springer, 2005.
[38]
C. Gentry. Fully homomorphic encryption using ideal lattices. In ACM Symposium on Theory of Computing (STOC), 2009.
[39]
T. George. Takeaways from the Shopify hack. SecurityWeek, Sept. 2020. https://www.securityweek.com/takeaways-shopify-hack.
[40]
A. Gersho. Principles of quantization. IEEE Transactions on circuits and systems, 25(7):427--436, 1978.
[41]
B. Goethals, S. Laur, H. Lipmaa, and T. Mielikäinen. On private scalar product computation for privacy-preserving data mining. In International Conference on Information Security and Cryptology (ICISC). 2004.
[42]
O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious RAMs. Journal of the ACM (JACM), 43(3):431--473, 1996.
[43]
I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
[44]
T. Gupta, N. Crooks, W. Mulhern, S. Setty, L. Alvisi, and M. Walfish. Scalable and private media consumption with popcorn. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 91--107, 2016.
[45]
T. Gupta, H. Fingler, L. Alvisi, and M. Walfish. Pretzel: Email encryption and provider-supplied functions are compatible. In ACM SIGCOMM Conference, 2017.
[46]
S. Halevi and V. Shoup. Algorithms in HElib. In Advances in Cryptology---CRYPTO, pages 554--571. Springer, 2014.
[47]
S. Halevi and V. Shoup. Faster homomorphic linear transformations in HElib. In Advances in Cryptology---CRYPTO, pages 93--120. Springer, 2018.
[48]
R. Handa, C. R. Krishna, and N. Aggarwal. Document clustering for efficient and secure information retrieval from cloud. Concurrency and Computation: Practice and Experience, 31(15):e5127, 2019.
[49]
R. Handa, C. R. Krishna, and N. Aggarwal. Searchable encryption: A survey on privacy-preserving search schemes on encrypted outsourced data. Concurrency and Computation: Practice and Experience, 31(17):e5201, 2019.
[50]
R. Henry, Y. Huang, and I. Goldberg. One (block) size fits all: PIR and SPIR with variable-length records via multi-block queries. In Network and Distributed System Security Symposium (NDSS), 2013.
[51]
T. Hoang, M. O. Ozmen, Y. Jang, and A. A. Yavuz. Hardware-supported ORAM in effect: Practical oblivious search and update on very large dataset. Privacy Enhancing Technologies Symposium (PETS), 2019(1), 2019.
[52]
A. Holmes. 533 million facebook users' phone numbers and personal data have been leaked online. Business Insider, Apr. 2021. https://www.businessinsider.com/stolen-data-of-533-million-facebook-users-leaked-online-2021-4.
[53]
A. Ibrahim, H. Jin, A. A. Yassin, and D. Zou. Secure rank-ordered search of multi-keyword trapdoor over encrypted cloud data. In 2012 IEEE Asia-Pacific Services Computing Conference, pages 263--270. IEEE, 2012.
[54]
I. Iliashenko and V. Zucca. Faster homomorphic comparison operations for BGV and BFV. Privacy Enhancing Technologies Symposium (PETS), 3:246--264, 2021.
[55]
Intel. Intel Software Guard Extensions. https://software.intel.com/en-us/sgx.
[56]
I. Ioannidis, A. Grama, and M. Atallah. A secure protocol for computing dot-products in clustered and distributed environments. In International Conference on Parallel Processing(ICPP), 2002.
[57]
S. Ji, J. Shao, D. Agun, and T. Yang. Privacy-aware ranking with tree ensembles on the cloud. In International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 315--324, 2018.
[58]
C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan. Gazelle: A low latency framework for secure neural network inference. In USENIX Security Symposium (SEC), pages 1651--1669, 2018.
[59]
E. Kushilevitz and R. Ostrovsky. Replication is not needed: Single database, computationally-private information retrieval. In Symposium on Foundations of Computer Science (FOCS), 1997.
[60]
D. Lazar, Y. Gilad, and N. Zeldovich. Karaoke: Distributed private messaging immune to passive traffic analysis. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 711--725, 2018.
[61]
Q. Lou, B. Feng, G. C. Fox, and L.Jiang. Glyph: Fast and accurately training deep neural networks on encrypted data. arXiv preprint arXiv:1911.07101, 2019.
[62]
V. Lyubashevsky, C. Peikert, and O. Regev. On ideal lattices and learning with errors over rings. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 1--23, 2010.
[63]
L. McKenzie. Secure file sharing compromises university security. Inside Higher Ed, Apr. 2021. https://www.insidehighered.com/news/2021/04/07/accellion-data-security-breach-latest-hit-universities.
[64]
P. Mishra, R. Poddar, J. Chen, A. Chiesa, and R. A. Popa. Oblix: An efficient oblivious search index. In IEEE Symposium on Security and Privacy (S&P), pages 279--296. IEEE, 2018.
[65]
M. Oehler and D. S. Phatak. A conjunction for private stream searching. In International Conference on Social Computing, pages 441--447. IEEE, 2013.
[66]
F. Olumofin and I. Goldberg. Privacy-preserving queries over relational databases. In Privacy Enhancing Technologies Symposium (PETS), pages 75--92. Springer, 2010.
[67]
C. Örencik and E. Savaş. An efficient privacy-preserving multi-keyword search over encrypted cloud data with ranking. Distributed and Parallel Databases, 32(1):119--160, 2014.
[68]
R. Ostrovsky and W. E. Skeith. Private searching on streaming data. Journal of Cryptology, 20(4):397--430, 2007.
[69]
B. Parno, C. Gentry, J. Howell, and M. Raykova. Pinocchio: Nearly practical verifiable computation. In IEEE Symposium on Security and Privacy (S&P), May 2013.
[70]
R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In LREC Workshop on New Challenges for NLP Frameworks, pages 45--50, May 2010.
[71]
D. Rushe. Yahoo $250,000 daily fine over NSA data refusal was set to double "every week". The Guardian, 2014.
[72]
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986.
[73]
M. D. Schatz, R. A. Van de Geijn, and J. Poulson. Parallel matrix multiplication: A systematic journey. SIAM Journal on Scientific Computing, 38(6):C748-C781, 2016.
[74]
H. Schütze, C. D. Manning, and P. Raghavan. Introduction to information retrieval, volume 39. Cambridge University Press Cambridge, 2008.
[75]
Microsoft SEAL (release 3.5). https://github.com/Microsoft/SEAL, Apr. 2020. Microsoft Research, Redmond, WA.
[76]
A. W. Services. Amazon EC2 Instance Savings Plans. https://aws.amazon.com/savingsplans/compute-pricing/.
[77]
A. W. Services. Amazon EC2 On-Demand Pricing (Data transfer). https://aws.amazon.com/ec2/pricing/on-demand/.
[78]
J. Shao, S. Ji, A. O. Glova, Y. Qiao, T. Yang, and T. Sherwood. Index obfuscation for oblivious document retrieval in a trusted execution environment. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 1345--1354, 2020.
[79]
J. Shao, S. Ji, and T. Yang. Privacy-aware document ranking with neural signals. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 305--314, 2019.
[80]
D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encrypted data. In IEEE Symposium on Security and Privacy (S&P), pages 44--55, 2000.
[81]
E. Stefanov, E. Shi, and D. Song. Towards practical oblivious RAM. In Network and Distributed System Security Symposium (NDSS), 2012.
[82]
M. Strizhov and I. Ray. Multi-keyword similarity search over encrypted cloud data. In IFIP international information security conference, pages 52--65. Springer, 2014.
[83]
W. Sun, B. Wang, N. Cao, M. Li, W. Lou, Y. T. Hou, and H. Li. Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. In Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security, pages 71--82, 2013.
[84]
W. Sun, R. Zhang, W. Lou, and Y. T. Hou. REARGUARD: Secure keyword search using trusted hardware. In IEEE International Conference on Computer Communications (INFOCOM), pages 801--809. IEEE, 2018.
[85]
P. Syverson, R. Dingledine, and N. Mathewson. Tor: The second-generation onion router. In USENIX Security Symposium (SEC), pages 303--320, 2004.
[86]
N. Tyagi, Y. Gilad, D. Leung, M. Zaharia, and N. Zeldovich. Stadium: A distributed metadata-private messaging system. In ACM Symposium on Operating Systems Principles (SOSP), pages 423--440, 2017.
[87]
M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan. Fully homomorphic encryption over the integers. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 24--43. Springer, 2010.
[88]
C. Wang, N. Cao, J. Li, K. Ren, and W. Lou. Secure ranked keyword search over encrypted cloud data. In International Conference on Distributed Computing Systems (ICDCS), pages 253--262. IEEE, 2010.
[89]
F. Wang, C. Yun, S. Goldwasser, V. Vaikuntanathan, and M. Zaharia. Splinter: Practical private queries on public data. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 299--313, 2017.
[90]
Y. Wang, H. Pang, Y. Yang, and X. Ding. Secure server-aided top-k monitoring. Information Sciences, 420:345--363, 2017.
[91]
Y. Wang, J. Wang, and X. Chen. Secure searchable encryption: a survey. Journal of communications and information networks, 1(4):52--65, 2016.
[92]
C. Wei, Q. Gu, S. Ji, W. Chen, Z. Wang, and R. Beyah. OB-WSPES: A uniform evaluation system for obfuscation-based web search privacy. IEEE Transactions on Dependable and Secure Computing, 2019.
[93]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel computing, 27(1-2):3--35, 2001.
[94]
Z. Xia, X. Wang, X. Sun, and Q. Wang. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE transactions on parallel and distributed systems, 27(2):340--352, 2015.
[95]
X. Yi and E. Bertino. Private searching for single and conjunctive keywords on streaming data. In Workshop on Privacy in the Electronic Society (WPES), pages 153--158, 2011.
[96]
X. Yi, E. Bertino, J. Vaidya, and C. Xing. Private searching on streaming data based on keyword frequency. IEEE Transactions on Dependable and Secure Computing, 11(2):155--167, 2013.
[97]
X. Yi, R. Paulet, and E. Bertino. Private searching on streaming data. In Homomorphic Encryption and Applications, pages 101--126. Springer, 2014.
[98]
X. Yi and C. Xing. Private (t, n) threshold searching on streaming data. In International Conference on Privacy, Security, Risk and Trust and International Conference on Social Computing, pages 676--683. IEEE, 2012.
[99]
J. Yu, P. Lu, Y. Zhu, G. Xue, and M. Li. Toward secure multi-keyword top-k retrieval over encrypted cloud data. IEEE transactions on dependable and secure computing, 10(4):239--250, 2013.
[100]
P. Zhang, Y. Li, Q. Liu, and H. Lin. A scalable distributed private stream search system. In International Conference on Distributed Computing Systems Workshops, pages 128--135. IEEE, 2015.
[101]
J. Zobel and A. Moffat. Exploring the similarity space. In ACM SIGIR Forum, volume 32, pages 18--34, 1998.

Cited By

View all
  • (2023)Private Information Retrieval in Large Scale Public Data RepositoriesProceedings of the VLDB Endowment10.14778/3611540.361157216:12(3868-3871)Online publication date: 1-Aug-2023
  • (2023)Private Web Search with TiptoeProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613134(396-416)Online publication date: 23-Oct-2023
  • (2023)Sparsity Aware of TF-IDF Matrix to Accelerate Oblivious Document Ranking and Retrieval2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00137(974-981)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
October 2021
899 pages
ISBN:9781450387095
DOI:10.1145/3477132
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2021

Check for updates

Badges

Author Tags

  1. document ranking
  2. private information retrieval
  3. secure matrix-vector product

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SOSP '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)427
  • Downloads (Last 6 weeks)36
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Private Information Retrieval in Large Scale Public Data RepositoriesProceedings of the VLDB Endowment10.14778/3611540.361157216:12(3868-3871)Online publication date: 1-Aug-2023
  • (2023)Private Web Search with TiptoeProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613134(396-416)Online publication date: 23-Oct-2023
  • (2023)Sparsity Aware of TF-IDF Matrix to Accelerate Oblivious Document Ranking and Retrieval2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00137(974-981)Online publication date: 1-Nov-2023
  • (2022)PantheonProceedings of the VLDB Endowment10.14778/3574245.357425116:4(643-656)Online publication date: 1-Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media