[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Language-integrated privacy-aware distributed queries

Published: 10 October 2019 Publication History

Abstract

Distributed query processing is an effective means for processing large amounts of data. To abstract from the technicalities of distributed systems, algorithms for operator placement automatically distribute sequential data queries over the available processing units. However, current algorithms for operator placement focus on performance and ignore privacy concerns that arise when handling sensitive data.
We present a new methodology for privacy-aware operator placement that both prevents leakage of sensitive information and improves performance. Crucially, our approach is based on an information-flow type system for data queries to reason about the sensitivity of query subcomputations. Our solution unfolds in two phases. First, placement space reduction generates deployment candidates based on privacy constraints using a syntax-directed transformation driven by the information-flow type system. Second, constraint solving selects the best placement among the candidates based on a cost model that maximizes performance. We verify that our algorithm preserves the sequential behavior of queries and prevents leakage of sensitive data. We implemented the type system and placement algorithm for a new query language SecQL and demonstrate significant performance improvements in benchmarks.

Supplementary Material

a167-salvaneschi (a167-salvaneschi.webm)
Presentation at OOPSLA '19

References

[1]
Akka. 2019. Akka toolkit and runtime. http://akka.io .
[2]
Arvind Arasu, Spyros Blanas, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, Ravi Ramamurthy, Prasang Upadhyaya, and Ramarathnam Venkatesan. 2013. Secure Database-as-a-service with Cipherbase. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). ACM, New York, NY, USA, 1033–1036.
[3]
AWS. 2019. AWS Fargate. https://aws.amazon.com/de/fargate/ .
[4]
S. Bajaj and R. Sion. 2014. TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality. IEEE Transactions on Knowledge and Data Engineering 26, 3, 752–765.
[5]
D. Bell and L. LaPadula. 1973. Secure Computer Systems: Mathematical Foundations. Technical Report ESD-TR-73-278. MITRE Corporation.
[6]
Gabriel Bender, Lucja Kot, and Johannes Gehrke. 2014. Explainable Security for Relational Databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). ACM, New York, NY, USA, 1411–1422.
[7]
Niklas Broberg and David Sands. 2006. Flow Locks: Towards a Core Calculus for Dynamic Flow Policies. In Proceedings of the 15th European Conference on Programming Languages and Systems (ESOP’06). Springer-Verlag, Berlin, Heidelberg, 180–196.
[8]
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2017. Optimal Operator Replication and Placement for Distributed Stream Processing Systems. SIGMETRICS Perform. Eval. Rev. 44, 4 (May 2017), 11–22.
[9]
James Cheney, Sam Lindley, and Philip Wadler. 2013. A Practical Theory of Language-integrated Query. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP ’13). ACM, New York, NY, USA, 403–416.
[10]
Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik. 2003. Scalable distributed stream processing. In In CIDR. Asilomar, CA.
[11]
George Copeland and David Maier. 1984. Making Smalltalk a Database System. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD ’84). ACM, New York, NY, USA, 316–325.
[12]
Raimil Cruz, Tamara Rezk, Bernard Serpette, and Éric Tanter. 2017. Type Abstraction for Relaxed Noninterference. In 31st European Conference on Object-Oriented Programming (ECOOP 2017) (Leibniz International Proceedings in Informatics (LIPIcs)), Peter Müller (Ed.), Vol. 74. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1–7:27.
[13]
Gianpaolo Cugola and Alessandro Margara. 2013. Deployment strategies for distributed complex event processing. Computing 95, 2 (01 Feb 2013), 129–156.
[14]
Sabrina De Capitani di Vimercati, Sara Foresti, Sushil Jajodia, Giovanni Livraga, Stefano Paraboschi, and Pierangela Samarati. 2017. An Authorization Model for Multi Provider Queries. Proc. VLDB Endow. 11, 3 (Nov. 2017), 256–268.
[15]
Jeffrey Dean and Sanjay Ghemawat. 2010. MapReduce: A Flexible Data Processing Tool. Commun. ACM 53, 1 (Jan. 2010), 72–77.
[16]
Dorothy E. Denning. 1976. A Lattice Model of Secure Information Flow. Commun. ACM 19, 5 (May 1976), 236–243.
[17]
Dorothy E. Denning and Peter J. Denning. 1977. Certification of Programs for Secure Information Flow. Commun. ACM 20, 7 (July 1977), 504–513.
[18]
Philip Derbeko, Shlomi Dolev, Ehud Gudes, and Shantanu Sharma. 2016. Security and privacy aspects in MapReduce on clouds: A survey. Computer Science Review 20 (May 2016), 1–28.
[19]
Ekaterina B. Dimitrova, Panos K. Chrysanthis, and Adam J. Lee. 2019. Authorization-aware Optimization for Multi-provider Queries. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (SAC ’19). ACM, New York, NY, USA, 431–438.
[20]
Curtis E. Dyreson and Richard Thomas Snodgrass. 1998. Supporting Valid-time Indeterminacy. ACM Trans. Database Syst. 23, 1 (March 1998), 1–57.
[21]
Nicholas L. Farnan, Adam J. Lee, and Ting Yu. 2010. Investigating Privacy-aware Distributed Query Evaluation. In Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society (WPES ’10). ACM, New York, NY, USA, 43–52.
[22]
Bent Flyvbjerg. 2006. Five Misunderstandings About Case-Study Research. Qualitative Inquiry 12, 2 (2006), 219–245.
[23]
Cédric Fournet, Markulf Kohlweiss, George Danezis, and Zhengqin Luo. 2013. ZQL: A Compiler for Privacy-preserving Data Processing. In Proceedings of the 22Nd USENIX Conference on Security (SEC’13). USENIX Association, Berkeley, CA, USA, 163–178. http://dl.acm.org/citation.cfm?id=2534766.2534781
[24]
Reed M. Gardner, T.Allan Pryor, and Homer R. Warner. 1999. The HELP hospital information system: update 1998. International Journal of Medical Informatics 54, 3 (1999), 169 – 182.
[25]
J. A. Goguen and J. Meseguer. 1982. Security Policies and Security Models. In 1982 IEEE Symposium on Security and Privacy. 11–11.
[26]
Marco Guarnieri, Musard Balliu, Daniel Schoepe, David Basin, and Andrei Sabelfeld. 2019. Information-Flow Control for Database-Backed Applications. In 2019 IEEE European Symposium on Security and Privacy (EuroS P). 79–94.
[27]
Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica. 2003. Querying the Internet with PIER. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (VLDB ’03). VLDB Endowment, Berlin, Germany, 321–332. http://dl.acm.org/citation.cfm?id=1315451.1315480
[28]
Steven Y. Ko, Kyungho Jeon, and Ramsés Morales. 2011. The HybrEx Model for Confidentiality and Privacy in Cloud Computing. USENIX Association, Berkeley, CA, USA. 8–8 pages. http://dl.acm.org/citation.cfm?id=2170444.2170452
[29]
Krzysztof Kuchcinski and Radoslaw Szymanek. 2017. JaCoP - Java Constraint Programming solver. https://osolpro.atlassian. net/wiki/display/JACOP/ .
[30]
Geetika T. Lakshmanan, Ying Li, and Rob Strom. 2008. Placement Strategies for Internet-Scale Data Stream Systems. IEEE Internet Computing 12, 6 (Nov 2008), 50–60.
[31]
Jed Liu, Owen Arden, Michael D. George, and Andrew C. Myers. 2017. Fabric: Building open distributed systems securely by construction. Journal of Computer Security 25, 4-5 (2017), 367–426.
[32]
Ralf Mitschke, Sebastian Erdweg, Mirko Köhler, Mira Mezini, and Guido Salvaneschi. 2014. I3QL: Language-integrated Live Data Views. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 417–432.
[33]
Andrew C. Myers. 1999. JFlow: Practical Mostly-static Information Flow Control. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’99). ACM, New York, NY, USA, 228–241.
[34]
Andrew C. Myers and Barbara Liskov. 1997. A Decentralized Model for Information Flow Control. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP ’97). ACM, New York, NY, USA, 129–142.
[35]
K. Y. Oktay, M. Kantarcioglu, and S. Mehrotra. 2017. Secure and Efficient Query Processing over Hybrid Clouds. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). ieee, San Diego, CA, USA, 733–744.
[36]
Kerim Yasin Oktay, Sharad Mehrotra, Vaibhav Khadilkar, and Murat Kantarcioglu. 2015. SEMROD: Secure and Efficient MapReduce Over HybriD Clouds. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 153–166.
[37]
OpenClinical. 2004. HELP - Health Evaluation Through Logical Processing. http://www.openclinical.org/aisp_help.html .
[38]
Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, and Margo Seltzer. 2006. NetworkAware Operator Placement for Stream-Processing Systems. In Proceedings of the 22Nd International Conference on Data Engineering (ICDE ’06). IEEE Computer Society, Washington, DC, USA, 49–60.
[39]
Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakrishnan. 2011. CryptDB: Protecting Confidentiality with Encrypted Query Processing. In ACM Symposium on Operating Systems Principles (SOSP ’11). ACM, New York, NY, USA, 85–100.
[40]
Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM 55, 6 (2012), 121–130.
[41]
Daniel Schoepe, Daniel Hedin, and Andrei Sabelfeld. 2014. SeLINQ: Tracking Information Across Application-database Boundaries. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP ’14). ACM, New York, NY, USA, 25–38.
[42]
Shayak Sen, Saikat Guha, Anupam Datta, Sriram K. Rajamani, Janice Tsai, and Jeannette M. Wing. 2014. Bootstrapping Privacy Compliance in Big Data Systems. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP ’14). IEEE Computer Society, Washington, DC, USA, 327–342.
[43]
Sai Deep Tetali, Mohsen Lesani, Rupak Majumdar, and Todd Millstein. 2013. MrCrypt: Static Analysis for Secure Cloud Computations. In International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). ACM, New York, NY, USA, 271–286.
[44]
C. Thoma, A. Labrinidis, and A. J. Lee. 2014. Automated operator placement in distributed Data Stream Management Systems subject to user constraints. In 2014 IEEE 30th International Conference on Data Engineering Workshops. 310–316.
[45]
Feng Tian and David J. DeWitt. 2003. Tuple Routing Strategies for Distributed Eddies. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (VLDB ’03). VLDB Endowment, Berlin, Germany, 333–344. http: //dl.acm.org/citation.cfm?id=1315451.1315481
[46]
TPC. 2019. TPC-H Benchmark Specification. http://www.tpc.org/tpch .
[47]
Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich. 2013. Processing analytical queries over encrypted data. In Proceedings of the 39th international conference on Very Large Data Bases (PVLDB’13). VLDB Endowment, 289–300. http://dl.acm.org/citation.cfm?id=2488335.2488336
[48]
Huseyin Ulusoy, Pietro Colombo, Elena Ferrari, Murat Kantarcioglu, and Erman Pattuk. 2015. GuardMR: Fine-grained Security Policy Enforcement for MapReduce Systems. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIA CCS ’15). ACM, New York, NY, USA, 285–296.
[49]
Yannis Vassiliou. 1979. Null Values in Data Base Management: A Denotational Semantics Approach. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD ’79). ACM, New York, NY, USA, 162–169.
[50]
Dennis Volpano, Cynthia Irvine, and Geoffrey Smith. 1996. A Sound Type System for Secure Flow Analysis. J. Comput. Secur. 4, 2-3 (Jan. 1996), 167–187. http://dl.acm.org/citation.cfm?id=353629.353648
[51]
Pascal Weisenburger, Manisha Luthra, Boris Koldehofe, and Guido Salvaneschi. 2017. Quality-aware Runtime Adaptation in Complex Event Processing. In Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS’17). IEEE Press, Piscataway, NJ, USA, 140–151.
[52]
Ying Xing, Stan Zdonik, and Jeong-Hyon Hwang. 2005. Dynamic Load Distribution in the Borealis Stream Processor. In Proceedings of the 21st International Conference on Data Engineering (ICDE ’05). IEEE Computer Society, Washington, DC, USA, 791–802.
[53]
Jean Yang, Travis Hance, Thomas H. Austin, Armando Solar-Lezama, Cormac Flanagan, and Stephen Chong. 2016. Precise, Dynamic Information Flow for Database-backed Applications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 631–647.
[54]
Steve Zdancewic. 2013. A Type System for Robust Declassification. Electron. Notes Theor. Comput. Sci. 83 (Jan. 2013), 263–277.
[55]
Steve Zdancewic, Lantian Zheng, Nathaniel Nystrom, and Andrew C. Myers. 2001. Untrusted Hosts and Confidentiality: Secure Program Partitioning. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (SOSP ’01). ACM, New York, NY, USA, 1–14.
[56]
Q. Zeng, M. Zhao, P. Liu, P. Yadav, S. Calo, and J. Lobo. 2015. Enforcement of Autonomous Authorizations in Collaborative Distributed Query Evaluation. IEEE Transactions on Knowledge and Data Engineering 27, 4 (April 2015), 979–992.
[57]
Kehuan Zhang, Xiaoyong Zhou, Yangyi Chen, XiaoFeng Wang, and Yaoping Ruan. 2011. Sedic: Privacy-aware Data Intensive Computing on Hybrid Clouds. In Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS ’11). ACM, New York, NY, USA, 515–526.
[58]
Lantian Zheng, Stephen Chong, Andrew C. Myers, and Steve Zdancewic. 2003. Using Replication and Partitioning to Build Secure Distributed Systems. In Proceedings of the 2003 IEEE Symposium on Security and Privacy (SP ’03). IEEE Computer Society, Washington, DC, USA, 236–. http://dl.acm.org/citation.cfm?id=829515.830549
[59]
Yongluan Zhou, Beng Chin Ooi, Kian-Lee Tan, and Ji Wu. 2006. Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System. In Proceedings of the 2006 Confederated International Conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I (ODBASE’06/OTM’06). Springer-Verlag, Berlin, Heidelberg, 54–71.

Cited By

View all
  • (2024)Object Graph ProgrammingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623319(1-13)Online publication date: 20-May-2024
  • (2022)Distributed Query Execution under Access RestrictionsComputers & Security10.1016/j.cose.2022.103056(103056)Online publication date: Dec-2022
  • (2022)WIP: Pods: Privacy Compliant Scalable Decentralized Data ServicesHeterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-93663-1_7(70-82)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 3, Issue OOPSLA
October 2019
2077 pages
EISSN:2475-1421
DOI:10.1145/3366395
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2019
Published in PACMPL Volume 3, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Privacy
  2. Information-Flow Type System
  3. Operator Placement
  4. SQL
  5. Scala

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Object Graph ProgrammingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623319(1-13)Online publication date: 20-May-2024
  • (2022)Distributed Query Execution under Access RestrictionsComputers & Security10.1016/j.cose.2022.103056(103056)Online publication date: Dec-2022
  • (2022)WIP: Pods: Privacy Compliant Scalable Decentralized Data ServicesHeterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-93663-1_7(70-82)Online publication date: 1-Jan-2022
  • (2021)An authorization model for query execution in the cloudThe VLDB Journal10.1007/s00778-021-00709-x31:3(555-579)Online publication date: 6-Nov-2021
  • (2021)Distributed Query Evaluation over Encrypted DataData and Applications Security and Privacy XXXV10.1007/978-3-030-81242-3_6(96-114)Online publication date: 14-Jul-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media