[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3035918.3035926acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Distributed Provenance Compression

Published: 09 May 2017 Publication History

Abstract

Network provenance, which records the execution history of network events as meta-data, is becoming increasingly important for network accountability and failure diagnosis. For example, network provenance may be used to trace the path that a message traversed in a network, or to reveal how a particular routing entry was derived and the parties involved in its derivation. A challenge when storing the provenance of a live network is that the large number of the arriving messages may incur substantial storage overhead. In this paper, we explore techniques to dynamically compress distributed provenance stored at scale. Logically, the compression is achieved by grouping equivalent provenance trees and maintaining only one concrete copy for each equivalence class. To efficiently identify equivalent provenance, we (1) introduce distributed event-based linear programs (DELP) to specify distributed network applications, and (2) statically analyze DELPs to allow for quick detection of provenance equivalence at runtime. Our experimental results demonstrate that our approach leads to significant storage reduction and query latency improvement over alternative approaches.

References

[1]
Y. Amsterdamer, D. Deutch, T. Milo, and V. Tannen. On provenance minimization. ACM Trans. Database Syst., 37(4):30, 2012.
[2]
Z. Bao, H. Köhler, L. Wang, X. Zhou, and S. W. Sadiq. Efficient provenance storage for relational queries. In CIKM, pages 1352--1361, 2012.
[3]
A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In Proceedings of ACM SIGMOD, pages 993--1006, 2008.
[4]
A. Chen, Y. Wu, A. Haeberlen, W. Zhou, and B. T. Loo. The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance. In Proceedings of ACM SIGCOMM, Aug. 2016.
[5]
C. Chen, L. Jia, H. Xu, C. Luo, W. Zhou, and B. T. Loo. A program logic for verifying secure routing protocols. In Proceedings of FORTE, pages 117--132, 2014.
[6]
C. Chen, H. Lehri, L. K. Loh, A. Alur, L. Jia, B. T. Loo, and W. Zhou. Provably correct distributed provenance compression (cmu-cylab-17-001). Technical report, CyLab, Carnegie Mellon University, Jan. 2017.
[7]
R. Droms. Dynamic host configuration protocol. 1997. RFC 2131.
[8]
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proceedings of PODS, pages 31--40, 2007.
[9]
J. Jung, E. Sit, H. Balakrishnan, and R. Morris. DNS performance and the effectiveness of caching. IEEE/ACM Trans. Netw., 10(5):589--603, 2002.
[10]
G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In Proceedings of ACM SIGMOD, pages 951--962, 2010.
[11]
B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative Networking Language, Execution and Optimization. In Proceedings of ACM SIGMOD, 2006.
[12]
B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking. In Communications of the ACM, 2009.
[13]
P. V. Mockapetris. Domain names - implementation and specification, Nov. 1987. RFC 1035.
[14]
S. C. Muthukumar, X. Li, C. Liu, J. B. Kopena, M. Oprea, and B. T. Loo. Declarative toolkit for rapid network protocol simulation and experimentation. In SIGCOMM (demo), 2009.
[15]
ns 3 project. Network Simulator 3. http://www.nsnam.org/.
[16]
D. Olteanu and J. Závodný. On factorisation of provenance polynomials. In Proceedings of TaPP, 2011.
[17]
D. Olteanu and J. Závodný. Factorised representations of query results: size bounds and readability. In Proceedings of ICDT, pages 285--298, 2012.
[18]
D. C. Plummer. An ethernet address resolution protocol. 1982. RFC 826.
[19]
M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker. Abstractions for network update. In Proceedings of ACM SIGCOMM, pages 323--334, 2012.
[20]
Robert Ramey. http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/index.html.
[21]
A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of ICDE, pages 91--102, 1997.
[22]
Y. Wu, A. Chen, A. Haeberlen, W. Zhou, and B. T. Loo. Automated network repair with meta provenance. In Proceedings of HotNets, pages 26:1--26:7, 2015.
[23]
Y. Wu, M. Zhao, A. Haeberlen, W. Zhou, and B. T. Loo. Diagnosing missing events in distributed systems with negative provenance. In Proceeding of ACM SIGCOMM, pages 383--394, 2014.
[24]
Y. Xie, K. Muniswamy-Reddy, D. Feng, Y. Li, and D. D. E. Long. Evaluation of a hybrid approach for efficient provenance storage. TOS, 9(4):14, 2013.
[25]
E. W. Zegura, K. L. Calvert, and S. Bhattacharjee. How to model an internetwork. In Proceedings IEEE INFOCOM, pages 594--602, 1996.
[26]
W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proceedings of SOSP, pages 295--310, 2011.
[27]
W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen, Z. G. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. PVLDB, 6(2):49--60, 2012.
[28]
W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In Proceedings of ACM SIGMOD, pages 615--626, 2010.

Cited By

View all
  • (2024) eAudit: A Fast, Scalable and Deployable Audit Data Collection System * 2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00087(3571-3589)Online publication date: 19-May-2024
  • (2024)Compression and In-Situ Query Processing for Fine-Grained Array Lineage2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00281(3654-3667)Online publication date: 13-May-2024
  • (2023)System Auditing for Real-Time SystemsACM Transactions on Privacy and Security10.1145/362522926:4(1-37)Online publication date: 13-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed systems
  2. provenance
  3. static analysis
  4. storage

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024) eAudit: A Fast, Scalable and Deployable Audit Data Collection System * 2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00087(3571-3589)Online publication date: 19-May-2024
  • (2024)Compression and In-Situ Query Processing for Fine-Grained Array Lineage2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00281(3654-3667)Online publication date: 13-May-2024
  • (2023)System Auditing for Real-Time SystemsACM Transactions on Privacy and Security10.1145/362522926:4(1-37)Online publication date: 13-Nov-2023
  • (2022)Provenance-based data skippingProceedings of the VLDB Endowment10.14778/3494124.349413015:3(451-464)Online publication date: 4-Feb-2022
  • (2022)Towards Efficient Auditing for Real-Time SystemsComputer Security – ESORICS 202210.1007/978-3-031-17143-7_30(614-634)Online publication date: 26-Sep-2022
  • (2021)Compact, tamper-resistant archival of fine-grained provenanceProceedings of the VLDB Endowment10.14778/3436905.343690914:4(485-497)Online publication date: 22-Feb-2021
  • (2021)Validating the Integrity of Audit Logs Against Execution Repartitioning AttacksProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3484551(3337-3351)Online publication date: 12-Nov-2021
  • (2021)LineageChain: a fine-grained, secure and efficient data provenance system for blockchainsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00646-130:1(3-24)Online publication date: 7-Feb-2021
  • (2020)On the Forensic Validity of Approximated Audit LogsProceedings of the 36th Annual Computer Security Applications Conference10.1145/3427228.3427272(189-202)Online publication date: 7-Dec-2020
  • (2020)This is Why We Can’t Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical StorageAnnual Computer Security Applications Conference10.1145/3427228.3427255(165-178)Online publication date: 7-Dec-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media