[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Effective and complete discovery of order dependencies via set-based axiomatization

Published: 01 March 2017 Publication History

Abstract

Integrity constraints (ICs) are useful for query optimization and for expressing and enforcing application semantics. However, formulating constraints manually requires domain expertise, is prone to human errors, and may be excessively time consuming, especially on large datasets. Hence, proposals for automatic discovery have been made for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). ODs properly subsume FDs, as they can additionally express business rules involving order; e.g., an employee never has a higher salary while paying lower taxes than another employee.
We present a new OD discovery algorithm enabled by a novel polynomial mapping to a canonical form of ODs, and a sound and complete set of axioms (inference rules) for canonical ODs. Our algorithm has exponential worst-case time complexity, O(2|R|), in the number of attributes |R| and linear complexity in the number of tuples. We prove that it produces a complete and minimal set of ODs. Using real and synthetic datasets, we experimentally show orders-of-magnitude performance improvements over the prior state-of-the-art.

References

[1]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, AAAI Press, pages 307--328, 1996.
[2]
X. Chu, I. Ilyas, and P. Papotti. Discovering Denial Constraints. PVLDB, 6(13):1498--1509, 2013.
[3]
X. Chu, I. Ilyas, and P. Papotti. Holistic data cleaning: Putting violations into context. In ICDE, pages 458--469, 2013.
[4]
G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315--326, 2007.
[5]
J. Dong and R. Hull. Applying Approximate Order Dependency to Reduce Indexing Space. In SIGMOD, 119--127, 1982.
[6]
S. Ginsburg and R. Hull. Order Dependency in the Relational Model. TCS, 26(1): 149--195, 1983.
[7]
L. Golab, H. Karloff, F. Korn, and D. Srivastava. Sequential Dependencies. PVLDB, 2(1): 574--585, 2009.
[8]
R. Guravannavar, H. Ramanujam, and S. Sudarshan. Optimizing Nested Queries with Parameter Sort Orders. In VLDB, 481--492, 2005.
[9]
Y. Huhtala, J. Krkkinen, P. Porkka, and H. Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. In ICDE, pages 392--401, 1998.
[10]
P. Langer and F. Naumann. Efficient order dependency detection. VLDB J., 25(2):223--241, 2016.
[11]
T. Malkemus, S. Padmanabhan, B. Bhattacharjee, and L. Cranston. Predicate Derivation and Monotonicity Detection in DB2 UDB. In ICDE, 939--947, 2005.
[12]
W. Ng. An Extension of the Relational Data Model to Incorporate Ordered Domains. TODS, 26(3) 344--383, 2001.
[13]
T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schnberg, J. Zwiener, and F. Naumann. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, 8(10):1082--1093, 2015.
[14]
P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access Path Selection in a Relational Database Management System. In SIGMOD, 23--34, 1979.
[15]
D. Simmen, E. Shekita, and T. Malkemus. Fundamental Techniques for Order Optimization. In SIGMOD, 57--67, 1996.
[16]
J. Szlichta, P. Godfrey, L. Golab, M. Kargar, and D. Srivastava. Effective and complete discovery of order dependencies via set-based axiomatization. Technical report, 14 pages, http://arxiv.org/abs/1608.06169, 2016.
[17]
J. Szlichta, P. Godfrey, and J. Gryz. Fundamentals of Order Dependencies. PVLDB, 5(11): 1220--1231, 2012.
[18]
J. Szlichta, P. Godfrey, J. Gryz, W. Ma, P. Pawluk, and C. Zuzarte. Queries on Dates: Fast Yet not Blind. In EDBT, 497--502, 2011.
[19]
J. Szlichta, P. Godfrey, J. Gryz, W. Ma, W. Qiu, and C. Zuzarte. Business-Intelligence Queries with Order Dependencies in DB2. In EDBT, 750--761, 2014.
[20]
J. Szlichta, P. Godfrey, J. Gryz, and C. Zuzarte. Expressiveness and Complexity of Order Dependencies. PVLDB 6(14): 1858--1869, 2013.

Cited By

View all
  • (2024)Mixed Covers of Keys and Functional Dependencies for Maintaining the Integrity of Data under UpdatesProceedings of the VLDB Endowment10.14778/3654621.365462617:7(1578-1590)Online publication date: 1-Mar-2024
  • (2024)Efficient Differential Dependency DiscoveryProceedings of the VLDB Endowment10.14778/3654621.365462417:7(1552-1564)Online publication date: 1-Mar-2024
  • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 1-Sep-2024
  • Show More Cited By
  1. Effective and complete discovery of order dependencies via set-based axiomatization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 7
    March 2017
    132 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2017
    Published in PVLDB Volume 10, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mixed Covers of Keys and Functional Dependencies for Maintaining the Integrity of Data under UpdatesProceedings of the VLDB Endowment10.14778/3654621.365462617:7(1578-1590)Online publication date: 1-Mar-2024
    • (2024)Efficient Differential Dependency DiscoveryProceedings of the VLDB Endowment10.14778/3654621.365462417:7(1552-1564)Online publication date: 1-Mar-2024
    • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 1-Sep-2024
    • (2024)SynODC: Utilizing the Syntactic Structure for Outlier Detection in Categorical AttributesMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70359-1_13(213-229)Online publication date: 8-Sep-2024
    • (2022)Fast approximate denial constraint discoveryProceedings of the VLDB Endowment10.14778/3565816.356582816:2(269-281)Online publication date: 1-Oct-2022
    • (2022)Contextual Data Cleaning with Ontology Functional DependenciesJournal of Data and Information Quality10.1145/352430314:3(1-26)Online publication date: 23-May-2022
    • (2021)Fast incremental discovery of pointwise order dependenciesProceedings of the VLDB Endowment10.14778/3401960.340196513:10(1669-1681)Online publication date: 10-Mar-2021
    • (2021)Scalable Learning to Troubleshoot Query Performance ProblemsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481947(4016-4025)Online publication date: 26-Oct-2021
    • (2021)ABC of order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00696-z31:5(825-849)Online publication date: 8-Sep-2021
    • (2021)Efficient distributed discovery of bidirectional order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00683-431:1(49-74)Online publication date: 16-Aug-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media