[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Efficient order dependency detection

Published: 01 April 2016 Publication History

Abstract

Order dependencies (ODs) describe a relationship of order between lists of attributes in a relational table. ODs can help to understand the semantics of datasets and the applications producing them. They have applications in the field of query optimization by suggesting query rewrites. Also, the existence of an OD in a table can provide hints on which integrity constraints are valid for the domain of the data at hand. This work is the first to describe the discovery problem for order dependencies in a principled manner by characterizing the search space, developing and proving pruning rules, and presenting the algorithm Order, which finds all order dependencies in a given table. Order traverses the lattice of permutations of attributes in a level-wise bottom-up manner. In a comprehensive evaluation, we show that it is efficient even for various large datasets.

References

[1]
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557---581 (2015)
[2]
Abedjan, Ziawasch, Naumann, Felix: Advancing the discovery of unique column combinations. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 1565---1570, (2011)
[3]
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Databases (VLDB), pp. 487---499, (1994)
[4]
De Marchi, F., Lopes, S., Petit, J.-M.: Unary and n-ary inclusion dependency discovery in relational databases. J. Intell. Inf. Syst. 32(1), 53---73 (2009)
[5]
Dong, J., Hull, R.: Applying approximate order dependency to reduce indexing space. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 119---127, (1982)
[6]
Ginsburg, S., Hull, R.: Order dependency in the relational model. Theoret. Comput. Sci. 26(1---2), 149---195 (1983)
[7]
Golab, L., Karloff, H.J., Korn, F., Saha, A., Srivastava, D.: Sequential dependencies. Proc. VLDB Endow. 2(1), 574---585 (2009)
[8]
Halbeisen, L., Hungerbühler, N.: Number theoretic aspects of a combinatorial function. Notes Numb. Theory Discrete Math. 5(4), 138---150 (1999)
[9]
Heise, A., Quiané-Ruiz, J.-A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4), 301---312 (2013)
[10]
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100---111 (1999)
[11]
Lichman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml. Accessed March 10, 2015
[12]
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data--a review. IEEE Trans. Knowl Data Eng. 24(2), 251---264 (2012)
[13]
Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40---49 (2013)
[14]
Ng, W.: Ordered functional dependencies in relational databases. Inf. Syst. 24(7), 535---554 (1999)
[15]
Northwestern University. WikiTables: Public Site (2015). http://downey-n1.cs.northwestern.edu/public. Accessed March 10, 2015
[16]
Papenbrock, T., Bergmann, T., Finke, M., Zwiener, J., Naumann, F.: Data profiling with Metanome. Proc. VLDB Endow. 8(12), 1860---1871 (2015)
[17]
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: An experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082---1093 (2015)
[18]
Sloane, N.J.A.: The On-Line Encyclopedia of Integer Sequences--A000522 (2015). http://oeis.org/A000522. Accessed March 10, 2015
[19]
Szlichta, J., Godfrey, P., Gryz, J.: Chasing polarized order dependencies. In: Proceedings of the Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), pp. 168---179, (2012)
[20]
Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. Proc. VLDB Endow. 5(11), 1220---1231 (2012)
[21]
Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Qiu, W., Zuzarte, C.: Business-intelligence queries with order dependencies in DB2. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 750---761, (2014)
[22]
Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. Proc. VLDB Endow. 6(14), 1858---1869 (2013)

Cited By

View all
  • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 1-Sep-2024
  • (2023)Fast Algorithm for Embedded Order Dependency ValidationProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603720(1-4)Online publication date: 10-Jul-2023
  • (2023)Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data PipelinesProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599776(4991-5003)Online publication date: 6-Aug-2023
  • Show More Cited By
  1. Efficient order dependency detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The VLDB Journal — The International Journal on Very Large Data Bases
    The VLDB Journal — The International Journal on Very Large Data Bases  Volume 25, Issue 2
    April 2016
    163 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 April 2016

    Author Tags

    1. Data profiling
    2. Functional dependencies
    3. Metadata

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 1-Sep-2024
    • (2023)Fast Algorithm for Embedded Order Dependency ValidationProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603720(1-4)Online publication date: 10-Jul-2023
    • (2023)Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data PipelinesProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599776(4991-5003)Online publication date: 6-Aug-2023
    • (2023)Effective and Efficient Lexicographical Order Dependency DiscoveryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324878035:9(9700-9714)Online publication date: 1-Sep-2023
    • (2023)Discovery of Approximate Lexicographical Order DependenciesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313022735:4(3684-3698)Online publication date: 1-Apr-2023
    • (2023)Incremental discovery of denial constraintsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00788-y32:6(1289-1313)Online publication date: 1-Nov-2023
    • (2022)Fast approximate denial constraint discoveryProceedings of the VLDB Endowment10.14778/3565816.356582816:2(269-281)Online publication date: 1-Oct-2022
    • (2022)DataPrism: Exposing Disconnect between Data and SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517864(217-231)Online publication date: 10-Jun-2022
    • (2022)Data Dependencies Extended for Variety and Veracity: A Family TreeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304644334:10(4717-4736)Online publication date: 1-Oct-2022
    • (2022)ABC of order dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00696-z31:5(825-849)Online publication date: 1-Sep-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media