[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

Published: 21 May 2024 Publication History

Abstract

Over the past decade, there have been numerous extensions to the definition of Functional Dependency (<sc>fd</sc>), culminating in the introduction of Relaxed Functional Dependency (<sc>rfd</sc>), offering more flexible constraints compared to traditional <sc>fd</sc>s. This increased flexibility makes <sc>rfd</sc>s well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying <sc>rfd</sc>s within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This article presents a new algorithm, namely <sc>D-IndiBits</sc>, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of <sc>IndiBits</sc>, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate <sc>D-IndiBits</sc>&#x0027;s effectiveness compared to <sc>fd</sc> and <sc>rfd</sc> discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10 k and 100 k, <sc>D-IndiBits</sc> is capable of updating the set of <sc>rfd</sc>s in a few seconds, whereas all other approaches often employ more than 3 hours.

References

[1]
F. Naumann, “Data profiling revisited,” ACM SIGMOD Rec., vol. 42, no. 4, pp. 40–49, 2014.
[2]
L. Caruccio, V. Deufemia, and G. Polese, “Relaxed functional dependencies — A survey of approaches,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 1, pp. 147–165, Jan. 2016.
[3]
S. Song, F. Gao, R. Huang, and C. Wang, “Data dependencies over big data: A family tree,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 10, pp. 4717–4736, Oct. 2022.
[4]
R. Hai, C. Quix, and D. Wang, “Relaxed functional dependency discovery in heterogeneous data lakes,” in Proc. Int. Conf. Conceptual Model., A. H. F. Laender, B. Pernici, E.-P. Lim, and J. P. M. de Oliveira, Eds., Springer, Cham, 2019, pp. 225–239.
[5]
S. Song, A. Zhang, L. Chen, and J. Wang, “Enriching data imputation with extensive similarity neighbors,” Proc. VLDB Endowment, vol. 8, no. 11, pp. 1286–1297, 2015.
[6]
B. Breve, L. Caruccio, V. Deufemia, and G. Polese, “RENUVER: A missing value imputation algorithm based on relaxed functional dependencies,” in Proc. 25th Int. Conf. Extending Database Technol., 2022, pp. 1:52–1:64.
[7]
L. Caruccio, V. Deufemia, F. Naumann, and G. Polese, “Discovering relaxed functional dependencies based on multi-attribute dominance,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 9, pp. 3212–3228, Sep. 2021.
[8]
M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudr é-Mauroux, “Mind the gap: An experimental evaluation of imputation of missing values techniques in time series,” Proc. VLDB Endowment, vol. 13, no. 5, pp. 768–782, 2020.
[9]
L. Caruccio, S. Cirillo, V. Deufemia, and G. Polese, “Incremental discovery of functional dependencies with a bit-vector algorithm,” in Proc. 27th Italian Symp. Adv. Database Syst., 2019.
[10]
B. Breve, L. Caruccio, S. Cirillo, V. Deufemia, and G. Polese, “IndiBits: Incremental discovery of relaxed functional dependencies using bitwise similarity,” in Proc. IEEE 39th Int. Conf. Data Eng., 2023, pp. 1393–1405.
[11]
H. Mannila and K.-J. Raiha, “Dependency inference,” in Proc. 13th Int. Conf. Very Large Data Bases, 1987, pp. 155–158.
[12]
Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen, “TANE: An efficient algorithm for discovering functional and approximate dependencies,” Comput. J., vol. 42, no. 2, pp. 100–111, 1999.
[13]
H. Yao, H. J. Hamilton, and C. J. Butz, “FD_Mine: Discovering functional dependencies in a database using equivalences,” in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 729–732.
[14]
N. Novelli and R. Cicchetti, “FUN: An efficient algorithm for mining functional and embedded dependencies,” in Proc. 8th Int. Conf. Database Theory, 2001, pp. 189–203.
[15]
Z. Abedjan, P. Schulze, and F. Naumann, “DFD: Efficient functional dependency discovery,” in Proc. 23rd ACM Int. Conf. Inf. Knowl. Manage., 2014, pp. 949–958.
[16]
X. Wan, X. Han, J. Wang, and J. Li, “Efficient discovery of functional dependencies on massive data,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 1, pp. 107–121, Jan. 2024.
[17]
S. Lopes, J.-M. Petit, and L. Lakhal, “Efficient discovery of functional dependencies and Armstrong relations,” in Proc. 7th Int. Conf. Extending Database Technol., 2000, pp. 350–364.
[18]
C. Wyss, C. Giannella, and E. Robertson, “FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances,” in Proc. Int. Conf. Data Warehousing Knowl. Discov., 2001, pp. 101–110.
[19]
P. A. Flach and I. Savnik, “Database dependency discovery: A machine learning approach,” AI Commun., vol. 12, pp. 139–160, 1999.
[20]
T. Papenbrock and F. Naumann, “A hybrid approach to functional dependency discovery,” in Proc. Int. Conf. Manage. Data, 2016, pp. 821–833.
[21]
Z. Wei and S. Link, “Towards the efficient discovery of meaningful functional dependencies,” Inf. Syst., vol. 116, 2023, Art. no.
[22]
C. Giannella and E. Robertson, “On approximation measures for functional dependencies,” Inf. Syst., vol. 29, pp. 483–507, 2004.
[23]
J. Kivinen and H. Mannila, “Approximate inference of functional dependencies from relations,” Theor. Comput. Sci., vol. 149, no. 1, pp. 129–149, 1995.
[24]
R. King and J. Oil, “Discovery of functional and approximate functional dependencies in relational databases,” J. Appl. Math. Decis. Sci., vol. 7, no. 1, pp. 49–59, 2003.
[25]
I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga, “CORDS: Automatic discovery of correlations and soft functional dependencies,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2004, pp. 647–658.
[26]
S. Kruse and F. Naumann, “Efficient discovery of approximate dependencies,” Proc. VLDB Endowment, vol. 11, no. 7, pp. 759–772, 2018.
[27]
L. Caruccio, V. Deufemia, and G. Polese, “Mining relaxed functional dependencies from data,” Data Mining Knowl. Discov., vol. 34, no. 2, pp. 443–477, 2020.
[28]
L. Caruccio, V. Deufemia, and G. Polese, “Evolutionary mining of relaxed dependencies from big data collections,” in Proc. 7th Int. Conf. Web Intell. Mining Semantics, 2017, pp. 1–10.
[29]
W. Fan, F. Geerts, J. Li, and M. Xiong, “Discovering conditional functional dependencies,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 5, pp. 683–698, May 2011.
[30]
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, “Conditional functional dependencies for data cleaning,” in Proc. IEEE 23rd Int. Conf. Data Eng., 2007, pp. 746–755.
[31]
L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu, “On generating near-optimal tableaux for conditional functional dependencies,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 376–390, 2008.
[32]
S.-L. Wang, J.-W. Shen, and T.-P. Hong, “Incremental discovery of functional dependencies using partitions,” in Proc. Joint 9th IFSA World Congr., 2001, pp. 1322–1326.
[33]
P. Schirmer et al., “DynFD: Functional dependency discovery in dynamic datasets,” in Proc. Int. Conf. Extending Database Technol., 2019, pp. 253–264.
[34]
K. Belhajjame, “DynASt: Efficient maintenance of agree-sets against dynamic datasets,” in Proc. 26th Int. Conf. Extending Database Technol., 2023, pp. 14–26.
[35]
S. M. Fakhrahmad, M. Sadreddini, and M. Z. Jahromi, “AD-Miner: A new incremental method for discovery of minimal approximate dependencies using logical operations,” Intell. Data Anal., vol. 12, no. 6, pp. 607–619, 2008.
[36]
L. Caruccio and S. Cirillo, “Incremental discovery of imprecise functional dependencies,” J. Data Inf. Qual., vol. 12, no. 4, pp. 1–25, 2020.
[37]
T. Papenbrock et al., “Functional dependency discovery: An experimental evaluation of seven algorithms,” Proc. VLDB Endowment, vol. 8, no. 10, pp. 1082–1093, 2015.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 36, Issue 12
Dec. 2024
2224 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 21 May 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media