[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Entity integrity management under data volume, variety and veracity

Published: 25 January 2023 Publication History

Abstract

Edgar Codd introduced the principle of entity integrity in the context of his relational model of data. The principle says that every targeted real-world entity should be uniquely represented in the database. In actual database systems, entity integrity is typically enforced by primary keys. We introduce a framework toward generalizing entity integrity to different dimensions of data, including volume, variety, and veracity. We establish axiomatic and algorithmic foundations for the implication problem of the combined class of uniqueness constraints, functional dependencies, and multivalued dependencies in all combinations of the dimensions we consider. These are based on specific approaches to the semantics of these integrity constraints and to the dimensions of data. We also highlight how our concepts lead to new opportunities for diverse and important areas of applications, such as query optimization, database design and security, and data quality. Overall, this sets out an agenda for future research that extends our approaches or applies different approaches in this area, as driven by application requirements.

References

[1]
Abiteboul S, Hull R, and Vianu V Foundations of databases 1995 Boston Addison-Wesley
[2]
Arenas M Normalization theory for XML SIGMOD Record 2006 35 4 57-64
[3]
Arenas M and Libkin L An information-theoretic approach to normal forms for relational and XML data J ACM 2005 52 2 246-283
[4]
Atzeni P and Morfuni N Functional dependencies and constraints on null values in database relations Inf Control 1986 70 1 1-31
[5]
Baazizi MA, Colazzo D, Ghelli G, et al (2019) Schemas and types for JSON data: From theory to practice. In: Proceedings of the 2019 international conference on management of data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp 2060–2063
[6]
Balamuralikrishna N, Jiang Y, Koehler H, et al. Possibilistic keys Fuzzy Sets Syst 2019 376 1-36
[7]
Beeri C On the membership problem for functional and multivalued dependencies in relational databases ACM Trans Database Syst 1980 5 3 241-259
[8]
Beeri C and Bernstein PA Computational problems related to the design of normal form relational schemas ACM Trans Database Syst 1979 4 1 30-59
[9]
Beeri C, Fagin R, Howard JH (1977) A complete axiomatization for FDS and MVDS in database relations. In: SIGMOD Conference. ACM, pp 47–61
[10]
Bertossi LE (2011) Database repairing and consistent query answering. Synthesis lectures on data management, Morgan & Claypool Publishers
[11]
Biskup J (1995) Achievements of relational database schema design theory revisited. Semantics in Databases, Selected Papers from a Workshop, Prague, Czech Republic vol 1995, pp 29–54
[12]
Biskup J Security in computing systems 2009 Heidelberg, Germany Springer
[13]
Biskup J and Link S Appropriate inferences of data dependencies in relational databases Ann Math Artif Intell 2011 63 3–4 213-255
[14]
Biskup J and Weibert T Keeping secrets in incomplete databases Int J Inf Sec 2008 7 3 199-217
[15]
Biskup J, Embley D, and Lochner J Reducing inference control to access control for normalized database schemas Inf Proc Lett 2008 106 1 8-12
[16]
Brown P and Link S Probabilistic keys IEEE Trans Knowl Data Eng 2017 29 3 670-682
[17]
Cali A, Calvanese D, De Giacomo G, et al. Data integration under integrity constraints Inf Syst 2004 29 2 147-163
[18]
Casanova MA, Fagin R, and Papadimitriou CH Inclusion dependencies and their interaction with functional dependencies J Comput Syst Sci 1984 28 1 29-59
[19]
Chandra AK, Merlin PM (1977) Optimal implementation of conjunctive queries in relational data bases. In: Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4–6, 1977, Boulder, Colorado, USA, pp 77–90
[20]
Codd EF A relational model of data for large shared data banks Commun ACM 1970 13 6 377-387
[21]
Curino C, Moon HJ, Deutsch A, et al. Automating the database schema evolution process VLDB J 2013 22 1 73-98
[22]
Delobel C and Adiba M Relational database systems 1985 Amsterdam North Holland
[23]
Deutsch A, Popa L, and Tannen V Query reformulation with constraints SIGMOD Record 2006 35 1 65-73
[24]
Dubois D and Prade H Possibility theory - an approach to computerized processing of uncertainty 1988 Heidelberg Springer
[25]
Dubois D and Prade H Possibility theory, probability theory and multiple-valued logics: a clarification Ann Math Artif Intell 2001 32 1–4 35-66
[26]
Fagin R Multivalued dependencies and a new normal form for relational databases ACM Trans Database Syst 1977 2 3 262-278
[27]
Fagin R, Kolaitis P, Miller R, et al. Data exchange: semantics and query answering Theor Comput Sci 2005 336 1 89-124
[28]
Fan W Dependencies for graphs: challenges and opportunities J Data Inform Quality 2019 11 2 1-12
[29]
Fan W, Geerts F, Jia X, et al. Conditional functional dependencies for capturing data inconsistencies ACM Trans Database Syst 2008 33 2 1-48
[30]
Fan W, Geerts F, Jia X, et al. Conditional functional dependencies for capturing data inconsistencies ACM Trans Database Syst 2008 33 2 1-48
[31]
Farkas C and Jajodia S The inference problem: a survey SIGKDD Explor 2002 4 2 6-11
[32]
Ferrarotti F, Hartmann S, and Link S Efficiency frontiers of XML cardinality constraints Data Knowl Eng 2013 87 297-319
[33]
Gal A (2011) Uncertain schema matching. Synthesis Lectures on Data Management, Morgan & Claypool Publishers
[34]
Galil Z An almost linear-time algorithm for computing a dependency basis in a relational database J ACM 1982 29 1 96-102
[35]
Grinberg A (2018) XML and JSON recipes for SQL Server. Synthesis Lectures on Data Management, Apress Berkeley, CA,.
[36]
Hannula M and Kontinen J A finite axiomatization of conditional independence and inclusion dependencies Inf Comput 2016 249 121-137
[37]
Hannula M, Link S (2018) Automated reasoning about key sets. In: Automated reasoning - 9th international joint conference, IJCAR 2018, Held as Part of the Federated Logic Conference, FloC 2018, Proceedings, pp 47–63
[38]
Hartmann S, Link S (2004) Multi-valued dependencies in the presence of lists. In: Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, June 14–16, 2004, Paris, France, pp 330–341
[39]
Hartmann S, Link S (2007) Unlocking keys for XML trees. In: Database Theory - ICDT 2007, 11th international conference, Barcelona, Spain, January 10–12, 2007, Proceedings, pp 104–118
[40]
Hartmann S and Link S Efficient reasoning about a robust XML key fragment ACM Trans Database Syst 2009 34 2 1-33
[41]
Hartmann S, Link S (2009b) Expressive, yet tractable XML keys. In: EDBT 2009, 12th international conference on extending database technology, Saint Petersburg, Russia, March 24–26, 2009, Proceedings, pp 357–367
[42]
Hartmann S and Link S Numerical constraints on XML data Inf Comput 2010 208 5 521-544
[43]
Hartmann S, Link S (2010b) When data dependencies over SQL tables meet the logics of paradox and S-3. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PoDS), pp 317–326
[44]
Hartmann S and Link S The implication problem of data dependencies over SQL table definitions: axiomatic, algorithmic and logical characterizations ACM Trans Database Syst 2012 37 2 1-40
[45]
Hartmann S, Köhler H, Link S et al (2008) On the notion of an XML key. In: Semantics in data and knowledge bases, third international workshop, SDKB 2008, Nantes, France, March 29, 2008, Revised Selected Papers, pp 103–112
[46]
Hartmann S, Link S, Trinh T (2010) Solving the implication problem for XML functional dependencies with properties. In: Logic, language, information and computation, 17th international workshop, WoLLIC 2010, Brasilia, Brazil, July 6–9, 2010. Proceedings, pp 161–175
[47]
Imielinski T, Jr. WL (1983) Incomplete information and dependencies in relational databases. In: SIGMOD’83, proceedings of annual meeting, San Jose, California, USA, May 23–26, 1983, pp 178–184
[48]
Johnson DS and Klug AC Testing containment of conjunctive queries under functional and inclusion dependencies J Comput Syst Sci 1984 28 1 167-189
[49]
Klug A and Price R Determining view dependencies using tableaux ACM Trans Database Syst 1982 7 3 361-380
[50]
Köhler H and Link S Armstrong axioms and Boyce-Codd-Heath normal form under bag semantics Inf Process Lett 2010 110 16 717-724
[51]
Köhler H, Link S (2016) Qualitative cleaning of uncertain data. In: Proceedings of the 25th ACM international conference on information and knowledge management, CIKM 2016, Indianapolis, IN, USA, October 24–28, 2016, pp 2269–2274
[52]
Köhler H and Link S Inclusion dependencies and their interaction with functional dependencies in SQL J Comput Syst Sci 2017 85 104-131
[53]
Köhler H and Link S SQL schema design: foundations, normal forms, and normalization Inf Syst 2018 76 88-113
[54]
Köhler H and Link S Possibilistic data cleaning IEEE Trans Knowl Data Eng 2022 34 12 5939-5950
[55]
Kolahi S Dependency-preserving normalization of relational and XML data J Comput Syst Sci 2007 73 4 636-647
[56]
Kossmann J, Papenbrock T, and Naumann F Data dependencies for query optimization: a survey VLDB J 2022 31 1 1-22
[57]
Levene M and Loizou G Database design for incomplete relations ACM Trans Database Syst 1999 24 1 80-125
[58]
Levene M and Loizou G A guided tour of relational databases and beyond 1999 Heidelberg Springer
[59]
Levene M and Loizou G A generalisation of entity and referential integrity in relational databases ITA 2001 35 2 113-127
[60]
Levene M and Vincent MW Justification for inclusion dependency normal form IEEE Trans Knowl Data Eng 2000 12 2 281-291
[61]
Lien E On the equivalence of database models J ACM 1982 29 2 333-362
[62]
Link S On the implication of multivalued dependencies in partial database relations Int J Found Comput Sci 2008 19 3 691-715
[63]
Link S Characterisations of multivalued dependency implication over undetermined universes J Comput Syst Sci 2012 78 4 1026-1044
[64]
Link S (2020) Neo4j keys. In: Conceptual modeling - 39th international conference, ER 2020, Vienna, Austria, November 3–6, 2020, Proceedings, pp 19–33
[65]
Link S (2022) Object normal form, fourth normal form and their application to database security. In: Conceptual modeling - 41st international conference, ER 2022, Hyderabad, India, October 17–20, 2022, Proceedings, pp 349–364
[66]
Link S and Prade H Possibilistic functional dependencies and their relationship to possibility theory IEEE Trans Fuzzy Syst 2016 24 3 757-763
[67]
Link S and Prade H Relational database schema design for uncertain data Inf Syst 2019 84 88-110
[68]
Link S (2021) Wei Z (2021) Logical schema design that quantifies update inefficiency and join efficiency. In: Li G, Li Z, Idreos S et al (eds) SIGMOD’21: International conference on management of data. Virtual Event, China, June 20–25, pp 1169–1181
[69]
Livshits E, Kimelfeld B, and Roy S Computing optimal repairs for functional dependencies ACM Trans Database Syst 2020 45 1 1-46
[70]
Mok WY Utilizing nested normal form to design redundancy free JSON schemas iJES 2016 4 4 21-25
[71]
Naumann F, Herschel M (2010) An introduction to duplicate detection. Synthesis Lectures on Data Management, Morgan & Claypool Publishers
[72]
Papadakis G, Ioannou E, Thanos E, et al (2021) The four generations of entity resolution. Synthesis Lectures on Data Management, Morgan & Claypool Publishers
[73]
Paredaens J, De Bra P, Gyssens M, et al. The Structure of the Relational Database Model 1989 Heidelberg Springer
[74]
Roblot T, Hannula M, and Link S Probabilistic cardinality constraints - validation, reasoning, and semantic summaries VLDB J 2018 27 6 771-795
[75]
Saha B, Srivastava D (2014) Data quality: The other face of big data. In: IEEE 30th international conference on data engineering, Chicago, ICDE 2014, IL, USA, March 31–April 4, 2014, pp 1294–1297
[76]
Skavantzos P, Zhao K, Link S (2021) Uniqueness constraints on property graphs. In: Advanced information systems engineering - 33rd international conference, CAiSE 2021, Melbourne, VIC, Australia, June 28 - July 2, 2021, Proceedings, pp 280–295
[77]
Suciu D, Olteanu D, Ré C, et al (2011) Probabilistic databases. Synthesis Lectures on Data Management, Morgan & Claypool Publishers
[78]
Thalheim B A complete axiomatization for full join dependencies in relations Bulletin of the EATCS 1984 24 109-114
[79]
Thalheim B On semantic issues connected with keys in relational databases permitting null values Elektron Inform Kybern 1989 25 1/2 11-20
[80]
Thalheim B Dependencies in relational databases 1991 Braunschweig Teubner
[81]
Thalheim B (1992) Fundamentals of cardinality constraints. In: ER, pp 7–23
[82]
Vincent MW A corrected 5NF definition for relational database design Theor Comput Sci 1997 185 2 379-391
[83]
Vincent MW, Liu J, and Liu C Strong FDs and their application to normal forms in XML ACM Trans Database Syst 2004 29 3 445-462
[84]
Vincent MW, Liu J, and Liu C Strong functional dependencies and their application to normal forms in XML ACM Trans Database Syst 2004 29 3 445-462
[85]
Wei Z and Link S Embedded functional dependencies and data-completeness tailored database design PVLDB 2019 12 11 1458-1470
[86]
Wei Z, Link S (2019b) A fourth normal form for uncertain data. In: Advanced information systems engineering - 31st international conference, CAiSE 2019, Rome, Italy, June 3–7, 2019, Proceedings, pp 295–311
[87]
Wei Z and Link S Embedded functional dependencies and data-completeness tailored database design ACM Trans Database Syst 2021 46 2 1-46
[88]
Wei Z, Leck U, and Link S Discovery and ranking of embedded uniqueness constraints PVLDB 2019 12 13 2339-2352
[89]
Wei Z, Leck U, Link S (2019b) Entity integrity, referential integrity, and query optimization with embedded uniqueness constraints. In: 35th IEEE international conference on data engineering, ICDE 2019, Macao, China, April 8–11, 2019, pp 1694–1697
[90]
Wu M (1992) The practical need for fourth normal form. In: ACM SIGCSE conference, pp 19–23
[91]
Zaniolo C Mixed transitivity for functional and multivalued dependencies in database relations Inf Process Lett 1980 10 1 32-34
[92]
Zaniolo C Database relations with null values J Comput Syst Sci 1984 28 1 142-166

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 65, Issue 7
Jul 2023
399 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 January 2023
Accepted: 12 December 2022
Revision received: 07 December 2022
Received: 09 March 2022

Author Tags

  1. Axioms
  2. Duplicate
  3. Functional dependency
  4. Implication
  5. Inference
  6. Missing value
  7. Multivalued dependency
  8. Possibility theory
  9. SQL
  10. Uniqueness constraint
  11. Variety
  12. Veracity
  13. Volume

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media