8000 [DYOD] Maintain UCCs in face of potentially changing data by j-hellenberg · Pull Request #2599 · hyrise/hyrise · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[DYOD] Maintain UCCs in face of potentially changing data 8000 #2599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 73 commits into
base: master
Choose a base branch
from

Conversation

j-hellenberg
Copy link
Contributor
@j-hellenberg j 10000 -hellenberg commented Aug 2, 2023

UCCs (Unique Column Combinations) indicate the uniqueness across all entries of a column (or a set of columns). UCCs can be given by the schema (a primary key guarantees uniqueness), but also happen "incidentally" in real-world data.

Because UCCs can be used for query optimization, we want to detect these "incidental" UCCs as well, because, for optimization purposes, we can use them just as we would use a primary key constraint.

Before this PR, hyrise was already capable to detect such UCCs. It did so by actually generating UNIQUE constraints on a table, which are translated into UCCs during query optimization. However, this discovery happened under the assumption that the data of the table never changes. This means, if the data of the table were to change and violate the previously discovered "incidental" UCC by creating a duplicate value in the targeted column, this using the UCC for query optimization would lead to incorrect query results.

Therefore, in this MR, we assign a validation commit ID to the TableKeyConstraint (UCC) such that it may be only used for optimization if the table has not seen any changes since this commit ID (note that every modifying transaction increments the global commit ID). We will then regularly revalidate the UCC and see if we can expand the constraint to larger commit IDs.

For optimization purposes, we also make use of the MVCC (Multi-Version Concurrency Control) data of chunks, which includes but is not limited to the commit IDs of when any data was last inserted to or deleted from the chunk (note that hyrise models modifications as delete+insert).

@j-hellenberg j-hellenberg marked this pull request as ready for review August 2, 2023 16:12
@j-hellenberg j-hellenberg added the FullCI Run all CI tests (slow, but required for merge) label Aug 2, 2023
@SanJSp SanJSp self-requested a review August 4, 2023 16:36
Copy link
Contributor
@SanJSp SanJSp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey all, I reviewed your code in the name of Team 3.
First of all: Good job with the implementation!

I checked mainly for hyrise coding standards, which were all pretty well followed. I already started to look inside your comments to find stuff I could pick on 😁 So I think this is a pretty good sign. Likewise, I've tried to add some comments regarding const in between, sometimes I was not too sure if it's applicable. Feel free to try it out and if it works implement it, if not, so be it 😊

I will take a look at the other PR as well 😊

Base automatically changed from dey4ss/set_cids to master August 17, 2023 12:26
j-hellenberg and others added 8 commits August 22, 2023 11:20
Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
… become invalid

Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…id using MVCC data

Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…Cs becoming invalid and vice versa

Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi>
Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
@j-hellenberg j-hellenberg force-pushed the j-hellenberg/maintain_data_dependencies branch from d214251 to b3b721c Compare August 22, 2023 09:21
Copy link
Member
@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few suggestions, but nothing deal-breaking. Nice :)

@dey4ss
Copy link
Member
dey4ss commented Apr 30, 2024

Put this on hold until end of August. We plan to incorporate this feature, but there is currently no capacity for it. Will work on it when back from the internship.

@dey4ss dey4ss closed this Apr 30, 2024
@dey4ss dey4ss reopened this Apr 30, 2024
@dey4ss dey4ss marked this pull request as draft April 30, 2024 17:10
Rob2U and others added 21 commits May 20, 2025 23:32
…eyConstraint is schema-given. Also add `ETERNAL_COMMIT_ID` and `INITIAL_COMMIT_ID` to types.hpp
Co-authored-by: Daniel Lindner <27929897+dey4ss@users.noreply.github.com>
… key constraints that are confidently valid for optimization
Base automatically changed from robert/add-locking-for-table-constraints to master June 4, 2025 15:08
@Rob2U Rob2U marked this pull request as ready for review June 10, 2025 08:58
@Rob2U Rob2U requested a review from dey4ss June 10, 2025 08:58
Copy link
Member
@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I'd like to discuss before continuing with the tests

 - rename `TableKeyConstraint::is_valid` to `last_validation_result` and use `ValidationResultType` enum
 - skip probing for chunks with no inserts since last ucc validation
 - assert that key constraints of type primary key are always schema-given
Copy link
Member
@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on simplifying the test code

Rob2U and others added 2 commits June 18, 2025 14:34
Co-authored-by: Daniel Lindner <27929897+dey4ss@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FullCI Run all CI tests (slow, but required for merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0