-
Notifications
You must be signed in to change notification settings - Fork 161
[DYOD] Maintain UCCs in face of potentially changing data 8000 #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey all, I reviewed your code in the name of Team 3.
First of all: Good job with the implementation!
I checked mainly for hyrise coding standards, which were all pretty well followed. I already started to look inside your comments to find stuff I could pick on 😁 So I think this is a pretty good sign. Likewise, I've tried to add some comments regarding const
in between, sometimes I was not too sure if it's applicable. Feel free to try it out and if it works implement it, if not, so be it 😊
I will take a look at the other PR as well 😊
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
… become invalid Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…id using MVCC data Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
…Cs becoming invalid and vice versa Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
Co-authored-by: kalathen <kalathen@jyu.fi> Co-authored-by: Margarete <margarete.dippel@student.hpi.de>
d214251
to
b3b721c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions, but nothing deal-breaking. Nice :)
Put this on hold until end of August. We plan to incorporate this feature, but there is currently no capacity for it. Will work on it when back from the internship. |
…D` from mvcc_data to types.h
…eyConstraint is schema-given. Also add `ETERNAL_COMMIT_ID` and `INITIAL_COMMIT_ID` to types.hpp
Co-authored-by: Daniel Lindner <27929897+dey4ss@users.noreply.github.com>
…berg/maintain_data_dependencies
…berg/maintain_data_dependencies
… key constraints that are confidently valid for optimization
…berg/maintain_data_dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I'd like to discuss before continuing with the tests
- rename `TableKeyConstraint::is_valid` to `last_validation_result` and use `ValidationResultType` enum - skip probing for chunks with no inserts since last ucc validation - assert that key constraints of type primary key are always schema-given
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments on simplifying the test code
Co-authored-by: Daniel Lindner <27929897+dey4ss@users.noreply.github.com>
UCCs
(Unique Column Combinations) indicate the uniqueness across all entries of a column (or a set of columns). UCCs can be given by the schema (a primary key guarantees uniqueness), but also happen "incidentally" in real-world data.Because UCCs can be used for query optimization, we want to detect these "incidental" UCCs as well, because, for optimization purposes, we can use them just as we would use a primary key constraint.
Before this PR, hyrise was already capable to detect such UCCs. It did so by actually generating
UNIQUE
constraints on a table, which are translated into UCCs during query optimization. However, this discovery happened under the assumption that the data of the table never changes. This means, if the data of the table were to change and violate the previously discovered "incidental" UCC by creating a duplicate value in the targeted column, this using the UCC for query optimization would lead to incorrect query results.Therefore, in this MR, we assign a validation
commit ID
to the TableKeyConstraint (UCC) such that it may be only used for optimization if the table has not seen any changes since thiscommit ID
(note that every modifying transaction increments the globalcommit ID
). We will then regularly revalidate the UCC and see if we can expand the constraint to larger commit IDs.For optimization purposes, we also make use of the
MVCC
(Multi-Version Concurrency Control) data of chunks, which includes but is not limited to thecommit IDs
of when any data was last inserted to or deleted from the chunk (note that hyrise models modifications as delete+insert).