8000 Remove superfluous DISTINCT by dey4ss · Pull Request #2561 · hyrise/hyrise · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Remove superfluous DISTINCT #2561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 17, 2023
Merged

Remove superfluous DISTINCT #2561

merged 7 commits into from
May 17, 2023

Conversation

dey4ss
Copy link
Member
@dey4ss dey4ss commented Apr 12, 2023

Thanks to @Bouncner, I read the following note in a paper:

It took more than two years for PostgreSQL developers to implement a pattern that
removes the DISTINCT clause if the result is unique by definition, and this feature is
yet to be merged [26].

That's true and it does not look like they will even merge it soon: https://commitfest.postgresql.org/35/2433/

Thus, I was triggered to implement it in Hyrise and prove we can do this in less than half an hour. It does not change anything in benchmarks, but now, we have it.
(To be fair: The Postgres contributors had to implement passing UCCs in the LQP and tried multiple versions. Also, I needed more than half an hour with all tests and an edge case I forgot to cover.)

@dey4ss dey4ss added ReallyEasyToReview FullCI Run all CI tests (slow, but required for merge) labels Apr 12, 2023
@Bouncner
Copy link
Collaborator
Bouncner commented May 8, 2023

Thus, I was triggered to implement it in Hyrise and prove we can do this in less than half an hour.

Angeber.

Copy link
Collaborator
@Bouncner Bouncner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor things.

@@ -103,6 +91,34 @@ void DependentGroupByReductionRule::_apply_to_plan_without_subqueries(
};
auto group_by_columns = fetch_group_by_columns();

// Early exit (i): If this is an AggregateNode for SELECT DISTINCT (i.e., it has no aggregates) and the requested
// columns are already distinct, remove the whole node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entire?

Could you elaborate a little bit more here?

  • Why do we replace an aggregate with a projection?
  • Why do we use the group by count? Why does it need to be the same? (maybe explain for select distinct name from lineitem;?)

I would have expected to see something with "DISTINCT" here.

@dey4ss
Copy link
Member Author
dey4ss commented May 11, 2023

@Bouncner better?

Copy link
Collaborator
@Bouncner Bouncner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor note on a strange sounding sentencing. Otherwise ready to zünd.

@dey4ss dey4ss merged commit 6964ae3 into master May 17, 2023
@dey4ss dey4ss deleted the dey4ss/remove-distinct branch May 17, 2023 10:55
nikriek pushed a commit that referenced this pull request Oct 28, 2023
Implements a pattern that removes the DISTINCT clause if the result is unique by definition or previous GROUP BY clauses.
nikriek pushed a commit that referenced this pull request Nov 6, 2023
Implements a pattern that removes the DISTINCT clause if the result is unique by definition or previous GROUP BY clauses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FullCI Run all CI tests (slow, but required for merge) ReallyEasyToReview
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0