8000 [Feature]support Agg if by before-Sunrise · Pull Request #58336 · StarRocks/starrocks · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Feature]support Agg if #58336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 13, 2025
Merged

[Feature]support Agg if #58336

merged 13 commits into from
May 13, 2025

Conversation

before-Sunrise
Copy link
Contributor
@before-Sunrise before-Sunrise commented Apr 24, 2025

Why I'm doing:

for aggregation functions in same agg node, if any agg function need different data, we can give it a predicate, only data satisfied the predicate can be updated in agg state for this agg function.

What I'm doing:

agg calculation phase right now like below:

  1. calculate group by columns and agg input column
  2. allocate agg state in hashmap accroding to group by columns
  3. update agg state for every row accroding to agg input column

to support agg if, which means only valid data filterd by predicate can be updated into agg state in phase3. So I wrap every agg function with agg_if and make agg_if's child agg function always nullable.
agg if has one more arg at the last whose type is boolean like below:
select sum_if(ss_item_sk,ss_quantity between 1 and 20) from store_sales

agg_if's child agg function is alwasy nullable agg function, and this pr merge predicate column with first nullable arg column, then call child's nullable agg function. By this way, we can avoid too many vitual function call

right now don't support aggif(distinct)

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.5
    • 3.4
    • 3.3
    • 3.2
    • 3.1

@before-Sunrise before-Sunrise requested review from a team as code owners April 24, 2025 02:44
const AggStateDesc _agg_state_desc;
const AggregateFunction* _function;
};
} // namespace starrocks No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Memory corruption due to incorrect array size calculation.

You can modify the code like this:

// ...
void update_help(FunctionContext* ctx, size_t chunk_size, const Column** columns, const Column** replace_columns,
                 ColumnPtr& newNullableColumn) const {
    auto fake_null_column = NullColumn::create(chunk_size, 0);
    uint8_t* __restrict fake_null_column_raw_data = fake_null_column->mutable_raw_data();

    auto column_size = ctx->get_num_args(); // Corrected size calculation, dropped "+1"
    DCHECK(column_size >= 1);

// ...

8000
if (!(argFn instanceof AggregateFunction aggFunc)) {
return null;
}
result = AggStateIf.of(aggFunc);
}

if (result.isEmpty()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Incorrect handling of function parameters when creating oldFunctionParams.

You can modify the code like this:

            FunctionParams oldFunctionParams =
                    new FunctionParams(params.isStar(), params.exprs().subList(1, params.exprs().size()),
                            params.getExprsNames() == null ? null :
                                    params.getExprsNames().subList(1, params.getExprsNames().size()),
                            params.isDistinct(), params.getOrderByElements() == null ? null :
                            params.getOrderByElements().subList(1, params.getOrderByElements().size()));

This adjusts the sublists to correctly exclude the first parameter, which should be a boolean when dealing with AggStateIf.

}
}
offset += batch_nums;
}
#endif

for (size_t i = offset; i < chunk_size; ++i) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Mismatched handling of data_column pointer usage, where data_columns[1] was declared but &data_column is being passed.

You can modify the code like this:

void update(FunctionContext* ctx, const Column** columns, AggDataPtr __restrict state,
                 size_t row_num) const override {
        DCHECK_EQ(1, ctx->get_num_args());
        // This container stores the columns we really pass to the nested function.
        const Column* data_columns[1];
        
        ...
        
        // Ensure correct usage of `data_columns`
                           this->nested_function->update(ctx, data_columns, 
                                                         this->data(states[i] + state_offset).mutable_nest_state(), i);
        
        ...
}

Additionally, ensure all instances of passing &data_column within the loops are corrected to potentially data_columns. Adjust accordingly depending on surrounding logic and context.

@before-Sunrise before-Sunrise requested review from a team as code owners April 24, 2025 08:59
@before-Sunrise before-Sunrise changed the title [Feature][WIP] support Agg if [Feature]support Agg if Apr 28, 2025
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
fix comments

Signed-off-by: before-Sunrise <unclejyj@gmail.com>
satanson
satanson previously approved these changes May 8, 2025
rename 8000
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Copy link
sonarqubecloud bot commented May 9, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
6.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
github-actions bot commented May 9, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link
github-actions bot commented May 9, 2025

[FE Incremental Coverage Report]

pass : 114 / 120 (95.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/catalog/combinator/AggStateUtils.java 22 25 88.00% [117, 218, 221]
🔵 com/starrocks/catalog/combinator/AggStateIf.java 40 43 93.02% [62, 63, 64]
🔵 com/starrocks/analysis/FunctionParams.java 7 7 100.00% []
🔵 com/starrocks/catalog/combinator/AggStateDesc.java 9 9 100.00% []
🔵 com/starrocks/sql/analyzer/FunctionAnalyzer.java 14 14 100.00% []
🔵 com/starrocks/sql/optimizer/rule/transformation/RewriteCountIfFunction.java 14 14 100.00% []
🔵 com/starrocks/catalog/FunctionSet.java 4 4 100.00% []
🔵 com/starrocks/sql/analyzer/PolymorphicFunctionAnalyzer.java 2 2 100.00% []
🔵 com/starrocks/catalog/Function.java 2 2 100.00% []

Copy link
github-actions bot commented May 9, 2025

[BE Incremental Coverage Report]

pass : 116 / 140 (82.86%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exprs/agg/agg_state_if.h 93 114 81.58% [47, 48, 49, 51, 53, 56, 57, 58, 63, 65, 67, 68, 73, 181, 196, 198, 199, 201, 203, 204, 211]
🔵 be/src/exec/aggregator.cpp 21 24 87.50% [561, 562, 563]
🔵 be/src/column/nullable_column.h 1 1 100.00% []
🔵 be/src/exprs/condition_expr.cpp 1 1 100.00% []

@@ -178,6 +179,19 @@ public static void analyze(FunctionCallExpr functionCallExpr) {
throw new SemanticException(String.format("Resolved function %s has no wildcard decimal as return type",
fn.functionName()), functionCallExpr.getPos());
}
} else if (fn instanceof AggStateIf) {
FunctionName argFuncNameWithoutIf =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's best to analyze predicate expr

@satanson satanson merged commit 64e24d7 into StarRocks:main May 13, 2025
65 of 66 checks passed
@wanpengfei-git wanpengfei-git mentioned this pull request May 13, 2025
1 task
Mesut-Doner pushed a commit to Mesut-Doner/starrocks that referenced this pull request May 21, 2025
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support agg if
5 participants
0