-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature]support Agg if #58336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]support Agg if #58336
Conversation
const AggStateDesc _agg_state_desc; | ||
const AggregateFunction* _function; | ||
}; | ||
} // namespace starrocks No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Memory corruption due to incorrect array size calculation.
You can modify the code like this:
// ...
void update_help(FunctionContext* ctx, size_t chunk_size, const Column** columns, const Column** replace_columns,
ColumnPtr& newNullableColumn) const {
auto fake_null_column = NullColumn::create(chunk_size, 0);
uint8_t* __restrict fake_null_column_raw_data = fake_null_column->mutable_raw_data();
auto column_size = ctx->get_num_args(); // Corrected size calculation, dropped "+1"
DCHECK(column_size >= 1);
// ...
if (!(argFn instanceof AggregateFunction aggFunc)) { | ||
return null; | ||
} | ||
result = AggStateIf.of(aggFunc); | ||
} | ||
|
||
if (result.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Incorrect handling of function parameters when creating oldFunctionParams
.
You can modify the code like this:
FunctionParams oldFunctionParams =
new FunctionParams(params.isStar(), params.exprs().subList(1, params.exprs().size()),
params.getExprsNames() == null ? null :
params.getExprsNames().subList(1, params.getExprsNames().size()),
params.isDistinct(), params.getOrderByElements() == null ? null :
params.getOrderByElements().subList(1, params.getOrderByElements().size()));
This adjusts the sublists to correctly exclude the first parameter, which should be a boolean when dealing with AggStateIf.
} | ||
} | ||
offset += batch_nums; | ||
} | ||
#endif | ||
|
||
for (size_t i = offset; i < chunk_size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Mismatched handling of data_column
pointer usage, where data_columns[1]
was declared but &data_column
is being passed.
You can modify the code like this:
void update(FunctionContext* ctx, const Column** columns, AggDataPtr __restrict state,
size_t row_num) const override {
DCHECK_EQ(1, ctx->get_num_args());
// This container stores the columns we really pass to the nested function.
const Column* data_columns[1];
...
// Ensure correct usage of `data_columns`
this->nested_function->update(ctx, data_columns,
this->data(states[i] + state_offset).mutable_nest_state(), i);
...
}
Additionally, ensure all instances of passing &data_column
within the loops are corrected to potentially data_columns
. Adjust accordingly depending on surrounding logic and context.
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
96cf513
to
5e411fe
Compare
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Signed-off-by: before-Sunrise <unclejyj@gmail.com> fix comments Signed-off-by: before-Sunrise <unclejyj@gmail.com>
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 114 / 120 (95.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 116 / 140 (82.86%) file detail
|
@@ -178,6 +179,19 @@ public static void analyze(FunctionCallExpr functionCallExpr) { | |||
throw new SemanticException(String.format("Resolved function %s has no wildcard decimal as return type", | |||
fn.functionName()), functionCallExpr.getPos()); | |||
} | |||
} else if (fn instanceof AggStateIf) { | |||
FunctionName argFuncNameWithoutIf = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's best to analyze predicate expr
Signed-off-by: before-Sunrise <unclejyj@gmail.com>
Why I'm doing:
for aggregation functions in same agg node, if any agg function need different data, we can give it a predicate, only data satisfied the predicate can be updated in agg state for this agg function.
What I'm doing:
agg calculation phase right now like below:
to support agg if, which means only valid data filterd by predicate can be updated into agg state in phase3. So I wrap every agg function with agg_if and make agg_if's child agg function always nullable.
agg if has one more arg at the last whose type is boolean like below:
select sum_if(ss_item_sk,ss_quantity between 1 and 20) from store_sales
agg_if's child agg function is alwasy nullable agg function, and this pr merge predicate column with first nullable arg column, then call child's nullable agg function. By this way, we can avoid too many vitual function call
right now don't support aggif(distinct)
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: