-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Promql: Prevent extrapolation of NH below zero #16192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Promql: Prevent extrapolation of NH below zero #16192
Conversation
@beorn7 I have some tests failing, and before I go fix them can you please confirm that this is conceptually right approach. Thanks! |
This is on my review list. Apologies for the delay. I'm still trying to get to this this week. |
promql/functions.go
Outdated
// Histogram total count clamping logic | ||
// Using the first sample's count as the master metric, we ensure that | ||
// if extrapolation would take the count below zero, we clamp the extrapolation. | ||
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to check samples.Histograms[0].H.Count >= 0
.
promql/functions.go
Outdated
extrapolateToInterval += durationToStart | ||
|
||
// Histogram bucket clamping logic | ||
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 && resultHistogram.Sum > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 && resultHistogram.Sum > 0 { | |
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 { |
Sum can be negative. That's fine.
promql/functions.go
Outdated
var totalSumDelta float64 | ||
for i, bucketRate := range resultHistogram.PositiveBuckets { | ||
// Calculate this bucket's proportion of the total rate | ||
bucketProportion := bucketRate / resultHistogram.Sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Sum? If at all, it should be Count.
But I don't understand the logic of the "proportion of the rate" in the first place.
Maybe I haven't wrapped my head sufficiently far around it. But broadly, I think we have to do something different depending on whether we have clamped the durationToStart above.
If durationToZero > durationToStart
, we only have to manipulate the buckets if they would be extrapolated below zero at durationToStart
.
If durationToZero == durationToStart
, we have to manipulate all the buckets.
After the manipulation, the manipulated buckets (only) should extrapolate te exactly zero at durationToStart
.
I can try to cobble together the equation, but maybe this is already enough to put you on track (or put me on track if I have missed something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think you are totally right, I missed that we can actually make histograms schemas match(in hindsight it's obvious), and this is why I actually did this whole thing.
So we actually can match resultHistogram rate with first histogram, and then do all the calculations easily.
The only downside I think is that we will lose some precision in case where when resultHistogram.Schema > firstHistogram.Schema
Please let me know it is a problem or incorrect and I am missing something :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure.
Making all the schemas match is part of the normal rate calculation anyway. First we find the largest common schema, merge buckets in higher-res histograms to match that schema, and only then we start to do the math on all those histograms that now have the same schema.
The extrapolation logic only happens in that second part, so different schemas should not be an issue.
a02f905
to
c73bb76
Compare
Signed-off-by: Boris Beginin <gen3212@gmail.com>
c73bb76
to
ea7cdef
Compare
fixes #15976
Problem statement
When calculating rates for Native Histograms (NH), we need to prevent extrapolation below zero to ensure consistent results with classic histograms.
What we do now
Zero-point from
count
count
would hit 0.*Clamp buckets
Adjust
count
Bucket tweaks change the total count. We add the difference back to total
count