8000 Promql: Prevent extrapolation of NH below zero by gen1321 · Pull Request #16192 · prometheus/prometheus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Promql: Prevent extrapolation of NH below zero #16192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gen1321
Copy link
@gen1321 gen1321 commented Mar 8, 2025

fixes #15976

Problem statement

When calculating rates for Native Histograms (NH), we need to prevent extrapolation below zero to ensure consistent results with classic histograms.

What we do now

  1. Zero-point from count

    • Find when a linear back-extrapolation of count would hit 0.
    • If that moment lies inside the query range, we clamp the lower boundary to it.
  2. *Clamp buckets

    • Check if bucket would extrapolate below zero or if we clamped total count - if so reduce the slope.
  3. Adjust count
    Bucket tweaks change the total count. We add the difference back to total count

@gen1321
Copy link
Author
gen1321 commented Mar 8, 2025

@beorn7 I have some tests failing, and before I go fix them can you please confirm that this is conceptually right approach. Thanks!

@beorn7 beorn7 self-requested a review March 11, 2025 10:50
@beorn7
Copy link
Member
beorn7 commented Mar 12, 2025

This is on my review list. Apologies for the delay. I'm still trying to get to this this week.

// Histogram total count clamping logic
// Using the first sample's count as the master metric, we ensure that
// if extrapolation would take the count below zero, we clamp the extrapolation.
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to check samples.Histograms[0].H.Count >= 0.

extrapolateToInterval += durationToStart

// Histogram bucket clamping logic
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 && resultHistogram.Sum > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 && resultHistogram.Sum > 0 {
if isCounter && resultHistogram != nil && resultHistogram.Count > 0 {

Sum can be negative. That's fine.

var totalSumDelta float64
for i, bucketRate := range resultHistogram.PositiveBuckets {
// Calculate this bucket's proportion of the total rate
bucketProportion := bucketRate / resultHistogram.Sum
Copy link
Member
@beorn7 beorn7 Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Sum? If at all, it should be Count.
But I don't understand the logic of the "proportion of the rate" in the first place.

Maybe I haven't wrapped my head sufficiently far around it. But broadly, I think we have to do something different depending on whether we have clamped the durationToStart above.

If durationToZero > durationToStart, we only have to manipulate the buckets if they would be extrapolated below zero at durationToStart.

If durationToZero == durationToStart, we have to manipulate all the buckets.

After the manipulation, the manipulated buckets (only) should extrapolate te exactly zero at durationToStart.

I can try to cobble together the equation, but maybe this is already enough to put you on track (or put me on track if I have missed something).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think you are totally right, I missed that we can actually make histograms schemas match(in hindsight it's obvious), and this is why I actually did this whole thing.

So we actually can match resultHistogram rate with first histogram, and then do all the calculations easily.

The only downside I think is that we will lose some precision in case where when resultHistogram.Schema > firstHistogram.Schema

Please let me know it is a problem or incorrect and I am missing something :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure.

Making all the schemas match is part of the normal rate calculation anyway. First we find the largest common schema, merge buckets in higher-res histograms to match that schema, and only then we start to do the math on all those histograms that now have the same schema.

The extrapolation logic only happens in that second part, so different schemas should not be an issue.

@gen1321 gen1321 force-pushed the promql/prevent-negative-extrapolation-for-native-histograms branch 4 times, most recently from a02f905 to c73bb76 Compare May 16, 2025 08:20
@gen1321 gen1321 requested a review from beorn7 May 16, 2025 08:30
@gen1321 gen1321 marked this pull request as ready for review May 16, 2025 08:30
@gen1321 gen1321 requested a review from roidelapluie as a code owner May 16, 2025 08:30
Signed-off-by: Boris Beginin <gen3212@gmail.com>
@gen1321 gen1321 force-pushed the promql/prevent-negative-extrapolation-for-native-histograms branch from c73bb76 to ea7cdef Compare May 16, 2025 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PromQL(histograms): Prevent extrapolation below zero
2 participants
0