8000 scrape: provide a fallback format by alexgreenbank · Pull Request #15136 · prometheus/prometheus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

scrape: provide a fallback format #15136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Oct 18, 2024
Merged

scrape: provide a fallback format #15136

merged 27 commits into from
Oct 18, 2024

Conversation

alexgreenbank
Copy link
Member
@alexgreenbank alexgreenbank commented Oct 9, 2024

Fixes #13214

Provides a per scrape target setting for a fallback format for scraping that is used if the received Content-Type does not match an expected value, or is missing/blank.

This is a breaking change as some scrapes that worked with Prometheus 2.x may now require the fallback_scrape_protocol being set in order to force the scrape.

TODO

Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
@beorn7
Copy link
Member
beorn7 commented Oct 9, 2024

🤔 I actually had a quite different idea of how to approach this.

  • I think the default behavior in v3 should be to fail the scrape if anything goes wrong during content negotiation, including not returning a content-type at all. That's the only safe behavior. (After all, scrape: Bail out of invalid scrape format rather than silently falling back to the classic text format #13214 was triggered by the insight that there is no safe default anymore, as a PromParser could parse OpenMetrics without error (or vice versa) but yielding wrong results (most notably timestamps).)
  • If a fallback format is specified, the scrape should proceed, but the fallback should be logged. (Maybe we could consider to not log anything if the returned content-type is empty, to mirror the v2 behavior.)
  • I would prefer to not use a command line flag, but a setting per-scrape-config (next to the existing scrape_protocols, e.g. fallback_scrape_protocol).

@beorn7
Copy link
Member
beorn7 commented Oct 9, 2024

Maybe we should get some more opinions on this to find out which way to go. @bwplotka you had some thoughts on the original issue, but everyone's feedback welcome.

@beorn7
Copy link
Member
beorn7 commented Oct 9, 2024

More in favor of "have a scrape config options, not a command line flag": You might very well have some targets that serve OpenMetrics, but do not return a content-type, while you might have other targets that serve classic Prometheus text format, but do not return a content-type.

@alexgreenbank
Copy link
Member Author

🤔 I actually had a quite different idea of how to approach this.

OK, that all makes sense, I'll knock up an alternative approach that is per scrape target (I did play with something like that originally) and takes into account these points.

@alexgreenbank
Copy link
Member Author

For the protocol names, are the existing mime types appropriate for use as the values for fallback_scrape_protocol?

  • text/plain for prometheus
  • application/openmetrics-text for openmetrics
  • application/vnd.google.protobuf for protobuf
    ?

@bwplotka
Copy link
Member

I agree with @beorn7 directions here: #15136 (comment) - this is what I would have in mind too.

For the protocol names, are the existing mime types appropriate for use as the values for fallback_scrape_protocol?
text/plain for prometheus
application/openmetrics-text for openmetrics
application/vnd.google.protobuf for protobuf

What do you mean, why not reusing existing protocol enums in scrape_protocols

@alexgreenbank
Copy link
Member Author
alexgreenbank commented Oct 10, 2024

What do you mean, why not reusing existing protocol enums in scrape_protocols

Ah ok. I'd been looking at the various existing scrape tests which focused on dealing with the raw Content-Type values. (This includes text/plain; version=1.0.0; escaping=allow-utf-8 that has been added recently - but there hasn't been a corresponding change in the list of valid scrape_protocols in order to cover this. I should probably add this whilst I'm here.)

Yes, I'll use the same scrape_protocols enums.

So, for my brain:

  • scrape_protocols helps define the Accept header Prometheus sends when attempting the scrape. It's not used at all when processing the scrape results.
  • fallback_scrape_protocol (using one of the same enums) will help determine the behaviour of Prometheus when parsing the scrape response if the returned Content-Type is missing, unparseable (as a mimeType) or not an expected value.

v2 does the following:

  • No or blank Content-Type -> PromParser, no error logged
  • Unparseable Content-Type -> PromParser, error from mime.ParseMediaType() logged
  • mime.ParseMediaType(Content-Type) == application/openmetrics-text -> OpenMetricsParser, no error logged
  • mime.ParseMediaType(Content-Type) == application/vnd.google.protobuf -> ProtobufParser, no error logged
  • Anything else -> PromParser, no error logged

v3 should do:

  • No Content-Type, no fallback_scrape_protocol -> fail the scrape, log error about failed scrape
  • No Content-Type, fallback_scrape_protocol specified -> use fallback parser, log warning about use of fallback (**)
  • Unparseable Content-Type, no fallback_scrape_protocol -> fail the scrape, error logged about failed scrape and invalid/unparseable Content-Type
  • Unparseable Content-Type, fallback_scrape_protocol specified -> use fallback parser, log warning about use of fallback and invalid/unparseable Content-Type
  • mime.ParseMediaType(Content-Type) == application/openmetrics-text -> OpenMetricsParser, no error logged
  • mime.ParseMediaType(Content-Type) == application/vnd.google.protobuf -> ProtobufParser, no error logged
  • mime.ParseMediaType(Content-Type) == text/plain -> ProtobufParser PromParser, no error logged
  • Anything else (e.g. some unsupported Content-Type), no fallback_scrape_protocol -> fail the scrape, error logged about failed scrape and invalid/unparseable Content-Type
  • Anything else, fallback_scrape_protocol specified -> use fallback parser, log warning about use of fallback and invalid/unparseable Content-Type

The second v3 case (**) is the one where, as suggested above, we could not log anything to mimic v2 behaviour.

@beorn7
Copy link
Member
beorn7 commented Oct 10, 2024

mime.ParseMediaType(Content-Type) == text/plain -> ProtobufParser, no error logged

I guess this should be PromParser instead of ProtobufParser.

Otherwise, sounds spot on.

About the (**) case: I wouldn't worry too much. The logging in v2 is just debug level. I don't think anyone will complain about this changing slightly.

The way more impactful change is that we might fail scrapes now that succeeded before, and you need a config change to make them succeed again. And that is the reason why it would be great to get this into the v3 release.

Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
@alexgreenbank
Copy link
Member Author

I guess this should be PromParser instead of ProtobufParser.

Indeed.

OK, Revamped to be per scrape target.

Will go look for Jan's migration to 3.0 guide as it would be useful to add some comments in there.

I did add a new ScrapeProtocol for 1.0.0 which is UTF-8 aware: PrometheusText1.0.0. I assume we want to have this before PrometheusText0.0.4 in the ordering so that a well implemented client will see we want 1.0.0 with higher priority than 0.0.4.

Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
@alexgreenbank
Copy link
Member Author
alexgreenbank commented Oct 16, 2024

Added a review comment with suggested text to Jan's 3.0 migration PR (#15099 <-- fixed link)

Signed-off-by: Alex Greenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
@alexgreenbank alexgreenbank marked this pull request as ready for review October 16, 2024 13:56
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: Alex Greenbank <alex.greenbank@grafana.com>
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
@alexgreenbank
Copy link
Member Author

@bwplotka does this look good to you? If you have no objections, we can merge once my nits have been addressed.

OK, all nits addressed, tests rejigged, merged conflicts fixed and all should be good. Almost there!

Copy link
Member
@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no complaint received no far, let's 🚢 .

@beorn7 beorn7 merged commit 421a3c2 into main Oct 18, 2024
44 checks passed
@beorn7 beorn7 deleted the alexg/bail-scarpe-silently-1 branch October 18, 2024 15:12
KeyOfSpectator added a commit to AliyunContainerService/prometheus that referenced this pull request Oct 23, 2024
* master: (667 commits)
  NHCB scrape: refactor state handling and speed up scrape test (prometheus#15193)
  feat(tools): add debug printouts to rules unit testing (prometheus#15196)
  docs: add keep_firing_for in alerting rules
  api: Add rule group pagination to list rules api (prometheus#14017)
  Update scrape/scrape.go
  benchmark, rename parser omtext_with_nhcb
  goimports run
  Better docstring on test
  Remove omcounterdata.txt as redundant
  Fix failing benchmarks
  Add unit test to show that current wrapper is sub-optimal
  Rename convert_classic_histograms to convert_classic_histograms_to_nhcb
  More followup to prometheus#15164
  Follow up prometheus#15178
  Followup to prometheus#15164
  test(cmd/prometheus): speed up test execution by t.Parallel() when possible
  feat: normalize "le" and "quantile" labels values upon ingestion
  scrape: provide a fallback format (prometheus#15136)
  Disallowing configure AM with the v1 api (prometheus#13883)
  feat: ProtobufParse.formatOpenMetricsFloat: improve float formatting by using strconv.AppendFloat instead of fmt.Sprint
  ...

# Conflicts:
#	go.mod
#	go.sum
Mindavi added a commit to Mindavi/nixpkgs that referenced this pull request Dec 2, 2024
…ptions

I chose to not constrain them to the supported values to be mo
6D40
re flexible in supporting any
newly added values. But maybe it makes sense to constrain them, I'm not completely sure.

Ran into this because Prometheus 3.0 was introduced last week and my config broke,
since rtl_433 doesn't set the headers / content-type correctly.

Related-to: NixOS#358862
Related-to: prometheus/prometheus#15136
hsjobeki pushed a commit to hsjobeki/nixpkgs that referenced this pull request Dec 3, 2024
…ptions

I chose to not constrain them to the supported values to be more flexible in supporting any
newly added values. But maybe it makes sense to constrain them, I'm not completely sure.

Ran into this because Prometheus 3.0 was introduced last week and my config broke,
since rtl_433 doesn't set the headers / content-type correctly.

Related-to: NixOS#358862
Related-to: prometheus/prometheus#15136
julienduchesne pushed a commit to julienduchesne/prometheus that referenced this pull request Dec 13, 2024
@WojciechKuk
Copy link

please add possibility to use it on global: level. @alexgreenbank

@beorn7
Copy link
Member
beorn7 commented Jan 16, 2025

Note that this is a fallback for targets that are buggy (i.e. they do not correctly implement content negotiation). We intentionally removed the global default because it turned out to not be safe. Introducing a global option would partially defeat the purpose.

If so many of your targets are buggy that you need a global option, I recommend to prioritize fixing the targets.

stanhu added a commit to stanhu/sinatra that referenced this pull request Feb 3, 2025
This commit ensures that all parameters of `Content-Type` are
separated with commas (`,`) instead of semicolons (`;`).

[RFC 7231](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

```
Content-Type = media-type

Media types are defined in [Section 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1).
An example of the field is

Content-Type: text/html; charset=ISO-8859-4
```

[RFC 7231 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

```
Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"
```

According to [this link](https://stackoverflow.com/a/35879320), it
seems that there was prior confusion in early RFCs over this.  It
appears `,` was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
    (such as `content_type "text/plain; version=0.0.4"`). If you omit
    the `;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.

Closes sinatra#2076
stanhu added a commit to stanhu/sinatra that referenced this pull request Feb 3, 2025
This commit ensures that all parameters of `Content-Type` are
separated with commas (`,`) instead of semicolons (`;`).

[RFC 7231](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

```
Content-Type = media-type

Media types are defined in [Section 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1).
An example of the field is

Content-Type: text/html; charset=ISO-8859-4
```

[RFC 7231 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

```
Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"
```

According to [this link](https://stackoverflow.com/a/35879320), it
seems that there was prior confusion in early RFCs over this.  It
appears `,` was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
    (such as `content_type "text/plain; version=0.0.4"`). If you omit
    the `;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.

Closes sinatra#2076
stanhu added a commit to stanhu/sinatra that referenced this pull request Feb 3, 2025
This commit ensures that all parameters of `Content-Type` are
separated with commas (`,`) instead of semicolons (`;`).

[RFC 7231](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

```
Content-Type = media-type

Media types are defined in [Section 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1).
An example of the field is

Content-Type: text/html; charset=ISO-8859-4
```

[RFC 7231 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

```
Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"
```

According to [this link](https://stackoverflow.com/a/35879320), it
seems that there was prior confusion in early RFCs over this.  It
appears `,` was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
    (such as `content_type "text/plain; version=0.0.4"`). If you omit
    the `;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.

Closes sinatra#2076
dentarg pushed a commit to sinatra/sinatra that referenced this pull request Feb 9, 2025
This commit ensures that all parameters of `Content-Type` are separated
with commas (`,`) instead of semicolons (`;`).

RFC 7231 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

      Content-Type = media-type

    Media types are defined in Section 3.1.1.1. An example of the field is

      Content-Type: text/html; charset=ISO-8859-4

RFC 7231 3.1.1.1 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

    Media types define both a data format and various processing models:
    how to process that data in accordance with each context in which it
    is received.

      media-type = type "/" subtype *( OWS ";" OWS parameter )
      type       = token
      subtype    = token

    The type/subtype MAY be followed by parameters in the form of
    name=value pairs.

      parameter      = token "=" ( token / quoted-string )

    The type, subtype, and parameter name tokens are case-insensitive.
    Parameter values might or might not be case-sensitive, depending on
    the semantics of the parameter name.  The presence or absence of a
    parameter might be significant to the processing of a media-type,
    depending on its definition within the media type registry.

    A parameter value that matches the token production can be
    transmitted either as a token or within a quoted-string.  The quoted
    and unquoted values are equivalent.  For example, the following
    examples are all equivalent, but the first is preferred for
    consistency:

      text/html;charset=utf-8
      text/html;charset=UTF-8
      Text/HTML;Charset="utf-8"
      text/html; charset="utf-8"

According to https://stackoverflow.com/a/35879320, it seems
that there was prior confusion in early RFCs over this. It appears `,`
was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
(such as `content_type "text/plain; version=0.0.4"`). If you omit the
`;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

scrape: Bail out of invalid scrape format rather than silently falling back to the classic text format
4 participants
0