8000 fix(config): prohibit multiline interpolation by pront · Pull Request #22995 · vectordotdev/vector · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(config): prohibit multiline interpolation #22995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

B 8000 y clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

pront
Copy link
Member
@pront pront commented May 5, 2025

Summary

The envinronment variable (and the secrets) resolution now happens after the config string is parsed into a TOML table. The interpolation now happens on the table keys and values. As a side effect, this fixes a bug where comments referring to env vars (or secrets) that don't exist caused a config build error.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Injecting whole blocks now results in error e.g.

export SOURCES_BLOCK="sources:\"
  demo:
    type: demo_logs
    format: json
    interval: 1
${SOURCES_BLOCK}

The config above will fail to load.

How did you test this PR?

Vector config:

secret:
  backend_1:
    type: file
    path: /Users/pavlos.rontidis/CLionProjects/vector/pront/data/top_secret.json

# ${IN_COMMENT_BUT_DOES_NOT_EXIST}

sources:
  demo_logs_1:
    type: demo_logs
    format: json
    interval: 60
    # noop: "SECRET[i_dont_exist]"

transforms:
  t0:
    inputs:
      - demo_logs_1
    type: "remap"
    source: |
      .host = "${HOSTNAME}"
      .environment = "${ENV:?emv must be supplied}"
      .tenant = "${TENANT:-undefined}"
      .day = "SECRET[backend_1.day]"


sinks:
  s0:
    type: "SECRET[backend_1.type]"
    inputs: [ "t0" ]
    encoding:
      codec: json
      json:
        pretty: true

Output:

{
  "day": "monday",
  "environment": "${ENV:?emv must be supplied}",
  "host": "${HOSTNAME}",
  "message": "{\"host\":...",
  "service": "vector",
  "source_type": "demo_logs",
  "tenant": "${TENANT:-undefined}",
  "timestamp": "2025-05-05T20:52:42.024064Z"
}

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

References

@datadog-vectordotdev
Copy link
datadog-vectordotdev bot commented May 5, 2025

Datadog Report

Branch report: pront/intepolation-after-parsing
Commit report: d5d1ac2
Test service: vector

✅ 0 Failed, 7 Passed, 0 Skipped, 25.45s Total Time

@pront pront changed the title fix(config): disallow multiline interpolation fix(config): prohibit multiline interpolation May 6, 2025
@@ -0,0 +1,297 @@
use std::{collections::HashMap, sync::LazyLock};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved vars.rs to this location. But kept the same use/export.

deserialize_table(table)
}

pub fn resolve_environment_variables(table: Table) -> Result<Table, Vec<String>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was extracted from prepare_input

@pront pront marked this pull request as ready for review May 6, 2025 21:19
@pront pront requested a review from a team as a code owner May 6, 2025 21:19
@pront pront requested a review from jszwedko May 6, 2025 21:20
Copy link
Member
@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm glad to see us finally addressing this rough edge.

I think this works for string fields, but it doesn't seem to interpolate non-string fields correctly. For example:

sources:
  source0:
    type: demo_logs
    count: ${COUNT}
    format: json
    decoding:
      codec: json
    framing:
      method: bytes
    interval: 1.0
sinks:
  sink0:
    inputs:
    - source0
    type: console
    encoding:
      codec: json

With COUNT=1 gives:

2025-05-06T21:40:58.830829Z  INFO vector::app: Log level is enabled. level="info"
2025-05-06T21:40:58.831986Z  INFO vector::app: Loading configs. paths=["/tmp/tmp.yaml"]
2025-05-06T21:40:58.838663Z ERROR vector::cli: Configuration error. error=invalid type: string "1", expected usize



in `sources.source0`

@@ -0,0 +1,20 @@
The env var (and secrets) resolution now happens after the config string is parsed into a TOML table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the config always parsed into a TOML table? This explains the weird behavior I've seen before where I get TOML errors for YAML configs. 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, internally we call format::deserialize which supports all formats and then we always convert to a TOML table after this step. I would expect this to translate to a config builder (out of scope for this PR).


pub fn resolve_environment_variables(table: Table) -> Result<Table, Vec<String>> {
let mut vars = std::env::vars().collect::<HashMap<_, _>>();
if !vars.contains_key("HOSTNAME") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL this special case was here.

@@ -88,11 +89,10 @@ impl SecretBackendLoader {
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Currently (on master) we don't support multiple types for secrets.

2025-05-07T14:38:00.776983Z ERROR vector::cli: Configuration error. error=Error while retrieving secret from backend "backend_1": invalid type: integer `1`, expected a string at line 1 column 10.

However, interpolate_secrets supports multiple types. I consider this out of scope for this PR but we could go further and allow e.g. integer secrets.

@pront
Copy link
Member Author
pront commented May 7, 2025

With COUNT=1 gives:

2025-05-06T21:40:58.830829Z  INFO vector::app: Log level is enabled. level="info"
2025-05-06T21:40:58.831986Z  INFO vector::app: Loading configs. paths=["/tmp/tmp.yaml"]
2025-05-06T21:40:58.838663Z ERROR vector::cli: Configuration error. error=invalid type: string "1", expected usize

Fixed and added tests to cover this.

Copy link
Member
@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. This fixes my first example. I think the optimistic parsing as integers, though, isn't quite right. It doesn't seem to handle quoted strings where the value is parsable as an integer. For example, this doesn't work:

sources:
  source0:
    type: demo_logs
    count: ${COUNT}
    format: shuffle
    lines: ["${LINE}"]
    framing:
      method: bytes
    interval: 1.0
sinks:
  sink0:
    inputs:
    - source0
    type: console
    encoding:
      codec: json

It gives:

COUNT=1 LINE=1234 ./target/debug/vector -c /tmp/tmp.yaml
2025-05-07T15:31:36.925469Z  INFO vector::app: Log level is enabled. level="info"
2025-05-07T15:31:36.926534Z  INFO vector::app: Loading configs. paths=["/tmp/tmp.yaml"]
2025-05-07T15:31:36.932941Z ERROR vector::cli: Configuration error. error=invalid type: integer `1234`, expected a string

in `sources.source0`

@pront
Copy link
Member Author
pront commented May 7, 2025

Nice, thanks. This fixes my first example. I think the optimistic parsing as integers, though, isn't quite right. It doesn't seem to handle quoted strings where the value is parsable as an integer. For example, this doesn't work:

sources:
  source0:
    type: demo_logs
    count: ${COUNT}
    format: shuffle
    lines: ["${LINE}"]
    framing:
      method: bytes
    interval: 1.0
sinks:
  sink0:
    inputs:
    - source0
    type: console
    encoding:
      codec: json

It gives:

COUNT=1 LINE=1234 ./target/debug/vector -c /tmp/tmp.yaml
2025-05-07T15:31:36.925469Z  INFO vector::app: Log level is enabled. level="info"
2025-05-07T15:31:36.926534Z  INFO vector::app: Loading configs. paths=["/tmp/tmp.yaml"]
2025-05-07T15:31:36.932941Z ERROR vector::cli: Configuration error. error=invalid type: integer `1234`, expected a string

in `sources.source0`

Taking a look.This all goes back to my previous comment that this intermediate TOML table representation is problematic because it doesn't use type information from the config types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants
0