8000 Relax naming requirements in IR spec by justinchuby · Pull Request #6652 · onnx/onnx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Relax naming requirements in IR spec #6652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2025
Merged

Relax naming requirements in IR spec #6652

merged 1 commit into from
Jan 24, 2025

Conversation

justinchuby
Copy link
Member
@justinchuby justinchuby commented Jan 24, 2025

Description

This pull request includes changes to the docs/IR.md file to relax the strictness of identifier syntax rules.

  • Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (docs/IR.md, docs/IR.mdL262-R262)
  • Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (docs/IR.md, docs/IR.mdL477-R477)

Motivation and Context

The change was motivated by practicality.

  1. Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant.
  2. The onnx checker does not enforce this aspect of naming

It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format a.b.c. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules.

Fixes #6219

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@justinchuby justinchuby requested a review from a team as a code owner January 24, 2025 00:38
@justinchuby justinchuby requested review from a team January 24, 2025 00:41
@justinchuby justinchuby changed the title Weaken wording of naming requirements in IR spec Relax naming requirements in IR spec Jan 24, 2025
Copy link
codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.50%. Comparing base (9683661) to head (0460511).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6652   +/-   ##
=======================================
  Coverage   57.50%   57.50%           
=======================================
  Files         507      507           
  Lines       31624    31624           
  Branches     3046     3046           
=======================================
  Hits        18185    18185           
  Misses      12613    12613           
  Partials      826      826           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gramalingam
Copy link
Contributor

LGTM, thanks! I think a couple of other things that could be helpful (in the future) are:

  • Increasing awareness of this in the community, so that we can converge on something widely accepted.
  • A "sanitization tool" that renames identifiers to satisfy the stricter requirement (for users who need it).

@gramalingam gramalingam added this pull request to the merge queue Jan 24, 2025
Merged via the queue into main with commit 171b23e Jan 24, 2025
43 checks passed
@gramalingam gramalingam deleted the justinchu/c90 branch January 24, 2025 01:35
@Pierre-Bartet
Copy link

Nice, does it mean that the names can be any string of bytes, or is it still restricted to some subset such as ascii ?

@justinchuby
Copy link
Member Author
justinchuby commented Jan 24, 2025

Nice, does it mean that the names can be any string of bytes, or is it still restricted to some subset such as ascii ?

It can be any string I suppose. Of course I don’t expect it to become some control sequence and still play well with all tools. Do you see any caveats?

it still needs to "always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 2^32." according to the protobuf string type definition.

@justinchuby justinchuby added announcement Important information for users/developers release notes Important changes to call out in release notes labels Jan 24, 2025
@justinchuby justinchuby removed the announcement Important information for users/developers label Jan 24, 2025
@Pierre-Bartet
Copy link

it still needs to "always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 2^32." according to the protobuf string type definition.

Sounds reasonable, and then the checker must exactly enforce that (nothing more, nothing less).

@justinchuby
Copy link
Member Author

This is enforced always by protobuf, and so I don’t think there’s anything for the checker to do

@Pierre-Bartet
Copy link

Perfect then

@fdwr
Copy link
Contributor
fdwr commented Jan 24, 2025

The change was motivated by practicality

Yeah, I've noticed many ONNX models already out there that don't follow C90 identifier rules...

I don’t expect it to become some control sequence

@justinchuby: Indeed, I'd strongly discourage use of any Unicode characters in the general categories of:

  • Cc (other control)
  • Cf (control formatting)
  • Cn (control not assigned)
  • Zl (line separator)
  • Zp (paragraph separator)

Otherwise you'll have tool chaos. 🙃 (tldr: Avoid control characters and line-break characters.)

andife pushed a commit that referenced this pull request Jan 25, 2025
### Description

This pull request includes changes to the `docs/IR.md` file to relax the
strictness of identifier syntax rules.

* Changed the requirement for names within a graph to adhere to C90
identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`,
[docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262))
* Updated the requirement for dimension variable names to adhere to C90
identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`,
[docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477))

### Motivation and Context

The change was motivated by practicality.

1. Currently tools in the ecosystem does not assume adherence, and
popular source of ONNX models like the PyTorch exporter does not produce
models that adheres to the requirements. Relaxing the spec makes it
easier for tools in the ecosystem to be conformant.
2. The onnx checker does not enforce this aspect of naming

It further promotes interoperability with other ML frameworks. For
example, model weights in a PyTorch model is named in the format
`a.b.c`. By relaxing the spec, we are able to directly map PyTorch model
weights to ONNX initializers without any name transformation rules.

Fixes #6219

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
seungwoo-ji-03 pushed a commit to seungwoo-ji-03/onnx that referenced this pull request Feb 17, 2025
### Description

This pull request includes changes to the `docs/IR.md` file to relax the
strictness of identifier syntax rules.

* Changed the requirement for names within a graph to adhere to C90
identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`,
[docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262))
* Updated the requirement for dimension variable names to adhere to C90
identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`,
[docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477))

### Motivation and Context

The change was motivated by practicality.

1. Currently tools in the ecosystem does not assume adherence, and
popular source of ONNX models like the PyTorch exporter does not produce
models that adheres to the requirements. Relaxing the spec makes it
easier for tools in the ecosystem to be conformant.
2. The onnx checker does not enforce this aspect of naming

It further promotes interoperability with other ML frameworks. For
example, model weights in a PyTorch model is named in the format
`a.b.c`. By relaxing the spec, we are able to directly map PyTorch model
weights to ONNX initializers without any name transformation rules.

Fixes onnx#6219

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>
@justinchuby justinchuby added this to the 1.18 milestone Mar 3, 2025
@Pierre-Bartet
Copy link

I'd strongly discourage use of any Unicode characters in the general categories of: ...

Having to be careful about special characters always ends up pretty badly, which is what brings me there in the first place (see here).

We should make sure that the specification is conceptually easy to describe, and that the tools are tested to support everything which is allowed, so that test suites purposefully include adversarial examples such as Zp (paragraph separator) or Cf (control formatting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: spec release notes Important changes to call out in release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

ONNX checker does not validate C90 identifier compatibility.
4 participants
0