Relax naming requirements in IR spec #6652

justinchuby · 2025-01-24T00:38:36Z

Description

This pull request includes changes to the docs/IR.md file to relax the strictness of identifier syntax rules.

Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (docs/IR.md, docs/IR.mdL262-R262)
Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (docs/IR.md, docs/IR.mdL477-R477)

Motivation and Context

The change was motivated by practicality.

Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant.
The onnx checker does not enforce this aspect of naming

It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format a.b.c. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules.

Fixes #6219

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

codecov · 2025-01-24T00:43:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.50%. Comparing base (9683661) to head (0460511).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6652   +/-   ##
=======================================
  Coverage   57.50%   57.50%           
=======================================
  Files         507      507           
  Lines       31624    31624           
  Branches     3046     3046           
=======================================
  Hits        18185    18185           
  Misses      12613    12613           
  Partials      826      826

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gramalingam · 2025-01-24T01:12:22Z

LGTM, thanks! I think a couple of other things that could be helpful (in the future) are:

Increasing awareness of this in the community, so that we can converge on something widely accepted.
A "sanitization tool" that renames identifiers to satisfy the stricter requirement (for users who need it).

Pierre-Bartet · 2025-01-24T07:47:55Z

Nice, does it mean that the names can be any string of bytes, or is it still restricted to some subset such as ascii ?

justinchuby · 2025-01-24T14:47:36Z

Nice, does it mean that the names can be any string of bytes, or is it still restricted to some subset such as ascii ?

It can be any string I suppose. Of course I don’t expect it to become some control sequence and still play well with all tools. Do you see any caveats?

it still needs to "always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 2^32." according to the protobuf string type definition.

Pierre-Bartet · 2025-01-24T14:57:21Z

it still needs to "always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 2^32." according to the protobuf string type definition.

Sounds reasonable, and then the checker must exactly enforce that (nothing more, nothing less).

justinchuby · 2025-01-24T14:58:58Z

This is enforced always by protobuf, and so I don’t think there’s anything for the checker to do

Pierre-Bartet · 2025-01-24T15:01:42Z

Perfect then

fdwr · 2025-01-24T23:28:01Z

The change was motivated by practicality

Yeah, I've noticed many ONNX models already out there that don't follow C90 identifier rules...

I don’t expect it to become some control sequence

@justinchuby: Indeed, I'd strongly discourage use of any Unicode characters in the general categories of:

Cc (other control)
Cf (control formatting)
Cn (control not assigned)
Zl (line separator)
Zp (paragraph separator)

Otherwise you'll have tool chaos. 🙃 (tldr: Avoid control characters and line-break characters.)

### Description This pull request includes changes to the `docs/IR.md` file to relax the strictness of identifier syntax rules. * Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262)) * Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477)) ### Motivation and Context The change was motivated by practicality. 1. Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant. 2. The onnx checker does not enforce this aspect of naming It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format `a.b.c`. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules. Fixes #6219 Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Andreas Fehlner <fehlner@arcor.de>

### Description This pull request includes changes to the `docs/IR.md` file to relax the strictness of identifier syntax rules. * Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262)) * Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477)) ### Motivation and Context The change was motivated by practicality. 1. Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant. 2. The onnx checker does not enforce this aspect of naming It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format `a.b.c`. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules. Fixes onnx#6219 Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>

Pierre-Bartet · 2025-03-26T21:00:48Z

I'd strongly discourage use of any Unicode characters in the general categories of: ...

Having to be careful about special characters always ends up pretty badly, which is what brings me there in the first place (see here).

We should make sure that the specification is conceptually easy to describe, and that the tools are tested to support everything which is allowed, so that test suites purposefully include adversarial examples such as Zp (paragraph separator) or Cf (control formatting).

Weaken wording of naming requirements in IR spec

0460511

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby requested a review from a team as a code owner January 24, 2025 00:38

justinchuby requested review from a team January 24, 2025 00:41

justinchuby added the module: spec label Jan 24, 2025

justinchuby temporarily deployed to testpypi-weekly January 24, 2025 00:41 — with GitHub Actions Inactive

justinchuby temporarily deployed to pypi-weekly January 24, 2025 00:41 — with GitHub Actions Inactive

justinchuby changed the title ~~Weaken wording of naming requirements in IR spec~~ Relax naming requirements in IR spec Jan 24, 2025

gramalingam approved these changes Jan 24, 2025

View reviewed changes

gramalingam added this pull request to the merge queue Jan 24, 2025

Merged via the queue into main with commit 171b23e Jan 24, 2025
43 checks passed

gramalingam deleted the justinchu/c90 branch January 24, 2025 01:35

Pierre-Bartet mentioned this pull request Jan 24, 2025

Input names are silently modified, leading to inputs mismatch during inference onnx/sklearn-onnx#1153

Open

justinchuby added announcement Important information for users/developers release notes Important changes to call out in release notes labels Jan 24, 2025

justinchuby temporarily deployed to pypi-weekly January 24, 2025 14:53 — with GitHub Actions Inactive

justinchuby temporarily deployed to testpypi-weekly January 24, 2025 14:53 — with GitHub Actions Inactive

justinchuby removed the announcement Important information for users/developers label Jan 24, 2025

justinchuby mentioned this pull request Jan 24, 2025

Relaxed ONNX naming requirements #6658

Open

justinchuby added this to the 1.18 milestone Mar 3, 2025

botantony mentioned this pull request May 14, 2025

onnx 1.18.0 Homebrew/homebrew-core#223395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relax naming requirements in IR spec #6652

Relax naming requirements in IR spec #6652

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Relax naming requirements in IR spec #6652

Relax naming requirements in IR spec #6652

Conversation

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!