-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Relax naming requirements in IR spec #6652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #6652 +/- ##
=======================================
Coverage 57.50% 57.50%
=======================================
Files 507 507
Lines 31624 31624
Branches 3046 3046
=======================================
Hits 18185 18185
Misses 12613 12613
Partials 826 826 ☔ View full report in Codecov by Sentry. |
LGTM, thanks! I think a couple of other things that could be helpful (in the future) are:
|
Nice, does it mean that the names can be any string of bytes, or is it still restricted to some subset such as ascii ? |
It can be any string I suppose. Of course I don’t expect it to become some control sequence and still play well with all tools. Do you see any caveats? it still needs to "always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 2^32." according to the protobuf string type definition. |
Sounds reasonable, and then the checker must exactly enforce that (nothing more, nothing less). |
This is enforced always by protobuf, and so I don’t think there’s anything for the checker to do |
Perfect then |
Yeah, I've noticed many ONNX models already out there that don't follow C90 identifier rules...
@justinchuby: Indeed, I'd strongly discourage use of any Unicode characters in the general categories of:
Otherwise you'll have tool chaos. 🙃 (tldr: Avoid control characters and line-break characters.) |
### Description This pull request includes changes to the `docs/IR.md` file to relax the strictness of identifier syntax rules. * Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262)) * Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477)) ### Motivation and Context The change was motivated by practicality. 1. Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant. 2. The onnx checker does not enforce this aspect of naming It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format `a.b.c`. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules. Fixes #6219 Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
### Description This pull request includes changes to the `docs/IR.md` file to relax the strictness of identifier syntax rules. * Changed the requirement for names within a graph to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL262-R262](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L262-R262)) * Updated the requirement for dimension variable names to adhere to C90 identifier syntax rules from "MUST" to "SHOULD". (`docs/IR.md`, [docs/IR.mdL477-R477](diffhunk://#diff-abcfc88a55144836fff2a055d73bc894201789bda5c0de98594e931037b5ec21L477-R477)) ### Motivation and Context The change was motivated by practicality. 1. Currently tools in the ecosystem does not assume adherence, and popular source of ONNX models like the PyTorch exporter does not produce models that adheres to the requirements. Relaxing the spec makes it easier for tools in the ecosystem to be conformant. 2. The onnx checker does not enforce this aspect of naming It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format `a.b.c`. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules. Fixes onnx#6219 Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>
Having to be careful about special characters always ends up pretty badly, which is what brings me there in the first place (see here). We should make sure that the specification is conceptually easy to describe, and that the tools are tested to support everything which is allowed, so that test suites purposefully include adversarial examples such as |
Description
This pull request includes changes to the
docs/IR.md
file to relax the strictness of identifier syntax rules.docs/IR.md
, docs/IR.mdL262-R262)docs/IR.md
, docs/IR.mdL477-R477)Motivation and Context
The change was motivated by practicality.
It further promotes interoperability with other ML frameworks. For example, model weights in a PyTorch model is named in the format
a.b.c
. By relaxing the spec, we are able to directly map PyTorch model weights to ONNX initializers without any name transformation rules.Fixes #6219