-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Scaffolding for meta tensor crossref testing #75994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit a1810e9 (more details on the Dr. CI page): Expand to see more
🕵️ 12 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. TODO: There are failures that correspond to known bugs and need to be skipped. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: c127657 Pull Request resolved: #76905
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. TODO: There are failures that correspond to known bugs and need to be skipped. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 900fca4 Pull Request resolved: #76905
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. TODO: There are failures that correspond to known bugs and need to be skipped. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: b7272dd Pull Request resolved: #76905
#75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. TODO: There are failures that correspond to known bugs and need to be skipped. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #76905 Approved by: https://github.com/anjali411, https://github.com/mruberry, https://github.com/albanD
PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77008 Approved by: https://github.com/ngimel
Summary: PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and had it run on a simple OpInfo invocation only. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77008 Approved by: https://github.com/ngimel Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/60f131fb6c2e3f4a23e64096a3e718a1e669215b Reviewed By: malfet Differential Revision: D36250515 fbshipit-source-id: 93cdc3cb9bf4c3375bd679aea8d5f59a09f65585
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for this PR? Context is OpInfos dont have any mixed-device inputs so they're not very useful for testing FakeTensors, you would need something like this instead
PRAGMA locking_mode = EXCLUSIVE; | ||
PRAGMA temp_store = MEMORY; | ||
|
||
CREATE TABLE IF NOT EXISTS files ( file_id INTEGER NOT NULL PRIMARY KEY, file TEXT NOT NULL UNIQUE ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never thought id see SQL in pytorch 😮
I'm currently not sure how to prioritize it. You get very good coverage with it, but it's also very painful to do development on: it takes a few hours to run through the entirety of the test suite and when you issue fixes it is somewhat difficult to understand which tests you should rerun to revalidate. Without fake tensors is mind, I was planning on getting the OpInfo test suite clean first, and then moving on to crossref testing. Here is my suggestion, @eellison. You should create a variant mode that looks for tests that exercise mixed device inputs and log them all. Then (somehow) selectively apply cross-ref testing on this set (the somehow because I don't know how to automatically apply something to a generated list of tests, you'll need to figure something out) and land that. |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
There's some utility stuff I could have stacked separately but haven't done so, shout if you want me to.
torch.overrides.resolve_name
: this takes a public Torch API function and returns a string name corresponding to it. If found this pretty useful for giving good messages because the repr on most of our function objects is pretty useless (NB: this should be fixed.) It's missing a little bit of functionality; direct calls to torch.ops, torch._VF and torch._C._nn don't work yet. This needed a bit of surgery on_get_overridable_functions
and a nicer refactor would be good, not exactly sure how to do it.torch._C. _set_storage_via_tensor
, which is likeset_
but it takes in a tensor rather than a storage (I need this because meta storages don't work), andtorch._C. _is_batched
, which lets me test if a tensor is batched (as far as I can tell, there's no way to test this in userland.)PYTORCH_TEST_WITH_COVERAGE_DB
which will write out a sqlite3 database recording which tests called which torch API functions. This is useful if you're working on a particular torch function and want to quickly execute all tests that exercise that function (not just the obvious ones). From experimentation, I observed that it's best to have a fairly normalized database representation to reduce the disk size and inserts to the database go faster; I define a view to recover the 'naive' viewpoint on the database.How does crossref testing work? When
PYTORCH_TEST_WITH_CROSSREF=1
, we install a torch function mode which will attempt to run the equivalent of any torch API call with all meta arguments. For now, we just run it and don't check if any of the results are right, but that is the next logical step.Doing the same operation, but with meta arguments, is quite involved.
meta_storage
ensures if we see the same storage multiple times we map it to the same meta storage. However, meta tensors don't actually support Python storage bindings right now, so I instead return a tensor and use_set_storage_via_tensor
to install it later.Signed-off-by: Edward Z. Yang ezyang@fb.com