8000 Make debug_pkl smaller by only emitting unique traces. by qihqi · Pull Request #72596 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Make debug_pkl smaller by only emitting unique traces. #72596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

qihqi
Copy link
Contributor
@qihqi qihqi commented Feb 9, 2022

Summary:
debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange contains similar looking strings, they can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

Test Plan: unit test

Differential Revision: D33994011

@pytorch-bot
Copy link
pytorch-bot bot commented Feb 9, 2022
CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/qihqi/pytorch/blob/1d35b17c8b536711da1cfadbd5c56e8d36d012fb/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk, ciflow/xla ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Feb 9, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 96db16e (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T05:45:05.9110173Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T05:45:05.9105788Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-02-24T05:45:05.9106505Z     result = test(self, **param_kwargs)
2022-02-24T05:45:05.9107439Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2950, in wrapped
2022-02-24T05:45:05.9107809Z     f(self, *args, **kwargs, coalesced=True)
2022-02-24T05:45:05.9108098Z   File "test_sparse.py", line 1629, in test_basic_ops
2022-02-24T05:45:05.9108400Z     _test_basic_ops_hybrid()
2022-02-24T05:45:05.9108697Z   File "test_sparse.py", line 1622, in _test_basic_ops_hybrid
2022-02-24T05:45:05.9109073Z     self._test_basic_ops_shape(9, 12, [10, 10, 10], [2, 0], dtype, device, coalesced)
2022-02-24T05:45:05.9109555Z   File "test_sparse.py", line 1541, in _test_basic_ops_shape
2022-02-24T05:45:05.9109854Z     y1 = x1 * x2
2022-02-24T05:45:05.9110173Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T05:45:05.9110630Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-02-24T05:45:05.9111080Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-02-24T05:45:05.9111349Z 		
2022-02-24T05:45:05.9111650Z ✅ 19471 Passed
2022-02-24T05:45:05.9111944Z 💨 6506 Skipped
2022-02-24T05:45:05.9113442Z 🚨 1 Failed
2022-02-24T05:45:05.9570230Z ##[group]Run # Remove any previous test jsons if they exist
2022-02-24T05:45:05.9570668Z �[36;1m# Remove any previous test jsons if they exist�[0m
2022-02-24T05:45:05.9570959Z �[36;1mrm -f test-jsons-*.zip�[0m
2022-02-24T05:45:05.9571286Z �[36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'�[0m

See GitHub Actions build Lint / quick-checks (2/4)

Step: "Ensure no unqualified type ignore" (full log | diagnosis details | 🔁 rerun)

2022-02-24T03:26:31.6516737Z python: can't open..._launches.py': [Errno 2] No such file or directory
2022-02-24T03:26:31.6192103Z ##[group]Run set -eux
2022-02-24T03:26:31.6192531Z �[36;1mset -eux�[0m
2022-02-24T03:26:31.6192876Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2022-02-24T03:26:31.6243080Z shell: /bin/bash -e {0}
2022-02-24T03:26:31.6243279Z env:
2022-02-24T03:26:31.6243546Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.2/x64
2022-02-24T03:26:31.6243863Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.2/x64/lib
2022-02-24T03:26:31.6244121Z ##[endgroup]
2022-02-24T03:26:31.6315259Z + python torch/testing/_check_kernel_launches.py
2022-02-24T03:26:31.6317885Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2022-02-24T03:26:31.6516737Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2022-02-24T03:26:31.6577132Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2022-02-24T03:26:31.6578210Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2022-02-24T03:26:31.6625561Z shell: /bin/bash -e {0}
2022-02-24T03:26:31.6625762Z env:
2022-02-24T03:26:31.6626045Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.2/x64
2022-02-24T03:26:31.6626376Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.2/x64/lib
2022-02-24T03:26:31.6626653Z ##[endgroup]
2022-02-24T03:26:31.6922337Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2022-02-24T03:26:31.6923743Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2022-02-24T03:26:31.6966193Z shell: /bin/bash -e {0}

See GitHub Actions build linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T09:49:20.3971701Z RuntimeError: test_sparse failed! Received signal: SIGIOT
2022-02-24T09:49:20.1964627Z test_sparse.py:1567: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
2022-02-24T09:49:20.1966534Z   expected = self.safeToDense(x1) // 37.5
2022-02-24T09:49:20.2194827Z Memory exception on virtual address 0x0, node id 4 : Page not present
2022-02-24T09:49:20.2198130Z Address does not belong to a known buffer
2022-02-24T09:49:20.2201240Z Memory access fault by GPU node-4 (Agent handle: 0x5560897dc010) on address (nil). Reason: Page not present or supervisor privilege.
2022-02-24T09:49:20.3957329Z Traceback (most recent call last):
2022-02-24T09:49:20.3958188Z   File "test/run_test.py", line 1106, in <module>
2022-02-24T09:49:20.3964254Z     main()
2022-02-24T09:49:20.3964862Z   File "test/run_test.py", line 1084, in main
2022-02-24T09:49:20.3970807Z     raise RuntimeError(err_message)
2022-02-24T09:49:20.3971701Z RuntimeError: test_sparse failed! Received signal: SIGIOT
2022-02-24T09:49:21.9854226Z 
2022-02-24T09:49:21.9854767Z real	86m42.863s
2022-02-24T09:49:21.9855328Z user	145m42.450s
2022-02-24T09:49:21.9855860Z sys	19m14.259s
2022-02-24T09:49:21.9856390Z + cleanup
2022-02-24T09:49:21.9856894Z + retcode=1
2022-02-24T09:49:21.9857425Z + set +x
2022-02-24T09:49:21.9965072Z ##[error]Process completed with exit code 1.
2022-02-24T09:49:22.0035010Z ##[group]Run python3 -m pip install junitparser==2.1.1 rich==10.9.0
2022-02-24T09:49:22.0035927Z �[36;1mpython3 -m pip install junitparser==2.1.1 rich==10.9.0�[0m

See GitHub Actions build win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T08:14:49.4409748Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T08:14:49.4400528Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 376, in instantiated_test
2022-02-24T08:14:49.4401463Z     result = test(self, **param_kwargs)
2022-02-24T08:14:49.4402403Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 2950, in wrapped
2022-02-24T08:14:49.4403317Z     f(self, *args, **kwargs, coalesced=True)
2022-02-24T08:14:49.4403941Z   File "test_sparse.py", line 1629, in test_basic_ops
2022-02-24T08:14:49.4404482Z     _test_basic_ops_hybrid()
2022-02-24T08:14:49.4407080Z   File "test_sparse.py", line 1622, in _test_basic_ops_hybrid
2022-02-24T08:14:49.4407892Z     self._test_basic_ops_shape(9, 12, [10, 10, 10], [2, 0], dtype, device, coalesced)
2022-02-24T08:14:49.4408664Z   File "test_sparse.py", line 1541, in _test_basic_ops_shape
2022-02-24T08:14:49.4409149Z     y1 = x1 * x2
2022-02-24T08:14:49.4409748Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T08:14:49.4410766Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-02-24T08:14:49.4411695Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-02-24T08:14:49.4412113Z 
2022-02-24T08:14:49.4420253Z ✅ 36848 Passed
2022-02-24T08:14:49.4420696Z 💨 27954 Skipped
2022-02-24T08:14:49.4421078Z 🚨 1 Failed
2022-02-24T08:14:49.5268350Z ##[group]Run .github\scripts\wait_for_ssh_to_drain.ps1
2022-02-24T08:14:49.5269208Z �[36;1m.github\scripts\wait_for_ssh_to_drain.ps1�[0m
2022-02-24T08:14:49.5292722Z shell: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.EXE -command ". '{0}'"
2022-02-24T08:14:49.5293412Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Feb 9, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@qihqi qihqi requested review from zhxchen17 and gmagogsfm February 9, 2022 19:06
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

Copy link
Contributor
@gmagogsfm gmagogsfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that unit tests are used as test plans, but I don't see a checked in unit test. Is it left out of commit accidentally?

serialized = c10::ivalue::Tuple::create(
{s->text(), s->filename(), (int64_t)s->starting_line_no()});
{text_pos, fname_pos, (int64_t)s->starting_line_no()});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will dedup source code, could we run a one-time manual test on that ads model to see how much saving we can get?

How about the stack trace part in debug_pkl? Could we compress them too? It looks like a lot of the strings have redundancy.

Copy link
Contributor Author
@qihqi qihqi Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will dedup source code, could we run a one-time manual test on that ads model to see how much saving we can get?

I'll get help from the ads folks to see how to do this.

How about the stack trace part in debug_pkl? Could we compress them too? It looks like a lot of the strings have redundancy.

yes. SourceView object has string field "text", string field "filename". Sometimes the "text" field is stuffed with source code and sometimes stuffed with stack trace. I initially thought it's always source code until I unpicked and printed out what is in there and saw stack trace.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@qihqi qihqi requested a review from gmagogsfm February 10, 2022 00:46
@qihqi
Copy link
Contributor Author
qihqi commented Feb 10, 2022

You mentioned that unit tests are used as test plans, but I don't see a checked in unit test. Is it left out of commit accidentally?

I meant existing unit tests that do save/load sequences. Ex. test_jit

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@qihqi qihqi force-pushed the export-D33994011 branch 2 times, most recently from 32d5f8b to 5ab3a46 Compare February 10, 2022 19:58
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

// A stringlike class backed by a vector of string_view
// the string represented are logically the concatenation of the string_views
// This has advantage of not needing continues memory.
struct StringCordView {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving some of the heavy implementation of methods to cpp.

Additionally, are all the methods meant to be public?

};

// the interface
struct SourceView {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like SourceView is pure virtual now with Source being the only class implementing it. If so, why do we need SourceView?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we don't need.

SchemaParser(const std::string& str)
: L(std::make_shared<SourceView>(c10::string_view(str))),
explicit SchemaParser(const std::string& str)
: L(std::make_shared<Source>(c10::string_view(str))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are copying entire string (within Source) when schema parser is created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Source created with string_view don't copy (equiv to old SourceView) and Source created with std::string copies (equiv to old Source)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, source may or may not own the underlying string and that is dangerous. Could we preserve the original inheritance structure so that it is clear whether ownership is needed or not?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

Summary:
Pull Request resolved: pytorch#72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: a6db7e1d4c616ced4442ec06936ae005bf269212
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot pushed a commit that referenced this pull request Feb 24, 2022
Summary:
Pull Request resolved: #72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
@github-actions
Copy link
Contributor

Hey @qihqi.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 3bd1507. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 3bd1507. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Pull Request resolved: pytorch/pytorch#72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Pull Request resolved: pytorch/pytorch#72596

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange shares the same source, the string for trace can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```

Reviewed By: JasonHanwen

Differential Revision: D33994011

fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed fb-exported oncall: jit Add this issue/PR to JIT oncall triage queue Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0