Make debug_pkl smaller by only emitting unique traces. #72596

qihqi · 2022-02-09T18:53:50Z

Summary:
debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.

Since many SourceRange contains similar looking strings, they can be deduped.

The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).

Test Plan: unit test

Differential Revision: D33994011

pytorch-bot · 2022-02-09T18:53:54Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/qihqi/pytorch/blob/1d35b17c8b536711da1cfadbd5c56e8d36d012fb/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

facebook-github-bot · 2022 8000 -02-09T18:53:56Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/72596
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 96db16e (more details on the Dr. CI page):

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T05:45:05.9110173Z RuntimeError: CUDA error: an illegal memory access was encountered

2022-02-24T05:45:05.9105788Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-02-24T05:45:05.9106505Z     result = test(self, **param_kwargs)
2022-02-24T05:45:05.9107439Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2950, in wrapped
2022-02-24T05:45:05.9107809Z     f(self, *args, **kwargs, coalesced=True)
2022-02-24T05:45:05.9108098Z   File "test_sparse.py", line 1629, in test_basic_ops
2022-02-24T05:45:05.9108400Z     _test_basic_ops_hybrid()
2022-02-24T05:45:05.9108697Z   File "test_sparse.py", line 1622, in _test_basic_ops_hybrid
2022-02-24T05:45:05.9109073Z     self._test_basic_ops_shape(9, 12, [10, 10, 10], [2, 0], dtype, device, coalesced)
2022-02-24T05:45:05.9109555Z   File "test_sparse.py", line 1541, in _test_basic_ops_shape
2022-02-24T05:45:05.9109854Z     y1 = x1 * x2
2022-02-24T05:45:05.9110173Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T05:45:05.9110630Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-02-24T05:45:05.9111080Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-02-24T05:45:05.9111349Z 		
2022-02-24T05:45:05.9111650Z ✅ 19471 Passed
2022-02-24T05:45:05.9111944Z 💨 6506 Skipped
2022-02-24T05:45:05.9113442Z 🚨 1 Failed
2022-02-24T05:45:05.9570230Z ##[group]Run # Remove any previous test jsons if they exist
2022-02-24T05:45:05.9570668Z �[36;1m# Remove any previous test jsons if they exist�[0m
2022-02-24T05:45:05.9570959Z �[36;1mrm -f test-jsons-*.zip�[0m
2022-02-24T05:45:05.9571286Z �[36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'�[0m

Lint / quick-checks (2/4)

Step: "Ensure no unqualified type ignore" (full log | diagnosis details | 🔁 rerun)

2022-02-24T03:26:31.6516737Z python: can't open..._launches.py': [Errno 2] No such file or directory

2022-02-24T03:26:31.6192103Z ##[group]Run set -eux
2022-02-24T03:26:31.6192531Z �[36;1mset -eux�[0m
2022-02-24T03:26:31.6192876Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2022-02-24T03:26:31.6243080Z shell: /bin/bash -e {0}
2022-02-24T03:26:31.6243279Z env:
2022-02-24T03:26:31.6243546Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.2/x64
2022-02-24T03:26:31.6243863Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.2/x64/lib
2022-02-24T03:26:31.6244121Z ##[endgroup]
2022-02-24T03:26:31.6315259Z + python torch/testing/_check_kernel_launches.py
2022-02-24T03:26:31.6317885Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2022-02-24T03:26:31.6516737Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2022-02-24T03:26:31.6577132Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2022-02-24T03:26:31.6578210Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2022-02-24T03:26:31.6625561Z shell: /bin/bash -e {0}
2022-02-24T03:26:31.6625762Z env:
2022-02-24T03:26:31.6626045Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.2/x64
2022-02-24T03:26:31.6626376Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.2/x64/lib
2022-02-24T03:26:31.6626653Z ##[endgroup]
2022-02-24T03:26:31.6922337Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2022-02-24T03:26:31.6923743Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2022-02-24T03:26:31.6966193Z shell: /bin/bash -e {0}

linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T09:49:20.3971701Z RuntimeError: test_sparse failed! Received signal: SIGIOT

2022-02-24T09:49:20.1964627Z test_sparse.py:1567: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
2022-02-24T09:49:20.1966534Z   expected = self.safeToDense(x1) // 37.5
2022-02-24T09:49:20.2194827Z Memory exception on virtual address 0x0, node id 4 : Page not present
2022-02-24T09:49:20.2198130Z Address does not belong to a known buffer
2022-02-24T09:49:20.2201240Z Memory access fault by GPU node-4 (Agent handle: 0x5560897dc010) on address (nil). Reason: Page not present or supervisor privilege.
2022-02-24T09:49:20.3957329Z Traceback (most recent call last):
2022-02-24T09:49:20.3958188Z   File "test/run_test.py", line 1106, in <module>
2022-02-24T09:49:20.3964254Z     main()
2022-02-24T09:49:20.3964862Z   File "test/run_test.py", line 1084, in main
2022-02-24T09:49:20.3970807Z     raise RuntimeError(err_message)
2022-02-24T09:49:20.3971701Z RuntimeError: test_sparse failed! Received signal: SIGIOT
2022-02-24T09:49:21.9854226Z 
2022-02-24T09:49:21.9854767Z real	86m42.863s
2022-02-24T09:49:21.9855328Z user	145m42.450s
2022-02-24T09:49:21.9855860Z sys	19m14.259s
2022-02-24T09:49:21.9856390Z + cleanup
2022-02-24T09:49:21.9856894Z + retcode=1
2022-02-24T09:49:21.9857425Z + set +x
2022-02-24T09:49:21.9965072Z ##[error]Process completed with exit code 1.
2022-02-24T09:49:22.0035010Z ##[group]Run python3 -m pip install junitparser==2.1.1 rich==10.9.0
2022-02-24T09:49:22.0035927Z �[36;1mpython3 -m pip install junitparser==2.1.1 rich==10.9.0�[0m

win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-24T08:14:49.4409748Z RuntimeError: CUDA error: an illegal memory access was encountered

2022-02-24T08:14:49.4400528Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 376, in instantiated_test
2022-02-24T08:14:49.4401463Z     result = test(self, **param_kwargs)
2022-02-24T08:14:49.4402403Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 2950, in wrapped
2022-02-24T08:14:49.4403317Z     f(self, *args, **kwargs, coalesced=True)
2022-02-24T08:14:49.4403941Z   File "test_sparse.py", line 1629, in test_basic_ops
2022-02-24T08:14:49.4404482Z     _test_basic_ops_hybrid()
2022-02-24T08:14:49.4407080Z   File "test_sparse.py", line 1622, in _test_basic_ops_hybrid
2022-02-24T08:14:49.4407892Z     self._test_basic_ops_shape(9, 12, [10, 10, 10], [2, 0], dtype, device, coalesced)
2022-02-24T08:14:49.4408664Z   File "test_sparse.py", line 1541, in _test_basic_ops_shape
2022-02-24T08:14:49.4409149Z     y1 = x1 * x2
2022-02-24T08:14:49.4409748Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-02-24T08:14:49.4410766Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-02-24T08:14:49.4411695Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-02-24T08:14:49.4412113Z 
2022-02-24T08:14:49.4420253Z ✅ 36848 Passed
2022-02-24T08:14:49.4420696Z 💨 27954 Skipped
2022-02-24T08:14:49.4421078Z 🚨 1 Failed
2022-02-24T08:14:49.5268350Z ##[group]Run .github\scripts\wait_for_ssh_to_drain.ps1
2022-02-24T08:14:49.5269208Z �[36;1m.github\scripts\wait_for_ssh_to_drain.ps1�[0m
2022-02-24T08:14:49.5292722Z shell: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.EXE -command ". '{0}'"
2022-02-24T08:14:49.5293412Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-02-09T18:54:17Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-09T19:14:28Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-09T23:14:56Z

This pull request was exported from Phabricator. Differential Revision: D33994011

gmagogsfm

You mentioned that unit tests are used as test plans, but I don't see a checked in unit test. Is it left out of commit accidentally?

torch/csrc/jit/mobile/debug_info.h

torch/csrc/jit/mobile/debug_info.cpp

torch/csrc/jit/serialization/source_range_serialization.cpp

gmagogsfm · 2022-02-09T23:45:39Z

torch/csrc/jit/serialization/source_range_serialization.cpp

    serialized = c10::ivalue::Tuple::create(
-        {s->text(), s->filename(), (int64_t)s->starting_line_no()});
+        {text_pos, fname_pos, (int64_t)s->starting_line_no()});


Looks like this will dedup source code, could we run a one-time manual test on that ads model to see how much saving we can get?

How about the stack trace part in debug_pkl? Could we compress them too? It looks like a lot of the strings have redundancy.

Looks like this will dedup source code, could we run a one-time manual test on that ads model to see how much saving we can get?

I'll get help from the ads folks to see how to do this.

How about the stack trace part in debug_pkl? Could we compress them too? It looks like a lot of the strings have redundancy.

yes. SourceView object has string field "text", string field "filename". Sometimes the "text" field is stuffed with source code and sometimes stuffed with stack trace. I initially thought it's always source code until I unpicked and printed out what is in there and saw stack trace.

facebook-github-bot · 2022-02-09T23:48:12Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-10T00:45:23Z

This pull request was exported from Phabricator. Differential Revision: D33994011

qihqi · 2022-02-10T01:24:02Z

You mentioned that unit tests are used as test plans, but I don't see a checked in unit test. Is it left out of commit accidentally?

I meant existing unit tests that do save/load sequences. Ex. test_jit

facebook-github-bot · 2022-02-10T01:53:40Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-10T19:58:52Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-18T19:24:00Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-18T22:04:30Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-22T17:51:45Z

This pull request was exported from Phabricator. Differential Revision: D33994011

gmagogsfm · 2022-02-23T00:27:29Z

torch/csrc/jit/frontend/source_range.h

+// A stringlike class backed by a vector of string_view
+// the string represented are logically the concatenation of  the string_views
+// This has advantage of not needing continues memory.
+struct StringCordView {


Consider moving some of the heavy implementation of methods to cpp.

Additionally, are all the methods meant to be public?

gmagogsfm · 2022-02-23T00:55:21Z

torch/csrc/jit/frontend/source_range.h

+};
+
+// the interface
+struct SourceView {


Looks like SourceView is pure virtual now with Source being the only class implementing it. If so, why do we need SourceView?

Good point, we don't need.

gmagogsfm · 2022-02-23T01:46:12Z

torch/csrc/jit/frontend/function_schema_parser.cpp

-  SchemaParser(const std::string& str)
-      : L(std::make_shared<SourceView>(c10::string_view(str))),
+  explicit SchemaParser(const std::string& str)
+      : L(std::make_shared<Source>(c10::string_view(str))),


Does this mean we are copying entire string (within Source) when schema parser is created?

No. Source created with string_view don't copy (equiv to old SourceView) and Source created with std::string copies (equiv to old Source)

In that case, source may or may not own the underlying string and that is dangerous. Could we preserve the original inheritance structure so that it is clear whether ownership is needed or not?

facebook-github-bot · 2022-02-24T02:09:09Z

This pull request was exported from Phabricator. Differential Revision: D33994011

facebook-github-bot · 2022-02-24T03:16:24Z

This pull request was exported from Phabricator. Differential Revision: D33994011

Summary: Pull Request resolved: pytorch#72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: a6db7e1d4c616ced4442ec06936ae005bf269212

facebook-github-bot · 2022-02-24T03:22:50Z

This pull request was exported from Phabricator. Differential Revision: D33994011

Summary: Pull Request resolved: #72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41

github-actions · 2022-02-24T09:31:54Z

Hey @qihqi.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot · 2022-02-24T16:39:09Z

This pull request has been reverted by 3bd1507. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

facebook-github-bot · 2022-03-03T01:53:44Z

This pull request has been reverted by 3bd1507. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

Summary: Pull Request resolved: pytorch/pytorch#72596 debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings. Since many SourceRange shares the same source, the string for trace can be deduped. The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression). The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup. To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction. Test Plan: unit test Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents: ``` [qihan@devvm5585.vll0 ~]$ du archive -h 4.0K archive/xl_model_weights 3.7M archive/extra 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K archive/code/__torch__/caffe2/torch/fb/model_transform 8.0K archive/code/__torch__/caffe2/torch/fb 8.0K archive/code/__torch__/caffe2/torch 8.0K archive/code/__torch__/caffe2 20M archive/code/__torch__/torch/fx/graph_module 20M archive/code/__torch__/torch/fx 8.0K archive/code/__torch__/torch/classes 20M archive/code/__torch__/torch 20M archive/code/__torch__ 20M archive/code 2.7M archive/constants 35M archive [qihan@devvm5585.vll0 ~]$ du resaved -h 4.0K resaved/extra 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting 8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform 8.0K resaved/code/__torch__/caffe2/torch/fb 8.0K resaved/code/__torch__/caffe2/torch 8.0K resaved/code/__torch__/caffe2 1.3M resaved/code/__torch__/torch/fx/graph_module 1.3M resaved/code/__torch__/torch/fx 8.0K resaved/code/__torch__/torch/classes 1.4M resaved/code/__torch__/torch 1.4M resaved/code/__torch__ 1.4M resaved/code 2.7M resaved/constants 13M resaved [qihan@devvm5585.vll0 ~]$ ``` Reviewed By: JasonHanwen Differential Revision: D33994011 fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41 (cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)

pytorch-bot bot added the ciflow/default label Feb 9, 2022

facebook-github-bot added the cla signed label Feb 9, 2022

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Feb 9, 2022

qihqi requested review from zhxchen17 and gmagogsfm February 9, 2022 19:06

qihqi force-pushed the export-D33994011 branch from 1d35b17 to 41c9309 Compare February 9, 2022 19:14

qihqi force-pushed the export-D33994011 branch from 41c9309 to b0a9daf Compare February 9, 2022 23:14

gmagogsfm reviewed Feb 9, 2022

View reviewed changes

qihqi force-pushed the export-D33994011 branch from b0a9daf to 9ea1cd4 Compare February 9, 2022 23:48

qihqi force-pushed the export-D33994011 branch from 9ea1cd4 to b42cbca Compare February 10, 2022 00:45

qihqi requested a review from gmagogsfm February 10, 2022 00:46

qihqi force-pushed the export-D33994011 branch 2 times, most recently from 32d5f8b to 5ab3a46 Compare February 10, 2022 19:58

qihqi force-pushed the export-D33994011 branch from a2e59ed to bb26508 Compare February 18, 2022 19:24

qihqi requested a review from gmagogsfm February 18, 2022 19:31

qihqi force-pushed the export-D33994011 branch from bb26508 to 38107a3 Compare February 18, 2022 22:04

qihqi force-pushed the export-D33994011 branch from 38107a3 to 0a35885 Compare February 22, 2022 17:51

gmagogsfm approved these changes Feb 23, 2022

View reviewed changes

qihqi force-pushed the export-D33994011 branch from 0a35885 to e7c07b2 Compare February 24, 2022 02:09

qihqi force-pushed the export-D33994011 branch from e7c07b2 to 848ecb5 Compare February 24, 2022 03:16

qihqi force-pushed the export-D33994011 branch from 848ecb5 to 96db16e Compare February 24, 2022 03:22

pytorchmergebot closed this in 3d37f5b Feb 24, 2022

facebook-github-bot added the Reverted label Feb 24, 2022

rohan-varma mentioned this pull request Mar 1, 2022

[Meta] CI Revert Tracker #66178

Closed

Make debug_pkl smaller by only emitting unique traces. #72596

Make debug_pkl smaller by only emitting unique traces. #72596

Uh oh!

Conversation

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/4)

Lint / quick-checks (2/4)

linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (3/4)

win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (4/4)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!