PyTorch serialization formats #31877

lutzroeder · 2020-01-05T23:38:06Z

@soumith, @ezyang, as PyTorch serialization formats keep changing and evolving, is there a scheme to name and version the different formats to avoid confusion?

Something along these lines:

Description	Format
Tar file with `sys_info`, `pickle`, `storages`, `tensors`	PyTorch v0.1.1
Multi-Pickle file with `8a0a6cfc9c...` signature	PyTorch v0.1.10
Zip file containing `constants.pkl` and `model.json`	TorchScript v1.0
Zip file containing `constants.pkl` and `data.pkl`	TorchScript v1.3
Zip file containing `data.pkl` but no code	PyTorch v1.3

cc @ezyang @gchanan @zou3519 @suo

The text was updated successfully, but these errors were encountered:

ezyang · 2020-01-08T17:01:54Z

I agree we should have these docs

ezyang · 2020-01-08T17:02:16Z

I believe the zip format should have a magic number that is used for versioning but I am not sure we have publicly documented it

lutzroeder · 2020-01-09T02:21:46Z

The higher order question is if there is a consistent and "official" way to refer to ALL formats, including the legacy formats. If the scheme in the description above makes sense, then Zip format version in the /version file should probably match the version of PyTorch the format change was introduced in (/version is already set to 1 for multiple formats so PyTorch Zip v1 doesn't work as a description). This would lead to a pattern like PyTorch Zip v1 (Preview), PyTorch Zip v1, PyTorch Zip TorchScript v1, PyTorch Multi-Pickle and PyTorch Tar Pickle which would be more descriptive but gets confusing quickly.

driazati · 2020-01-09T21:21:25Z

Going through the current versions down to 0.3, it looks like we haven't been that great about versioning. The eager format has been very stable as the multi-pickle file (no changes since 0.3, the .tar format is 3 years old and I couldn't even get torch<0.3 to install correctly), and the protocol version / magic numbers we encode in thusly the eager serialization haven't changed (1001 and 0x1950A86A20F9469CFC6C respectively).

The TorchScript format has gone through more structural changes with only 1 really fundamental change (when model.json was removed in favor of data.pkl), but (as you have found) we haven't been bumping the version numbers accordingly (we only did once in #28122). So there's really no way to tell the versions apart from inspecting the presence of certain files.

There is also a new serialization format for eager ("Eager v2" below) coming in 1.4.0 that will be hidden be 8000 hind a flag (torch.save(obj, file, _use_new_zipfile_serialization=True)), but the serialized format is a zip file that matches the TorchScript version.

PyTorch Version	Eager Format	TorchScript Format	TorchScript archive/version file
1.4	Eager v1 / Eager v2 (via a flag)	Script v4	2
1.3	Eager v1	Script v4	1
1.2	Eager v1	Script v3	1
1.1	Eager v1	Script v2	1
1.0	Eager v1	Script v1	1
0.4	Eager v1	n/a	n/a
0.3	Eager v1	n/a	n/a

These commands show the contents of the zipfiles for each version, the binaries are all from here.

Script v1, v2, v3, v4

$ unzip script_example_1.4.0.pt
Archive:  script_example_1.4.0.pt
 extracting: script_example_1.5.0a0/version
 extracting: script_example_1.5.0a0/data/0
 extracting: script_example_1.5.0a0/data/1
 extracting: script_example_1.5.0a0/data.pkl
  inflating: script_example_1.5.0a0/code/__torch__.py
  inflating: script_example_1.5.0a0/code/__torch__.py.debug_pkl
 extracting: script_example_1.5.0a0/constants.pkl

$ unzip script_example_1.3.1.pt
Archive:  script_example_1.3.1.pt
 extracting: script_example_1.3.1/version
  inflating: script_example_1.3.1/code/__torch__.py
  inflating: script_example_1.3.1/code/__torch__.py.debug_pkl
 extracting: script_example_1.3.1/constants.pkl
 extracting: script_example_1.3.1/data/0
 extracting: script_example_1.3.1/data/1
 extracting: script_example_1.3.1/data.pkl

$ unzip script_example_1.2.0.pt
Archive:  script_example_1.2.0.pt
 extracting: script_example_1.2.0/version
 extracting: script_example_1.2.0/code/script_example_1.2.0.py
 extracting: script_example_1.2.0/debug/script_example_1.2.0.pkl
 extracting: script_example_1.2.0/attributes.pkl
 extracting: script_example_1.2.0/tensors/0
 extracting: script_example_1.2.0/tensors/1
 extracting: script_example_1.2.0/model.json

$ unzip script_example_1.1.0.pt
Archive:  script_example_1.1.0.pt
 extracting: script_example_1.1.0/version
 extracting: script_example_1.1.0/code/script_example_1.1.0.py
 extracting: script_example_1.1.0/attributes.pkl
 extracting: script_example_1.1.0/tensors/0
 extracting: script_example_1.1.0/tensors/1
 extracting: script_example_1.1.0/model.json

$ unzip script_example_1.0.0.pt
Archive:  script_example_1.0.0.pt
 extracting: script_example_1.0.0/version
 extracting: script_example_1.0.0/code/script_example_1.0.0.py
 extracting: script_example_1.0.0/tensors/0
 extracting: script_example_1.0.0/tensors/1
 extracting: script_example_1.0.0/model.json

Eager v2

$ unzip unzip eager_example_new_1.5.0a0.pt
Archive:  eager_example_new_1.5.0a0.pt
 extracting: eager_example_new_1.5.0a0/version  
 extracting: eager_example_new_1.5.0a0/data.pkl  
 extracting: eager_example_new_1.5.0a0/data/94148253811520

lutzroeder · 2020-01-10T03:25:00Z

@driazati thank you for sharing the files.

686e8d3 introduced .tar on 2016-08-22 which matches PyTorch v0.1.1.
e71cf20 introduced 0x1950A86A20F9469CFC6C on 2017-02-22 which matches PyTorch v0.1.10.

I'm trying to get more specific on how different formats can be uniquely named. For example, instead of "this file is in .tar format", would it be correct to say "this file is in PyTorch v0.1.1 format"? Not sure PyTorch Eager v1 would mean much and sounds like going forward both Eager and TorchScript going to use the same Zip container format.

For example, assume a tool needs to tell a user which format a .pth file is in:

def get_display_format(file):
  if is_tar_file(file):
    return "XXXXX" # PyTorch v0.1.1?
  if is_multi_pickle_file(file):
    return "XXXXX" # PyTorch v0.1.10?
  if is_zip_file(file):
    if has_version_file(1) or has_version_file(None):
      if has_model_json(file):
        return "XXXXX" # PyTorch v1.0?
      if has_data_pkl(file) and has_code_folder(file):
        return "XXXXX" # PyTorch v1.3 (TorchScript)?
      if has_data_pickle(file) and not has_code_folder(file):
        return "XXXXX" # PyTorch v1.4?
    else:
      if has_code_folder(file):
        return "XXXXX v" + generate_display_version_from_version_file(file) + " (TorchScript)"
      else:
        return "XXXXX v" + generate_display_version_from_version_file(file)

Question 1: What should those specific XXXXX format names be? Would it make sense to just use the PyTorch version that started creating this format and if so are the ones suggested in comments correct? Sounds like the answer is yes. Detecting script variants might be difficult but if it's possible there could be more conditions added to be more specific.

Question 2: What can be done to simplify this so generate_display_version_from_version_file would produce a human readable version going forward? Maybe TorchScript archive/version file (the last column in the table) going forward should match the actual PyTorch Version (first column) that introduced this format?

Question 3: Some files include tensor state only while others include code or model structure as well. Since this seems to cause confusion among users is there a recommended way to include this in the format XXXXX name? Not sure there are any easy ways to do this (same issue exists for Keras) so it's if there isn't an answer.

driazati · 2020-01-10T19:20:06Z

Can you give some more context on the background for this issue? An easy way to get everything into 1 format would be to load and re-save in the latest version of PyTorch. Since we're fully backwards compatible down to 0.1, we can load any objects (in the eager case) or TorchScript models and save them with the latest format.

I think a reasonable naming scheme would be something like (Eager|TorchScript) v(1.0.0|1.1.0|etc.) so it's informative a) what type of file it is and b) tells the minimum version of PyTorch that can load that file
Going forward the version 8000 number should similarly refer to the minimum version of PyTorch that can load this archive.
This is the main distinction between TorchScript and Eager saves. TorchScript (torch.jit.save) files are made to allow exporting a model to C++, so it includes the module hierarchy and TorchScript code details. Eager mode save files (torch.save) follow Python's pickle, with a layer on top to support saving Tensors, and pickle does not save class definitions or code, it relies on the code referenced within to be defined when it is loaded.

lutzroeder · 2020-01-11T04:11:52Z

Context is which format description Netron should show to users for diagnosing issues or discussing the changes PyTorch is going through. The other goal is to have some principles for version changes that will be followed going forward.

Would this be a correct representation of what we discussed so far?

def get_display_format(file):
  if is_tar_file(file):
    return "PyTorch Eager v0.1.1"
  if is_multi_pickle_file(file):
    return "PyTorch Eager v0.1.10"
  if is_zip_file(file):
    if has_version(file, 2) or has_version(file, 1) or has_version(file, None):
      if has_model_json(file):
        return "PyTorch Script v1.0"
      if has_data_pkl(file):
        if has_code_folder(file):
          if has_version_file(file, 2):
            return "PyTorch Script v1.4"
          return "PyTorch Script v1.3"
        else:
          return "PyTorch Eager v1.4"
    else:
      if has_code_folder(file):
        return "PyTorch Script v" + generate_display_version_from_version_file(file)
      else:
        return "PyTorch Eager v" + generate_display_version_from_version_file(file)

It is still unclear how generate_display_version_from_version_file would work going forward. Are you planing to change the implementation of /version to store 1.5 instead of 2 going forward?

driazati · 2020-01-13T18:36:45Z

Looks mostly good (pinging @ezyang for any thoughts), a couple notes:

instead of has_code_folder checking for the presence of constants.pkl is probably safer, this is what we do in our deserialization code to differentiate between the two
Instead of things like PyTorch Script v1.0 it should be TorchScript v1.0 so everything is consistent with our docs and tutorials

So in the end it'd look something like

def get_display_format(file):
  if is_tar_file(file):
    return "PyTorch v0.1.1"

  if is_multi_pickle_file(file):
    return "PyTorch v0.1.10"

  if is_zip_file(file):
    if has_model_json(file):
      if has_attribute_pkl():
        return "TorchScript v1.1"
      else:
        return "TorchScript v1.0"

    if has_data_pkl(file):
      if has_constants_pkl(file):
        if has_version_file(file, 2):
          return "TorchScript v1.4"
        return "TorchScript v1.3"
      else:
        return "PyTorch v1.4"

We discussed the /version internally and it probably won't change from its current format (a single number that gets bumped any time we change the way we serialize TorchScript code, so not any time there are changes to the file format)

lutzroeder · 2020-01-14T03:30:54Z

Is TorchScript v1.3 intuitive enough? The docs never mention that TorchScript v1.3 would be related to requiring PyTorch v1.3 vs. calling it PyTorch TorchScript v1.3 or PyTorch v1.3 TorchScript making this more explicit?

We discussed the /version internally and it probably won't change from its current format (a single number that gets bumped any time we change the way we serialize TorchScript code, so not any time there are changes to the file format)

Is there a way for tools to derive the PyTorch version needed for a given TorchScript file? If /version isn't changing would it make sense to add another field or file like /producer to add this information?

ezyang · 2020-01-14T16:51:41Z

@driazati I have little to say about the exact details of how we are version testing, or what the variants of the versions should be named (aligning them with PyTorch releases sounds reasonable). What I don't see in this discussion is whether or not the team is going to commit to accurately reporting versions on the file format going forward, and if so, what mechanisms we can put in place to make sure that we update it when we make changes to the format (since it seems the lightweight mechanism of code review isn't working). A simple stopgap is to have it report the version of PyTorch which exported the model...

lutzroeder · 2020-02-09T20:58:40Z

A simple stopgap is to have it report the version of PyTorch which exported the model...

Agree. Given the scheme we discussed above this would also make the most sense for tooling. The current /version stamp in TorchScript files hasn't been updated consistently and it is also not very useful or descriptive for tooling.

ezyang added the high priority label Jan 8, 2020

pytorch-probot bot added the triage review label Jan 8, 2020

ezyang added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jan 8, 2020

lutzroeder added a commit to lutzroeder/netron that referenced this issue Jan 11, 2020

PyTorch formats (pytorch/pytorch#31877)

7301b3d

lutzroeder added a commit to lutzroeder/netron that referenced this issue Jan 14, 2020

PyTorch formats (pytorch/pytorch#31877)

a6ea26f

lutzroeder added a commit to lutzroeder/netron that referenced this issue Jan 14, 2020

PyTorch formats (pytorch/pytorch#31877)

8943bbc

lutzroeder added a commit to lutzroeder/netron that referenced this issue Jan 15, 2020

PyTorch formats (pytorch/pytorch#31877)

5748f04

driazati mentioned this issue Feb 13, 2020

Include PyTorch version in serialization formats #33307

Closed

suo added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 28, 2020

lutzroeder added a commit to lutzroeder/netron that referenced this issue Mar 6, 2020

PyTorch formats (pytorch/pytorch#31877)

6016a7c

suo assigned driazati Apr 3, 2020

lutzroeder added a commit to lutzroeder/netron that referenced this issue Apr 11, 2020

Workaround pytorch/pytorch#31877

b01165d

This was referenced Jun 20, 2020

Adds dynamic versioning pattern #40279

Closed

Throws runtime error when performing integer division using torch.div #38620

Closed

PhenomenalOnee mentioned this issue Sep 3, 2020

Pre Trained model? yijingru/BBAVectors-Oriented-Object-Detection#7

Closed

lutzroeder added a commit to lutzroeder/netron that referenced this issue Sep 6, 2020

Workaround pytorch/pytorch#31877

b9e2c5e

lutzroeder closed this as completed Sep 6, 2020

lutzroeder mentioned this issue Apr 11, 2021

PyTorch support lutzroeder/netron#720

Closed

junjihashimoto mentioned this issue Jun 29, 2021

LibTorch cannot load yolov5 exported model hasktorch/hasktorch-serving-models-skeleton#1

Open

suhacker1 mentioned this issue Feb 9, 2024

Polyglot module improvements trailofbits/fickling#93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch serialization formats #31877

PyTorch serialization formats #31877

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PyTorch serialization formats #31877

PyTorch serialization formats #31877

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!