8000 Rust - Fix Serialization when tokens are part of original vocab by n1t0 · Pull Request #315 · huggingface/tokenizers · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Rust - Fix Serialization when tokens are part of original vocab #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 22, 2020

Conversation

n1t0
Copy link
Contributor
@n1t0 n1t0 commented Jun 22, 2020

When we add only special tokens, which are all parts of the original vocabulary, self.added_tokens_map.len() was 0, leading to a weird serialization.

For example, for Bert it was producing:

[],{"id":0,"special":true,"content": ...

@n1t0 n1t0 merged commit 42983cc into master Jun 22, 2020
@n1t0 n1t0 deleted the fix-serialization branch June 22, 2020 16:52
chris-ha458 added a commit to chris-ha458/tokenizers that referenced this pull request Aug 9, 2023
Narsil pushed a commit that referenced this pull request Aug 10, 2023
* CD backports

follow
huggingface/safetensors#317

* fix node bindings?

`cargo check` doesnt work on my local configuration from `tokenizers/bindings/node/native`
i don't think it will be a problem but i have difficulty telling

* backport #315

* safetensors#317 back ports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0