8000 Adding `Sequence` for `PostProcessor`. by Narsil · Pull Request #1052 · huggingface/tokenizers · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adding Sequence for PostProcessor. #1052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 25, 2022
Merged

Adding Sequence for PostProcessor. #1052

merged 4 commits into from
Aug 25, 2022

Conversation

Narsil
Copy link
Collaborator
@Narsil Narsil commented Aug 24, 2022

Actually adding the Sequence.

Added a dummy test since actually we didn't need it for OPT.
Still a nice addition for future models/ease of use.

@Narsil Narsil requested review from McPatate and mishig25 August 24, 2022 17:44
@Narsil Narsil force-pushed the update_implementation_to_respect_new_trait branch from 7ab79e8 to a1ffa36 Compare August 24, 2022 17:49
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@Narsil Narsil force-pushed the sequence_processor branch from 782ad99 to decb045 Compare August 24, 2022 17:56
@Narsil Narsil requested a review from SaulLu August 24, 2022 17:57
Copy link
Contributor
@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Got one question: can there be a bad or incompatible sequence (for example: Sequence::new(vec![TemplateProcessing, ByteLevel])) or any sequence woks (i.e. permutationcombination of all processors should work) ?

@Narsil
Copy link
Collaborator Author
Narsil commented Aug 25, 2022

Got one question: can there be a bad or incompatible sequence (for example: Sequence::new(vec![TemplateProcessing, ByteLevel])) or any sequence woks (i.e. permutationcombination of all processors should work) ?

Yes combinations can be "bad" in the sense that I am not really sure what they are supposed to do, but ultimately they are separate in the sense that none actually consumes the encodings until the very last step.

ByteLevel might or might not just remove the spaces from the offsets for instance and can be combined with TemplateProcessing.

If it's done in reverse order the output should be the same unless your special tokens.

@mishig25
Copy link
Contributor

got it 👍

Base automatically changed from update_implementation_to_respect_new_trait to main August 25, 2022 09:49
@Narsil Narsil force-pushed the sequence_processor branch from b98ebb2 to bba16bd Compare August 25, 2022 09:50
@Narsil Narsil merged commit 06025e4 into main Aug 25, 2022
@Narsil Narsil deleted the sequence_processor branch August 25, 2022 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0