Revert: Simplify how we set pad token and pad token ID for huggingfac… #3897
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This revert addresses a critical issue introduced in the previous pull request (#3735), where the simplification of pad token configuration inadvertently led to the PAD token being mapped to the same token ID as the UNK (unknown) token. This mapping anomaly resulted in quality degradation during fine-tuning.
The problem surfaced as the model, instead of learning to predict an EOS (end-of-sequence) token to indicate stopping at the end of a sequence, learned to predict an UNK token at the end of sequences. This hindered the model's ability to recognize when to halt during generation, impacting the overall performance and quality of the fine-tuned model.
This reversion aims to restore the previous pad token setup and rectify the unintended mapping issue, ensuring that the model correctly learns to predict EOS tokens for proper sequence termination during fine-tuning.
Demonstration of the bug that was introduced using Llama-2
Current:
The issue here is that we're mapping the new PAD token to the same token ID as the UNK token, which is the last token that's passed into the model's forward pass.
This is what used to happen before (and what this revert will go back to)