8000 Change limit from 750 char to words by vishnoianil · Pull Request #208 · instructlab/ui · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Change limit from 750 char to words #208

New issue
Merged
merged 1 commit into from
Sep 26, 2024

Conversation

vishnoianil
Copy link
Member

After discussion with Taxonomy Traiger team, it was concluded that at this point of time, having a 750 max words limit is a better constraint compared to 750 char.
Ideally this constraints should be enforced based on the number of tokens. Each model can have it's own way of splitting the string in the tokens, so determining number of tokens in any string depends on the model. Having a tokenizer service, that exposes a REST API to return the numbers of tokens in the provided string based on the base model, would be a really useful service in this scenario. We can leverage that to enforce the limits in terms of token.

Signed-off-by: Anil Vishnoi <vishnoianil@gmail.com>
@nerdalert nerdalert merged commit e3921aa into instructlab:main Sep 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0