-
Notifications
You must be signed in to change notification settings - Fork 100
Add mono models to huggingface model-zoo and incorporate into pipeline #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
device = torch.device(options.device) | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of try/except, should we pass a flag to distinguish PT from TF checkpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire segment of code should give an error since loader
is not defined anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodrigonogueira4 Yes, I feel a flag makes more sense. will add an flag --from_tf
for evaluate_passage_ranker.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronakice oh, I didn't notice there is a one change difference between my gcloud and local. Will remove that line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huggingface's work-around is something like bool("ckpt" in model_name_or_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_name_or_path is a directory may contains both "ckpt" and pytorch_model.bin
? https://huggingface.co/google/bert_uncased_L-10_H-128_A-2#list-files , so maybe still need to be stated by user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid problems with PT files containing ckpt
in their name, I would
8000
just go with the flag.
I feel like the model in the model-zoo is missing the vocab file? |
@ronakice Is this because we are using t5-base as tokenizer? |
https://huggingface.co/valhalla/t5-base-squad#list-files compare this with our list-files and it seems like our model is missing a few! Not sure if all are useful though! |
Currently in
I think the missed files are from tokenizer.(e.g. config file and spiece.model or vocal file). The pretrained model itself doesn't require these files. (The original gs://neuralresearcher_data/doc2query/experiments/367 doesn't contains vocab files as well.)
Do we need to find a way to integrate the pretrained tokenizer files to our model files? |
Yes, it can be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for implementing this!
Great! We can merge this. Thanks for doing this @MXueguang |
No description provided.