-
Notifications
You must be signed in to change notification settings - Fork 100
Fix Zero div and T5 decoding #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -30,7 +30,7 @@ def rerank(self, query: Query, texts: List[Text]) -> List[Text]: | |||
attn_mask = batch.output['attention_mask'] | |||
_, batch_scores = greedy_decode(self.model, | |||
input_ids.to(self.device), | |||
length=2, | |||
length=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember exactly why, but @daemon had a good reason to use length=2, so I will let him comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was to reproduce what the TensorFlow implementation had been doing. If using a single step of decoding improves quality, then that's all the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment would be a good idea...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that belongs in covidex.
@@ -75,7 +75,7 @@ def accumulate(self, scores: List[float], gold: RelevanceExample): | |||
score_rels = self.truncated_rels(scores) | |||
score_rels[score_rels != 0] = 1 | |||
gold_rels = np.array(gold.labels, dtype=int) | |||
score = recall_score(gold_rels, score_rels, zero_division=1) | |||
score = recall_score(gold_rels, score_rels, zero_division=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. @daemon, before we merge this, I just want to double-check with you if there was a reason for zero_division=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection.
When evaluating MS-MARCO/other datasets, I think it is better to score all samples regardless of whether or not the sample has a label. They could not have labels for instance if it failed to retrieve relevant passage in the first round of ranking. Setting recall with zero_division equivalent to 0 instead of 1 should score this right (also consistent with msmarco_eval.py).
This issue also fixes a bug with the T5 decoder. Since we want the output to be over the T/F label distribution of the first decoder output, and not the second (which is what the code was previously doing). Note that the way that was done is not all that bad in terms of performance surprisingly. As a result of this fix, T5 outperforms all other models in all regards in CovidQA.
natural question:
precision@1 0.27419354838709675
recall@3 0.43502304147465437
recall@50 0.9305683563748081
recall@1000 1.0
mrr 0.4224002621206025
mrr@10 0.4097638248847927
keyword question:
precision@1 0.24193548387096775
recall@3 0.36378648233486943
recall@50 0.9230414746543779
recall@1000 1.0
mrr 0.38249784501639117
mrr@10 0.3701228878648234
Better performance than old version but surprisingly close like I said earlier!