Fix Zero div and T5 decoding #16

ronakice · 2020-05-01T04:11:55Z

When evaluating MS-MARCO/other datasets, I think it is better to score all samples regardless of whether or not the sample has a label. They could not have labels for instance if it failed to retrieve relevant passage in the first round of ranking. Setting recall with zero_division equivalent to 0 instead of 1 should score this right (also consistent with msmarco_eval.py).

This issue also fixes a bug with the T5 decoder. Since we want the output to be over the T/F label distribution of the first decoder output, and not the second (which is what the code was previously doing). Note that the way that was done is not all that bad in terms of performance surprisingly. As a result of this fix, T5 outperforms all other models in all regards in CovidQA.

natural question:
precision@1 0.27419354838709675
recall@3 0.43502304147465437
recall@50 0.9305683563748081
recall@1000 1.0
mrr 0.4224002621206025
mrr@10 0.4097638248847927

keyword question:
precision@1 0.24193548387096775
recall@3 0.36378648233486943
recall@50 0.9230414746543779
recall@1000 1.0
mrr 0.38249784501639117
mrr@10 0.3701228878648234

Better performance than old version but surprisingly close like I said earlier!

rodrigonogueira4 · 2020-05-01T11:46:21Z

pygaggle/rerank/transformer.py

@@ -30,7 +30,7 @@ def rerank(self, query: Query, texts: List[Text]) -> List[Text]:
            attn_mask = batch.output['attention_mask']
            _, batch_scores = greedy_decode(self.model,
                                            input_ids.to(self.device),
-                                            length=2,
+                                            length=1,


I can't remember exactly why, but @daemon had a good reason to use length=2, so I will let him comment here

The reason was to reproduce what the TensorFlow implementation had been doing. If using a single step of decoding improves quality, then that's all the better.

Adding a comment would be a good idea...

Sure, that belongs in covidex.

rodrigonogueira4 · 2020-05-01T12:17:58Z

pygaggle/model/evaluate.py

@@ -75,7 +75,7 @@ def accumulate(self, scores: List[float], gold: RelevanceExample):
        score_rels = self.truncated_rels(scores)
        score_rels[score_rels != 0] = 1
        gold_rels = np.array(gold.labels, dtype=int)
-        score = recall_score(gold_rels, score_rels, zero_division=1)
+        score = recall_score(gold_rels, score_rels, zero_division=0)


Yeah, that makes sense. @daemon, before we merge this, I just want to double-check with you if there was a reason for zero_division=1

No objection.

fix zero div and t5 decoder

2749a04

ronakice requested review from daemon and rodrigonogueira4 May 1, 2020 04:12

rodrigonogueira4 reviewed May 1, 2020

View reviewed changes

daemon approved these changes May 1, 2020

View reviewed changes

ronakice merged commit 3469584 into master May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Zero div and T5 decoding #16

Fix Zero div and T5 decoding #16

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix Zero div and T5 decoding #16

Fix Zero div and T5 decoding #16

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!