Description
I tried replicating CovidQA experiments on Colab (Tesla K80), Compute Canada (Tesla V100) and locally (GTX1650) on #134 with the index from https://www.dropbox.com/s/z8s0urul6l4zig2/lucene-index-cord19-paragraph-2020-05-12.tar.gz?dl=1
Re-Ranking with Random (I got the same results)
python -um pygaggle.run.evaluate_kaggle_highlighter --method random --dataset data/kaggle-lit-review-0.2.json --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
precision@1 0.0
recall@3 0.0199546485260771
recall@50 0.3247165532879819
recall@1000 1.0
mrr 0.03999734528458418
mrr@10 0.020888672929489253
python -um pygaggle.run.evaluate_kaggle_highlighter --method random --split kq --dataset data/kaggle-lit-review-0.2.json --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
precision@1 0.0
recall@3 0.0199546485260771
recall@50 0.3247165532879819
recall@1000 1.0
mrr 0.03999734528458418
mrr@10 0.020888672929489253
Re-Ranking with BM25
I got the following error on the three machines:
File "/pygaggle/pygaggle/model/evaluate.py", line 161, in evaluate
scores = [x.score for x in self.reranker.rerank(example.query,
File "/pygaggle/pygaggle/rerank/bm25.py", line 46, in rerank
idfs = {w:
File "/pygaggle/pygaggle/rerank/bm25.py", line 48, in <dictcomp>
text.metadata['docid'], w) for w in tf}
KeyError: 'docid'
I replaced text.metadata['docid']
by text.title['docid']
in /pygaggle/pygaggle/rerank/bm25.py
and got the same results for the 2 commands:
python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 --dataset data/kaggle-lit-review-0.2.json --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
precision@1 0.15384615384615385
recall@3 0.21865889212827985
recall@50 0.7208778749595076
recall@1000 0.7582928409459021
mrr 0.25329970378011524
mrr@10 0.23344131303314977
python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 --split kq --dataset data/kaggle-lit-review-0.2.json --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
precision@1 0.15384615384615385
recall@3 0.21865889212827985
recall@50 0.7208778749595076
recall@1000 0.7582928409459021
mrr 0.25441237140238665
mrr@10 0.23493413238311195
Re-Ranking with monoT5
I tried with Python 3.6.9, 3.7.3 and 3.8 with the corresponding requirements and got the following error in all cases:
(Besides changing torch version, I have not tried looking into this error)
File /pygaggle/pygaggle/run/evaluate_kaggle_highlighter.py", line 193, in <module>
main()
File "/pygaggle/pygaggle/run/evaluate_kaggle_highlighter.py", line 182, in main
reranker = construct_map[options.method](options)
File "/pygaggle/pygaggle/run/evaluate_kaggle_highlighter.py", line 81, in construct_t5
model = loader.load().to(device).eval()
File "/pygaggle/pygaggle/model/serialize.py", line 76, in load
return self._fix_t5_model(T5ForConditionalGeneration.from_pretrained(
File "/pygaggle/pygaggle/model/serialize.py", line 34, in _fix_t5_model
model.decoder.block[0].layer[1].EncDecAttention.\
File ".../.../torch/nn/modules/module.py", line 778, in __getattr__
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'T5Attention' object has no attribute 'relative_attention_bias'