[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

Fix ZeroDivisionError when running backtesting evaluation
Closed, ResolvedPublic

Description

When working on generating datasets and "add a link" models for the 5th round of wikis, the training pipeline worked well for all models besides the Tibetan Wikipedia - bowiki. As seen in the screenshot below, the pipeline kept failing at the point where it runs the backtesting evaluation:

add-a-link backtesting evaluation error.png (741×1 px, 167 KB)

Checked files to confirm and indeed the bowiki.backtest.eval.csv file is missing as shown in the screenshot below:

add-a-link bowiki backtesting evaluation csv missing.png (741×1 px, 186 KB)

Talked to @MGerlach and he said:
This means that tot_ret=0 which leads to a division by zero when calculating the precision. the precision is defined as the TruePositives/(TruePositives+FalsePositives), i.e. from all the links suggested by the algorithm (TP+FP) how many were indeed correct links (TP). In this case, this means that the algorithm did not suggest any links (TP+FP=0). In this case, the precision is not defined. We should make add a check that the denominators in the precision and recall or larger than 0 and otherwise return NaN.
I believe this happens because the train and test data in bowiki is very small (27 sentences each only). while wikistats says there are ~12k articles in bowiki, most of them seem to be without any links which is actually quite interesting (here you can check random articles in bowiki). we discard articles without links for training or testing because we need actual links to train and test.

So, overall, we just have to create a patch where we add something like:

from math import nan
if tot_ret>0:
    micro_precision = tot_TP / tot_ret
else:
    micro_precision = nan
if tot_rel>0:
    micro_recall = tot_TP / tot_rel
else:
    micro_recall = nan

Event Timeline

Change 809527 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/mwaddlink@main] Fix ZeroDivisionError when running backtesting evaluation

https://gerrit.wikimedia.org/r/809527

kostajh subscribed.

Not sure if any QA is needed, or if we should just mark this as resolved. @kevinbazira resolve it if you think it's OK :)

Change 809527 merged by jenkins-bot:

[research/mwaddlink@main] Fix ZeroDivisionError when running backtesting evaluation

https://gerrit.wikimedia.org/r/809527