Fix ZeroDivisionError when running backtesting evaluation
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	kevinbazira
	Jun 28 2022, 9:15 AM

Description

When working on generating datasets and "add a link" models for the 5th round of wikis, the training pipeline worked well for all models besides the Tibetan Wikipedia - bowiki. As seen in the screenshot below, the pipeline kept failing at the point where it runs the backtesting evaluation:

add-a-link backtesting evaluation error.png (741×1 px, 167 KB)

Checked files to confirm and indeed the bowiki.backtest.eval.csv file is missing as shown in the screenshot below:

add-a-link bowiki backtesting evaluation csv missing.png (741×1 px, 186 KB)

Talked to @MGerlach and he said:
This means that tot_ret=0 which leads to a division by zero when calculating the precision. the precision is defined as the TruePositives/(TruePositives+FalsePositives), i.e. from all the links suggested by the algorithm (TP+FP) how many were indeed correct links (TP). In this case, this means that the algorithm did not suggest any links (TP+FP=0). In this case, the precision is not defined. We should make add a check that the denominators in the precision and recall or larger than 0 and otherwise return NaN.
I believe this happens because the train and test data in bowiki is very small (27 sentences each only). while wikistats says there are ~12k articles in bowiki, most of them seem to be without any links which is actually quite interesting (here you can check random articles in bowiki). we discard articles without links for training or testing because we need actual links to train and test.

So, overall, we just have to create a patch where we add something like:

from math import nan
if tot_ret>0:
    micro_precision = tot_TP / tot_ret
else:
    micro_precision = nan
if tot_rel>0:
    micro_recall = tot_TP / tot_rel
else:
    micro_recall = nan

Details

	Subject	Repo	Branch	Lines +/-
	Fix ZeroDivisionError when running backtesting evaluation	research/mwaddlink	main	+19 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	• lbowmaker	T307881 Scaling of link suggestions service
Open	Trizek-WMF	T304110 [EPIC] Deploy "add a link" to all Wikipedias
Resolved	Etonkovidova	T304549 Deploy "add a link" to 5th round of wikis
Resolved	kevinbazira	T311485 Fix ZeroDivisionError when running backtesting evaluation

Event Timeline

kevinbazira created this task.Jun 28 2022, 9:15 AM

Restricted Application added a project: Growth-Team. · View Herald TranscriptJun 28 2022, 9:15 AM

kevinbazira mentioned this in T304549: Deploy "add a link" to 5th round of wikis.Jun 28 2022, 9:17 AM

JJMC89 removed a project: User-notice.Jun 28 2022, 3:27 PM

kevinbazira claimed this task.Jun 29 2022, 7:05 AM

kevinbazira moved this task from Parked to In Progress on the Machine-Learning-Team (Active Tasks) board.

Change 809527 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/mwaddlink@main] Fix ZeroDivisionError when running backtesting evaluation

https://gerrit.wikimedia.org/r/809527

gerritbot added a project: Patch-For-Review.Jun 29 2022, 7:11 AM

Sgs moved this task from Inbox to Upcoming Work on the Growth-Team board.Jun 29 2022, 7:30 AM

kostajh edited projects, added Growth-Team (Sprint 0 (Growth Team)); removed Growth-Team.Jun 29 2022, 10:55 AM

kostajh moved this task from Incoming to Code Review on the Growth-Team (Sprint 0 (Growth Team)) board.

Not sure if any QA is needed, or if we should just mark this as resolved. @kevinbazira resolve it if you think it's OK :)

Change 809527 merged by jenkins-bot:

[research/mwaddlink@main] Fix ZeroDivisionError when running backtesting evaluation

https://gerrit.wikimedia.org/r/809527

kevinbazira mentioned this in rRMWA48f7dffeb82a: Fix ZeroDivisionError when running backtesting evaluation.Jun 29 2022, 11:12 AM

Maintenance_bot removed a project: Patch-For-Review.Jun 29 2022, 11:30 AM

calbon closed this task as Resolved.Jul 19 2022, 2:34 PM

calbon moved this task from In Progress to Completed on the Machine-Learning-Team (Active Tasks) board.

	F35283203: add-a-link bowiki backtesting evaluation csv missing.png
	Jun 28 2022, 9:15 AM

	F35283199: add-a-link backtesting evaluation error.png
	Jun 28 2022, 9:15 AM

Fix ZeroDivisionError when running backtesting evaluationClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Fix ZeroDivisionError when running backtesting evaluation
Closed, ResolvedPublic
Actions

Related Objects
Search...