In T336927, 18 rounds of add-a-link models were trained and for the pipelines that succeeded models were published here:
https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/
Below is a list of wikis whose models were not published:
Wikis | Reason |
jawiki, aswiki | T304548#7937512 |
bowiki | T304549#8060880 |
dzwiki | T304551#8412493 |
diqwiki, dvwiki | T304551#8417373 |
fywiki | T308133#8459395 |
ganwiki | T308133#8469595 |
hywwiki | T308134#8548734 |
krcwiki | T308135#8632750 |
T308136#8648765 | |
mnwwiki, mywiki | T308137#8690680 |
piwiki | T308138#8708597 |
zhwiki | T308139#8720236 |
wuuwiki, zh_classicalwiki, zh_yuewiki | T308139#8728522 |
shnwiki | T308141#8778455 |
snwiki, szywiki | T308142#8804657 |
tiwiki, urwiki | T308143#8827377 |
The goal is to improve the link-recommendation algorithm in order to support the languages listed above.
*lrcwiki was closed (T330616). thus, there is no need to train the add-a-link model for that language.
Notes:
In T304548#7937512, jawiki did not pass the backtesting evaluation. The suggested next steps are to manually inspect the model with users who have experience with this language or use google-translate as the link-recommendation algorithm is iteratively improved until the model passes the backtesting evaluation.
Models to be inspected and their datasets can be found on the stat1008 machine:
WIKI_ID=jawiki cd /home/kevinbazira/mwaddlink/data/$WIKI_ID