working on improving the prompt for generating simple summaries to make sure the language of the output is the same as the input. latest results for sample of 100 articles in this spreadsheet

Fri, Dec 6, 11:20 AM · Research, OKR-Work

MGerlach closed T369288: [WE.3.1.3] Building a model for content simplification (Q2), a subtask of T342614: Models for text summarization using LLMs, as Resolved.

Fri, Dec 6, 11:11 AM · Epic, address-knowledge-gaps, Research

MGerlach closed T369288: [WE.3.1.3] Building a model for content simplification (Q2) as Resolved.

Closing this task as work is completed.

Fri, Dec 6, 11:11 AM · Research (FY2024-25-Research-October-December), OKR-Work

Fri, Nov 29

MGerlach added a comment to T378490: Create Call for Contributions (CfP) for research track.

weekly update:

first full revision of CfP for this year. Will share with co-chair/organizers before proceeding
starting to put together a plan/templates for advertising the CfP after publication

Fri, Nov 29, 10:53 AM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

After successfully implementing the Aya-expanse-32b model, I am generating simple summaries for the set of sample articles from Web experiments model the model on our own GPUs on ML-lab servers (related to T380643); code: https://gitlab.wikimedia.org/repos/research/simple-summaries/-/blob/main/simple-summary_aya-expanse_experiment01.ipynb
Looking at initial results (see spreadsheet). I have been detecting two issues: i) simple summaries are not much simpler in terms of readability score; ii) output sometimes (~20%) in different language. Testing different ways to adapt the prompt to mitigate the issue.

Fri, Nov 29, 10:51 AM · Research, OKR-Work

MGerlach updated the task description for T378490: Create Call for Contributions (CfP) for research track.

Fri, Nov 29, 10:51 AM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

After successfully implementing the Aya-expanse-32b model, I am generating simple summaries for set of sample articles from Web experiments model the model on our own GPUs on ML-lab servers
- Code: https://gitlab.wikimedia.org/repos/research/simple-summaries/-/blob/main/simple-summary_aya-expanse_experiment01.ipynb
- Results: https://docs.google.com/spreadsheets/d/1kkemzryZGy3mrz5i0_zAhgS8Z6aNZQOjy88F-6MCErA/edit
Drafted documentation of the hypothesis work and added to the meta-page https://meta.wikimedia.org/wiki/Research:Develop_a_model_for_text_simplification_to_improve_readability_of_Wikipedia_articles/FY24-25_WE.3.1.3_content_simplification

Fri, Nov 29, 10:47 AM · Research (FY2024-25-Research-October-December), OKR-Work

Fri, Nov 22

MGerlach added a comment to T378490: Create Call for Contributions (CfP) for research track.

weekly update:

ongoing coordination about the PC co-chair
updated submission templates and repo https://gitlab.wikimedia.org/repos/research/wikiworkshop-templates
Kinneret set up an OpenReview instance for submission https://openreview.net/forum?id=4MwSMQxue6
revising the CfP (topics, troubleshooting for OpenReview sign-up)

Fri, Nov 22, 4:36 PM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach updated the task description for T378490: Create Call for Contributions (CfP) for research track.

Fri, Nov 22, 4:34 PM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

Coordinated with Web Team about filtering low-quality simple summaries for experiments. They applied one of the proposed guard-rail metrics to ensure factual consistency (meaning preservation) between the original article and the simple summary.

Fri, Nov 22, 4:32 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

We switched the model from Aya-23 to the next version Aya-expanse. https://phabricator.wikimedia.org/T379052#10314444
I implemented the Aya-expanse models in our internal infrastructure using the new GPUs on the ml-lab servers. We run the model to generate simple summaries using the Aya-expanse-8b or Aya-expanse-32b model.
- I implemented some quantization techniques so that the models would fit into memory (the larger version does not work out of the box) https://www.deeplearning.ai/short-courses/quantization-fundamentals-with-hugging-face/
- Specifically, we can use the model with different datatypes. For example, using float16 instead of the default float32 reduces the memory footprint of the model in half. In turn, this also reduces time needed for inference. At the same time, it is generally believed that this comes with little to no decrease in model performance.
- For example, the aya-expanse-32b model could not be loaded into memory of the GPU with the default datatype. Instead, using float16, the model’s memory footprint is 60.16GB and thus fits into memory of the GPU. Similarly, for the smaller Aya-expanse-8b the memory footprint decreases from 29.91GB to 14.95GB requiring only half the time to run a single example query (8s vs 3s).
- We can implement additional quantization techniques to further improve the memory footprint and inference time using, for example, the quanto library https://huggingface.co/blog/quanto-introduction This will require more thorough experiments to not only understand different options and potential dependencies that need to be resolved in LiftWing, but also make sure that model quality is preserved. I believe that this is beyond the scope of the current task and should be scoped as a separate task/hypothesis, if we have enough evidence that the model is useful in practice.

Fri, Nov 22, 4:26 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach updated the task description for T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Fri, Nov 22, 4:21 PM · Research (FY2024-25-Research-October-December), OKR-Work

Thu, Nov 21

MGerlach added a comment to T378420: Request: Research viability of us Reading List data for recommendations.

@Jdlrobson I found the following three tables:

Thu, Nov 21, 6:09 PM · Research

Sat, Nov 16

MGerlach added a comment to T378490: Create Call for Contributions (CfP) for research track.

weekly update:

no updates because I was attending the team offsite during this week

Sat, Nov 16, 4:01 PM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

no updates because I was attending the team offsite during this week

Sat, Nov 16, 4:00 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

no updates because I was attending the team offsite during this week

Sat, Nov 16, 4:00 PM · Research (FY2024-25-Research-October-December), OKR-Work

Fri, Nov 8

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Started work with the ML Team on a dedicated subtask to test-deploy Aya models on LiftWing T379052
While we have successfully tested the smaller Aya23-8b model, we have not been able to run the larger Aya23-35b model as it requires too much memory than is available in the available GPUs. We are thus testing the next generation of the Aya models (Aya-expanse) for test-deployment because they have a smaller memory footprint and thus might be easier to run in our infrastructure and, at the same time, are reportedly strictly better than the previous Aya-23 (so we would probably switch to the newer version in future experiments) while also supporting the same set of 23 languages.
We ran first successful experiments with the larger Aya-expanse-32b model in the ML-Lab machines, where we were able to load and run inference.

Fri, Nov 8, 3:33 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach added a comment to T378490: Create Call for Contributions (CfP) for research track.

weekly update:

Coordinated timeline with Kinneret (make sure it aligns with Research Fund and other Wikimedia Research events) and revised slightly to accomodate that
Drafted first version of updated CfP
Starting revising submission templates
Coordinated with Kinneret about creating an OpenReview instance in order to get a submission link that we can mention in the CfP
Coordinated with Kinneret about updates to the Privacy Policy
Main blocker for proceeding now with the CfP is that I am waiting for confirmation of the co-PC chair to finalize timeline and draft for the CfP

Fri, Nov 8, 2:52 PM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly updates:

shared revised set of guardrail metrics for simple summaries with Web Team (googlesheet).
- most of the summaries are substantially simpler than the original and have relatively few grammatical issues.
- most importantly, the meaning preservation metric (summaC) seems very useful to filter simple summaries that are not consistent with the original (e.g. error messages or text that was not contained in the original article). the simple summaries with very low scores should be discarded for the first set of experiments as lower recall is not an issue.
these guardrail metrics thus offer an option to filter out potentially low-quality simple summaries.

Fri, Nov 8, 2:47 PM · Research, OKR-Work

MGerlach updated the task description for T379052: Test the feasibility of deployment of Aya-23 model in LiftWing.

Fri, Nov 8, 2:34 PM · Patch-For-Review, Machine-Learning-Team

MGerlach added a comment to T379052: Test the feasibility of deployment of Aya-23 model in LiftWing.

@isarantopoulos Thanks for the updates

Fri, Nov 8, 2:29 PM · Patch-For-Review, Machine-Learning-Team

Nov 5 2024

MGerlach added a comment to T379052: Test the feasibility of deployment of Aya-23 model in LiftWing.

Update: @isarantopoulos did first experiments with the Aya-23-35B model. It does not work out of the box. The raw version of the model is 65GB on disk and does not fit into memory. We will explore some potential workarounds using a quantized model, e.g. using in8 datatype, to reduce footprint such that its compatible with our infrastructure.

Nov 5 2024, 10:45 AM · Patch-For-Review, Machine-Learning-Team

MGerlach added a comment to T379052: Test the feasibility of deployment of Aya-23 model in LiftWing.

Update: Aya-23-8B model runs successfully in LiftWing. thanks @isarantopoulos

Nov 5 2024, 10:35 AM · Patch-For-Review, Machine-Learning-Team

MGerlach created T379052: Test the feasibility of deployment of Aya-23 model in LiftWing.

Nov 5 2024, 10:24 AM · Patch-For-Review, Machine-Learning-Team

Nov 1 2024

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Putting together update for public documentation of the model
Set up meeting with ML Team next week to discuss test-deployment of aya23-35b model (used in the summaries experiment by Web Team) or potential alternative candidates

Nov 1 2024, 11:21 AM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

Web Team generated simple summaries for a selected list of ~8K articles T375364 using the aya23-35b model from the Cohere API
I am trying to evaluate the quality of the simple summaries by calculating 3 proxy metrics for simplicity, fluency, and meaning preservation (googledocs sheet). The aim is to identify low-quality summaries that should be filtered. Qualitatively inspecting the score for meaning preservation indicates that we can identify cases where the summary contains information that is not mentioned in the original article (negative scores or low scores close to 0). Plannning to inspect more samples if this approach for filtering makes sense.

Nov 1 2024, 11:02 AM · Research, OKR-Work

MGerlach updated subscribers of T378490: Create Call for Contributions (CfP) for research track.

weekly update:

needed to get organized about todos for the Research track
for CfP, the first step was to draft a rough timeline of dates (submission deadline, review period, etc). will consult with co-organizer and organizers from research fund etc.
next step: revising the CfP from last year.

Nov 1 2024, 10:32 AM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

Oct 29 2024

MGerlach created T378490: Create Call for Contributions (CfP) for research track.

Oct 29 2024, 1:10 PM · Research-outreach, Research (FY2024-25-Research-October-December), Research-foundational

MGerlach created T378485: Organize the Research track - Wiki Workshop 2025.

Oct 29 2024, 1:00 PM · Essential-Work, Research-foundational, Research

Oct 28 2024

MGerlach added a comment to T219903: Keep research.wikimedia.org landing page updated.

I would like to ask for adding 2 new papers to the landing page.

Oct 28 2024, 8:56 AM · Patch-For-Review, periodic-update, Research

Oct 25 2024

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

no update this week (no immediate asks for support this week)

Oct 25 2024, 3:09 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Ongoing work by ML Team to test-deploy the Aya-23 35B version of the model in LiftWing. Due to the size of the model, this requires some workarounds with datatypes which affects package dependencies.
Selected and implemented three interpretable metrics to automatically check the quality of the automatically generated simple summaries. This will serve as a tool to make informed decisions about whether the simple summaries meet some minimum quality requirements before considering use in practice or whether they should be discarded . These metrics capture: i) simplicity (is the model output simpler to read than the original?); ii) fluency (is the model output grammatically correct?); iii) meaning preservation (is the model output factually consistent with the original text?).
Example notebook: https://gitlab.wikimedia.org/repos/research/text-simplification/-/blob/main/evaluate_simple-summaries.ipynb

Oct 25 2024, 3:08 PM · Research (FY2024-25-Research-October-December), OKR-Work

Oct 24 2024

MGerlach added a comment to T276438: Establish processes for running the dataset pipeline.

In T276438#10247742, @Michael wrote:

Growth is working on surfacing link-recommendations in new ways (T362584), and so I'm trying to get a grasp on how this service is evolving. Where can I get insights into the latest datasets?

For now, I've only found the 2021-06 dataset for eswiki from the link above: https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/eswiki/
But that's in a folder called "one-off".
Is that the latest dataset used?

Oct 24 2024, 12:39 PM · Growth-Team, Machine-Learning-Team, Growth-Scaling, Add-Link

Oct 22 2024

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

@Prototyperspective thanks for reaching out.

In T369288#10166287, @Prototyperspective wrote:

How does this relate to Simple Wiki – could it be used to create article drafts the user can then improve (maybe even dynamically generated simple versions that are in sync with changes to the sources article)? I think at this point one of English Wikipedia's greatest challenges is that articles and sections are just sooo long – most people don't read all of it but removing things is usually also not due.

In principle, yes. One model generates simplified versions of articles. It is trained on pairs of the same article existing in English Wikipedia and Simple English Wikipedia (dataset). One could use the model to then generate simplified versions of articles that do not yet exist in Simple English Wikipedia but are already in English Wikipedia (of which there are many). However, it currently considers only the plain text of the article and will thus not include any links or references (which would be crucial for a draft of an actual article). Please note, that this is exploratory research to assess the feasibility of such a model.

Oct 22 2024, 7:58 AM · Research (FY2024-25-Research-October-December), OKR-Work

Oct 18 2024

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

I have been mostly busy this week with catching up on what has happened over the past 2 months while I was out.
I have been coordinating the work with ML team to deploy the larger 35B version of the Aya-23 model in a test-instance of LiftWing (the smaller 8B version was already tested successfully and thus constitutes a valid backup solution in case the former does not work)

Oct 18 2024, 4:06 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

adapted to title to reflect that work is continuing in Q2
spent most of my time to catch up with what are current needs from corresponding hypothesis owners
the Web Team started small user tests using the model for generating simple summaries of sections (documentation) I had prepared leaving for sabbatical (T374638). Feedback from the small sample of users is very positive (report)
I have been syncing with Jan about next experiments to generate simple summaries for the lead sections of 10K articles for use in larger experiments in the browser extension (T375364)
We also started discussions about usage of https://vector-search.wmcloud.org/ endpoint in recommendation experiments (T374669). These are currently small scale-experiments but there are some questions about how this could be scaled when potentially using it in larger experiments.

Oct 18 2024, 3:48 PM · Research, OKR-Work

MGerlach renamed T369292: Research support for hypothesis in KR WE.3.1 (Q2) from Research support for hypothesis in KR WE.3.1 (Q1) to Research support for hypothesis in KR WE.3.1 (Q2).

Oct 18 2024, 3:44 PM · Research, OKR-Work

MGerlach updated the task description for T361926: Improve training and inference pipeline for multilingual link recommendation model.

Oct 18 2024, 3:04 PM · Research, Essential-Work

MGerlach claimed T361926: Improve training and inference pipeline for multilingual link recommendation model.

training pipeline with airflow has been merged (MR)
will run some tests of the pipeline in the next week(s)
inference pipeline will need a separate task as it requires some additional discussion on how to best approach that. see some context in Fabian's recent presentation (slidedeck)

Oct 18 2024, 3:04 PM · Research, Essential-Work

Aug 16 2024

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update

no updates this week as I was attending ACL 2024 conference.

Aug 16 2024, 8:27 AM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update

no updates this week as I was attending ACL 2024 conference.

Aug 16 2024, 8:26 AM · Research (FY2024-25-Research-October-December), OKR-Work

Aug 8 2024

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

no updates this week as there were no requests for additional support so far

Aug 8 2024, 3:53 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Updated the project page on meta with current status: https://meta.wikimedia.org/wiki/Research:Develop_a_model_for_text_simplification_to_improve_readability_of_Wikipedia_articles/FY24-25_WE.3.1.3_content_simplification
Identified informative/interpretable metrics to evaluate performance of simplification models. For the manual evaluation, the most common approach is to judge 3 dimensions: simplicity (is the text simpler), fluency (is the text grammatical), and adequacy (does the text preserve the meaning). Looking into the literature, we can define automated metrics which approximate judgements along these dimensions.
- Simplicity: Measure the change in readability score using our readability scoring model for Wikipedia articles.
- Fluency: Count the number of grammatical errors using LanguageTool
- Adequacy: Measure the factual consistency between the original and simplified text to detect, e.g., “model hallucination”. This is a very active field of research and several methods have been proposed recently such as FactCC (based on a trained classification model), SummaC (based on textual entailment), or QuestEval (based on question generation and answering).
The advantage of these metrics is that they are more interpretable and that they dont require reference samples from a ground truth dataset. This will hopefully make it easier to obtain confidence about whether models are “good enough” for potential use in production.
Started to implement metrics so we can automatically measure model performance.
Next step: Finalize implementation of metrics for evaluating model performance and run on representative sample with existing model.

Aug 8 2024, 3:50 PM · Research (FY2024-25-Research-October-December), OKR-Work

Aug 2 2024

MGerlach claimed T364486: Update the Research Handbook to clarify how to handle news coverage of our research.

Update: I added a draft section about this to the handbook https://office.wikimedia.org/wiki/Research/Handbook/Communication#Communicating_with_public_media

Aug 2 2024, 3:37 PM · Research-management, Research-outreach, Research

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

I put together documentation for article recommendations with different tools from Research for experiments WE.3.1.1 (doc)
Shared documentation with hypothesis owner for feedback.

Aug 2 2024, 3:16 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Built two working model prototypes for content simplification/summarization
- 1) Simplification: Generate a simplified version of a section/paragraph using simpler language.
- 2) Section-gist: Generate a plain language summary of a section (i.e. combining simplification and summarization)
Put together detailed documentation about the two models with examples and tutorial notebooks on how the models can be run (doc). I will add these updates to the project-meta page too in the next week or so.
Exploring alternatives for automatic evaluation of models to make it easier to iterate through different model variants without the need for manual evaluation.
Experimenting with the test-deployment in the staging instance of LiftWing (available thanks to the ML-Team) of the Aya-23 model (Aya-23-8B) which is used for the section gist model. The model works and returns requests in a reasonable time. Though the model quality seems substantially lower than the larger model I used in the experiments using the external API. ML Team is planning to test-deploy the larger model version (Aya-23-35B) in the next weeks.

Aug 2 2024, 3:15 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach updated the task description for T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Aug 2 2024, 3:13 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach closed T364852: Feedback on Edge Uniques draft design document as Resolved.

Update: Fabian, Isaac, and I coordinated and left detailed feedback and comments in the design doc. Resolving the task as the request in the task description is completed. We will of course continue to engage in any follow-up discussions in the doc.

Aug 2 2024, 2:28 PM · OKR-Work, Research

Jul 29 2024

MGerlach added a comment to T219903: Keep research.wikimedia.org landing page updated.

For the next round of updates, could you add the following items:

Add to Publications page and Knowledge Gaps publications:
- Mykola Trokhymovych, Indira Sen, Martin Gerlach. 2024. An Open Multilingual System for Scoring Readability of Wikipedia. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL '24).
Also add blurb to Knowledge Gaps updates:
- Date: Aug 2024
- Title: A multilingual model for measuring readability
- Blurb: We published a new paper at ACL ‘24 where we develop a multilingual model to score the readability of Wikipedia articles across languages.
- Link: https://arxiv.org/abs/2406.01835

Jul 29 2024, 12:07 PM · Patch-For-Review, periodic-update, Research

Jul 26 2024

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update

Shared results on section-gists of Wikipedia articles (T369288#10018291) with Web team as one potential approach for experiments on summarization and simplification for readers.

Jul 26 2024, 3:25 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly update:

Ran small-scale experiments to automatically generate section-gists (i.e. plain-language summaries of a section) for Wikipedia articles using different models.
- Test-dataset with original and simplified text from only the lead section of 10 articles in English, German, Portuguese (see spreadsheet)
- Sample-dataset of several sections from the same article without the reference-simplification (see spreadsheet)
We are able to automatically generate section-gists (plain-language summaries of a section) of Wikipedia articles across languages using the LLM Aya-23. Based on qualitative evaluation of small sample (<100), the results are promising that the model can provide a concise and easy-to-understand overview of a long piece of text. I successfully tested English, German, and Portuguese, but the model supports 23 languages supposedly covering nearly half the world’s population in terms of speakers (which is way more than any other comparable LLM that I am aware of).
Evaluation of text simplification is challenging. Automated metrics (such as SARI) dont seem to be very useful to rigorously assess model performance for specific use cases. Simple baselines such as returning a truncated version of the input-text can yield surprisingly good results. As a result, we should probably not rely on these metrics and, instead, resort to manually judging (samples of) the model output which is, however, much more resource-intensive.

Jul 26 2024, 3:22 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach updated the task description for T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Jul 26 2024, 3:20 PM · Research (FY2024-25-Research-October-December), OKR-Work

Jul 19 2024

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

Android reached out to understand more about recommended content within search. I shared some resources from research on search and articles recommendations (e.g. from list building)
Web is starting with first experiments on recommendations in search. Providing support for using the article-similarity search from the list-building tool https://list-building.toolforge.org/
Web is preparing to start thinking about experiments on simplifications which will happen later in the quarter. Ongoing discussions about what simplification would be useful and how to evaluate.

Jul 19 2024, 3:19 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

weekly summary:

Clarified the criteria and constraints for the model
- Multilingual: support (some) languages other than English
- Openness: it needs to be open.
- Resources: We need to be able to host the model in our infrastructure in LiftWing with reasonable inference time.
- Use-case: We need to define the use-case; for example, should simplification be on the level of sentence, paragraph, section, or the full article?
- Quality: Ensure the output is useful in practice according to some metric
Did a deep-dive on recent works on text simplification with language models (reviewing 26 papers from the past 2-3 years see below [1]). This helped me to understand the most common and promising strategies to approach the task and to identify challenges.
Some of the main learnings came from a paper (Paper Plain) which aims to improve access to medical papers. Based on interviews with readers about barriers for interacting with content and usability testing, they identify section gists as a valuable and most frequently-used feature by non-expert readers.
- Operationalize simplification as a plain-language summary of a section
- For example, use a prompt: “Summarize content you are provided with for a fifth-grade student.”
- This approach seems highly effective as most LLMs are explicitly trained on the task of summarization across languages (e.g. XLSum)
I also reviewed some of the recent large language models that could be good candidates. I identified the Aya-23 model as a promising candidate model
- Multilingual: It is multilingual supporting 23 languages (these languages cover approximately half the world’s population)
- It is an open-weight model and available in Hugging Face
- Given the successful test-deployment of the similarly-sized Gemma2-27B-it on LiftWing (T369055), it seems that this model could in principle be also hosted.
- The model can be prompted to generate plain-language summaries of sections without fine-tuning
- Some initial tests with the API-endpoint looked promising across several languages.

Jul 19 2024, 2:55 PM · Research (FY2024-25-Research-October-December), OKR-Work

Jul 12 2024

MGerlach closed T361942: Run revised 3rd survey on readability perception, a subtask of T325815: Understanding perception of readability in Wikipedia, as Resolved.

Jul 12 2024, 2:35 PM · Research

MGerlach closed T361942: Run revised 3rd survey on readability perception as Resolved.

weekly update:

we succesfully ran the second part of the pilot survey
we will start analyzing the data next week. based on the results we will decide on next steps: either continue and scale, or re-assess the general approach. we will post results on the meta-page https://meta.wikimedia.org/wiki/Research:Understanding_perception_of_readability_in_Wikipedia#Pilot_survey:_version_3
for now I am closing the task as the goal of running the survey was accomplished.

Jul 12 2024, 2:35 PM · Research (FY2024-25-Research-July-September)

MGerlach added a comment to T369292: Research support for hypothesis in KR WE.3.1 (Q2).

weekly update:

I reached out to all hypothesis owners in WE.3.1 individually We also had a joint meeting to support Support WE.3.1. From this, I obtained a much clearer picture about the needed support
WE.3.1.1: might need light support (consulting) for options to generate recommendations. Substantial support needed for experiments on simplification/summarization. Web Team is starting to think about specifications in more detail. So these are ongoing discussions at the moment.
WE.3.1.4: No support needed at this point. They will focus on figuring out what work would need to be done for scaling search (e.g. morelike). They also want to start looking into vector search which would likely require some support from Research (e.g. creating vectors/embeddings). However, they are starting from scratch and during Q1 will start to figure out what they want and what support would be needed in the future.
WE.3.1.5: No support needed at this point. They want to skip the use of orphans for the first round of experiments.

Jul 12 2024, 12:58 PM · Research, OKR-Work

MGerlach added a comment to T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Weekly update:

Systematizing open questions for successful model development
- Infrastructure requirements: size, performance (inference time)
- Product requirements: Languages, Quality, Use-case
- Model requirements: Openness (can we host), Effectiveness (does the model have a chance to perform well), training (supervised, in-context learning, zero-shot as is), evaluation
Gathering input from product team (Web) on intended use to tailor model specifications (languages, input/output format, quality boundaries). These are ongoing discussions but should become more clear in the next week or so
Learning about updates in ML infrastructure which expands the potential set of candidate models. The ML team announced that we will have new servers will GPUs available for model training and hosting. This will allow us to use larger (and potentially better) models. Similarly, I am following the test-deployment of the Gemma2-model on LiftWing T369055. If successful, this could constitute a promising candidate model for the simplification task (or models with similar size/architecture).
Reviewing scientific literature to compile list of candidate models for the task. I have identified more than 10 recent papers (2023/2024) on using LLMs for text simplification approach. This is very insightful to understand which models are most promising for the specific task. For example, a promising model is Aya-23, an open model with dedicatedly multilingual support and, in principle, compatible with out infrastructure constraints (see above)..

Jul 12 2024, 12:55 PM · Research (FY2024-25-Research-October-December), OKR-Work

Jul 10 2024

MGerlach created T369712: Request to update Readability model on Lift Wing.

Jul 10 2024, 12:54 PM · Lift-Wing, Machine-Learning-Team

Jul 5 2024

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

ran the first part of the survey this week. next week will be the second part.

Jul 5 2024, 1:39 PM · Research (FY2024-25-Research-July-September)

MGerlach updated the task description for T361942: Run revised 3rd survey on readability perception.

Jul 5 2024, 1:38 PM · Research (FY2024-25-Research-July-September)

Jul 4 2024

MGerlach added a subtask for T342614: Models for text summarization using LLMs: T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Jul 4 2024, 1:55 PM · Epic, address-knowledge-gaps, Research

MGerlach added a parent task for T369288: [WE.3.1.3] Building a model for content simplification (Q2): T342614: Models for text summarization using LLMs.

Jul 4 2024, 1:55 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach created T369292: Research support for hypothesis in KR WE.3.1 (Q2).

Jul 4 2024, 1:52 PM · Research, OKR-Work

MGerlach renamed T369288: [WE.3.1.3] Building a model for content simplification (Q2) from [WE.3.1.3] Building a model for content simplification to [WE.3.1.3] Building a model for content simplification (Q1).

Jul 4 2024, 1:48 PM · Research (FY2024-25-Research-October-December), OKR-Work

MGerlach created T369288: [WE.3.1.3] Building a model for content simplification (Q2).

Jul 4 2024, 1:48 PM · Research (FY2024-25-Research-October-December), OKR-Work

Jul 1 2024

MGerlach added a comment to T357692: OpenReview for WikiWorkshop.

@MGerlach if you have any other observations, please add them to this task. We'll review this as part of the retro and it's good to have as we prepare for next year and know what we need to address in advance. I'll resolve the task after you're done :)

@KinneretG A gree with everything you wrote. In general though I did like OpenReview. One additional comment:

setting of deadlines was public (e.g. for submission deadline of reviews for PC members). We often set the deadline in OpenReview to some time (e.g. a few hours or a day) after the officially communicated deadline in order to accommodate also those who run into technical or other issues. However, the internally set deadline is also visible publicly for the relevant folks. This lead to some confusion as to what the actual deadline is.

Jul 1 2024, 12:18 PM · Research

MGerlach added a comment to T361926: Improve training and inference pipeline for multilingual link recommendation model.

Moving this to the next quarter (FY2024-25-Research-July-September) as the work is not yet fully completed

the MR for the pipeline in airflow is submitted.
this needs some code-review by research-engineering before it can be merged. this should be completed by mid-July (see T361929#9935541)
we expect to resolve the task in the next 1-2 weeks

Jul 1 2024, 10:13 AM · Research, Essential-Work

MGerlach moved T361926: Improve training and inference pipeline for multilingual link recommendation model from FY2023-24-Research-April-June to FY2024-25-Research-July-September on the Research board.

Jul 1 2024, 10:08 AM · Research, Essential-Work

MGerlach closed T355729: Organize the Research track - Wiki Workshop 2024 as Resolved.

We succesfully ran the research track at wikiworkshop2024.
Closing this task as all sub-tasks have been completed.

Jul 1 2024, 9:50 AM · Research

MGerlach closed T355729: Organize the Research track - Wiki Workshop 2024, a subtask of T356020: Wiki Workshop 2024 - Q3, as Resolved.

Jul 1 2024, 9:49 AM · Research (FY2023-24-Research-January-March)

MGerlach closed T355729: Organize the Research track - Wiki Workshop 2024, a subtask of T340599: Organize Wiki Workshop 2024 (June 20th), as Resolved.

Jul 1 2024, 9:49 AM · Research (FY2024-25-Research-July-September)

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

Moving to FY204-25-Research-July-September for now as we need 1 more week to run actually run the survey.

Jul 1 2024, 9:47 AM · Research (FY2024-25-Research-July-September)

MGerlach moved T361942: Run revised 3rd survey on readability perception from FY2023-24-Research-April-June to FY2024-25-Research-July-September on the Research board.

Jul 1 2024, 9:45 AM · Research (FY2024-25-Research-July-September)

Jun 28 2024

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

finished internal testing
implemented two changes based on feedback: i) make the survey shorter by removing a few pairs, ii) add a question about how regularly participants use Wikipedia.
figuring out available resources on prolific
planning to deploy next week

Jun 28 2024, 12:45 PM · Research (FY2024-25-Research-July-September)

MGerlach closed T361944: Orphan articles as reading recommendations as Resolved.

weekly updates:

finally I managed to spend some time on this.
I figured out that one of the main bottlenecks was to calculate the indegree for each potential link-target (this is crucial since we want to prioritize articles with low indegree such as orphans). My initial approach was to use the Linkshere-API. However, this requires a separate call for each individual article. A much cheaper alternative is to query the replicas, with only a single query for potentially hundreds of articles for which we want to get the indegree (see the example in quarry). The replicas can be easily queried from toolforge (wikitech-documentation, example script in PAWS). For an example article, I could reduce the query-time 10-fold.
I integrated this (and a few other fixes to improve the recommendations) into the latest version of the tool. Example: https://linkrec.toolforge.org/readmore?lang=en&title=Tiwanaku
the example still takes some time but its much less likely to just timeout since the number of API calls is much smaller.
in case, we need much more substantial speedups, we might want to resort to other heuristics such as the one mentioned above (T361944#9809372) using morelikethis in cirrussearch combined with the orphans-template

Jun 28 2024, 8:49 AM · Research (FY2023-24-Research-April-June)

MGerlach closed T361947: Evaluate simplification in multilingual setting as Resolved.

weekly update:

finalized the mulitlingual experiments beyond English
wrote up the results on the meta-page https://meta.wikimedia.org/wiki/Research:Develop_a_model_for_text_simplification_to_improve_readability_of_Wikipedia_articles/First_round_of_experiments#Multilingual_experiments
Main takeaway is that the model seems to work well for some language (but not all for others). "For a few languages scores were similar to English (German, Italian, Catalan); for many languages scores were slightly lower (Spanish, Basque, French , Portuguese, Dutch); and for some languages performance was substantially worse (Greek, Armenian, and Russian)"

Jun 28 2024, 8:33 AM · Research (FY2023-24-Research-April-June)

MGerlach closed T361947: Evaluate simplification in multilingual setting, a subtask of T342614: Models for text summarization using LLMs, as Resolved.

Jun 28 2024, 8:32 AM · Epic, address-knowledge-gaps, Research

Jun 21 2024

MGerlach added a comment to T361947: Evaluate simplification in multilingual setting.

weekly updates:

Ran and evaluated experiments with fine-tuned Flan-T5 on other languages. The main problem is that if the model is only trained on article pairs (original/simplified) in English, the model output (simplified) will most often be in English, even if the model input (original) is in, say, German or French. Thus, the model will perform a simplification AND translation in to English since it has seen only English samples of simplfiied text.
As a potential solution I am experimenting with the following setup: i) fine-tune the model with additional samples from the other languages, ii) add a prefix "Simplify in <LANG>: " to the model-input. This will provide additional instructions for the model treating the simplifications in different languages as different but related tasks.

Jun 21 2024, 4:32 PM · Research (FY2023-24-Research-April-June)

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

internal testing of the limesurvey (including integration with prolific)
next week: deployment on prolific

Jun 21 2024, 2:53 PM · Research (FY2024-25-Research-July-September)

MGerlach added a comment to T349774: Maintain wikiworkshop.org website.

In T349774#9913345, @DDeSouza wrote:

@MGerlach done.

Jun 21 2024, 1:27 PM · Research, Patch-For-Review

MGerlach added a comment to T349774: Maintain wikiworkshop.org website.

@DDeSouza could you please remove the pdf for the paper (the sooner the better so it doesnt get indexed by GoogleScholar etc):
"Structural Evolution of Co-Creation in Wikipedia [PDF]
Negin Maddah and Babak Heydari"
They opted out of having the pdf put on the website.
Thank you

Jun 21 2024, 8:11 AM · Research, Patch-For-Review

Jun 14 2024

MGerlach added a comment to T361944: Orphan articles as reading recommendations .

weekly update:

no updates

Jun 14 2024, 5:43 PM · Research (FY2023-24-Research-April-June)

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

finalized the Limesurvey
will test the Limesurvey internally next week (after that can be deployed)

Jun 14 2024, 5:42 PM · Research (FY2024-25-Research-July-September)

MGerlach added a comment to T361947: Evaluate simplification in multilingual setting.

weekly update:

started setting up the experiments for evaluating fine-tuned Flan-T5 on other languages
will run experiments next week

Jun 14 2024, 5:41 PM · Research (FY2023-24-Research-April-June)

MGerlach closed T352545: Organize sessions for research track as Resolved.

weekly update:

finalized the program for the research track (sessions, session chairs) https://docs.google.com/spreadsheets/d/1KXCIitFfd57bRwuL30jciJEhipq4-wMI_hzvW0JmBCo/edit?gid=0#gid=0
wrote a doc with instructions for session chairs

Jun 14 2024, 5:41 PM · Research (FY2023-24-Research-April-June), Research-outreach

MGerlach closed T352545: Organize sessions for research track, a subtask of T355729: Organize the Research track - Wiki Workshop 2024, as Resolved.

Jun 14 2024, 5:41 PM · Research

MGerlach updated the task description for T352545: Organize sessions for research track.

Jun 14 2024, 5:38 PM · Research (FY2023-24-Research-April-June), Research-outreach

Jun 10 2024

MGerlach closed T362416: Attend ICWSM 2024 conference as Resolved.

Jun 10 2024, 2:34 PM · Research-foundational, address-knowledge-gaps, Research-outreach, Research

MGerlach updated the task description for T362416: Attend ICWSM 2024 conference.

Jun 10 2024, 2:34 PM · Research-foundational, address-knowledge-gaps, Research-outreach, Research

May 30 2024

MGerlach added a comment to T361944: Orphan articles as reading recommendations .

weekly update:

no update. main focus was preparing for attending ICWSM T362416

May 30 2024, 4:45 PM · Research (FY2023-24-Research-April-June)

MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

no update. main focus was preparing for attending ICWSM T362416

May 30 2024, 4:44 PM · Research (FY2024-25-Research-July-September)

MGerlach added a comment to T361947: Evaluate simplification in multilingual setting.

weekly update:

no update. main focus was preparing for attending ICWSM T362416

May 30 2024, 4:44 PM · Research (FY2023-24-Research-April-June)

MGerlach added a comment to T352545: Organize sessions for research track.

weekly update:

iterating on the sessions and the session chairs
planning to finalize next week

May 30 2024, 4:42 PM · Research (FY2023-24-Research-April-June), Research-outreach

May 27 2024

MGerlach closed T352543: Review and select submissions for research track as Resolved.

FInal update:

send notifications to authors
website will be updated with accepted extended abstracts and the members of the program committee
also send thank-you email to reviewers and inviting them to register to attend the event

May 27 2024, 9:32 AM · Research (FY2023-24-Research-April-June), Research-outreach

MGerlach (Martin Gerlach)
Senior Research Scientist

Projects

Calendar

Today

Tomorrow

Tuesday

User Details

Recent Activity
View All

Fri, Dec 6

Fri, Nov 29

Fri, Nov 22

Thu, Nov 21

Sat, Nov 16

Fri, Nov 8

Nov 5 2024

Nov 1 2024

Oct 29 2024

Oct 28 2024

Oct 25 2024

Oct 24 2024

Oct 22 2024

Oct 18 2024

Aug 16 2024

Aug 8 2024

Aug 2 2024

Jul 29 2024

Jul 26 2024

Jul 19 2024

Jul 12 2024

Jul 10 2024

Jul 5 2024

Jul 4 2024

Jul 1 2024

Jun 28 2024

Jun 21 2024

Jun 14 2024

Jun 10 2024

May 30 2024

May 27 2024

MGerlach (Martin Gerlach)Senior Research Scientist

Projects

Calendar

Today

Tomorrow

Tuesday

User Details

Recent ActivityView All

Fri, Dec 6

Fri, Nov 29

Fri, Nov 22

Thu, Nov 21

Sat, Nov 16

Fri, Nov 8

Nov 5 2024

Nov 1 2024

Oct 29 2024

Oct 28 2024

Oct 25 2024

Oct 24 2024

Oct 22 2024

Oct 18 2024

Aug 16 2024

Aug 8 2024

Aug 2 2024

Jul 29 2024

Jul 26 2024

Jul 19 2024

Jul 12 2024

Jul 10 2024

Jul 5 2024

Jul 4 2024

Jul 1 2024

Jun 28 2024

Jun 21 2024

Jun 14 2024

Jun 10 2024

May 30 2024

May 27 2024

MGerlach (Martin Gerlach)
Senior Research Scientist

Recent Activity
View All