8000 GitHub - j-ranasinghe/sinhala-qa: This repository contains: Scripts for training and evaluating Sinhala fine-tuned models on QA tasks.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

This repository contains: Scripts for training and evaluating Sinhala fine-tuned models on QA tasks.

Notifications You must be signed in to change notification settings

j-ranasinghe/sinhala-qa

Repository files navigation

Sinhala QA/MRC

This repository contains:

  • Scripts for training and evaluating Sinhala fine-tuned models on QA tasks.
  • A data/ folder with the Sinhala QA datasets, including training, validation, and test sets used for model development and benchmarking.
  • 📦 The translated dataset used for training is publicly available on Hugging Face: SiQuAD

Related Repositories

Below are key companion repositories used in this workflow:

  • Tool developed for manually annotating QA pairs in Sinhala.
  • Used to create the Sinhala QA test set for evaluation.
  • Supports context selection, question writing, and answer span marking.
  • Scripts and pipeline used for translating the SQuAD dataset into Sinhala.
  • Includes preprocessing, automatic translation, post-editing, and alignment verification steps.
  • Contains scripts for scraping Sinhala news articles.
  • Data gathered was used to build a seed context dataset to support Sinhala QA development and fine-tuning of models
  • 📦 The full dataset used for extracting passages for QA is publicly available on Hugging Face: Sinhala-News-Wiki-text-corpus

About

This repository contains: Scripts for training and evaluating Sinhala fine-tuned models on QA tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0