[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
-
Updated
Jun 10, 2024 - Python
8000
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language models (LLMs).
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!
Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
A reward model to evaluate machine translations, focusing on English-to-Spanish sentence pairs, with applications in natural language processing (NLP), translation quality assessment, and multilingual content adaptation
POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO
Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.
To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."