8000 hlp-ai (Xiaofeng Liu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View hlp-ai's full-sized avatar
  • Human Language Processing Laboratory (HLP Lab)
  • Wuhan, China

Block or report hlp-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL…

Jupyter Notebook 883 330 Updated Jan 4, 2024

Jupyter notebooks for the Natural Language Processing with Transformers book

Jupyter Notebook 4,326 1,346 Updated Aug 21, 2024

State-of-the-art LLM-based translation models.

Ruby 525 40 Updated Apr 9, 2025

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.

Python 125 14 Updated Jun 7, 2023

The Hugging Face course on Transformers

MDX 2,956 932 Updated May 21, 2025

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

Python 23,866 2,040 Updated May 9, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 49,784 7,184 Updated Apr 20, 2025

🕷️ The pipeline for the OSCAR corpus

Rust 167 16 Updated Dec 18, 2023

BLEURT is a metric for Natural Language Generation based on transfer learning.

Python 730 85 Updated Aug 4, 2023

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 21,129 2,476 Updated Apr 30, 2025
Python 93 16 Updated Dec 12, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,649 914 Updated Jul 1, 2024

nanoGPT style version of Llama 3.1

Python 1,369 83 Updated Aug 8, 2024

从0开始,将chatgpt的技术路线跑一遍。

Python 236 41 Updated Sep 5, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 12,139 1,232 Updated May 21, 2025

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,489 532 Updated May 3, 2024

Python GUI Programming Cookbook, Third Edition, Published by Packt

Python 103 58 Updated Jan 18, 2021

安卓课设:翻译君APP

Java 14 1 Updated Nov 15, 2021

A tool that locates, downloads, and extracts machine translation corpora

Python 154 23 Updated Apr 27, 2025

NTREX -- News Test References for MT Evaluation

83 16 Updated Jun 5, 2024

Facebook Low Resource (FLoRes) MT Benchmark

Python 736 127 Updated Nov 20, 2023

Text to Speech for Japanese

Python 16 6 Updated May 11, 2023

Bilingual-TTS (Japanese and Korean)

Jupyter Notebook 30 5 Updated Jul 1, 2023

Vits Japanese with Whisper as data processor (you can train your VITS even you only have audios)

Jupyter Notebook 161 28 Updated May 7, 2023

VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai

Python 928 197 Updated Dec 6, 2023

无需情感标注的情感可控语音合成模型,基于VITS

Jupyter Notebook 1,387 166 Updated Mar 30, 2023

Open language modeling toolkit based on PyTorch

Python 118 21 Updated May 15, 2025

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 7,193 608 Updated May 21, 2025

《PDF 解析》

1,043 124 Updated Aug 5, 2024

The full minitorch student suite.

Python 2,074 458 Updated Aug 17, 2024
Next
0