8000 yyu233 (yyz) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View yyu233's full-sized avatar
:shipit:
小怪兽
:shipit:
小怪兽

Block or report yyu233

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Automatically create Faiss knn indices with the most optimal similarity search parameters.

Python 854 76 Updated May 21, 2024

A simple n-gram language model.

Python 11 2 Updated Sep 11, 2018

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。

Python 5,993 1,137 Updated Dec 26, 2024

KenLM: Faster and Smaller Language Model Queries

C++ 2,613 521 Updated Mar 30, 2025

Library for fast text representation and classification.

HTML 26,226 4,768 Updated Mar 22, 2024

Tools to download and cleanup Common Crawl data

Python 1,008 149 Updated Apr 25, 2023

The Abstraction and Reasoning Corpus

JavaScript 4,407 663 Updated Apr 4, 2025

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 460 44 Updated May 13, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,579 662 Updated Feb 10, 2025

official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"

Python 149 8 Updated May 31, 2024

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,833 144 Updated Dec 30, 2024
104 3 Updated Feb 16, 2021

XFUND: A Multilingual Form Understanding Benchmark

203 21 Updated Jul 15, 2022

Code repository for the paper - "Matryoshka Representation Learning"

Jupyter Notebook 496 26 Updated Feb 19, 2024

Label, clean and enrich text datasets with LLMs.

Python 2,225 155 Updated Mar 5, 2025

Code release for DynamicTanh (DyT)

Python 937 79 Updated Mar 30, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,284 2,622 Updated Mar 4, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,719 192 Updated Apr 9, 2025

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including…

80 6 Updated Feb 8, 2023

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 22,205 1,865 Updated Mar 26, 2025

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,453 611 Updated Feb 21, 2025

如需体验textin文档解析,请点击https://cc.co/16YSIy

Python 97 9 Updated Nov 11, 2024

extract text from any document. no muss. no fuss.

HTML 4,140 627 Updated Dec 2, 2024

A Collection of Variational Autoencoders (VAE) in PyTorch.

Python 7,170 1,134 Updated Mar 21, 2025
Python 15 Updated Jan 17, 2025

Code for the emrQA question answering dataset

Python 148 35 Updated Feb 9, 2022

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Python 131 31 Updated Feb 15, 2023
Python 13 1 Updated Sep 24, 2024

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…

Python 685 88 Updated Mar 13, 2025

Tensorflow implementation for Dash

Python 31 2 Updated Aug 18, 2022
Next
0