8000 yyu233 (yyz) / Starred · GitHub

More Web Proxy on the site http://driver.im/

yyu233

Follow

小怪兽

yyz yyu233

小怪兽

Follow

20 followers · 36 following

Achievements

Achievements

Starred repositories

criteo / autofaiss

Automatically create Faiss knn indices with the most optimal similarity search parameters.

Python 854 76 Updated May 21, 2024

daandouwe / ngram-lm

A simple n-gram language model.

Python 11 2 Updated Sep 11, 2018

shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。

Python 5,993 1,137 Updated Dec 26, 2024

kpu / kenlm

KenLM: Faster and Smaller Language Model Queries

C++ 2,613 521 Updated Mar 30, 2025

facebookresearch / fastText

Library for fast text representation and classification.

HTML 26,226 4,768 Updated Mar 22, 2024

facebookresearch / cc_net

Tools to download and cleanup Common Crawl data

Python 1,008 149 Updated Apr 25, 2023

fchollet / ARC-AGI

The Abstraction and Reasoning Corpus

JavaScript 4,407 663 Updated Apr 4, 2025

opendatalab / OmniDocBench

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 460 44 Updated May 13, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,579 662 Updated Feb 10, 2025

ucaslcl / Fox

official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"

Python 149 8 Updated May 31, 2024

Ucas-HaoranWei / Vary

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,833 144 Updated Dec 30, 2024

HCIILAB / EPHOIE

104 3 Updated Feb 16, 2021

doc-analysis / XFUND

XFUND: A Multilingual Form Understanding Benchmark

203 21 Updated Jul 15, 2022

RAIVNLab / MRL

Code repository for the paper - "Matryoshka Representation Learning"

Jupyter Notebook 496 26 Updated Feb 19, 2024

refuel-ai / autolabel

Label, clean and enrich text datasets with LLMs.

Python 2,225 155 Updated Mar 5, 2025

jiachenzhu / DyT

Code release for DynamicTanh (DyT)

Python 937 79 Updated Mar 30, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,284 2,622 Updated Mar 4, 2025

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,719 192 Updated Apr 9, 2025

google-research-datasets / vrdu

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including…

80 6 Updated Feb 8, 2023

microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 22,205 1,865 Updated Mar 26, 2025

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,453 611 Updated Feb 21, 2025

intsig-textin / markdown_tester

如需体验textin文档解析，请点击https://cc.co/16YSIy

Python 97 9 Updated Nov 11, 2024

deanmalmgren / textract

extract text from any document. no muss. no fuss.

HTML 4,140 627 Updated Dec 2, 2024

AntixK / PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Python 7,170 1,134 Updated Mar 21, 2025

yuyijiong / fineweb-edu-chinese

Python 15 Updated Jan 17, 2025

panushri25 / emrQA

Code for the emrQA question answering dataset

Python 148 35 Updated Feb 9, 2022

jgc128 / mednli

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Python 131 31 Updated Feb 15, 2023

milteam / MedSyn

Python 13 1 Updated Sep 24, 2024

ScienceOne-AI / DeepSeek-671B-SFT-Guide

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…

Python 685 88 Updated Mar 13, 2025

idstcv / Dash

Tensorflow implementation for Dash

Python 31 2 Updated Aug 18, 2022

Starred topics

appointment-scheduling

telemedicine

Game Development

Machine learning

jQuery

Java

GraphQL

Game engine

Deep learning

Chrome

See all starred topics

0