NLP
Implementation of Universal Transformer in Pytorch
《机器翻译:基础与模型》肖桐 朱靖波 著 - Machine Translation: Foundations and Models
100+ Chinese Word Vectors 上百种预训练中文词向量
Unsupervised text tokenizer for Neural Network-based text generation.
An Engine-Agnostic Deep Learning Framework in Java
Training open neural machine translation models
Fast Neural Machine Translation in C++ - development repository
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Open-source vector similarity search for Postgres
Transformer related optimization, including BERT, GPT
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Flax is a neural network library for JAX that is designed for flexibility.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
State-of-the-Art Text Embeddings