This pandect (ΟΞ±Ξ½Ξ΄ΞΞΊΟΞ·Ο is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online.
Note Quick legend on available resource types:
β - open source project, usually a GitHub repository with its number of stars
π - resource you can read, usually a blog post or a paper
ποΈ - a collection of additional resources
π± - non-open source tool, framework or paid service
π₯οΈ - a resource you can watch
ποΈ - a resource you can listen to
Note Section keywords: paper summaries, compendium, awesome list
- ποΈ The NLP Index - Searchable Index of NLP Papers by Quantum Stat / NLP Cypher
- β Awesome NLP by keon [GitHub, 16528 stars]
- β Speech and Natural Language Processing Awesome List by elaboshira [GitHub, 2189 stars]
- β Awesome Deep Learning for Natural Language Processing (NLP) [GitHub, 1274 stars]
- β Text Mining and Natural Language Processing Resources by stepthom [GitHub, 557 stars]
- ποΈ Brainsources for #NLP enthusiasts by Philip Vollet
- β Awesome AI/ML/DL - NLP Section [GitHub, 1473 stars]
- ποΈ NLP articles by Devopedia
- β 100 Must-Read NLP Papers 100 Must-Read NLP Papers [GitHub, 3732 stars]
- β NLP Paper Summaries by dair-ai [GitHub, 1475 stars]
- β Curated collection of papers for the NLP practitioner [GitHub, 1075 stars]
- β Papers on Textual Adversarial Attack and Defense [GitHub, 1501 stars]
- β Recent Deep Learning papers in NLU and RL by Valentin Malykh [GitHub, 296 stars]
- β A Survey of Surveys (NLP & ML): Collection of NLP Survey Papers [GitHub, 1997 stars]
- β A Paper List for Style Transfer in Text [GitHub, 1609 stars]
- π₯ Video recordings index for papers
- β NLP top 10 conferences Compendium by soulbliss [GitHub, 459 stars]
- π ICLR 2020 Trends
- π SpacyIRL 2019 Conference in Overview
- π Paper Digest - Conferences and Papers in Overview
- β NLP Progress by sebastianruder [GitHub, 22568 stars]
- β NLP Tasks by Kyubyong [GitHub, 3017 stars]
- β NLP Datasets by niderhoff [GitHub, 5741 stars]
- β Datasets by Huggingface [GitHub, 19096 stars]
- ποΈ Big Bad NLP Database
- β UWA Unambiguous Word Annotations - Word Sense Disambiguation Dataset
- β MLDoc - Corpus for Multilingual Document Classification in Eight Language [GitHub, 152 stars]
- β Awesome Embedding Models by Hironsan [GitHub, 1752 stars]
- β Awesome list of Sentence Embeddings by Separius [GitHub, 2219 stars]
- β Awesome BERT by Jiakui [GitHub, 1846 stars]
- β The Super Duper NLP Repo [Website, 2020]
- β NLP Resources for Bahasa Indonesian [GitHub, 480 stars]
- β Indic NLP Catalog [GitHub, 552 stars]
- β Pre-trained language models for Vietnamese [GitHub, 653 stars]
- β Natural Language Toolkit for Indic Languages (iNLTK) [GitHub, 814 stars]
- β Indic NLP Library [GitHub, 550 stars]
- β AI4Bharat-IndicNLP Portal
- β ARBML - Implementation of many Arabic NLP and ML projects [GitHub, 387 stars]
- β zemberek-nlp - NLP tools for Turkish [GitHub, 1146 stars]
- β TDD AI - An open-source platform for all Turkish datasets, language models, and NLP tools.
- β KLUE - Korean Language Understanding Evaluation [GitHub, 560 stars]
- β Persian NLP Benchmark - benchmark for evaluation and comparison of various NLP tasks in Persian language [GitHub, 73 stars]
- β nlp-greek - Greek language sources [GitHub, 5 stars]
- β Awesome NLP Resources for Hungarian [GitHub, 221 stars]
- β List of pre-trained NLP models [GitHub, 170 stars]
- β Pretrained language models developed by Huawei Noah's Ark Lab [GitHub, 3019 stars]
- β Spanish Language Models and resources [GitHub, 251 stars]
- β Modern Deep Learning Techniques Applied to Natural Language Processing [GitHub, 1328 stars]
- π A Review of the Neural History of Natural Language Processing [Blog, October 2018]
- π Natural Language Processing in 2020: The Year In Review [Blog, December 2020]
- π ML and NLP Research Highlights of 2020 [Blog, January 2021]
π Back to the Table of Contents
- ποΈ NLP Highlights [Years: 2017 - now, Status: active]
- ποΈ The NLP Zone Episodes [Years: 2021 - now, Status: active]
- ποΈ TWIML AI [Years: 2016 - now, Status: active]
- ποΈ Practical AI [Years: 2018 - now, Status: active]
- ποΈ The Data Exchange [Years: 2019 - now, Status: active]
- ποΈ Gradient Dissent [Years: 2020 - now, Status: active]
- ποΈ Machine Learning Street Talk [Years: 2020 - now, Status: active]
- πο 10000 Έ DataFramed - latest trends and insights on how to scale the impact of data science in organizations [Years: 2019 - now, Status: active]
- ποΈ The Super Data Science Podcast [Years: 2016 - now, Status: active]
- ποΈ Data Hack Radio [Years: 2018 - now, Status: active]
- ποΈ AI Game Changers [Years: 2020, Status: active]
- ποΈ The Analytics Show [Years: 2019 - now, Status: active]
- π NLP News by Sebastian Ruder
- π This Week in NLP by Robert Dale
- π Papers with Code
- π The Batch by deeplearning.ai
- π Paper Digest by PaperDigest
- π NLP Cypher by QuantumStat
- π₯ NLP Zurich [YouTube Recordings]
- π₯ Hacking-Machine-Learning [YouTube Recordings]
- π₯ NY-NLP (New York)
- π₯ Yannic Kilcher
- π₯ HuggingFace
- π₯ Kaggle Reading Group
- π₯ Rasa Paper Reading
- π₯ Stanford CS224N: NLP with Deep Learning
- π₯ NLPxing
- π₯ ML Explained - A.I. Socratic Circles - AISC
- π₯ Deeplearning.ai
- π₯ Machine Learning Street Talk
π Back to the Table of Contents
- β GLUE - General Language Understanding Evaluation (GLUE) benchmark
- β SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks
- β decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models
- β dialoglue - DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue [GitHub, 280 stars]
- β DynaBench - Dynabench is a research platform for dynamic data collection and benchmarking
- β Big-Bench - collaborative benchmark for measuring and extrapolating the capabilities of language models [GitHub, 2835 stars]
- β WikiAsp - WikiAsp: Multi-document aspect-based summarization Dataset
- β WikiLingua - A Multilingual Abstractive Summarization Dataset
- β SQuAD - Stanford Question Answering Dataset (SQuAD)
- β XQuad - XQuAD (Cross-lingual Question Answering Dataset) for cross-lingual question answering
- β GrailQA - Strongly Generalizable Question Answering (GrailQA)
- β CSQA - Complex Sequential Question Answering
- π XTREME - Massively Multilingual Multi-task Benchmark
- β GLUECoS - A benchmark for code-switched NLP
- β IndicGLUE - Natural Language Understanding Benchmark for Indic Languages
- β LinCE - Linguistic Code-Switching Evaluation Benchmark
- β Russian SuperGlue - Russian SuperGlue Benchmark
- β BLURB - Biomedical Language Understanding and Reasoning Benchmark
- β BLUE - Biomedical Language Understanding Evaluation benchmark
- β LexGLUE - A Benchmark Dataset for Legal Language Understanding in English
- β Long-Range Arena - Long Range Arena for Benchmarking Efficient Transformers (Pre-print) [GitHub, 716 stars]
- β SUPERB - Speech processing Universal PERformance Benchmark
- β CodeXGLUE - A benchmark dataset for code intelligence
- β CrossNER - CrossNER: Evaluating Cross-Domain Named Entity Recognition
- β MultiNLI - Multi-Genre Natural Language Inference corpus
- β iSarcasm: A Dataset of Intended Sarcasm - iSarcasm is a dataset of tweets, each labelled as either sarcastic or non_sarcastic
π Back to the Table of Contents
- π A Recipe for Training Neural Networks by Andrej Karpathy [Keywords: research, training, 2019]
- π Recent Advances in NLP via Large Pre-Trained Language Models: A Survey [Paper, November 2021]
- β Pre-trained ELMo Representations for Many Languages [GitHub, 1458 stars]
- β sense2vec - Contextually-keyed word vectors [GitHub, 1617 stars]
- β wikipedia2vec [GitHub, 935 stars]
- β StarSpace [GitHub, 3938 stars]
- β fastText [GitHub, 25871 stars]
- π Language Models and Contextualised Word Embeddings by David S. Batista [Blog, 2018]
- π An Essential Guide to Pretrained Word Embeddings for NLP Practitioners by AnalyticsVidhya [Blog, 2020]
- π Polyglot Word Embeddings Discover Language Clusters [Blog, 2020]
- π The Illustrated Word2vec by Jay Alammar [Blog, 2019]
- β vecmap - VecMap (cross-lingual word embedding mappings) [GitHub, 644 stars]
- β sentence-transformers - Multilingual Sentence & Image Embeddings with BERT [GitHub, 14981 stars]
- β bpemb - Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) [GitHub, 1179 stars]
- β subword-nmt - Unsupervised Word Segmentation for Neural Machine Translation and Text Generation [GitHub, 2185 stars]
- β python-bpe - Byte Pair Encoding for Python [GitHub, 223 stars]
- π The Transformer Family by Lilian Weng [Blog, 2020]
- π Playing the lottery with rewards and multiple languages - about the effect of random initialization [ICLR 2020 Paper]
- π Attention? Attention! by Lilian Weng [Blog, 2018]
- π the transformer β¦ βexplainedβ? [Blog, 2019]
- π₯οΈ Attention is all you need; Attentional Neural Network Models by Εukasz Kaiser [Talk, 2017]
- π Attention Is Off By One [July, 2023]
- π₯οΈ Understanding and Applying Self-Attention for NLP [Talk, 2018]
- π The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures [Paper, April 2021]
- π Pre-Trained Models: Past, Present and Future [Paper, June 2021]
- π A Survey of Transformers [Paper, June 2021]
- π The Annotated Transformer by Harvard NLP [Blog, 2018]
- π The Illustrated Transformer by Jay Alammar [Blog, 2018]
- π Illustrated Guide to Transformers by Hong Jing [Blog, 2020]
- π Sequential Transformer with Adaptive Attention Span by Facebook. Blog [Blog, 2019]
- π Evolution of Representations in the Transformer by Lena Voita [Blog, 2019]
- π Reformer: The Efficient Transformer [Blog, 2020]
- π Longformer β The Long-Document Transformer by Viktor Karlsson [Blog, 2020]
- π TRANSFORMERS FROM SCRATCH [Blog, 2019]
- π Transformers in Natural Language Processing β A Brief Survey by George Ho [Blog, May 2020]
- β Lite Transformer - Lite Transformer with Long-Short Range Attention [GitHub, 596 stars]
- π Transformers from Scratch [Blog, Oct 2021]
- π A Visual Guide to Using BERT for the First Time by Jay Alammar [Blog, 2019]
- π The Dark Secrets of BERT by Anna Rogers [Blog, 2020]
- π Understanding searches better than ever before [Blog, 2019]
- π Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework [Blog, 2019]
- β SemBERT - Semantics-aware BERT for Language Understanding [GitHub, 286 stars]
- β BERTweet - BERTweet: A pre-trained language model for English Tweets [GitHub, 574 stars]
- β Optimal Subarchitecture Extraction for BERT [GitHub, 470 stars]
- β CharacterBERT: Reconciling ELMo and BERT [GitHub, 195 stars]
- π When BERT Plays The Lottery, All Tickets Are Winning [Blog, Dec 2020]
- β BERT-related Papers a list of BERT-related papers [GitHub, 2032 stars]
- π T5 Understanding Transformer-Based Self-Supervised Architectures [Blog, August 2020]
- π T5: the Text-To-Text Transfer Transformer [Blog, 2020]
- β multilingual-t5 - Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model [GitHub, 1245 stars]
- π Big Bird: Transformers for Longer Sequences original paper by Google Research [Paper, July 2020]
- π₯οΈ Reformer: The Efficient Transformer - [Paper, February 2020] [Video, October 2020]
- π₯οΈ Longformer: The Long-Document Transformer - [Paper, April 2020] [Video, April 2020]
- π₯οΈ Linformer: Self-Attention with Linear Complexity - [Paper, June 2020] [Video, June 2020]
- π₯οΈ Rethinking Attention with Performers - [Paper, September 2020] [Video, September 2020]
- β performer-pytorch - An implementation of Performer, a linear attention-based transformer, in Pytorch [GitHub, 1084 stars]
- π Switch Transformers: Scaling to Trillion Parameter Models original paper by Google Research [Paper, January 2021]
- π The Illustrated GPT-2 by Jay Alammar [Blog, 2019]
- π The Annotated GPT-2 by Aman Arora
- π OpenAIβs GPT-2: the model, the hype, and the controversy by Ryan Lowe [Blog, 2019]
- π How to generate text by Patrick von Platen [Blog, 2020]
- π Zero Shot Learning for Text Classification by Amit Chaudhary [Blog, 2020]
- π GPT-3 A Brief Summary by Leo Gao [Blog, 2020]
- π GPT-3, a Giant Step for Deep Learning And NLP by Yoel Zeldes [Blog, June 2020]
- π GPT-3 Language Model: A Technical Overview by Chuan Li [Blog, June 2020]
- π Is it possible for language models to achieve language understanding? by Christopher Potts
- β Awesome GPT-3 - list of all resources related to GPT-3 [GitHub, 4589 stars]
- ποΈ GPT-3 Projects - a map of all GPT-3 start-ups and commercial projects
- ποΈ GPT-3 Demo Showcase - GPT-3 Demo Showcase, 180+ Apps, Examples, & Resources
- π± OpenAI API - API Demo to use OpenAI GPT for commercial applications
- π GPT-Neo - in-progress GPT-3 open source replication HuggingFace Hub
- β GPT-J - A 6 billion parameter, autoregressive text generation model trained on The Pile
- π Effectively using GPT-J with few-shot learning [Blog, July 2021]
- π What is Two-Stream Self-Attention in XLNet by Xu LIANG [Blog, 2019]
- π Visual Paper Summary: ALBERT (A Lite BERT) by Amit Chaudhary [Blog, 2020]
- π Turing NLG by Microsoft
- π Multi-Label Text Classification with XLNet by Josh Xin Jie Lee [Blog, 2019]
- β ELECTRA [GitHub, 2326 stars]
- β Performer implementation of Performer, a linear attention-based transformer, in Pytorch [GitHub, 1084 stars]
- π Distilling knowledge from Neural Networks to build smaller and faster models by FloydHub [Blog, 2019]
- π Compression of Deep Learning Models for Text: A Survey [Paper, April 2021]
- β Bert-squeeze - code to reduce the size of Transformer-based models or decrease their latency at inference time [GitHub, 79 stars]
- β XtremeDistil - XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks [GitHub, 153 stars]
- π PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization by Google AI [Blog, June 2020]
- β CTRLsum - CTRLsum: Towards Generic Controllable Text Summarization [GitHub, 146 stars]
- β XL-Sum - XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages [GitHub, 252 stars]
- β SummerTime - an open-source text summarization toolkit for non-experts [GitHub, 265 stars]
- β PRIMER - PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization [GitHub, 151 stars]
- β summarus - Models for automatic abstractive summarization [GitHub, 170 stars]
- π Fusing Knowledge into Language Model [Presentation, Oct 2021]
Note Section keywords: best practices, MLOps
π Back to the Table of Contents
- π₯ In Search of Best Practices for NLP Projects [Slides, Dec. 2020]
- π₯ EMNLP 2020: High Performance Natural Language Processing by Google Research, Recording, Nov. 2020]
- π Practical Natural Language Processing - A Comprehensive Guide to Building Real-World NLP Systems [Book, June 2020]
- π How to Structure and Manage NLP Projects [Blog, May 2021]
- π Applied NLP Thinking - Applied NLP Thinking: How to Translate Problems into Solutions [Blog, June 2021]
- π₯ Introduction to NLP for Industry Use - DataTalksClub presentation on Introd 4D32 uction to NLP for Industry Use [Recording, December 2021]
- π Measuring Embedding Drift - Best practices for monitoring drift of NLP models [Blog, December 2022]
MLOps, especially when applied to NLP, is a set of best practices around automating various parts of the workflow when building and deploying NLP pipelines.
In general, MLOps for NLP includes having the following processes in place:
- Data Versioning - make sure your training, annotation and other types of data are versioned and tracked
- Experiment Tracking - make sure that all of your experiments are automatically tracked and saved where they can be easily replicated or retraced
- Model Registry - make sure any neural models you train are versioned and tracked and it is easy to roll back to any of them
- Automated Testing and Behavioral Testing - besides regular unit and integration tests, you want to have behavioral tests that check for bias or potential adversarial attacks
- Model Deployment and Serving - automate model deployment, ideally also with zero-downtime deploys like Blue/Green, Canary deploys etc.
- Data and Model Observability - track data drift, model accuracy drift etc.
Additionally, there are two more components that are not as prevalent for NLP and are mostly used for Computer Vision and other sub-fields of AI:
- Feature Store - centralized storage of all features developed for ML models than can be easily reused by any other ML project
- Metadata Management - storage for all information related to the usage of ML models, mainly for reproducing behavior of deployed ML models, artifact tracking etc.
- β awesome-mlops [GitHub, 12526 stars]
- β best-of-ml-python [GitHub, 16309 stars]
- ποΈ MLOps.Toys - a curated list of MLOps projects
- π Machine Learning Operations (MLOps): Overview, Definition, and Architecture [Paper, May 2022]
- π Requirements and Reference Architecture for MLOps:Insights from Industry [Paper, Oct 2022]
- π MLOps: What It Is, Why it Matters, and How To Implement It by Neptune AI [Blog, July 2021]
- π Best MLOps Tools You Need to Know as a Data Scientist by Neptune AI [Blog, July 2021]
- π State of MLOps 2021 by Valohai [Blog, August 2021]
- π The MLOps Stack by Valohai [Blog, October 2020]
- π Data Version Control for Machine Learning Applications by Megagon AI [Blog, July 2021]
- π The Rapid Evolution of the Canonical Stack for Machine Learning [Blog, July 2021]
- π MLOps: Comprehensive Beginnerβs Guide [Blog, March 2021]
- π What Iβve learned about MLOps from speaking with 100+ ML practitioners [Blog, May 2021]
- π DataRobot Challenger Models - MLOps Champion/Challenger Models
- π State of MLOps Blog by Dr. Ori Cohen
- π MLOps Ecosystem Overview [Blog, 2021]
- π MLOps cource by Made With ML
- π GitHub MLOps - collection of resources on how to facilitate Machine Learning Ops with GitHub
- π ML Observability Fundamentals Course Learn how to monitor and root-cause issues with production NLP models
- The MLOps Community - blogs, slack group, newsletter and more all about MLOps
- β DVC - Data Version Control (DVC) tracks ML models and data sets [Free and Open Source] Link to GitHub
- π± Weights & Biases - tools for experiment tracking and dataset versioning [Paid Service]
- π± Pachyderm - version control for data with the tools to build scalable end-to-end ML/AI pipelines [Paid Service with Free Tier]
- β mlflow - open source platform for the machine learning lifecycle [Free and Open Source] Link to GitHub
- π± Weights & Biases - tools for experiment tracking and dataset versioning [Paid Service]
- π± Neptune AI - experiment tracking and model registry built for research and production teams [Paid Service]
- π± Comet ML - enables data scientists and teams to track, compare, explain and optimize experiments and models [Paid Service]
- π± SigOpt - automate training & tuning, visualize & compare runs [Paid Service]
- β Optuna - hyperparameter optimization framework [GitHub, 10650 stars]
- β Clear ML - experiment, orchestrate, deploy, and build data stores, all in one place [Free and Open Source] Link to GitHub
- β Metaflow - human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects [GitHub, 8093 stars]
- β DVC - Data Version Control (DVC) tracks ML models and data sets [Free and Open Source] Link to GitHub
- β mlflow - open source platform for the machine learning lifecycle [Free and Open Source] Link to GitHub
- β ModelDB - open-source system for Machine Learning model versioning, metadata, and experiment management [GitHub, 1696 stars]
- π± Neptune AI - experiment tracking and model registry built for research and production teams [Paid Service]
- π± Valohai - End-to-end ML pipelines [Paid Service]
- π± Pachyderm - version control for data with the tools to build scalable end-to-end ML/AI pipelines [Paid Service with Free Tier]
- π± polyaxon - reproduce, automate, and scale your data science workflows with production-grade MLOps tools [Paid Service]
- π± Comet ML - enables data scientists and teams to track, compare, explain and optimize experiments and models [Paid Service]
- β CheckList - Beyond Accuracy: Behavioral Testing of NLP models [GitHub, 2003 stars]
- β TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 2922 stars]
- β WildNLP - Corrupt an input text to test NLP models' robustness [GitHub, 76 stars]
- β Great Expectations - Write tests for your data [GitHub, 9874 stars]
- β Deepchecks - Python package for comprehensively validating your machine learning models and data [GitHub, 3582 stars]
- β mlflow - open source platform for the machine learning lifecycle [Free and Open Source] Link to GitHub
- π± Amazon SageMaker [Paid Service]
- π± Valohai - End-to-end ML pipelines [Paid Service]
- π± NLP Cloud - Production-ready NLP API [Paid Service]
- π± Saturn Cloud [Paid Service]
- π± SELDON BE96 - machine learning deployment for enterprise [Paid Service]
- π± Comet ML - enables data scientists and teams to track, compare, explain and optimize experiments and models [Paid Service]
- π± polyaxon - reproduce, automate, and scale your data science workflows with production-grade MLOps tools [Paid Service]
- β TorchServe - flexible and easy to use tool for serving PyTorch models [GitHub, 4174 stars]
- π± Kubeflow - The Machine Learning Toolkit for Kubernetes [GitHub, 10600 stars]
- β KFServing - Serverless Inferencing on Kubernetes [GitHub, 3504 stars]
- π± TFX - TensorFlow Extended - end-to-end platform for deploying production ML pipelines [Paid Service]
- π± Pachyderm - version control for data with the tools to build scalable end-to-end ML/AI pipelines [Paid Service with Free Tier]
- π± Cortex - containers as a service on AWS [Paid Service]
- π± Azure Machine Learning - end-to-end machine learning lifecycle [Paid Service]
- β End2End Serverless Transformers On AWS Lambda [GitHub, 121 stars]
- β NLP-Service - sample demo of NLP as a service platform built using FastAPI and Hugging Face [GitHub, 13 stars]
- π± Dagster - data orchestrator for machine learning [Free and Open Source]
- π± Verta - AI and machine learning deployment and operations [Paid Service]
- β Metaflow - human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects [GitHub, 8093 stars]
- β flyte - workflow automation platform for complex, mission-critical data and ML processes at scale [GitHub, 5525 stars]
- β MLRun - Machine Learning automation and tracking [GitHub, 1425 stars]
- π± DataRobot MLOps - DataRobot MLOps provides a center of excellence for your production AI
- β imodels - package for concise, transparent, and accurate predictive modeling [GitHub, 1375 stars]
- β Cockpit - A Practical Debugging Tool for Training Deep Neural Networks [GitHub, 474 stars]
- β WeightWatcher - WeightWatcher tool for predicting the accuracy of Deep Neural Networks [GitHub, 1453 stars]
- β Arize AI - embedding drift monitoring for NLP models
- β Arize-Phoenix - ML observability for LLMs, vision, language, and tabular models
- β whylogs - open source standard for data and ML logging [GitHub, 2636 stars]
- β Rubrix - open-source tool for exploring and iterating on data for artificial intelligence projects [GitHub, 3843 stars]
- β MLRun - Machine Learning automation and tracking [GitHub, 1425 stars]
- π± DataRobot MLOps - DataRobot MLOps provides a center of excellence for your production AI
- π± Cortex - containers as a service on AWS [Paid Service]
- π± Algorithmia - minimize risk with advanced reporting and enterprise-grade security and governance across all data, models, and infrastructure [Paid Service]
- π± Dataiku - dataiku is for teams who want to deliver advanced analytics using the latest techniques at big data scale [Paid Service]
- β Evidently AI - tools to analyze and monitor machine learning models [Free and Open Source] Link to GitHub
- π± Fiddler - ML Model Performance Management Tool [Paid Service]
- π± Hydrosphere - open-source platform for managing ML models [Paid Service]
- π± Verta - AI and machine learning deployment and operations [Paid Service]
- π± Domino Model Ops - Deploy and Manage Models to Drive Business Impact [Paid Service]
- π± Datafold - data quality through diffs, profiling, and anomaly detection [Paid Service]
- π± acceldata - improve reliability, accelerate scale, and reduce costs across all data pipelines [Paid Service]
- π± Bigeye - monitoring and alerting to your datasets in minutes [Paid Service]
- π± datakin - end-to-end, real-time data lineage solution [Paid Service]
- π± Monte Carlo - data integrity, drifts, schema, lineage [Paid Service]
- π± SODA - data monitoring, testing and validation [Paid Service]
- π± Tecton - enterprise feature store for machine learning [Paid Service]
- β FEAST - open source feature store for machine learning Website [GitHub, 5525 stars]
- π± Hopsworks Feature Store - data management system for managing machine learning features [Paid Service]
- β ML Metadata - a library for recording and retrieving metadata associated with ML developer and data scientist workflows [GitHub, 617 stars]
- π± Neptune AI - experiment tracking and model registry built for research and production teams [Paid Service]
- β Metaflow - human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects [GitHub, 8093 stars]
- β kedro - Python framework for creating reproducible, maintainable and modular data science code [GitHub, 9883 stars]
- β Seldon Core - MLOps framework to package, deploy, monitor and manage thousands of production machine learning models [GitHub, 4353 stars]
- β ZenML - MLOps framework to create reproducible ML pipelines for production machine learning [GitHub, 3972 stars]
- π± Google Vertex AI - build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified AI platform [Paid Service]
- β Diffgram - Complete training data platform for machine learning delivered as a single application [GitHub, 1834 stars]
- π± Continual.ai - build, deploy, and operationalize ML models easier and faster with a declarative interface on cloud data warehouses like Snowflake, BigQuery, RedShift, and Databricks. [Paid Service]
π Back to the Table of Contents
- π Why BERT Fails in Commercial Environments by Intel AI [Blog, 2020]
- π Fine Tuning BERT for Text Classification with FARM by Sebastian Guggisberg [Blog, 2020]
- β Pretrain Transformers Models in PyTorch using Hugging Face Transformers [GitHub, 254 stars]
- π₯οΈ Practical NLP for the Real World [Presentation, 2019]
- π₯οΈ From Paper to Product β How we implemented BERT by Christoph Henkelmann [Talk, 2020]
- β Parallelformers: An Efficient Model Parallelization Toolkit for Deployment [GitHub, 776 stars]
- β Training BERT with Compute/Time (Academic) Budget [GitHub, 309 stars]
- β embedding-as-service [GitHub, 204 stars]
- β Bert-as-service [GitHub, 12399 stars]
- β NLP Recipes by microsoft [GitHub, 6367 stars]
- β NLP with Python by susanli2016 [GitHub, 2721 stars]
- β Basic Utilities for PyTorch NLP by PetrochukM [GitHub, 2210 stars]
- β Blackstone - A spaCy pipeline and model for NLP on unstructured legal text [GitHub, 636 stars]
- β Sci spaCy - spaCy pipeline and models for scientific/biomedical documents [GitHub, 1688 stars]
- β FinBERT: Pre-Trained on SEC Filings for Financial NLP Tasks [GitHub, 197 stars]
- β LexNLP - Information retrieval and extraction for real, unstructured legal text [GitHub, 692 stars]
- β NerDL and NerCRF - Tutorial on Named Entity Recognition for Healthcare with SparkNLP
- β Legal Text Analytics - A list of selected resources dedicated to Legal Text Analytics [GitHub, 613 stars]
- β BioIE - A curated list of resources relevant to doing Biomedical Information Extraction [GitHub, 338 stars]
Note Section keywords: speech recognition
π Back to the Table of Contents
- β wav2letter - Automatic Speech Recognition Toolkit [GitHub, 6370 stars]
- β DeepSpeech - Baidu's DeepSpeech architecture [GitHub, 25166 stars]
- π Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]
- β kaldi - Kaldi is a toolkit for speech recognition [GitHub, 14177 stars]
- β awesome-kaldi - resources for using Kaldi [GitHub, 532 stars]
- β ESPnet - End-to-End Speech Processing Toolkit [GitHub, 8355 stars]
- π HuBERT - Self-supervised representation learning for speech recognition, generation, and compression [Blog, June 2021]
- β FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub, 857 stars]
- β TTS - a deep learning toolkit for Text-to-Speech [GitHub, 34356 stars]
- π± NotebookLM - Google Gemini powered personal assistant / podcast generator
- β whisper - Robust Speech Recognition via Large-Scale Weak Supervision, by OpenAI [GitHub, 68884 stars]
- β vibe - GUI tool to work with whisper, multilingual and cuda support included [GitHub, 931 stars]
- β VoxPopuli - large-scale multilingual speech corpus for representation learning [GitHub, 507 stars]
Note Section keywords: topic modeling
π Back to the Table of Contents
- π Topic Modelling with PySpark and Spark NLP by Maria Obedkova [Spark, Blog, 2020]
- π A Unique Approach to Short Text Clustering (Algorithmic Theory) by Brittany Bowers [Blog, 2020]
- β Top2Vec [GitHub, 2924 stars]
- β Anchored Correlation Explanation Topic Modeling [GitHub, 303 stars]
- β Topic Modeling in Embedding Spaces [GitHub, 540 stars] Paper
- β TopicNet - A high-level interface for BigARTM library [GitHub, 140 stars]
- β BERTopic - Leveraging BERT and a class-based TF-IDF to create easily interpretable topics [GitHub, 6038 stars]
- β OCTIS - A python package to optimize and evaluate topic models [GitHub, 718 stars]
- β Contextualized Topic Models [GitHub, 1196 stars]
- β GSDMM - GSDMM: Short text clustering [GitHub, 353 stars]
Note Section keywords: keyword extraction
π Back to the Table of Contents
- β PyTextRank - PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension [GitHub, 2132 stars]
- β textrank - TextRank implementation for Python 3 [GitHub, 1248 stars]
- β rake-nltk - Rapid Automatic Keyword Extraction algorithm using NLTK [GitHub, 1061 stars]
- β yake - Single-document unsupervised keyword extraction [GitHub, 1632 stars]
- β RAKE-tutorial - A python implementation of the Rapid Automatic Keyword Extraction [GitHub, 375 stars]
- β rake-nltk - Rapid Automatic Keyword Extraction algorithm using NLTK [GitHub, 1061 stars]
- β flashtext - Extract Keywords from sentence or Replace keywords in sentences [GitHub, 5583 stars]
- β BERT-Keyword-Extractor - Deep Keyphrase Extraction using BERT [GitHub, 254 stars]
- β keyBERT - Minimal keyword extraction with BERT [GitHub, 3471 stars]
- β KeyphraseVectorizers - vectorizers that extract keyphrases with part-of-speech patterns [GitHub, 251 stars]
- π Adding a custom tokenizer to spaCy and extracting keywords from Chinese texts by Haowen Jiang [Blog, Feb 2021]
- π How to Extract Relevant Keywords with KeyBERT [Blog, June 2021]
Note Section keywords: ethics, responsible NLP
π Back to the Table of Contents
- Explainability for Natural Language Processing - KDD'2021 Tutorial Slides [Presentation, August 2021]
- β ecco - Tools to visuals and explore NLP language models [GitHub, 1974 stars]
- β NLP Profiler - A simple NLP library allows profiling datasets with text columns [GitHub, 243 stars]
- β transformers-interpret - Model explainability that works seamlessly with transformers [GitHub, 1278 stars]
- β Awesome-explainable-AI - collection of research materials on explainable AI/ML [GitHub, 1400 stars]
- β LAMA - LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models [GitHub, 1346 stars]
- β Language Interpretability Tool (LIT) [GitHub, 3474 stars]
- β WhatLies - Toolkit to help visualise - what lies in word embeddings [GitHub, 468 stars]
- β Interpret-Text - Interpretability techniques and visualization dashboards for NLP models [GitHub, 413 stars]
- β InterpretML - Fit interpretable models. Explain blackbox machine learning [GitHub, 6238 stars]
- β thermostat - Collection of NLP model explanations and accompanying analysis tools [GitHub, 143 stars]
- β Dodrio - Exploring attention weights in transformer-based models with linguistic knowledge [GitHub, 342 stars]
- β imodels - package for concise, transparent, and accurate predictive modeling [GitHub, 1375 stars]
- π Bias in Natural Language Processing @EMNLP 2020 [Blog, Nov 2020]
- π₯οΈ Machine Learning as a Software Engineering Enterprise - NeurIPS 2020 Keynote [Presentation, Dec 2020]
- ποΈ Ethics in NLP - resources from ACLs Ethics in NLP track
- ποΈ The Institute for Ethical AI & Machine Learning
- π Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models [Paper, Feb 2021]
- β Fairness-in-AI - this package is used to detect and mitigate biases in NLP tasks [GitHub, 77 stars]
- β nlg-bias - dataset + classifier tools to study social perception biases in natural language generation [GitHub, 65 stars]
- ποΈ bias-in-nlp - list of papers related to bias in NLP [GitHub, 9 stars]
- π Privacy Considerations in Large Language Models [Blog, Dec 2020]
- β DeepWordBug - Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers [GitHub, 73 stars]
- β Adversarial-Misspellings - Combating Adversarial Misspellings with Robust Word Recognition [GitHub, 62 stars]
- β HateXplain - BERT for detecting abusive language [GitHub, 187 stars]
Note Section keywords: frameworks
π Back to the Table of Contents
- β spaCy by Explosion AI [GitHub, 29784 stars]
- β flair by Zalando [GitHub, 13855 stars]
- β AllenNLP by AI2 [GitHub, 11740 stars]
- β stanza (former Stanford NLP) [GitHub, 7253 stars]
- β spaCy stanza [GitHub, 723 stars]
- β nltk [GitHub, 13489 stars]
- β gensim - framework for topic modeling [GitHub, 15597 stars]
- β pororo - Platform of neural models for natural language processing [GitHub, 1279 stars]
- β NLP Architect - A Deep Learning NLP/NLU library by IntelΒ AI Lab [GitHub, 2936 stars]
- β FARM [GitHub, 1734 stars]
- β gobbli by RTI International [GitHub, 275 stars]
- β headliner - training and deployment of seq2seq models [GitHub, 229 stars]
- β SyferText - A privacy preserving NLP framework [GitHub, 197 stars]
- β DeText - Text Understanding Framework for Ranking and Classification Tasks [GitHub, 1263 stars]
- β TextHero - Text preprocessing, representation and visualization [GitHub, 2882 stars]
- β textblob - TextBlob: Simplified Text Processing [GitHub, 9109 stars]
- β AdaptNLP - A high level framework and library for NLP [GitHub, 407 stars]
- β textacy - NLP, before and after spaCy [GitHub, 2209 stars]
- β texar - Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow [GitHub, 2388 stars]
- β jiant - jiant is an NLP toolkit [GitHub, 1639 stars]
- β WildNLP Text manipulation library to test NLP models [GitHub, 76 stars]
- β snorkel Framework to generate training data [GitHub, 5791 stars]
- β NLPAug Data augmentation for NLP [GitHub, 4419 stars]
- β SentAugment Data augmentation by retrieving similar sentences from larger datasets [GitHub, 363 stars]
- β faker - Python package that generates fake data for you [GitHub, 17648 stars]
- β textflint - Unified Multilingual Robustness Evaluation Toolkit for NLP [GitHub, 639 stars]
- β Parrot - Practical and feature-rich paraphrasing framework [GitHub, 871 stars]
- β AugLy - data augmentations library for audio, image, text, and video [GitHub, 4950 stars]
- β TextAugment - Python 3 library for augmenting text for natural language processing applications [GitHub, 396 stars]
- β TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 2922 stars]
- β CleverHans - adversarial example library for constructing NLP attacks and building defenses [GitHub, 6172 stars]
- β CheckList - Beyond Accuracy: Behavioral Testing of NLP models [GitHub, 2003 stars]
- β transformers by HuggingFace [GitHub, 132974 stars]
- β Adapter Hub and its documentation - Adapter modules for Transformers [GitHub, 2543 stars]
- β haystack - Transformers at scale for question answering & neural search. [GitHub, 16997 stars]
- β DeepPavlov by MIPT [GitHub, 6676 stars]
- β ParlAI by FAIR [GitHub, 10477 stars]
- β rasa - Framework for Conversational Agents [GitHub, 18726 stars]
- β wav2letter - Automatic Speech Recognition Toolkit [GitHub, 6370 stars]
- β ChatterBot - conversational dialog engine for creating chatbots [GitHub, 14039 stars]
- β SpeechBrain - open-source and all-in-one speech toolkit based on PyTorch [GitHub, 8674 stars]
- β dialoguefactory Generate continuous dialogue data in a simulated textual world [GitHub, 5 stars]
- β MUSE A library for Multilingual Unsupervised or Supervised word Embeddings [GitHub, 3181 stars]
- β vecmap A framework to learn cross-lingual word embedding mappings [GitHub, 644 stars]
- β sentence-transformers - Multilingual Sentence & Image Embeddings with BERT [GitHub, 14981 stars]
- β Ekphrasis - text processing tool, geared towards text from social networks [GitHub, 661 stars]
- β DeepPhonemizer - grapheme to phoneme conversion with deep learning [GitHub, 352 stars]
- β LemmInflect - python module for English lemmatization and inflection [GitHub, 259 stars]
- β Inflect - generate plurals, ordinals, indefinite articles [GitHub, 964 stars]
- β simplemma - simple multilingual lemmatizer for Python [GitHub, 964 stars]
- β polyglot - Multi-lingual NLP Framework [GitHub, 2309 stars]
- β trankit - Light-Weight Transformer-based Python Toolkit for Multilingual NLP [GitHub, 730 stars]
- β Spark NLP [GitHub, 3826 stars]
- β Parallelformers: An Efficient Model Parallelization Toolkit for Deployment [GitHub, 776 stars]
- β COMET -A Neural Framework for MT Evaluation [GitHub, 493 stars]
- β marian-nmt - Fast Neural Machine Translation in C++ [GitHub, 1236 stars]
- β argos-translate - Open source neural machine translation in Python [GitHub, 3771 stars]
- β Opus-MT - Open neural machine translation models and web services [GitHub, 605 stars]
- β dl-translate - A deep learning-based translation library built on Huggingface transformers [GitHub, 440 stars]
- β CTranslate2 - CTranslate2 end-to-end machine translation [GitHub, 3300 stars]
- β PolyFuzz - Fuzzy string matching, grouping, and evaluation [GitHub, 736 stars]
- β pyahocorasick - Python module implementing Aho-Corasick algorithm for string matching [GitHub, 937 stars]
- β fuzzywuzzy - Fuzzy String Matching in Python [GitHub, 9220 stars]
- β jellyfish - approximate and phonetic matching of strings [GitHub, 2049 stars]
- β textdistance - Compute distance between sequences [GitHub, 3367 stars]
- β DeepMatcher - Compute distance between sequences [GitHub, 555 stars]
- β RE2 - Simple and Effective Text Matching with Richer Alignment Features [GitHub, 339 stars]
- β Machamp - Machamp: A Generalized Entity Matching Benchmark [GitHub, 17 stars]
- β ConvoKit - Cornell Conversational Analysis Toolkit [GitHub, 543 stars]
- β scrubadub - Clean personally identifiable information from dirty dirty text [GitHub, 394 stars]
- β hashformers - automatically inserting the missing spaces between the words in a hashtag [GitHub, 68 stars]
- β booknlp - a natural language processing pipeline that scales to books and other long documents (in English) [GitHub, 785 stars]
- β bookworm - ingests novels, builds an implicit character network and a deeply analysable graph [GitHub, 76 stars]
- β SemanticFinder - frontend-only live semantic search with transformers.js [GitHub, 224 stars]
- β fugashi - Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis [GitHub, 391 stars]
- β SudachiPy - SudachiPy is a Python version of Sudachi, a Japanese morphological analyzer [GitHub, 390 stars]
- β Konoha - easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code [GitHub, 226 stars]
- β jProcessing - Japanese Natural Langauge Processing Libraries [GitHub, 148 stars]
- β Ginza - Japanese NLP Library using spaCy as framework based on Universal Dependencies [GitHub, 745 stars]
- β kuromoji - self-contained and very easy to use Japanese morphological analyzer designed for search [GitHub, 953 stars]
- β nagisa - Japanese tokenizer based on recurrent neural networks [GitHub, 382 stars]
- β KyTea - Kyoto Text Analysis Toolkit for word segmentation and pronunciation estimation [GitHub, 201 stars]
- β Jigg - Pipeline framework for easy natural language processing [GitHub, 74 stars]
- β Juman++ - Juman++ (a Morphological Analyzer Toolkit) [GitHub, 376 stars]
- β RakutenMA - morphological analyzer (word segmentor + PoS Tagger) for Chinese and Japanese written purely in JavaScript [GitHub, 473 stars]
- β toiro - a comparison tool of Japanese tokenizers [GitHub, 118 stars]
- β AttaCut - Fast and Reasonably Accurate Word Tokenizer for Thai [GitHub, 79 stars]
- β ThaiLMCut - Word Tokenizer for Thai Language [GitHub, 15 stars]
- β Spacy-pkuseg - The pkuseg toolkit for multi-domain Chinese word segmentation [GitHub, 53 stars]
- β recruitment-dataset - Recruitment Dataset Preprocessing and Recommender System (Ukrainian, English)
- β textblob-de - TextBlob: Simplified Text Processing for German [GitHub, 103 stars]
- β Kashgari Transfer Learning with focus on Chinese [GitHub, 2389 stars]
- β Underthesea - Vietnamese NLP Toolkit [GitHub, 1383 stars]
- β PTT5 - Pretraining and validating the T5 model on Brazilian Portuguese data [GitHub, 84 stars]
- β Small-Text - Active Learning for Text Classifcation in Python [GitHub, 549 stars]
- β Doccano - open source annotation tool for machine learning practitioners [GitHub, 9460 stars]
- β Adala - Autonomous DAta (Labeling) Agent framework [GitHub, 927 stars]
- β EDA - Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks [GitHub, 1585 stars]
- π± Prodigy - annotation tool powered by active learning [Paid Service]
Note Section keywords: learn NLP
π Back to the Table of Contents
- π Learn NLP the practical way [Blog, Nov. 2019]
- π Learn NLP the Stanford way (+Part 2) [Blog, Nov 2020]
- π Choosing the right course for a Practical NLP Engineer
- π 12 Best Natural Language Processing Courses & Tutorials to Learn Online
- β Treasure of Transformers - Natural Language processing papers, videos, blogs, official repos along with colab Notebooks [GitHub, 912 stars]
- π₯οΈ Rasa Algorithm Whiteboard - YouTube series by Rasa explaining various Data Science and NLP Algorithms
- π₯οΈ ExplosionAI Videos - YouTube series by ExplosionAI teaching you how to use spacy and apply it for NLP
- π₯οΈ CS25: Transformers United Stanford - Fall 2021 [Course, Fall 2021]
- π NLP Course | For You - Great and interactive course on NLP
- π Advanced NLP with spaCy - how to use spaCy to build advanced natural language understanding systems
- π Transformer models for NLP by HuggingFace
- π₯οΈ Stanford NLP Seminar - slides from the Stanford NLP course
- π Natural Language Processing with Transformers - [Book, February 2022]
- π Applied Natural Language Processing in the Enterprise - [Book, May 2021]
- π Practical Natural Language Processing - [Book, June 2020]
- π Dive into Deep Learning - An interactive deep learning book with code, math, and discussions
- π Natural Language Processing and Computational Linguistics - Speech, Morphology and Syntax (Cognitive Science)
- π Top NLP Books to Read 2020 - Blog post by Raymong Cheng [Blog, Sep 2020]
- β nlp-tutorial - A list of NLP(Natural Language Processing) tutorials built on PyTorch [GitHub, 1366 stars]
- β nlp-tutorial - Natural Language Processing Tutorial for Deep Learning Researchers [GitHub, 14110 stars]
- β Hands-On NLTK Tutorial [GitHub, 540 stars]
- β Modern Practical Natural Language Processing [GitHub, 266 stars]
- β Transformers-Tutorials - demos with the Transformers library by HuggingFace [GitHub, 9176 stars]
- ποΈ CalmCode Tutorials - Set of Python Data Science Tutorials
- r/LanguageTechnology - NLP Reddit forum
π Back to the Table of Contents
- β tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub, 8940 stars]
- β SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub, 10141 stars]
- β SoMaJo - A tokenizer and sentence splitter for German and English web and social media texts [GitHub, 135 stars]
- β WildNLP Text manipulation library to test NLP models [GitHub, 76 stars]
- β NLPAug Data augmentation for NLP [GitHub, 4419 stars]
- β SentAugment Data augmentation by retrieving similar sentences from larger datasets [GitHub, 363 stars]
- β TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 2922 stars]
- β skweak - software toolkit for weak supervision applied to NLP tasks [GitHub, 917 stars]
- β NL-Augmenter - Collaborative Repository of Natural Language Transformations [GitHub, 773 stars]
- β EDA - Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks [GitHub, 1585 stars]
- β snorkel Framework to generate training data [GitHub, 5791 stars]
- β dialoguefactory Generate continuous dialogue data in a simulated textual world [GitHub, 5 stars]
- β A Survey of Data Augmentation Approaches for NLP [Paper, May 2021] GitHub Link
- π A Visual Survey of Data Augmentation in NLP [Blog, 2020]
- π Weak Supervision: A New Programming Paradigm for Machine Learning [Blog, March 2019]
- β Datasets for Entity Recognition [GitHub, 1497 stars]
- β Datasets to train supervised classifiers for Named-Entity Recognition [GitHub, 338 stars]
- β Bootleg - Self-Supervision for Named Entity Disambiguation at the Tail [GitHub, 212 stars]
- β Few-NERD - Large-scale, fine-grained manually annotated named entity recognition dataset [GitHub, 385 stars]
- β tacred-relation TACRED: position-aware attention model for relation extraction [GitHub, 355 stars]
- β tacrev TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [GitHub, 69 stars]
- β tac-self-attention Relation extraction with position-aware self-attention [GitHub, 64 stars]
- β Re-TACRED Re-TACRED: Addressing Shortcomings of the TACRED Dataset [GitHub, 51 stars]
- β NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks by HuggingFace [GitHub, 2850 stars]
- β coref - BERT and SpanBERT for Coreference Resolution [GitHub, 443 stars]
- β Reading list for Awesome Sentiment Analysis papers by declare-lab [GitHub, 517 stars]
- β Awesome Sentiment Analysis by xiamx [GitHub, 913 stars]
- β Neural Adaptation in Natural Language Processing - curated list [GitHub, 261 stars]
- β CMU LTI Low Resource NLP Bootcamp 2020 - CMU Language Technologies Institute low resource NLP bootcamp 2020 [GitHub, 597 stars]
- β Gramformer - ramework for detecting, highlighting and correcting grammatical errors [GitHub, 1502 stars]
- β NeuSpell - A Neural Spelling Correction Toolkit [GitHub, 665 stars]
- β SymSpellPy - Python port of SymSpell [GitHub, 796 stars]
- π Speller100 by Microsoft [Blog, Feb 2021]
- β JamSpell - spell checking library - accurate, fast, multi-language [GitHub, 608 stars]
- β pycorrector - spell correction for Chinese [GitHub, 5517 stars]
- β contractions - Fixes contractions such as
you're
to youare
[GitHub, 308 stars] - π Fine Tuning T5 for Grammar Correction by Sachin Abeywardana [Blog, Nov 2022]
- β Styleformer - Neural Language Style Transfer framework [GitHub, 475 stars]
- β StylePTB - A Compositional Benchmark for Fine-grained Controllable Text Style Transfer [GitHub, 60 stars]
- β pyahocorasick - Python module implementing Aho-Corasick algorithm for string matching [GitHub, 937 stars]
- β LDNOOBW - List of Dirty, Naughty, Obscene, and Otherwise Bad Words [GitHub, 2899 stars]
- β Subreddit Analyzer - comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit [GitHub, 489 stars]
- β SkillNER - rule based NLP module to extract job skills from text [GitHub, 153 stars]
- β nlp-gym - NLPGym - A toolkit to develop RL agents to solve NLP tasks [GitHub, 192 stars]
- β AutoNLP - Faster and easier training and deployments of SOTA NLP models [GitHub, 3836 stars]
- β TPOT - Python Automated Machine Learning tool [GitHub, 9691 stars]
- β Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch [GitHub, 2359 stars]
- β HungaBunga - Brute-Force all sklearn models with all parameters using .fit .predict [GitHub, 710 stars]
- π± AutoML Natural Language - Google's paid AutoML NLP service
- β Optuna - hyperparameter optimization framework [GitHub, 10650 stars]
- β FLAML - fast and lightweight AutoML library [GitHub, 3871 stars]
- β Gradsflow - open-source AutoML & PyTorch Model Training Library [GitHub, 306 stars]
- π₯οΈ A framework for designing document processing solutions [Blog, June 2022]
- β keytotext - a model which will take keywords as inputs and generate sentences as outputs [GitHub, 445 stars]
- π Controllable Neural Text Generation [Blog, Jan 2021]
- β BARTScore Evaluating Generated Text as Text Generation [GitHub, 317 stars]
- β TitleStylist Learning to Generate Headlines with Controlled Styles [GitHub, 76 stars]
- π A Systematic Review of Reproducibility Research in Natural Language Processing [Paper, March 2021]
License CC0
- All linked resources belong to original authors
- Akropolis by parkjisun from the Noun Project
- Book of Ester by Gilad Sotil from the Noun Project
- quill by Juan Pablo Bravo from the Noun Project
- acting by Flatart from the Noun Project
- olympic by supalerk laipawat from the Noun Project
- aristocracy by Eucalyp from the Noun Project
- Horn by Eucalyp from the Noun Project
- temple by Eucalyp from the Noun Project
- constellation by Eucalyp from the Noun Project
- ancient greek round pattern by Olena Panasovska from the Noun Project
- Harp by Vectors Point from the Noun Project
- Atlas by parkjisun from the Noun Project
- Parthenon by Eucalyp from the Noun Project
- papyrus by IconMark from the Noun Project
- papyrus by Smalllike from the Noun Project
- pegasus by Saeful Muslim from the Noun Project