Starred repositories
Supercharge Your LLM Application Evaluations 🚀
NDVI forecasting using Weather data
A Tutorial for Setting Python Development Environment with VScode and Docker
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
Probabilistic time series modeling in Python
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter multilingual chat model based on BLOOM.
Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more.
Devon: An open-source pair programmer
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Tools for merging pretrained large language models.
Datasets, Transforms and Models specific to Computer Vision
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
A Framework of Small-scale Large Multimodal Models
A Survey on Data Selection for Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Tool for data extraction and interacting with Lean programmatically.
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Awesome List of Tamil NLP & AI Resources