Stars
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
ClickHouse® is a real-time analytics database management system
A library for efficient similarity search and clustering of dense vectors.
eBPF Observability - Distributed Tracing and Profiling
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
LlamaIndex is the leading framework for building LLM-powered agents over your data.
OpenSearch Kubernetes Operator
A flexible distributed key-value database that is optimized for caching and other realtime workloads.
Performance analysis tools based on Linux perf_events (aka perf) and ftrace
JDK main-line development https://openjdk.org/projects/jdk
Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
Fast and Lightweight Observability Data Collector
This tool extracts word vectors from Lucene index.
Apache Pulsar - distributed pub-sub messaging system
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
🔎 Open source distributed and RESTful search engine.
🏡 Open source home automation that puts local control and privacy first.
A Cloud Native traffic orchestration system