Releases: neuml/txtai
v8.5.0
This release migrates from Transformers Agents to smolagents, adds Model Context Protocol (MCP) support and now requires Python 3.10+
See below for full details on the new features, improvements and bug fixes.
New Features
- Migrate to smolagents (#890)
- Add Model Context Protocol (MCP) Support (#892)
- Add support for MCP servers to Agent Framework (#898)
- Require Python 3.10 (#897)
Improvements
- Lazy load list of translation models (#896)
Bug Fixes
v8.4.0
This release adds support for vision LLMs, graph vector search, embeddings checkpoints, observability and an OpenAI-compatible API
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for vision models to HF LLM pipeline (#884)
- Add similar query clause to graph queries (#875)
- Feature Request: Embeddings index checkpointing (#695)
- Feature Request: Enhance observability and tracing capabilities (#869)
- Add OpenAI API compatible endpoint to API (#883)
- Add example notebook showing how to use OpenAI compatible API (#887)
- Add texttospeech pipeline to API (#552)
- Add upload endpoint to API (#659)
Improvements
- Add encoding parameter to TextToSpeech pipeline (#885)
- Add support for input streams to Transcription pipeline (#886)
Bug Fixes
- Fix bug with latest version of Transformers and model registry (#878)
v8.3.1
v8.3.0
This release adds support for GLiNER, Chonkie, Kokoro TTS and Static Vectors
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for GLiNER models (#862) Thank you @urchade
- Add semantic chunking pipeline (#812) Thank you @bhavnicksm
- Add Kokoro TTS support to TextToSpeech pipeline (#854) Thank you @hexgrad
- Add staticvectors inference (#859)
- Add example notebook for Entity Extraction with GLiNER (#873)
- Add example notebook for RAG Chunking (#874)
- Add notebook that analyzes NeuML LinkedIn posts (#851)
Improvements
- Add new methods for audio signal processing (#855)
- Remove fasttext dependency (#857)
- Remove WordVectors.build method (#858)
- Detect graph queries and route to graph index (#865)
- Replace python-louvain library with networkx equivalent (#867)
- Word vector model improvements (#868)
- Improve parsing of table text in HTML to Markdown pipeline (#872)
Bug Fixes
v8.2.0
This release simplifies LLM chat messages, adds attribute filtering to Graph RAG and enables multi-cpu/gpu vector encoding
See below for full details on the new features, improvements and bug fixes.
New Features
- Add defaultrole to LLM pipeline (#841)
- Feature Request: Graph RAG - Add extra attributes (#684)
- Support graph=True in embeddings config (#848)
- Support pulling attribute data in graph.scan (#849)
- Encoding using multiple-GPUs (#541)
- Add vectors argument to Model2Vec vectors (#846)
- Enhanced Docs: LLM Embedding Examples (#843, #844) Thank you @igorlima!
Improvements
v8.1.0
This release adds Docling integration, Embeddings context managers and significant database component enhancements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add text extraction with Docling (#814)
- Add Embeddings context manager (#832)
- Add support for halfvec and bit vector types with PGVector ANN (#839)
- Persist embeddings components to specified schema (#829)
- Add example notebook that analyzes the Hugging Face Posts dataset (#817)
- Add an example notebook for autonomous agents (#820)
Improvements
- Cloud storage improvements (#821)
- Autodetect Model2Vec model paths (#822)
- Add parameter to disable text cleaning in Segmentation pipeline (#823)
- Refactor vectors package (#826)
- Refactor Textractor pipeline into multiple pipelines (#828)
- RDBMS graph.delete tests and upgrade graph dependency (#837)
- Bound ANN hamming scores between 0.0 and 1.0 (#838)
Bug Fixes
- Fix error with inferring function parameters in agents (#816)
- Add programmatic workaround for Faiss + macOS (#818) Thank you @yukiman76!
- docs: update 49_External_database_integration.ipynb (#819) Thank you @eltociear!
- Fix memory issue with llama.cpp LLM pipeline (#824)
- Fix issue with calling cached_file for local directories (#825)
- Fix resource issues with embeddings indexing components backed by databases (#831)
- Fix bug with NetworkX.hasedge method (#834)
v8.0.0
π We're excited to announce the release of txtai 8.0 π
If you like txtai, please remember to give it a β!
8.0 introduces agents. Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.
This release also adds support for Model2Vec vectorization. See below for more.
New Features
- Add txtai agents π (#804)
- Add agents package to txtai (#808)
- Add documentation for txtai agents (#809)
- Add agents to Application and API interfaces (#810)
- Add agents example notebook (#811)
- Add model2vec vectorization (#801)
Improvements
- Update BASE_IMAGE in Dockerfile (#799)
- Cleanup vectors package (#802)
- Build script improvements (#805)
Bug Fixes
- ImportError: cannot import name 'DuckDuckGoSearchTool' from 'transformers.agents' (#807)
v7.5.1
v7.5.0
This release adds Speech to Speech RAG, new TTS models and Generative Audio features
See below for full details on the new features, improvements and bug fixes.
New Features
- Add Speech to Speech example notebook (#789)
- Add streaming speech generation (#784)
- Add a microphone pipeline (#785)
- Add an audio playback pipeline (#786)
- Add Text to Audio pipeline (#792)
- Add support for SpeechT5 ONNX exports with Text to Speech pipeline (#793)
- Add audio signal processing and mixing methods (#795)
- Add Generative Audio example notebook (#798)
- Add example notebook covering open data access (#782)
Improvements
- Issue with Language Specific Transcription Using txtai and Whisper (#593)
- Update TextToSpeech pipeline to support speaker parameter (#787)
- Update Text to Speech Generation Notebook (#790)
- Update hf_hub_download methods to use cached_file (#794)
- Require Python >= 3.9 (#796)
- Upgrade pylint and black (#797)
v7.4.0
This release adds the SQLite ANN, new text extraction features and a programming language neutral embeddings index format
See below for full details on the new features, improvements and bug fixes.
New Features
- Add SQLite ANN (#780)
- Enhance markdown support for Textractor (#758)
- Update txtai index format to remove Python-specific serialization (#769)
- Add new functionality to RAG application (#753)
- Add bm25s library to benchmarks (#757) Thank you @a0346f102085fe9f!
- Add serialization package for handling supported data serialization methods (#770)
- Add MessagePack serialization as a top level dependency (#771)
Improvements
- Support
<pre>
blocks with Textractor (#749) - Update HF LLM to reduce noisy warnings (#752)
- Update NLTK model downloads (#760)
- Refactor benchmarks script (#761)
- Update documentation to use base imports (#765)
- Update examples to use RAG pipeline instead of Extractor when paired with LLMs (#766)
- Modify NumPy and Torch ANN components to use np.load/np.save (#772)
- Persist Embeddings index ids (only used when content storage is disabled) with MessagePack (#773)
- Persist Reducer component with skops library (#774)
- Persist NetworkX graph component with MessagePack (#775)
- Persist Scoring component metadata with MessagePack (#776)
- Modify vector transforms to load/save data using np.load/np.save (#777)
- Refactor embeddings configuration into separate component (#778)
- Document txtai index format (#779)