v1.31.0 - Backward compatible named vectors, MUVERA, HNSW Snapshotting, BM25 AND/OR
Breaking Changes
None
New Features
Backward Compatible Named Vectors
- feat(named-vectors): change auto-schema to create a named vector instead of legacy one by @faustuzas in #7678
- feat(named-vectors): forward 'default' vector to legacy vector in case of mixed vector by @faustuzas in #7711
- feat: allow to refer to legacy vector as default named vector in mixed collections by @faustuzas in #7749
- Revert "feat: change autoschema to create a named vector (#7678)" by @faustuzas in #7764
- Enable by default adding of new named vectors to existing collections by @antas-marcin in #8122
MUVERA
- Add Muvera encoding by @robbespo00 in #7774
- Update default muvera reptitions by @robbespo00 in #8314
HNSW Snapshotting
- feature: HNSW periodic snapshots by @aliszka in #7974
- HNSW snapshots: automated tests of should create snapshot logic by @aliszka in #8166
- HNSW snapshots by @asdine in #7754
Keyword Search AND/OR Operators
- feat(bm25_block): ✨ support to minimum should match and AND by @amourao in #8124
- Refactor/bm25 and args by @amourao in #8229
- refactor(bm25_block): ♻️ change to lowercase minimumOrTokensMatch by @amourao in #8292
Replica Movement
- Integrate Replica Copy via File Copy with FSM/Engine by @nathanwilk7 in #7583
- add mocks for query router by @reyreaud-l in #7613
- decouple replication engine from FSM to allow to pass raft handle & client to leader by @reyreaud-l in #7627
- add GetFile impl to integ test fake remote client by @reyreaud-l in #7628
- add rbac to new replica endpoints by @reyreaud-l in #7633
- Replica movement POC by @reyreaud-l in #7432
- Add a Stub for the
GET /replication/replicate/{id}
Endpoint by @salvatore-campagna-weaviate in #7648 - fix direct candidate resolving in query routing by @reyreaud-l in #7720
- Implement the
GET /v1/replication/replicate/{id}
API by @salvatore-campagna-weaviate in #7722 - Add tests for replication manager in REST API handlers by @salvatore-campagna-weaviate in #7746
- [CLU-53] makes replication FSM works with replication/update status cmd by @reyreaud-l in #7691
- Remove dedicated interface for details reader by @reyreaud-l in #7756
- Add FSM state Prometheus metrics for replica replication operations by @salvatore-campagna-weaviate in #7777
- chore: shard copy with integrity checking by @jeroiraz in #7860
- Refactor
ShardReplicationEngine
and improve testing by @salvatore-campagna-weaviate in #7827 - fix: snapshot restore hanlde cases where rbac is nil by @moogacs in #7894
- Add replication engine ops and lifecycle callbacks with Prometheus metrics by @salvatore-campagna-weaviate in #7868
- add snapshot/restore mechanism to replication fsm by @reyreaud-l in #7920
- Skip replication ops that are already running or completed by @salvatore-campagna-weaviate in #7927
- Rename
opsByNode
toopsByTarget
and addopsBySource
by @salvatore-campagna-weaviate in #7961 - Add a Grafana dashboard for monitoring the replication engine by @salvatore-campagna-weaviate in #7962
- add dedicated handler for each state in the state machine by @reyreaud-l in #7963
- Add error tracking and state change history to replica movement by @reyreaud-l in #7973
- Shard replica movement finalizing via async replication by @nathanwilk7 in #7911
- refactor async replication retry by @reyreaud-l in #7992
- Introduce user-facing UUID for each replication operation by @tsmith023 in #7980
- Shard Replication FSM Use Shard ID for opsByShard by @nathanwilk7 in #8010
- Introduce a
transferType
parameter to distinguish copy or move replication ops by @salvatore-campagna-weaviate in #7968 - Introduce replication cancellation and deletion by @tsmith023 in #7995
- simple tenant movement test by @nathanwilk7 in #8002
- Make the replica movement finalizing/async replication upper bound configurable by @nathanwilk7 in #8009
- Replication engine max workers env var by @nathanwilk7 in #8059
- add get ops with parameters and get sharding state api ops by @reyreaud-l in #8041
- Replica Movement: Reset Async Replication After Finalization by @reyreaud-l in #8069
- Always use a 5s polling interval for the fsm op producer by @nathanwilk7 in #8065
- chore: concurrent file downloading during copy replica by @jeroiraz in #8088
- Add
DELETE /replications/replicate
endpoint by @tsmith023 in #8071 - Delete relevant replication operations when classes/shards are deleted by @tsmith023 in #8066
- test: multi-tenancy shard movement acceptance tests by @jeroiraz in #8078
- fix replication to forbid multiple move op on the same source by @reyreaud-l in #8139
- Shard replication additional host writes by @nathanwilk7 in #8079
- Replica movement test router with fsm 2 by @reyreaud-l in #8145
- Align State enum in API to FSM state in repo by @tsmith023 in #8151
- Ensure async replication running locally during dehydrating by @nathanwilk7 in #8162
- Use GetShard for incoming replica ops to avoid recreating a shard that has been deleted on the source node by @nathanwilk7 in #8158
- Improve logs for async replication during replica move ops by @nathanwilk7 in #8161
- fix: copy docidcounter and prop tracker files by @jeroiraz in #8168
- refactor engine shutdown, cancel if error max reached by @reyreaud-l in #8180
- Query the leader in the consumer semaphore to avoid racey state change by @tsmith023 in #8173
- Introduce
schemaManager.SyncShard
to clean-up ghost shards by @tsmith023 in #8140 - Ensure we log errors when there is a state transition handler error by @nathanwilk7 in #8189
- async replication exit early on failure by @reyreaud-l in #8185
- Move unlocked FSM map reads in replication manager into locked methods in FSM manager by @tsmith023 in #8191
- Add hydrating wait group to test by @tsmith023 in #8198
- Make adding/removing replica to shards idempotent to not fail on retries by @reyreaud-l in #8196
- Add new
replications
domain to RBAC functionality by @tsmith023 in #8117 - add consumer ops gateway by @reyreaud-l in #8190
- Use FINALIZING -> READY state transition to test deduplicated processes by @tsmith023 in #8201
- Add debug handler to remove async replication target overrides by @nathanwilk7 in #8199
- Rename
replications
toreplicate
in RBAC nomenclature, fix EC in tests by @tsmith023 in #8205 - Rplc mvmnt/attempt more duplication flake fixes by @tsmith023 in #8206
- Rplc mvmnt/improve engine shutdown by @tsmith023 in #8197
- Track replication factor and number of replicas per shard by @salvatore-campagna-weaviate in #8152
- make sure we delete from onFlight only if we acquired it in that worker to avoid side effect by @reyreaud-l in #8219
- add prepare directory for file copy step by @reyreaud-l in #8195
- trim load shard end of hydrating by @reyreaud-l in #8222
- Repl mvt/dont revert async repl target after deleted by @nathanwilk7 in #8224
- Fix unit tests by @nathanwilk7 in #8226
- dont overwrite err before returning by @nathanwilk7 in #8240
- Add missing
RemoveAsyncReplicationTargetNode
toprocessDehydratingOp
by @tsmith023 in #8249 - no need to check until we've reached the time we need by @nathanwilk7 in #8247
- returns a 404 on shard not found to avoid internal cluster api retries by @reyreaud-l in #8254
- Include error details for local index not found by @nathanwilk7 in #8264
- fix: indexcounter and proplen absolute path by @jeroiraz in #8255
- Rplc mvmnt/fix sync shard belongs to nodes by @tsmith023 in #8286
- Add a shard filter to the node/class status interal/HTTP endpoints by @nathanwilk7 in #8282
- Ensure
c.cancelReplicationEngine != nil
before calling it by @tsmith023 in #8288 - increase dehydrating time bound, increase workers by default by @reyreaud-l in #8296
- reload target node override config before hashbeat by @reyreaud-l in #8271
- Ensure there is no async replication on any shard in each test by @tsmith023 in #8253
- chore: fsync entire data path by @jeroiraz in #8295
- Use chan to communicate raft FSM snapshot restore to replication engine by @tsmith023 in #8218
- Forbid cancel/delete of ops when they should be uncancellable by @tsmith023 in #8227
- update replica movement acceptance test by @reyreaud-l in #8297
- increase async repl time bound to reduce occurence of missed writes by @reyreaud-l in #8309
- Revert all changes adding replication engine restart on snapshot restore by @tsmith023 in #8312
- Introduce
/replication/replicate/force-delete
by @tsmith023 in #8277 - Disable replica movement by default by @reyreaud-l in #8315
- Add objects to shards in the hope of avoiding flakes by @tsmith023 in #8319
BlockMax WAND Migration
- feature (blockmax migrator): configurable collections/properties/tenants to migrate by @aliszka in #7750
- fix (blockmax migrator): move started marking to sync phase by @aliszka in #7773
- Blockmax migrator rollback select by @aliszka in #7780
- feat(bm25_block): Trigger reindexing using REST call by triggering shard reinit by @amourao in #7766
- feature (blockmax migrator): reload shards after reindex phase by @aliszka in #7767
Modules
- Introduce voyage-3-large model by @crewone in #7641
- Add tex2vec-model2vec module by @antas-marcin in #7757
- chore: update text2vec-huggingface module to support newest API by @antas-marcin in #8095
- chore: update text2vec-mistral module to support newest embeddings API by @antas-marcin in #8096
- Lower max tokens per request limit for openai text2vec by @dirkkul in #8085
- Add support for reranking Cohere V3.5 model by @antas-marcin in #8100
- chore: remove model name validation in text2vec-voyageai module by @antas-marcin in #8101
- chore: remove model name validation in text2vec-cohere module by @antas-marcin in #8102
- Introducing VoyageAI's v3.5 models by @antas-marcin in #8263
- remove model name validation in text2vec-weaviate module by @augustas1 in #8172
- fix: replace unsupported gemini-1.0-pro-vision model with gemini-2.0-flash-001 in Google e2e tests by @antas-marcin in #8294
Fixes
- hnsw: check nodes slice as part of tombstone cleanup by @trengrj in #7683
- queue: recover corrupt chunks by asdine in #7729
- Add missing checks if dynamic DB user management is enabled by @dirkkul in #7744
- Change user id when rotating key by @dirkkul in #7755
- Disable default dimensions in Azure by @antas-marcin in #7768
- RBAC: Handle downgrades by @dirkkul in #7719
- Add first 3 letters of api key to return values by @dirkkul in #7762
- adapters/handlers/rest: Fix nil pointer dereference in
/authz/roles
and/users/db
endpoints by @mohamedawnallah in #7779 - fix: erase empty wal files by @jeroiraz in #7793
- Rename key to
baseURL
in default class config map by @tsmith023 in #7783 - fix: async replication fetch all local digest in range by @jeroiraz in #7751
- fix(bm25_block): 🐛 Fix condition for filter matching no docs by @amourao in #7804
- Batch vectorization: Cache tokenizer by @dirkkul in #7818
- fix call errors.As with a nil value error err by @alingse in #7682
- DB Users: Fix concurrency issues by @dirkkul in #7851
- Remove symbols added during merge by @dirkkul in #7880
- fix: flush and lock when listing shard files by @jeroiraz in #7876
- RBAC: Fix upgrading from version without rbac snapshots to rbac snapshots by @dirkkul in #7891
- RBAC: Add upgrade path 1.29=>1.30 to RAFT snapshots by @dirkkul in #7886
- fix: guard raft snapshot distributedTasks restore by @moogacs in #7898
- RBAC: Add downgrade path 1.30=>1.29 to RAFT snapshots by @dirkkul in #7888
- fix: set distributed task scheduler in app state by @faustuzas in #7913
- DB Users: Fix updating first letters of api key when rotating a key by @dirkkul in #7914
- avoid nil logger in pv-pair by @etiennedi in #7940
- fix: crash-tolerant memtable flushing by @jeroiraz in #7938
- Add named
Vectors
toGroupHitAdditional
struct by @tsmith023 in #7933 - chore: Handle file handler cleanup on
newSegment
by @kavirajk in #7945 - chore: Handle file descriptors cleanup in few places by @kavirajk in #7943
- fix: 🐛 fix segment load order on recovery by @amourao in #7978
- Fix: pass missing memwatch from segment group -> segment by @etiennedi in #7994
- bugfix: AutoSchema and Asyncreplication configs fix in Dynamic configs by @kavirajk in #7993
- Remove workers for dynamic index by @trengrj in #7836
- dynamic index: unblock index during upgrade by @asdine in #8001
- bug(raft): fix lastAppliedIndex to be updated if there was no error by @moogacs in #8008
- fix(bm25_block): 🐛 fix reading BMW data with offset.end != 0 by @amourao in #8023
- fix(raft-snapshot): backward compatibility downgrade path for new snapshots structure by @moogacs in #8092
- fix(db-internal): phantom tenants as leftover on UpdateIndex from RAFT to COLD by @moogacs in #8019
- chore: include md5 header in S3 requests by @antas-marcin in #8150
- fix(raft): make sure store is open no matter the error status on catchup by @moogacs in #8163
- fix nil in results error by @donomii in #8179
- Fix flat filtered search with multivector by @robbespo00 in #8200
- fix(shard shutdown): make sure shard is shut down when it's marked for shutdown by @donomii in #8089
- Distinguish between
err != nil
andshard == nil
inindex.GetShard
error returns by @tsmith023 in #8223 - fix: flush write buffer before syncing by @jeroiraz in #8256
- Fix filtered search with multivec by @robbespo00 in #8238
- DB Users: Add support for RAFT snapshots to db users by @dirkkul in #8164
- Fix parsing the azure openai response by @dirkkul in #8272
- chore: pass dynUserManager to single node recovery by @moogacs in #8280
- refact(cluster server): gracefully shutdown internal REST server by @moogacs in #8257
- fix(memberlist): rejoin list on single node split brain by @andrewisplinghoff and @moogacs in #8246
Performance Improvements
- chore: refactor vector index and queue access to be thread-safe by @faustuzas in #7606
- Optimized stand-alone k-means clustering by @tobias-weaviate in #7556
- feat(bm25_block): Allow setting an higher segment inspection limit by env var by @amourao in #7813
- feat(bm25_block): Skip search if allowList is empty by @amourao in #7814
- Handle concurrency in the cache by @dirkkul in #7828
- Optimize Segmentindex header parsing by @dirkkul in #7837
- Optimize commitlogger writes by @dirkkul in #7841
- Optimize reading of bloom filter by @dirkkul in #7839
- Optimize creating bytes out of Mappair by @dirkkul in #7844
- Feature: rangeable index in memory by @aliszka in #7801
- Feature: buf pool for rangeable segment-in-memory by @aliszka in #7817
- feat: add distributed tasks management by @faustuzas in #7878
- feat: include distributed tasks to raft snapshots by @faustuzas in #7887
- fix: sort string in /tasks to make it more deterministic by @faustuzas in #7919
- feat: introduce optimized mmap pkg by @faustuzas in #7929
- chore: optimize reading headers to precompute compaction by @dirkkul in #7934
- Migrate more mmap uses to optimzied package by @dirkkul in #7946
- chore: convert shardsStatus lock to RWMutex to be used in GetStatus() by @moogacs in #7956
- chore: create new fsm inside recoverSingleNode to avoid panics with new fields by @moogacs in #7954
- refact:(shutdown) improve db and shard dropping shutdown performance by @moogacs in #7571
- Read small segments to disk 2 by @dirkkul in #7964
- Fix: change default full read mmap and add mem watcher by @amourao in #7972
- fix: flush buf writer when not including checksum by @jeroiraz in #8016
- feat(bm25_block): ✨ use better average prop length for max impact by @amourao in #8018
- Reduce amount of writes when creating segments by @dirkkul in #7971
- Simplify writing indices if no secondary indices are present by @dirkkul in #8045
- chore: inactivity timeout to ensure maintenance tasks are resumed by @jeroiraz in #8021
- Change default min mmap size to 8kb by @dirkkul in #8067
- Optimize writing bloom filters + net additions by @dirkkul in #8051
- feat(bm25_block): ✨ use per prop length from segments by @amourao in #8038
- refact: convert TenantResponse to models.Tenant only by @moogacs in #8098
- Remove introduction of
getNoInitLocalShard
by @tsmith023 in #8128 - Clamping negative distances to zero by @abdelr in #8133
- optimization: wal reuse upon restart by @dirkkul in #8126
- refact(raft-config): introduce raft timeouts multipler and adjust Query() and Apply() retries by @moogacs in #8194
- Fix overzealous cyclemanager with wal reuse by @etiennedi in #8235
- Configure more buckets for wal reuse by @dirkkul in #8231
Observability Improvements
- chore: make RAFT TrailingLogs configurable by @moogacs in #7791
- DB Users: add last used time by @dirkkul in #7786
- Adding Kubernetes Grafana dashboard by @Dabz in #7790
- chore: add metric fo db internal shard status by @moogacs in #7850
- Adds metrics for OpenAI operations by @donomii in #7843
- bugfix: Handle shards count metrics correctly for
StartUnloadingShard
by @kavirajk in #7901 - feat: add an HTTP endpoint to list active distributed tasks by @faustuzas in #7902
- chore: add metric for auto tenant operations by @moogacs in #7855
- Fix vector index tombstones metric on restart by @trengrj in #7909
- chore: add shard shutdown as valid status in the db layer for metric purposes by @moogacs in #7969
- chore: set
NoLegacyTelemetry
flag onraft
config. by @kavirajk in #8004 - Metrics for every write by @etiennedi in #8022
- Better metrics for Mmap usage by @etiennedi in #8104
- chore: Add raft FSM index metrics to cluster store by @kavirajk in #8007
- chore(raft): pass metric for single node recovery by @moogacs in #8132
- chore: Add metric to track last applied index on startup. by @kavirajk in #8120
- More fine-grained tenant analysis + runtime config to skip revectorize check by @etiennedi in #8302
Testing Improvements
- fix index testing on replica movement poc branch by @reyreaud-l in #7590
- Add tests for POST, PUT, DELETE ops involving references by @tsmith023 in #7742
- skip flaky test_ref_with_multiple_cycle test by @faustuzas in #7748
- fix: cherry-pick a commit to fix multi-vector validation test by @faustuzas in #7759
- chore: add test for adding multi-vector index to an existing legacy collection by @faustuzas in #7758
- Add tests for batching with auto-tenancy by @dirkkul in #7798
- chore(tests): add possibility to pass a prebuilt MockOIDC image to tests by @antas-marcin in #7824
- Feature: rangeable segment-in-memory tests by @aliszka in #7838
- test: add Snapshot test to verify FSM snapshots e.g. the RBAC was restored correctly from a snapshot by @moogacs in #7803
- chore: TestRBACSnapshotRecovery sort files content before md5sum checks by @moogacs in #7861
- Improve backup e2e tests by @donomii in #7785
- chore(ci): move less relevant module tests to be run only during release by @antas-marcin in #7870
- Adds additional OpenAI tests and bug fixes by @donomii in #7869
- chore(ci): reduce retries on e2e tests from 3 to 2 by @antas-marcin in #7884
- chore(tests): stabilize flaky ColBERT e2e test by @antas-marcin in #7866
- chore(ci): split integration tests into two pipelines by @antas-marcin in #7873
- chore(tests): replace gemini-1.0 model with gemini-1.5 in generative-google e2e test by @antas-marcin in #7924
- simplify consume resuming test by @reyreaud-l in #7991
- Make mock call
.Maybe
rather than relying on timings by @tsmith023 in #8036 - fix: Google module e2e tests by @antas-marcin in #8062
- Make improvements to attempt flake fixes by @tsmith023 in #8070
- chore: fix backup tests pass the correct response by @moogacs in #8121
- chore(test): stabilize backup tests by assigning unique bucket names by @moogacs in #6999
- chore(backup-gcs): handle overrides in test by @moogacs in #8131
- Improve waiting for node shutdown in test by @dirkkul in #8146
- Fix flaky test condense loop with alloc checker by @trengrj in #8156
- chore(test): add acceptance test to make sure DB open after faulty schema update followed by restart by @moogacs in #8167
- chore(gcs-test): use STORAGE_EMULATOR_HOST for gcs backup test clients by @moogacs in #8175
- Fix flaky test due to un-normalized vectors by @trengrj in #8248
- Removing knnSearchByVector from tests by @abdelr in #8250
- chore(runtimeconfig): Add test to lock lower_snake_case in config by @kavirajk in #8275
- chore(test) memberlist single node split brain test in case of network interrupt by @moogacs in #8262
- chore: Fix panic in test when introducing new runtime config by @kavirajk in #8316
- chore: skip temporarily TestNetworkIsolationSplitBrain test by @antas-marcin in #8321
Chores and Docs
- replace the go routine calls with error groups to make the linter happy by @nathanwilk7 in #7609
- Replace -arm64/-amd64 by .arm64/.amd64 in docker tags by @jfrancoa in #7638
- refact: correct Error msg of vectorFromParams()&validateNearParams() by @cryo-zd in #7615
- chore: refactor models helper methods by @faustuzas in #7737
- chore: use docker image to generate docs instead of local binary by @faustuzas in #7802
- chore(linter): migrate to golangci-lint v2 by @antas-marcin in #7796
- Use snake case to name mock files by @salvatore-campagna-weaviate in #7835
- Fix typo in DBUser by @mpartipilo in #7856
- chore: configure automated mock creation by @faustuzas in #7857
- chore(tests): adjust go.mod in acceptance with go client project by @antas-marcin in #7889
- chore: refactor runtime overrides API by @kavirajk in #7826
- chore: remove index shutdown dependency on queue on DB shutdown by @moogacs in #7967
- improve error verbosity when not finding replication op by @reyreaud-l in #7990
- Detect presence of not found in error string returned by raft query by @tsmith023 in #8013
- Improve names in bloom filter by @dirkkul in #8080
- docs: Update replication API docs by @g-despot in #8087
- Remove windows release builds by @dirkkul in #8123
- Lower log level for vectorizer batching by @dirkkul in #8159
- Change recovery from wal log level to debug by @dirkkul in #8214
- chore(NodeAddress): update log line to have latest members count on success by @moogacs in #8307
- Turn wal recovery log msg to debug by @etiennedi in #8313
Security Updates
- Update pull_requests.yaml to reconfigure Orca SAST scanning by @spiros-spiros in #7745
New Contributors
- @cryo-zd made their first contribution in #7615
- @crewone made their first contribution in #7641
- @mohamedawnallah made their first contribution in #7779
- @spiros-spiros made their first contribution in #7745
- @alingse made their first contribution in #7682
- @Dabz made their first contribution in #7790
- @mpartipilo made their first contribution in #7856
- @g-despot made their first contribution in #8087
Full Changelog: v1.30.0...v1.31.0