Answer.AI

company

https://www.answer.ai

AnswerDotAI

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

bwarner new activity 9 days ago

answerdotai/ModernBERT-base:Inference fails on CPU: `ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)`

bwarner new activity 9 days ago

answerdotai/ModernBERT-base:ValueError: The checkpoint you are trying to load has model type `modernbert`

bwarner new activity 9 days ago

answerdotai/ModernBERT-base:Set tokenizer "model_max_length" property to 8192

View all activity

Articles

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

• 501

answerdotai's activity

tomaarsen

posted an update about 23 hours ago

Post

633

I just released Sentence Transformers v3.4.0, featuring a memory leak fix, compatibility between the powerful Cached... losses and the Matryoshka loss modifier, and a bunch of fixes & small features.

🪆 Matryoshka & Cached loss compatibility
It is now possible to combine the powerful Cached... losses (which use in-batch negatives & a caching mechanism to allow for endless batch size & negatives) with the Matryoshka loss modifier which modifies a base loss such that it is trained not only on the maximum dimensionality (e.g. 1024 dimensions), but also on many lower dimensions (e.g. 768, 512, 256, 128, 64, 32).
After training, these models' embeddings can be truncated for faster retrieval, etc.

🎞️ Resolve memory leak when Model and Trainer are reinitialized
Due to a circular dependency between Trainer -> Model -> ModelCardData -> Trainer, deleting both the trainer & model still didn't free up the memory.
This led to a memory leak in scripts where you repeatedly do so.

➕ New Features
Many new small features, e.g. multi-GPU support for 'mine_hard_negatives', a 'margin' parameter to TripletEvaluator, and Matthews Correlation Coefficient in the BinaryClassificationEvaluator.

🐛 Bug Fixes
Also a bunch of fixes, for example that subsequent batches were not sorted when using the "no_duplicates" batch sampler. See the release notes for more details.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.4.0

Big thanks to all community members who assisted in this release. 10 folks with their first contribution this time around!

bwarner

in answerdotai/ModernBERT-base 9 days ago

Inference fails on CPU: `ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)`

#10 opened about 1 month ago by

umarbutler

ValueError: The checkpoint you are trying to load has model type `modernbert`

#37 opened 23 days ago by

Sengil

Set tokenizer "model_max_length" property to 8192

#39 opened 22 days ago by

NohTow

bwarner

in answerdotai/ModernBERT-large 9 days ago

Set tokenizer "model_max_length" property to 8192

#9 opened 22 days ago by

NohTow

Mention that users should use transformers v4.48.0

#12 opened 11 days ago by

tomaarsen

bwarner

in answerdotai/ModernBERT-base 9 days ago

Mention that users should use transformers v4.48.0

#50 opened 11 days ago by

tomaarsen

posted an update 9 days ago

Post

4298

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
📜 my training scripts, using the Sentence Transformers library
📊 my Weights & Biases reports with losses & metrics
📕 my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1

1 reply

tomaarsen

posted an update 24 days ago

Post

2926

That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details:
🤖 Based on ModernBERT-base with 149M parameters.
📊 Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
🏎️ Immediate FA2 and unpacking support for super efficient inference.
🪆 Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
➡️ Maximum sequence length of 8192 tokens!
2️⃣ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
➕ Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
🏛️ Apache 2.0 licensed: fully commercially permissible

Try it out here: nomic-ai/modernbert-embed-base

Very nice work by Zach Nussbaum and colleagues at Nomic AI.

ncoop57

authored 2 papers about 1 month ago

Stable Code Technical Report

Paper • 2404.01226 • Published Apr 1, 2024 • 1

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

rbiswasfc

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

fladhak

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

griffin

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

jph00

authored 2 papers about 1 month ago

The Matrix Calculus You Need For Deep Learning

Paper • 1802.01528 • Published Feb 5, 2018

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

bwarner

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

tomaarsen

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

bclavie

authored a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

freddyaboulton

posted an update about 1 month ago

Post

1467

Just created a Gradio space for playing with the new OAI realtime voice API!

freddyaboulton/openai-realtime-voice

AI & ML interests

Recent Activity

Articles

Finally, a Replacement for BERT: Introducing ModernBERT

Team members 19

answerdotai's activity

Inference fails on CPU: `ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)`

ValueError: The checkpoint you are trying to load has model type `modernbert`

Set tokenizer "model_max_length" property to 8192

Set tokenizer "model_max_length" property to 8192

Mention that users should use transformers v4.48.0

Mention that users should use transformers v4.48.0