8000 Pringled (Thomas van Dongen) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Pringled's full-sized avatar

Organizations

@MinishLab

Block or report Pringled

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pringled/README.md

Hi there 👋

I'm Thomas van Dongen. I am currently working as head of AI engineering at Springer Nature. I am one of the founding members of The Minish Lab where we develop open-source machine learning packages.

My research interests include:

  • 🚤 Small, fast models: Making CPU-friendly models that can run on any device.
  • 🧩 Word Embeddings: Focusing on static embeddings to balance performance and resource usage.
  • Efficient Nearest Neighbors: Optimizing ANN/KNN methods for scalable semantic search.
  • 🔍 Recommenders: Developing smarter systems to improve recommendations and information retrieval, focussed on the scientific publishing space.

I'm currently working on:

  • model2vec: a library for creating state-of-the-art static embedding models by distilling sentence transformers.
  • vicinity: a library for fast and lightweight nearest neighbor search, with flexible indexing backends.
  • semhash: a library for lightweight text deduplication, outlier detection, and representative filtering.
  • tokenlearn: a library for pre-training static embedding models.
  • model2vec-rs: a Rust port of Model2Vec.

Info:

Pinned Loading

  1. MinishLab/model2vec MinishLab/model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.7k 86

  2. MinishLab/semhash MinishLab/semhash Public

    Fast Semantic Text Deduplication & Filtering

    Python 670 34

  3. MinishLab/vicinity MinishLab/vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 277 8

  4. MinishLab/tokenlearn MinishLab/tokenlearn Public

    Pre-train Static Word Embeddings

    Python 61 5

  5. MinishLab/model2vec-rs MinishLab/model2vec-rs Public

    Official Rust Implementation of Model2Vec

    Rust 100 3

0