8000 GitHub - cbowdon/sparse-recs: Little example of mini rec system
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cbowdon/sparse-recs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparse-recs

Here's an example of a content recommendation system with these properties:

  1. Works on sparse recommendation data.
  2. Incorporates multiple item features and social graph information.
  3. Incorporates recency bias to phase out older recommendations.
  4. Works with low data and a cold start (for new users with graph) but scales reasonably.
  5. Is explainable - can identify which features contributed most to a recommendation.
  6. Is SIMPLE.

Concept

Basically a very tiny version of LightFM. I am hesitant to recommend LightFM directly because although it is good, it is also likely a dead project (>2 years with no updates, no Python 3.12+ compatibility). But the paper is here and pretty easy going, and the general idea is easy to implement.

Files:

  • model.py illustrates the idea, which is basically to sum user and item feature embeddings, and take the similarity (dot product) of these.
  • model_with_sbert.py adds a simple SBERT embedding for text features.

(Beware that these are illustrative, I haven't trained and evaluated these models.)

Each model would need to sit alongside a database that can do vector searches. I suggest collecting e.g. the top 100 most similar titles, plus the highest-rated/most popular items from friends in the same category, basically a broad net of stuff that might be interesting, which the model can rank.

The ranked items from the model could then be input to an LLM, which would be able to re-rank or remove duff items.

Going even simpler - optimising for effort vs performance

The above is pretty simple, but you still have to train it etc., which might be time better spent elsewhere. You could go even simpler:

  • Get an LLM (high-dimensional) embedding of each item, something like: "TITLE: ...; CREATOR: ...; CATEGORY: ...".
  • Do vector search on these, boosting items that friends liked recently.
  • Get an LLM to re-rank the results, picking say 3-5 best items.

That's probably an afternoon's work. No training, no model to maintain. Just a bit of text processing and vector search. Almost certainly not as good as the more involved approach, but maybe good enough? It depends what you need most at this stage, a quick implementation or better quality.

(You could go even smaller, i.e. Chi Square plus Cramer's V, but having done some research I think that's too small and doesn't have enough room to grow.)

About

Little example of mini rec system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0