sparse-recs

Here's an example of a content recommendation system with these properties:

Works on sparse recommendation data.
Incorporates multiple item features and social graph information.
Incorporates recency bias to phase out older recommendations.
Works with low data and a cold start (for new users with graph) but scales reasonably.
Is explainable - can identify which features contributed most to a recommendation.
Is SIMPLE.

Concept

Basically a very tiny version of LightFM. I am hesitant to recommend LightFM directly because although it is good, it is also likely a dead project (>2 years with no updates, no Python 3.12+ compatibility). But the paper is here and pretty easy going, and the general idea is easy to implement.

Files:

model.py illustrates the idea, which is basically to sum user and item feature embeddings, and take the similarity (dot product) of these.
model_with_sbert.py adds a simple SBERT embedding for text features.

(Beware that these are illustrative, I haven't trained and evaluated these models.)

Each model would need to sit alongside a database that can do vector searches. I suggest collecting e.g. the top 100 most similar titles, plus the highest-rated/most popular items from friends in the same category, basically a broad net of stuff that might be interesting, which the model can rank.

The ranked items from the model could then be input to an LLM, which would be able to re-rank or remove duff items.

Going even simpler - optimising for effort vs performance

The above is pretty simple, but you still have to train it etc., which might be time better spent elsewhere. You could go even simpler:

Get an LLM (high-dimensional) embedding of each item, something like: "TITLE: ...; CREATOR: ...; CATEGORY: ...".
Do vector search on these, boosting items that friends liked recently.
Get an LLM to re-rank the results, picking say 3-5 best items.

That's probably an afternoon's work. No training, no model to maintain. Just a bit of text processing and vector search. Almost certainly not as good as the more involved approach, but maybe good enough? It depends what you need most at this stage, a quick implementation or better quality.

(You could go even smaller, i.e. Chi Square plus Cramer's V, but having done some research I think that's too small and doesn't have enough room to grow.)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
model.py		model.py
model_with_sbert.py		model_with_sbert.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sparse-recs

Concept

Going even simpler - optimising for effort vs performance

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cbowdon/sparse-recs

Folders and files

Latest commit

History

Repository files navigation

sparse-recs

Concept

Going even simpler - optimising for effort vs performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages