lechmazur

lechmazur

CEO, Advameg, Inc.

63 followers · 0 following

Achievements

Stars

lechmazur / nyt-connections

Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words

Python 101 5 Updated Jun 11, 2025

michaelgiba / survivor

HTML 1 Updated Apr 4, 2025

lechmazur / pgg_bench

Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing econ…

36 2 Updated Apr 10, 2025

lechmazur / elimination_game

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other

277 9 Updated Jun 10, 2025

lechmazur / goods

LLM public goods game

8 Updated Feb 22, 2025

lechmazur / step_game

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a…

54 2 Updated Jun 6, 2025

lechmazur / generalization

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which ite…

59 2 Updated Jun 11, 2025

lechmazur / writing

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

Batchfile 240 6 Updated Jun 11, 2025

lechmazur / divergent

LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each other or to 50 initial random words.

31 1 Updated Mar 20, 2025

lechmazur / deception

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation …

27 2 Updated Mar 20, 2025

lechmazur / confabulations

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

HTML 175 5 Updated Jun 11, 2025

lechmazur / ChessCounter

Estimate the number of legal chess positions

C++ 12 1 Updated Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lechmazur

Achievements

Achievements

Block or report lechmazur

Stars

lechmazur / nyt-connections

michaelgiba / survivor

lechmazur / pgg_bench

lechmazur / elimination_game

lechmazur / goods

lechmazur / step_game

lechmazur / generalization

lechmazur / writing

lechmazur / divergent

lechmazur / deception

lechmazur / confabulations

lechmazur / ChessCounter