AlienKevin

Kevin Xiang Li AlienKevin

Researching CV and agents at the Stanford Vision and Language Lab

65 followers · 9 following

Achievements

Organizations

Lists (1)

Sort

PL

1 repository

Stars

wusize / Harmon

Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Python 98 1 Updated Apr 12, 2025

baaivision / EVE

EVE Series: Encoder-Free Vision-Language Models from BAAI

Python 326 8 Updated Mar 1, 2025

ssundaram21 / dreamsim

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)

Python 484 28 Updated Mar 28, 2025

LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Python 914 39 Updated Sep 27, 2024

AoiDragon / POPE

[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Python 83 11 Updated Mar 25, 2024

openai / guided-diffusion

Python 6,780 858 Updated Jul 2, 2024

mlsw / partial-embedding-matrix-adaptation

Vocabulary-level memory efficiency for language model fine-tuning.

Python 9 Updated Mar 24, 2025

SilentView / GigaTok

Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"

Python 145 1 Updated Apr 22, 2025

unslothai / unsloth-zoo

Utils for Unsloth

Python 85 93 Updated May 14, 2025

yinboc / dito

Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"

Python 113 4 Updated Jan 31, 2025

FoundationVision / UniTok

A Unified Tokenizer for Visual Generation and Understanding

Python 287 5 Updated May 6, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,555 217 Updated May 8, 2025

zh460045050 / V2L-Tokenizer

Python 132 9 Updated Jun 21, 2024

haoliuhl / language-quantized-autoencoders

Language Quantized AutoEncoders

Python 105 5 Updated Feb 7, 2023

End2End-Diffusion / REPA-E

Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Python 196 5 Updated Apr 16, 2025

sihyun-yu / REPA

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,037 45 Updated Mar 16, 2025

facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,792 1,271 Updated Jul 23, 2024

facebookresearch / webssl

Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).

Python 119 8 Updated Apr 29, 2025

anishathalye / imagenet-simple-labels

Simpler human-readable labels for ImageNet 🏷

132 52 Updated Feb 1, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,340 540 Updated May 13, 2025

elehman16 / gpt4_bias

Jupyter Notebook 11 5 Updated Aug 6, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,748 77 Updated Aug 15, 2024

ltgoslo / bert-in-context

Official implementation of "BERTs are Generative In-Context Learners"

Python 27 Updated Mar 14, 2025

huggingface / smollm

Everything about the SmolLM2 and SmolVLM family of models

Python 2,325 132 Updated Mar 31, 2025

apple / dmel-demo

dMel: Speech Tokenization Made Simple

HTML 11 1 Updated May 13, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,747 1,483 Updated Apr 24, 2025

linzhiqiu / t2v_metrics

Evaluating text-to-image/video/3D models with VQAScore

Python 296 21 Updated May 5, 2025

lxa9867 / ImageFolder

High-performance Image Tokenizers for VAR and AR

Python 258 5 Updated Apr 25, 2025

qihao067 / CrossFlow

[CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution'

Python 167 2 Updated Mar 14, 2025

PanasonicConnect / VideoMultiAgents

VideoMultiAgents: A Multi-Agent Framework for Video Question Answering

Python 6 1 Updated May 7, 2025