8000 AlienKevin (Kevin Xiang Li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View AlienKevin's full-sized avatar

Organizations

@AIwaffle

Block or report AlienKevin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Python 98 1 Updated Apr 12, 2025

EVE Series: Encoder-Free Vision-Language Models from BAAI

Python 326 8 Updated Mar 1, 2025

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)

Python 484 28 Updated Mar 28, 2025

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Python 914 39 Updated Sep 27, 2024

[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Python 83 11 Updated Mar 25, 2024

Vocabulary-level memory efficiency for language model fine-tuning.

Python 9 Updated Mar 24, 2025

Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"

Python 145 1 Updated Apr 22, 2025

Utils for Unsloth

Python 85 93 Updated May 14, 2025

Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"

Python 113 4 Updated Jan 31, 2025

A Unified Tokenizer for Visual Generation and Understanding

Python 287 5 Updated May 6, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,555 217 Updated May 8, 2025
Python 132 9 Updated Jun 21, 2024

Language Quantized AutoEncoders

Python 105 5 Updated Feb 7, 2023

Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Python 196 5 Updated Apr 16, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,037 45 Updated Mar 16, 2025

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,792 1,271 Updated Jul 23, 2024

Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).

Python 119 8 Updated Apr 29, 2025

Simpler human-readable labels for ImageNet 🏷

132 52 Updated Feb 1, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,340 540 Updated May 13, 2025
Jupyter Notebook 11 5 Updated Aug 6, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,748 77 Updated Aug 15, 2024

Official implementation of "BERTs are Generative In-Context Learners"

Python 27 Updated Mar 14, 2025

Everything about the SmolLM2 and SmolVLM family of models

Python 2,325 132 Updated Mar 31, 2025

dMel: Speech Tokenization Made Simple

HTML 11 1 Updated May 13, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,747 1,483 Updated Apr 24, 2025

Evaluating text-to-image/video/3D models with VQAScore

Python 296 21 Updated May 5, 2025

High-performance Image Tokenizers for VAR and AR

Python 258 5 Updated Apr 25, 2025

[CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution'

Python 167 2 Updated Mar 14, 2025

VideoMultiAgents: A Multi-Agent Framework for Video Question Answering

Python 6 1 Updated May 7, 2025
Next
0