-
Stanford University
- https://kevinx.li/
- https://orcid.org/0009-0005-6860-039X
- in/kevinxli
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- ASL
- Agda
- Assembly
- AutoHotkey
- AutoIt
- Batchfile
- Bikeshed
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Coq
- Crystal
- Cuda
- Cython
- Dart
- Elixir
- Elm
- Erlang
- Gleam
- Go
- HTML
- Hack
- Handlebars
- Haskell
- Idris
- JSON
- Java
- JavaScript
- JetBrains MPS
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Less
- Lex
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- NSIS
- Nim
- Nix
- OCaml
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- Perl
- PowerShell
- Python
- R
- Racket
- Reason
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- SMT
- Scala
- Scheme
- ShaderLab
- Shell
- Smali
- Standard ML
- Swift
- TeX
- TypeScript
- V
- VHDL
- Vala
- Vim Script
- Visual Basic
- Vue
- WebAssembly
- Wren
- Zig
- 10000 sed
Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
EVE Series: Encoder-Free Vision-Language Models from BAAI
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
Vocabulary-level memory efficiency for language model fine-tuning.
Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"
A Unified Tokenizer for Visual Generation and Understanding
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Language Quantized AutoEncoders
Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
Simpler human-readable labels for ImageNet 🏷
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official implementation of "BERTs are Generative In-Context Learners"
Everything about the SmolLM2 and SmolVLM family of models
Minimal reproduction of DeepSeek R1-Zero
Evaluating text-to-image/video/3D models with VQAScore
High-performance Image Tokenizers for VAR and AR
[CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution'
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering