-
KAUST
- Saudi Arabia
-
19:42
(UTC +03:00) - https://xiaoqian-shen.github.io
- @xiaoqian_shen
- in/xiaoqian-shen-759991264
Highlights
- Pro
Lists (22)
Sort Name ascending (A-Z)
Stars
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
ai agents for trading
[ICML 2025] Official PyTorch implementation of LongVU
⚖️ The First Coding Agent-as-a-Judge
DeepSeek-VL: Towards Real-World Vision-Language Understanding
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
OpenEQA Embodied Question Answering in the Era of Foundation Models
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
Emu Series: Generative Multimodal Models from BAAI
Paint by Example: Exemplar-based Image Editing with Diffusion Models
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
[CVPR 2025] Official PyTorch implementation of StoryGPT-V
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
[ICLR 2024] Code for FreeNoise based on VideoCrafter
[SIGGRAPH Asia 2023] An interactive story visualization tool that support multiple characters