ViStoryBench introduces a comprehensive and diverse benchmark for story visualization, enabling thorough evaluation of models across narrative complexity, character consistency, and visual style.
vistorybench-demo.mp4
- (Refining..) Release final code and complete project.
- [2025/06/02] Upload paper (arxiv).
- [2025/05/21] Init project and release code (semi-finished).
git clone --recursive
cd ViStoryBench
conda create -n storyvisbmk python=3.11
conda activate storyvisbmk
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt
80 stories and 344 characters, both Chinese and English,
Each story included Plot Correspondence, Setting Description, Shot Perspective Design, On-Stage Characters and Static Shot Description
Each character included at least one inference image and corresponding prompt description.
π₯ Download our ViStory Datasets (π€huggingface) and put it into your local path
</path/to/your/dataset>
# example
./data/dataset/ViStory/
Use our standardized loading script
dataset_load.py
or your own data loader
pyhton vistorybench/data_process/dataset_process/dataset_load.py
Based on dataset_load.py
, convert ViStory/ViStory-lite dataset to your method's required input format (converted dataset will be saved to /data/dataset_processed/your_method_name/
) and modify the inference vistorybench for story visualization (suggest generated results save to /data/outputs/your_method_name/
).
Example of UNO:
python vistorybench/data_process/dataset_process/adapt2uno.py \
--language 'en' # choice=['en','ch']
Other Methods:
adapt2seedstory.py
,adapt2storyadapter.py
,adapt2storydiffusion.py
,adapt2storygen.py
,adapt2vlogger.py
,adapt2animdirector.py
Make sure your output results are organized according to the following folder structure:
data/outputs/
βββ method_name/
βββ dataset_name/
βββ story_id/
βββ timestamp/
βββ shot_XX.jpg
βββ ...
method_name
: The model used (e.g., StoryDiffusion, UNO, GPT4o, etc.)dataset_name
: The dataset used (e.g., ViStory_en)story_id
: The story identifier (e.g., 01, 02, etc.)timestamp
: Generation run timestamp (YYYYMMDD-HHMMSS)shot_XX.jpg
: Generated image for the shot
When you run the evaluation code, it will automatically perform data reading (ensure both the ViStoryBench dataset and the generated results conform to the standard directory structure specified above). The generated-results reading code has been uniformly integrated into the following file:
vistorybench/data_process/outputs_read/read_outputs.py
Example of UNO:
cd vistorybench/bench
sh bench_run.sh 'uno' # Run it for data integrity check
sh bench_run.sh 'uno' --all # Run it for all evaluation
sh bench_run.sh 'uno' --cref # Run it for content consistency eval
sh bench_run.sh 'uno' --cref --csd_cross --csd_self # Run it for both content and style consistency eval
sh bench_run.sh 'uno' --save_format # Run it to standardize the generated-results file structure.
--cref
--csd_cross
--csd_self
--aesthetic
--prompt_align2
--diversity
STORY_IMG = ['uno', 'seedstory', 'storygen', 'storydiffusion', 'storyadapter', 'theatergen']
STORY_VIDEO = ['movieagent', 'animdirector', 'vlogger', 'mmstoryagent']
CLOSED_SOURCE = ['gemini', 'gpt4o']
BUSINESS = ['moki', 'morphic_studio', 'bairimeng_ai', 'shenbimaliang', 'xunfeihuiying', 'doubao']
@article{zhuang2025vistorybench,
title={ViStoryBench: Comprehensive Benchmark Suite for Story Visualization},
author={Cailin Zhuang, Ailin Huang, Wei Cheng, Jingwei Wu, Yaoqi Hu, Jiaqi Liao, Zhewei Huang, Hongyuan Wang, Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Gang Yu, Chi Zhang},
journal={arXiv preprint arxiv:2505.24862},
year={2025}
}