8000 GitHub - csuhan/Tar: Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ Tar Public

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

License

Notifications You must be signed in to change notification settings

csuhan/Tar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang

Project Lead   Corresponding Authors

Project Page Tar Paper on arXiv Huggingface Model Huggingface Space Huggingface Space

News

  • J 6083 une 2025. We are applying for code open-sourcing and will release the code and model immediately once approved.

Citation

@article{han2025tar,
  title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
  author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
  journal={arXiv preprint arXiv:2506.18898},
  year={2025},
}

License

This project is licensed under the Apache 2.0 License.

About

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0