Quantifying the Gap between Understanding and Generation within Unified Multimodal Models
Published in CVPR 2026 Findings, 2026
We extend MLLM-as-a-Judge across multiple modalities, present TaskAnything and JudgeAnything benchmarks that reveal MLLM-as-a-Judge excel at judging MMU but struggle with MMG tasks.
Recommended citation: @misc{wang2026quantifyinggapunderstandinggeneration, title={Quantifying the Gap between Understanding and Generation within Unified Multimodal Models}, author={Chenlong Wang and Yuhang Chen and Zhihan Hu and Dongping Chen and Wenhu Chen and Sarah Wiegreffe and Tianyi Zhou}, year={2026}, eprint={2602.02140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.02140}, }
Download Paper | Download Slides | Download Bibtex
