Publications

You can also find my articles on my Google Scholar profile.

Conference Papers

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Published in CVPR 2026 Findings, 2026

We extend MLLM-as-a-Judge across multiple modalities, present TaskAnything and JudgeAnything benchmarks that reveal MLLM-as-a-Judge excel at judging MMU but struggle with MMG tasks.

Recommended citation: @misc{wang2026quantifyinggapunderstandinggeneration, title={Quantifying the Gap between Understanding and Generation within Unified Multimodal Models}, author={Chenlong Wang and Yuhang Chen and Zhihan Hu and Dongping Chen and Wenhu Chen and Sarah Wiegreffe and Tianyi Zhou}, year={2026}, eprint={2602.02140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.02140}, }
Download Paper | Download Slides | Download Bibtex

PAPER2WEB: LET’S MAKE YOUR PAPER ALIVE!

Published in In submission, 2025

We introduce Paper2Web, the first benchmark for assessing academic webpage generation, and PWAgent, an autonomous system designed to bridge the gap between static PDFs and interactive project sites. By leveraging an iterative refinement process and MCP-driven tools, PWAgent generates layout-aware, multimedia-rich homepages that prioritize both aesthetics and information density. Experimental results demonstrate that this agent-driven approach achieves superior performance over end-to-end LLM generation and existing web-conversion templates, offering a high-quality, low-cost solution for researchers.

Recommended citation: @misc{chen2025paper2webletsmakepaper, title={Paper2Web: Let's Make Your Paper Alive!}, author={Yuhang Chen and Tianpeng Lv and Siyi Zhang and Yixiang Yin and Yao Wan and Philip S. Yu and Dongping Chen}, year={2025}, eprint={2510.15842}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.15842}, }
Download Paper | Download Slides | Download Bibtex

Judge Anything: MLLM as a Judge Across Any Modality

Published in KDD 2025 Datasets and Benchmarks Track, 2025

We introduce GapEval, a symmetric evaluation framework that measures the bidirectional inference consistency of Unified Multimodal Models. The findings challenge the assumption of true model integration, demonstrating that understanding and generation capabilities are often unaligned. By revealing that knowledge within these models is disjointed and unsynchronized, the research provides a critical perspective on the limitations of current “unified” architectures and calls for deeper cognitive convergence in future AI development.

Recommended citation: @article{pu2025judge, title={Judge Anything: MLLM as a Judge Across Any Modality}, author={Pu, Shu and Wang, Yaochen and Chen, Dongping and Chen, Yuhang and Wang, Guohao and Qin, Qi and Zhang, Zhongyi and Zhang, Zhiyuan and Zhou, Zetong and Gong, Shuang and others}, journal={arXiv preprint arXiv:2503.17489}, year={2025} }
Download Paper | Download Slides | Download Bibtex