Peng "Richard" Xia (夏鹏)

I am a Ph.D. student at Department of Computer Science, UNC-Chapel Hill, advised by Prof. Huaxiu Yao. Before that, I was briefly enrolled (2023-2024) as a Ph.D. student at Monash University, advised by A/Prof. Zongyuan Ge. I got B. Eng degree in AI Experimental Class, School of Computer Science and Technology at Soochow University in 2023.

I am deeply intrigued by Multi-modal (e.g., vision, language) LLM and Agent with their applications to health and broad science. My recent research endeavors involve retrieval-based methods, aiming to play a fundamental role in next-generation vision-language models to improve their factuality, adaptability, and trustworthiness.

I am always open to collaboration. Feel free to drop me an e-mail. :-)

Email: richard.peng.xia AT gmail DOT com; pxia AT cs DOT unc DOT edu

 /   /   /   /   /   / 

profile photo

News

  • Jan.2025: Three papers were accepted by ICLR 2025 and MMIE was selected as an oral presentation.

  • Dec.2024: Invited talk at Cohere For AI, one paper was accepted by COLING 2025, two papers were accepted by AAAI 2025.

  • Sep.2024: One paper was accepted by NeurIPS 2024 and one paper was accepted by EMNLP 2024.

  • Jul.2024: One paper was accepted by ECCV 2024.

  • Jun.2024: Two papers were accepted by MICCAI 2024 and one was early accepted.

  • Sep.2023: One paper was accepted by NeurIPS 2023.

  • Aug.2022: Share paper list about multi-modal learning in medical imaging.

Selected Publications (Full Publications)

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu*, Peng Xia*, Yun Li, Hongtu Zhu, Sheng Wang, Huaxiu Yao
arXiv preprint, 2024. [Paper] [Code]

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Hanran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, Huaxiu Yao
International Conference on Learning Representations (ICLR), 2025. | AFM and SafeGenAI Workshop at NeurIPS, 2024 [Paper] [Code] [Marktechpost Video] [Marktechpost News]
MMIE: Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-Language Models.
Peng Xia*, Siwei Han*, Shi Qiu*, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, Huaxiu Yao
International Conference on Learning Representations (ICLR), 2025. (Oral Presentation) | AFM Workshop at NeurIPS, 2024 [Paper] [Code] [Project Page]
AnyPrefer: An Automatic Framework for Preference Data Synthesis.
Yiyang Zhou*, Zhaoyang Wang*, Tianle Wang*, Shangyu Xing, Peng Xia, Bo Li, Kaiyuan Zheng, Zijian Zhang, Zhaorun Chen, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Weitong Zhang, Ying Wei, Mohit Bansal, Huaxiu Yao
International Conference on Learning Representations (ICLR), 2025. | AFM Workshop at NeurIPS, 2024 [Paper]
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Peng Xia*, Kangyu Zhu*, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao
The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [Paper] [Code] [Talk]
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao
The Conference on Neural Information Processing Systems (NeurIPS), 2024 [Paper] [Code] [Project Page]

During Ph.D. Journey at Monash

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Peng Xia, Xingtong Yu, Ming Hu, Lie Ju, Zhiyong Wang, Peibo Duan, Zongyuan Ge
The Conference on Computational Linguistics (COLING), 2025 [Paper] [Code]
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu*, Peng Xia*, Lin Wang*, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge
European Conference on Computer Vision (ECCV), 2024 [Paper] [Code] [Project Page]
Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations
Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, Zongyuan Ge
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024 Early Accepted [Paper]
TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM
Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024 [Paper]
Towards Realistic Semi-supervised Medical Image Classification
Wenxue Li, Lie Ju, Feilong Tang, Peng Xia, Xinyu Xiong, Ming Hu, Lei Zhu, Zongyuan Ge
The Conference on Association for the Advancement of Artificial Intelligence (AAAI), 2025 [Paper]
Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation
Feilong Tang, Zhongxing Xu, Ming Hu, Wenxue Li, Peng Xia, Yiheng Zhong, Hanjun Wu, Jionglong Su, Zongyuan Ge
The Conference on Association for the Advancement of Artificial Intelligence (AAAI), 2025 [Paper]
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Peng Xia, Di Xu, Ming Hu, Lie Ju, Zongyuan Ge
ALVR Workshop @ Annual Meeting of the Association for Computational Linguistics (ACL), 2024 [Paper] [Code]
NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding
Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge
The Conference on Neural Information Processing Systems (NeurIPS), 2023 [Paper] [Code]

Patents


Invited Talks


Press

  • Oct. 2024: "MMed-RAG: A Versatile Multimodal Retrieval-Augmented Generation System Transforming Factual Accuracy in Medical Vision-Language Models Across Multiple Domains" was covered by MarkTechPost. [Video] [News]


Selected Honors & Awards

  • ICLR Oral Presentation (Top 1.8%), 2025

  • MSRI Graduate Scholarship, MSRI Living Stipend, 2023-2024

  • Third Place, Shanghai-HK Interdisciplinary Shared Tasks Task 1, 2022

  • Second Price, The 3rd Huawei DIGIX AI Algorithm Contest, 2021

  • Honorable Mention, Mathematics Contest in Modeling, 2021


Academic Services

  • Student Volunteer: EMNLP (2024)

  • Journal/Conference Reviewer: NeurIPS (2024), NeurIPS D&B Track (2024), ICML (2024-2025), ICLR (2025), CVPR (2025), MICCAI (2024-2025), WACV (2025), ACL Rolling Review (ARR) (2024-2025), International Journal of Computer Vision (IJCV), IEEE Transactions on Medical Imaging (TMI), Stat

Flag Counter
© Peng Xia | Last updated: last update