📝Publications & Preprints

* indicates equal contribution; indicates corresponding authorship.

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models.
P. Xia, K. Zhu, H. Li, T. Wang, W. Shi, L. Zhang, J. Zou, H Yao.
arXiv preprint, 2024.
the short version is presented in NeurIPS 2024 Workshop on Adaptive Foundation Models and Safe Generative AI.
Paper  ·  Code

Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-Language Models.
P. Xia*, S. Han*, S. Qiu*, Y. Zhou, Z. Wang, W. Zheng, Z. Chen, C. Cui, M. Ding, L. Li, L. Wang, H Yao.
arXiv preprint, 2024.
the short version is presented in NeurIPS 2024 Workshop on Adaptive Foundation Models.
Paper  ·  Code

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models.
P. Xia*, K. Zhu*, H. Li, H. Zhu, Y. Li, G. Li, L. Zhang, H. Yao.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
Paper  ·  Code

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models.
P. Xia, Z. Chen, J. Tian*, Y. Gong*, R. Hou, Y. Xu, Z. Wu, Z. Fan, Y. Zhou, K. Zhu, W. Zheng, Z. Wang, X. Wang, X. Zhang, C. Bansal, M. Niethammer, J. Huang, H. Zhu, Y. Li, Z. Ge, J. Sun, G. Li, J. Zou, H. Yao.
Conference on Neural Information Processing Systems (NeurIPS), 2024.
the short version is presented in ICML 2024 Workshop on Foundation Models in the Wild.
Paper  ·  Code

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations.
P. Xia*, M. Hu*, F. Tang, W. Li, W. Zheng, L. Ju, P. Duan, H. Yao, Z. Ge.
Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024. (Early Accept, Top 11%)
Paper  ·  Code

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding.
M. Hu*, P. Xia*, L. Wang*, S. Yan, F. Tang, Z. Xu, Y. Luo, K. Song, J. Leitner, X. Cheng, J. Cheng, C. Liu, K. Zhou, Z. Ge.
European Conference on Computer Vision (ECCV), 2024.
Paper ·  Code



🎨Patents

  • Article quality discrimination software based on multi-model transfer pre-training.
    J. Li, P. Xia, K. Zeng, et al.
    CN Software Copyright. 2022SR0228307. (Granted)
  • Lane detection system based on cascaded convolutional neural network.
    J. Li, K. Zeng, P. Xia.
    CN Software Copyright. 2022SR0248890. (Granted)