📝Publications & Preprints

* indicates equal contribution; indicates corresponding authorship.

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models.
P. Xia*, K Zhu*, H Li, H Zhu, Y Li, G Li, L Zhang, H Yao.
arXiv preprint, 2024.
Paper  ·  Code

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models.
P. Xia, Z. Chen, J. Tian*, Y. Gong*, R. Hou, Y. Xu, Z. Wu, Z. Fan, Y. Zhou, K. Zhu, W. Zheng, Z. Wang, X. Wang, X. Zhang, C. Bansal, M. Niethammer, J. Huang, H. Zhu, Y. Li, Z. Ge, J. Sun, G. Li, J. Zou, H. Yao.
arXiv preprint, 2024.
the short version is presented in ICML 2024 Workshop on Foundation Models in the Wild.
Paper  ·  Code

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations.
P. Xia*, M. Hu*, F. Tang, W. Li, W. Zheng, L. Ju, P. Duan, H. Yao, Z. Ge.
Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024. (Early Accept, Top 11%)
Paper  ·  Code

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding.
P. Xia, X. Yu, M. Hu, L. Ju, Z. Wang, P. Duan, Z. Ge.
arXiv preprint, 2023.
the short version is presented in ACL 2024 Workshop on Advances in Language and Vision Research (ALVR).
Paper  ·  Code

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding.
M. Hu*, P. Xia*, L. Wang*, S. Yan, F. Tang, Z. Xu, Y. Luo, K. Song, J. Leitner, X. Cheng, J. Cheng, C. Liu, K. Zhou, Z. Ge.
European Conference on Computer Vision (ECCV), 2024.
Paper ·  Code

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-Tailed Multi-Label Visual Recognition.
P. Xia, D. Xu, M. Hu, L. Ju, Z. Ge.
ACL 2024 Workshop on Advances in Language and Vision Research (ALVR).
Paper ·  Code

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding.
M. Hu*, L. Wang*, S. Yan*, D. Ma*, Q. Ren, P. Xia, W. Feng, P. Duan, L. Ju, Z. Ge.
Conference on Neural Information Processing Systems (NeurIPS), 2023.
Paper ·  Code



🎨Patents

  • Article quality discrimination software based on multi-model transfer pre-training.
    J. Li, P. Xia, K. Zeng, et al.
    CN Software Copyright. 2022SR0228307. (Granted)
  • Lane detection system based on cascaded convolutional neural network.
    J. Li, K. Zeng, P. Xia.
    CN Software Copyright. 2022SR0248890. (Granted)