📝 Publications
👁️ Multimodal Learning & Visual Reasoning
CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
Xinyu Zhang, et al., NeurIPS 2025 (CCF-A)
- Proposed a training-free method to adaptively adjust visual focus for VLMs.
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang, et al., ACL 2025 (CCF-A)
- The first benchmark specifically targeting physics-based visual reasoning.
Alignment Relation is What You Need for Diagram Parsing
Xinyu Zhang, et al., IEEE-TIP 2024 (CCF-A, SCI-Q1)
- Focuses on curriculum-level diagram parsing and question generation.
📑 Selected Conference & Journal Papers
- [Beyond Layer-wise Merging: Dynamic Chain-of-Merging for VLM], Xinyu Zhang, et al., CVPR 2026 (Under Review/Accepted)
- [Cognitive Predictive Coding Network], Xinyu Zhang, et al., ACM-MM 2025 (CCF-A)
- [Memory-enriched thought-by-thought framework (METbT)], Xinyu Zhang, et al., CVIU 2025 (CCF-B)
- [RPMG-FSS: Robust Prior Mask Guided Few-Shot Semantic Segmentation], Xinyu Zhang, et al., IEEE-TCSVT 2023 (CCF-B)
- [Evochart: A benchmark towards real-world chart understanding], (Co-author), AAAI 2025
- [Cog-dqa: Chain-of-guiding learning for DQA], (Co-author), CVPR 2024