Publications

publications by categories in reversed chronological order.

2025

  1. arXiv’25/05
    HoliTom: Holistic Token Merging for Fast Video Large Language Models
    arXiv preprint arXiv:2505.21334, 2025
  2. arXiv’25/05
    Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
    Sicheng Feng* , Song Wang*, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang , and Xinchao Wang
    arXiv preprint arXiv:2505.18675, 2025
  3. CVPR’25
    DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
    CVPR, 2025
  4. ICLR’25
    Accessing Vision Foundation Models at ImageNet-level Costs
    ICLR, 2025

2024

  1. arXiv’24/11
    Is Oracle Pruning the True Oracle?
    arXiv preprint arXiv:2412.00143, 2024
  2. ACM MM’24
    Towards Real-time Video Compressive Sensing on Mobile Devices
    Miao Cao , Lishun Wang, Huan Wang , Guoqing Wang, and Xin Yuan
    ACM MM, 2024
  3. ECCV’24 Oral
    A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
    Miao Cao , Lishun Wang, Huan Wang, and Xin Yuan
    ECCV, 2024