Publications

publications by categories in reversed chronological order.

2025

  1. NeurIPS’25
    HoliTom: Holistic Token Merging for Fast Video Large Language Models
    NeurIPS, 2025
  2. NeurIPS’25
    Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
    Kejia Zhang, Keda Tao, Jiasheng Tang, and Huan Wang
    NeurIPS, 2025
  3. arXiv’25/07
    When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
    arXiv preprint arXiv:2507.20198, 2025
  4. arXiv’25/05
    Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
    Sicheng Feng* , Song Wang*, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang , and Xinchao Wang
    arXiv preprint arXiv:2505.18675, 2025
  5. CVPR’25
    DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
    CVPR, 2025
  6. ICLR’25
    Accessing Vision Foundation Models at ImageNet-level Costs
    ICLR, 2025

2024

  1. arXiv’24/11
    Is Oracle Pruning the True Oracle?
    arXiv preprint arXiv:2412.00143, 2024
  2. ACM MM’24
    Towards Real-time Video Compressive Sensing on Mobile Devices
    Miao Cao , Lishun Wang, Huan Wang , Guoqing Wang, and Xin Yuan
    ACM MM, 2024
  3. ECCV’24 Oral
    A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
    Miao Cao , Lishun Wang, Huan Wang, and Xin Yuan
    ECCV, 2024