ENCODE Lab

HuanWang.jpg

Yungu Campus

Westlake University

Hangzhou, China

About Us:
The ENCODE Lab is led by Dr. Huan Wang, a Tenure-Track Assistant Professor in the AI Department at Westlake University. Our lab is dedicated to advancing the field of Artificial Intelligence by focusing on creating efficient and effective AI solutions.

Research Focus:
Our research focuses on Efficient AI in vision and language modeling, spanning image classifcation / detection / segmentation [GReg, PaI-Survey, TPP] to neural style transfer [Ultra-Resolution-NST], single image super-resolution [ASSL/GASSL, SRP, ISS-P, Oracle-Pruning-Sanity-Check], 3D novel view synthesis / neural rendering / NeRF / NeLF [R2L, MobileR2L, LightAvatar], AIGC / diffusion models / Stable Diffusion [SnapFusion, FreeBlend], LLM / MLLM [DyCoke, Poison-as-Cure], and snapshot compressive imaging (SCI) [QuantizedSCI, MobileSCI].

Our Mission:
Our mission is to advance AI by creating efficient, broadly applicable methods and models. We’re dedicated to driving both theoretical innovation and tangible solutions for diverse real-world problems.

News

2025/02 [CVPR'25] DyCoke is accepted by CVPR’25! Congrats to Keda!🎉 DyCoke is a training-free, plug-and-play token compression method for fast video LLMs: 1.5x wall-clock inference speedup and 1.4x memory reduction with no performance drop. [arxiv] [code]
2025/02 [Preprint] Can diffusion models blend visual concepts that are semantically very unsimilar (e.g., an orange and a teddy bear)? Yes, we introduce FreeBlend, a new method to blend arbitrary concepts. [arxiv] [code] [webpage]
2025/01 [Preprint] Adversarial visual noise is always malicious to our models like “poison”? No, we find it can also be a cure to mitigate the hallucination problem of VLMs. [arxiv] [code] [webpage]
2025/01 [ICLR'25] One paper about distilling large foundation models with low cost “Compressing Vision Foundation Models at ImageNet-level Costs” is accepted by ICLR’25. Thanks to the lead author Yitian!
2024/12 [Preprint] We present empirical evidence to show that oracle pruning, the “ground-truth” pruning paradigm that has been followed for around 35 years in the pruning community, does not hold in practice. [arxiv][webpage]
2024/07 [NeurIPS'24] We introduce a training framework Scala to learn slimmable ViTs. Using Scala, a ViT model is trained once but can inference at different widths, up to the need of devices with different resources. The project is led by Yitian. Congrats!
2024/07 [MM'24] We present the first real-time on-device video SCI (Snapshot Compressive Imaging) framework via dedicated network design and a distillation-based training strategy. Congrats to Miao!
2024/07 [ECCV'24] One paper about efficient video SCI (Snapshot Compressive Imaging) via network quantization is accepted by ECCV’24 as an oral. Congrats to Miao! [code]

Latest Posts

Selected Publications

  1. arXiv’25/05
    HoliTom: Holistic Token Merging for Fast Video Large Language Models
    arXiv preprint arXiv:2505.21334, 2025
  2. arXiv’25/05
    Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
    Sicheng Feng* , Song Wang*, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang , and Xinchao Wang
    arXiv preprint arXiv:2505.18675, 2025
  3. CVPR’25
    DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
    CVPR, 2025
  4. ICLR’25
    Accessing Vision Foundation Models at ImageNet-level Costs
    ICLR, 2025
  5. arXiv’24/11
    Is Oracle Pruning the True Oracle?
    arXiv preprint arXiv:2412.00143, 2024
  6. ACM MM’24
    Towards Real-time Video Compressive Sensing on Mobile Devices
    Miao Cao , Lishun Wang, Huan Wang , Guoqing Wang, and Xin Yuan
    ACM MM, 2024
  7. ECCV’24 Oral
    A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
    Miao Cao , Lishun Wang, Huan Wang, and Xin Yuan
    ECCV, 2024