2025/09 | [NeurIPS'25] 4 papers accepted by NeurIPS 2025 in the field of efficient and reliable AI. Congrats to my students and collaborators! đ Two of them are public already: - HoliTom: As a top-performing video LLM token compression method, HoliTom can maintain 99.1% performance while reducing the FLOPs to only 6.9%. And, itâs training-free! [arxiv] [code] [webpage]
- Poison as Cure: Adversarial visual noise is always malicious to our models like âpoisonâ? No, we find it can also be a cure to mitigate the hallucination problem of VLMs. [arxiv] [code] [webpage]
|
2025/06 | [Award-to-Students] đCongrats to my PhD student Keda Tao on receiving the â2025 Westlake University Xinrui Award (è„żæč性ćŠć棫ç ç©¶çæ°éć„)â (only 2 recipients in AI among all the 2025 Fall PhD students in School of Engineering). |
2025/02 | [CVPR'25] DyCoke is accepted by CVPRâ25! Congrats to Keda!đ DyCoke is a training-free, plug-and-play token compression method for fast video LLMs: 1.5x wall-clock inference speedup and 1.4x memory reduction with no performance drop. [arxiv] [code] |
2025/02 | [Preprint] Can diffusion models blend visual concepts that are semantically very unsimilar (e.g., an orange and a teddy bear)? Yes, we introduce FreeBlend, a new method to blend arbitrary concepts. [arxiv] [code] [webpage] |
2025/01 | [Preprint] Adversarial visual noise is always malicious to our models like âpoisonâ? No, we find it can also be a cure to mitigate the hallucination problem of VLMs. [arxiv] [code] [webpage] |
2025/01 | [ICLR'25] One paper about distilling large foundation models with low cost âCompressing Vision Foundation Models at ImageNet-level Costsâ is accepted by ICLRâ25. Thanks to the lead author Yitian! |
2024/12 | [Preprint] We present empirical evidence to show that oracle pruning, the âground-truthâ pruning paradigm that has been followed for around 35 years in the pruning community, does not hold in practice. [arxiv][webpage] |
2024/07 | [NeurIPS'24] We introduce a training framework Scala to learn slimmable ViTs. Using Scala, a ViT model is trained once but can inference at different widths, up to the need of devices with different resources. The project is led by Yitian. Congrats! |
2024/07 | [MM'24] We present the first real-time on-device video SCI (Snapshot Compressive Imaging) framework via dedicated network design and a distillation-based training strategy. Congrats to Miao! |
2024/07 | [ECCV'24] One paper about efficient video SCI (Snapshot Compressive Imaging) via network quantization is accepted by ECCVâ24 as an oral. Congrats to Miao! [code] |