2025/02 | [CVPR'25] DyCoke is accepted by CVPR’25! Congrats to Keda!🎉 DyCoke is a training-free, plug-and-play token compression method for fast video LLMs: 1.5x wall-clock inference speedup and 1.4x memory reduction with no performance drop. [arxiv] [code] |
2025/02 | [Preprint] Can diffusion models blend visual concepts that are semantically very unsimilar (e.g., an orange and a teddy bear)? Yes, we introduce FreeBlend, a new method to blend arbitrary concepts. [arxiv] [code] [webpage] |
2025/01 | [Preprint] Adversarial visual noise is always malicious to our models like “poison”? No, we find it can also be a cure to mitigate the hallucination problem of VLMs. [arxiv] [code] [webpage] |
2025/01 | [ICLR'25] One paper about distilling large foundation models with low cost “Compressing Vision Foundation Models at ImageNet-level Costs” is accepted by ICLR’25. Thanks to the lead author Yitian! |
2024/12 | [Preprint] We present empirical evidence to show that oracle pruning, the “ground-truth” pruning paradigm that has been followed for around 35 years in the pruning community, does not hold in practice. [arxiv][webpage] |
2024/07 | [NeurIPS'24] We introduce a training framework Scala to learn slimmable ViTs. Using Scala, a ViT model is trained once but can inference at different widths, up to the need of devices with different resources. The project is led by Yitian. Congrats! |
2024/07 | [MM'24] We present the first real-time on-device video SCI (Snapshot Compressive Imaging) framework via dedicated network design and a distillation-based training strategy. Congrats to Miao! |
2024/07 | [ECCV'24] One paper about efficient video SCI (Snapshot Compressive Imaging) via network quantization is accepted by ECCV’24 as an oral. Congrats to Miao! [code] |