Paper-Conference

Back to the Features: DINO as a Foundation for Video World Models
ICML Workshop 2025 - Learning physical world models in the latent space of DINOv2 from uncurated web videos.
Back to the Features: DINO as a Foundation for Video World Models
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
CVPR 2025 - Locked-image tuning for vision-language alignment using a DINOv2 backbone and a few tricks on top.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment