Paper-Conference

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
CVPR 2025 - Locked-image tuning for vision-language alignment using a DINOv2 backbone and a few tricks on top.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment