Paper-Conference

Back to the Features: DINO as a Foundation for Video World Models

ICML Workshop 2025 - Learning physical world models in the latent space of DINOv2 from uncurated web videos.

Federico Baldassarre, Marc Szafraniec, Basile Terver, Vasil Khalidov, Francisco Massa, Yann LeCun, Patrick Labatut, Maximilian Seitzer, Piotr Bojanowski

Back to the Features: DINO as a Foundation for Video World Models

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

CVPR 2025 - Locked-image tuning for vision-language alignment using a DINOv2 backbone and a few tricks on top.

Cijo Jose, Théo Moutakanni, Dahyun Kang, Federico Baldassarre, Timothée Darcet, Hu Xu, Daniel Li, Marc Szafraniec, Michaël Ramamonjisoa, Maxime Oquab, Oriane Siméoni, Huy v. Vo, Patrick Labatut, Piotr Bojanowski

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Variable Rate Allocation for Vector-Quantized Autoencoders

ICASSP 2023 - Control the bit rate allocation of a a PQ-VAE using semantic information from DINO, in collaboration with Meta AI in Paris.

Federico Baldassarre, Alaaeldin El-Nouby, Hervé Jégou

Variable Rate Allocation for Vector-Quantized Autoencoders

Learnable Masked Tokens for Improved Transferability of Self-Supervised Vision Transformers

ECML PKDD 2022 - Learnable masked tokens reduce overfitting when training transformers in low-data regime.

Hao Hu, Federico Baldassarre, Hossein Azizpour

Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors

BMVC 2022 - Quantitative metrics for evaluating explanations of video DeepFake detectors, in collaboration with the Huawei Ireland Research Center.

Federico Baldassarre, Quentin Debard, Gonzalo Fiz Pontiveros, Tri Kurniawan Wijaya

Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors

Towards Self-Supervised Learning of Global and Object-Centric Representations

ICLR Workshop 2022 - How to learn object-centric representations using contrastive learning on multi-object datasets?

Federico Baldassarre, Hossein Azizpour

Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks

ECCV 2020 - A novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels, graph networks, and attribution-based explanations.

Federico Baldassarre, Kevin Smith, Josephine Sullivan, Hossein Azizpour

Explainability Techniques for Graph Convolutional Networks

ICML Workshop 2019 - How do Sensitivity Analysis, Guided Backpropagation, and Layer-Wise Relevance Propagation behave when applied to Graph Neural Networks?

Federico Baldassarre, Hossein Azizpour

Explainability Techniques for Graph Convolutional Networks