Federico Baldassarre

AI Scientist

Mistral

Ciao 👋

I’m an AI Scientist at Mistral in Paris. My research focuses on multimodal foundation models for vision and language. Previously, I was a postdoctoral researcher at FAIR in Meta AI, where I contributed to the development of DINOv3, a state-of-the-art self-supervised vision foundation model, and DINO-world, a latent video world model. During my PhD at KTH, Stockholm, I worked on the explainability of deep learning models, applied to computer vision and bioinformatics.

Interests

Self-supervised learning
Video world models
Explainability and reasoning

Education

PhD in Deep Learning, 2018 - 2023
KTH - Royal Institute of Technology
MSc in Machine Learning, 2016 - 2018
KTH - Royal Institute of Technology
BSc in Computer Engineering, 2013 - 2016
UNIBO - University of Bologna

News

Dec 2025 I’m joining Mistral in Paris as an AI Scientist 🇫🇷
Nov 2025 Recognized as a top reviewer at NeurIPS 2025, thanks!
Oct 2025 Guest lecture on DINOv3 to deep learning students at Chalmers University, Gothenburg, Sweden 🇸🇪
Sept 2025 Honored to be invited to present DINOv3 at the Foundation Model workshop at the Geneva Institute of Theoretical Science 🇨🇭
Sept 2025 Presenting my postdoc contributions (CAPI, dino.txt, DINO-world and DINOv3) to students of the WASP AI program at the University of Linköping, Sweden 🇸🇪
Aug 2025 Announcing DINOv3, a major release that raises the bar of self-supervised vision foundation models 🦖🦖🦖
Jul 2025 Publishing DINO-world, a latent video world model trained in the feature space of DINOv2. Also presented at ICML 2025 in Vancouver 🍁
May 2025 Our work on scalable pre-training of dense image representations, CAPI, is accepted at TMLR. Congrats to Timothée for leading this work!
Feb 2025 Our “DINO.txt” paper on aligning vision and language representations is accepted at CVPR 2025.
Sept 2023 I’m moving to Paris to join the DINO team at Meta AI as a postdoctoral researcher 🦖
Jun 2023 PhD Graduation at KTH, Stockholm 🎓🎉
Feb 2023 Our work on semantic image compression with quantized autoencoders is accepted at ICASSP 2023 (Meta AI internship).
- SEE ALL NEWS

Featured Publications

DINOv3

Preprint 2025 - Scaling vision SSL to 7B parameters and 1.7B images, achieving unprecedented patch feature quality.

Oriane Siméoni, Huy v. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

Back to the Features: DINO as a Foundation for Video World Models

ICML Workshop 2025 - Learning physical world models in the latent space of DINOv2 from uncurated web videos.

Federico Baldassarre, Marc Szafraniec, Basile Terver, Vasil Khalidov, Francisco Massa, Yann LeCun, Patrick Labatut, Maximilian Seitzer, Piotr Bojanowski

Back to the Features: DINO as a Foundation for Video World Models

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

CVPR 2025 - Locked-image tuning for vision-language alignment using a DINOv2 backbone and a few tricks on top.

Cijo Jose, Théo Moutakanni, Dahyun Kang, Federico Baldassarre, Timothée Darcet, Hu Xu, Daniel Li, Marc Szafraniec, Michaël Ramamonjisoa, Maxime Oquab, Oriane Siméoni, Huy v. Vo, Patrick Labatut, Piotr Bojanowski

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

See all publications