Lucas Beyer is a Belgian research scientist recognized for his contributions to computer vision, representation learning, and large-scale model training, including foundational work on the Vision Transformer (ViT). He has held senior research positions at Google DeepMind and OpenAI, and is currently a researcher at Meta's Superintelligence Team. [1]
Beyer studied at RWTH Aachen University in Germany, where he earned a Diplom-Ingenieur (Dipl.Ing.) in Computational Engineering Science, graduating in 2012 with a grade of 1.3. His final thesis, titled "Exploiting Graphics Accelerators for Computational Biology," focused on solving generalized least-squares problems on GPUs and was graded 1.0. After graduating, he briefly began a PhD program in High-Performance Computing at the Aachen Institute for Advanced Study in Computational Engineering Science (AICES) from November 2012 to April 2013. He then transitioned to the field of computer vision, undertaking a PhD at RWTH Aachen's Visual Computing Institute from June 2013 to May 2018. His doctoral research, supervised by Professor Bastian Leibe, centered on deep learning for computer vision on mobile robots with a focus on reducing annotation effort. [2] [3] [6]
Beyer's professional career began before his PhD, with a role as a programmer at Digatron Power Electronics from 2006 to 2008, where he worked on control systems for battery testing equipment. During his university studies, he held an internship at Mint medical GmbH and served in several student research assistant and tutoring roles at RWTH Aachen University.
During his doctoral studies, Beyer completed several research internships. In the summer of 2016, he was an intern at Google in Los Angeles, working on image-gaze prediction. He then interned at the AI startup Kindred in Toronto from August to November 2016, focusing on learning from human demonstration for robotics. He returned to Google for another research internship in the summer of 2017, where he worked on disentangling representations learned by FaceNet.
Upon completing his PhD in 2018, Beyer joined Google Brain in Zürich as a Staff Research Scientist, a role he continued through its integration into Google DeepMind until October 2024. At Google, he co-led the multimodal (vision-language) research team and contributed to numerous high-impact projects. His research during this time focused on developing scalable and efficient models for computer vision and multimodal learning, with an emphasis on large-scale pre-training, architectural innovation, and robust evaluation methodologies.
Beyer was part of the team at Google Research that developed the Vision Transformer (ViT), a seminal work that applied the Transformer architecture, previously successful in natural language processing, to computer vision tasks. This approach demonstrated that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks, challenging the dominance of Convolutional Neural Networks (CNNs). He also co-authored "Scaling Vision Transformers," which explored how to effectively scale ViT models to achieve state-of-the-art results. Further exploring architectural design, Beyer was a key contributor to MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs) that achieved competitive results without using convolutions or self-attention mechanisms. His work also includes "Big Transfer (BiT)," a paper that established principles for transfer learning from large-scale, pre-trained vision models, and FlexiViT, a method for training ViTs with randomized patch sizes to enable flexible deployment at different computational costs.
Beyer also made significant contributions to training techniques and dataset quality. He was involved in creating the ImageNet-ReaL labels, a project that corrected labels in the ImageNet validation set to provide a more accurate benchmark for model evaluation. His work on multimodal models includes SigLIP, which proposed using a sigmoid loss for contrastive image-text pre-training as a more scalable alternative to the standard softmax-based loss used in models like CLIP. The paper "In Defense of the Triplet Loss for Person Re-Identification," co-authored with Alexander Hermans and Bastian Leibe during his PhD, showed that a well-implemented triplet loss could outperform other methods for deep metric learning. [1] [2] [3] [4] [5] [6]
In late 2024, after leaving Google, Beyer co-founded the Zürich office for OpenAI along with his colleagues Alexander Kolesnikov and Xiaohua Zhai, serving as a Member of Technical Staff. [8] [9]
In June 2025, Beyer, along with Kolesnikov and Zhai, left OpenAI to join Meta's Superintelligence Team as a researcher. The move attracted media attention following claims by OpenAI CEO Sam Altman that Meta was offering $100 million signing bonuses to recruit his staff. Beyer publicly refuted these claims on X (formerly Twitter), stating, "no, we did not get 100M sign-on, that's fake news." In a subsequent reply on the platform, when a commenter suggested Altman had made the claim to make potential recruits feel lowballed, Beyer responded, "Yes, it was a brilliant move, gotta give him that." [7] [8] [9]
During his PhD, Beyer worked on robotics projects, including SPENCER, a service robot designed for guidance in airports, and STRANDS, which focused on long-term autonomy for robots in everyday environments. His early publications from this period focused on applying deep learning to perception tasks for mobile robots using sensors like 2D laser scanners. Notable works include DROW, a real-time deep learning detector for wheelchairs in 2D range data, and Biternion Nets, a method for continuous head pose regression from discrete labels.
A selection of his key publications includes:
This list represents a subset of his more than 50 publications in the field.
Beyer has received several awards and fellowships for his academic and research work.
These honors were awarded during his time at RWTH Aachen University. [2] [1] [3] [4] [5]