Łukasz Kaiser is the co-founder of the data processing company Pathway. His career includes senior research roles at Google Brain and OpenAI. He is a primary co-author of the 2017 paper "Attention Is All You Need," which introduced the Transformer deep learning architecture that underpins most contemporary large language models. [1] [2]
Kaiser earned dual Master of Science degrees in Computer Science and Mathematics from the University of Wroclaw in Poland, completing them in 2004 and 2001, respectively. [3] [4] He then pursued his doctoral studies at RWTH Aachen University in Germany, where he received his Ph.D. in Computer Science in 2008. [5] [6] His dissertation, titled Logic and Games on Automatic Structures, focused on algorithmic model theory. [7] [3] He later obtained his Habilitation (HDR) in Computer Science in 2013 from Paris Diderot University, with a thesis titled "Logic and Automata for Unranked Data." [8]
Kaiser began his career in academia, focusing on theoretical computer science, logic, and automata theory. [9] After completing his Ph.D., he worked as a post-doctoral researcher at RWTH Aachen University and later at LIAFA, a laboratory affiliated with Paris Diderot University. [2] In late 2010, he became a chargé de recherche (tenured research scientist) for the French National Centre for Scientific Research (CNRS), based at Paris Diderot University. [6] [7] In 2013, Kaiser transitioned from academia to industry, joining the Google Brain team in Mountain View, California, to work on deep learning. [7] He later commented on this shift, noting the rapid pace of change in deep learning compared to theoretical computer science. [7]
During his nearly eight-year tenure at Google Brain, from August 2013 to April 2021, Kaiser was promoted to Staff Research Scientist and became a key contributor to several of the company's most important AI projects. [1] He played a crucial role in the development of TensorFlow, Google's open-source machine learning framework. [6] His work on attention mechanisms was a core component of the Google Neural Machine Translation (GNMT) system that powers Google Translate. [6]
In 2017, he co-authored the paper "Attention Is All You Need," which introduced the Transformer architecture. That same year, he co-created and led the development of Tensor2Tensor (T2T), an open-source library designed to make deep learning research more accessible and which included the reference implementation of the Transformer model. [1] He also co-authored the Reformer model, an efficient variant of the Transformer designed to handle long sequences with less memory usage. [4]
Kaiser joined OpenAI as a researcher in April 2021. [1] At OpenAI, he contributed to the development of models such as GPT-4 and Codex. [5] He later served as a research lead for the OpenAI o1 model series, which launched in September 2024. [10]
Concurrently with his research career, Kaiser co-founded the technology company Pathway in January 2020, where he serves as Chief Scientist. [1] Pathway develops a reactive data processing framework that enables real-time machine learning applications by unifying stream and batch data processing. The company's goal is to allow AI systems to update automatically as new data arrives, facilitating applications that require low-latency responses to live data. [11]
Kaiser is one of eight co-authors of the 2017 paper "Attention Is All You Need," which introduced the Transformer. This model architecture marked a significant shift in sequence processing by dispensing with recurrent (RNN) and convolutional (CNN) layers, relying instead entirely on a self-attention mechanism. [2] Self-attention allows the model to weigh the importance of different words in an input sequence to better understand context. This design enabled greater parallelization during training, making it highly effective for the massive datasets used to train large language models. The Transformer has since become the foundational architecture for most state-of-the-art NLP models, including BERT and the GPT series. [1]
The other co-authors of the paper are Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Illia Polosukhin. [10]
While at Google, Kaiser was a main author and core contributor to the TensorFlow framework, which became a standard tool for building and deploying machine learning models. [8] [6] To accelerate research and improve accessibility, he led the creation of Tensor2Tensor (T2T), an open-source library of deep learning models and datasets. [2] T2T was designed to make it easier for researchers to test new ideas against state-of-the-art models and served as the initial public repository for the Transformer model's code, contributing to its rapid adoption. [8]
At OpenAI, Kaiser was a research lead for the o1 model series, which he described as a "new paradigm" in AI upon its launch in September 2024. [10] According to him, these models are designed to use "hidden CoTs" (Chain of Thoughts), an internal reasoning process that allows them to spend more computational effort to think before providing a response. He stated that this approach enables the models to learn from less data, generalize better, and engage in a form of approximate reasoning compared to previous architectures. [10] In December 2024, he commented on an early evaluation release of the o3 model, noting its advanced capabilities on reasoning benchmarks. [10]
For his doctoral dissertation, Kaiser received the 2009 E.W. Beth Dissertation Prize, awarded for outstanding dissertations in the fields of logic, language, and information. [6] [7]
In October 2024, Kaiser and his seven co-authors of "Attention Is All You Need" were awarded the 2024 NEC C&C Prize for their contributions to the development of the Transformer model. [10]
As of late 2025, Kaiser's work has been cited over 425,000 times, according to his Google Scholar profile. The paper "Attention Is All You Need" is one of the most cited in modern computer science, with over 100,000 citations. [2]
Kaiser is an active speaker at AI conferences and has shared his perspective on the long-term direction of AI research. He has articulated a vision of creating a single, universal model capable of performing tasks across multiple modalities like language, vision, and audio. In one interview, he stated, "The dream is that at some point, there will be one model, and the one model will have learned to be a good programmer, to be a good conversational agent, to do vision, and to do language." [8] This philosophy also guided his work on the 2017 paper "One Model To Learn Them All." [7]
During the OpenAI leadership crisis in November 2023, he publicly posted messages of support for the company's team, including the statement, "OpenAI is nothing without its people." [10]
At public talks in 2024 and 2025, such as the Pathway Meetup and at Pi School, Kaiser discussed the evolution of deep learning. He contrasted the state of the field in 2014, when getting a neural network to match existing translation systems was a major breakthrough, to the present, where models like GPT-4 can perform a wide range of NLP tasks without specialized training. [5] Looking forward, he has highlighted "impending data scarcity" as a major challenge and theorized that future performance gains will come from training on fewer, high-quality data points retrieved from personal and organizational knowledge graphs. [11]
This interview features computer scientist Łukasz Kaiser discussing topics related to the development, behavior, and limitations of contemporary artificial intelligence systems. The interview was published on November 28, 2025, on the This Is World YouTube channel. Kaiser is a co author of the paper Attention Is All You Need, which introduced the transformer architecture later adopted in large language models.
During the conversation, Kaiser describes how the transformer model, initially proposed as a technical approach for sequence modeling, became widely used in language based AI systems. He states that training models primarily on textual data led to the emergence of generalization abilities beyond direct memorization. According to Kaiser, patterns present in large language datasets appear to align with certain structures of human reasoning, allowing models to perform tasks not explicitly specified during training.
Kaiser also addresses the current lack of comprehensive theoretical understanding of large scale models. He explains that while smaller systems can be analyzed in detail, the behavior of large models remains difficult to interpret due to their size and complexity. From his perspective, generalization is a central research question in artificial intelligence, and increases in model scale alone do not resolve underlying conceptual gaps.
The interview further covers limitations of existing architectures. Kaiser notes that current systems do not incorporate many elements present in biological intelligence, such as embodiment, sensory experience, or evolutionary constraints. He suggests that insights from neuroscience and biology may inform future research directions. He concludes that future advances in artificial intelligence may require changes to foundational learning approaches rather than continued expansion of existing model designs. [12]
This interview was published on the YouTube channel The MAD Podcast with Matt Turck on November 26, 2025, and features Łukasz Kaiser discussing topics related to artificial intelligence research. The conversation addresses current research directions and technical approaches used in contemporary AI systems, based on Kaisers own explanations and interpretations.
During the interview, Kaiser states that ongoing AI development within research laboratories continues at a steady pace. He describes this progress as resulting from a combination of increased computational scale and changes in model training approaches. According to his account, pre training remains part of current workflows, alongside the expanded use of reinforcement learning techniques applied to reasoning oriented models.
Kaiser explains that reasoning models are trained to generate intermediate steps during problem solving, commonly referred to as chain of thought. He associates this training method with tasks that allow objective verification, including mathematics, software development, and scientific problem solving. He contrasts this approach with earlier language models that relied primarily on learned statistical associations without explicit intermediate reasoning steps.
The discussion also covers recent OpenAI model iterations, including GPT 4, GPT 5, and GPT 5.1. Kaiser attributes changes across these versions mainly to post training processes, reinforcement learning, and data filtering practices, rather than to increases in parameter count alone. He also mentions technical and operational considerations such as GPU resource allocation, model distillation, and infrastructure requirements related to large scale training and deployment.
Kaiser provides background on his academic and professional trajectory, referencing early work related to logic and games, followed by research roles at Google Brain and OpenAI. He references his involvement as a co author of the Transformer architecture and describes it as a general purpose neural network design used across multiple AI tasks. He notes that research into alternative and complementary architectures continues within the field.
In addressing future research topics, Kaiser mentions areas including generalization, multimodal reasoning, extended task execution by agents, and robotics. He notes that current systems display variable performance across tasks and that limitations remain in areas requiring consistent reasoning across contexts. He also references ongoing research related to interpretability, system reliability, and human oversight in applied AI systems. [13]