Anton Bakhtin is is a member of the technical team at Anthropic, and an artificial intelligence researcher recognized for his contributions to multi-agent reinforcement learning, strategic reasoning, and large language models. He has held research and engineering positions at several major technology firms, including Yandex, Google, Meta, and Anthropic. He recently joined the Meta Superintelligence team.
Anton Bakhtin pursued higher education in Russia, earning a Master's degree from Moscow State University, which he attended from 2006 to 2011. Following this, he continued his specialized training at the Yandex School of Data Analysis, a program known for its rigorous curriculum in computer science and machine learning. He completed a Master's degree from this institution between 2012 and 2014, focusing on areas relevant to his subsequent career in software development and AI research. [1] [4]
Bakhtin began his professional career at the Russian technology company Yandex in 2012 as a Software Developer. He was promoted to Senior Software Developer in 2014 and remained with the company until 2015. Following his time at Yandex, he relocated to the United States and joined Google as a Senior Software Engineer, a role he held from 2015 to 2017.
In 2017, Bakhtin transitioned to a research-focused role, joining Facebook (now Meta) as a Research Engineer at the Facebook AI Research (FAIR) lab. He worked at Meta for approximately six years, until 2023, where he was a key contributor to significant projects in multi-agent systems, most notably the CICERO project. After leaving Meta, he joined the AI safety and research company Anthropic in 2023 as a Member of the Technical Staff. At Anthropic, he was involved in the development of the Claude 3 family of AI models.
In July 2025, it was reported that Bakhtin was part of a wave of high-profile talent acquisitions by Meta for a new division focused on building "superintelligence." This move placed him alongside other prominent researchers hired from competitors like Apple and OpenAI, signaling an intensified effort by major technology companies to secure leading experts in the field of artificial general intelligence. [3] [1] [2] [4]
During his tenure at Meta AI, Bakhtin was a central figure in the development of CICERO, the first artificial intelligence agent to achieve human-level performance in the complex strategy game Diplomacy. The project was a significant milestone in AI because Diplomacy requires capabilities beyond the purely adversarial logic mastered in games like Go or StarCraft. The game involves seven players who must negotiate, form alliances, coordinate actions, and sometimes betray one another to succeed, making natural language communication and trust-building essential components of gameplay.
Bakhtin noted that traditional self-play reinforcement learning techniques, which were successful in other games, were insufficient for Diplomacy because cooperation and coordination do not emerge naturally in such a complex social environment. The research team, which included collaborator Noam Brown, tackled this by developing a hybrid AI architecture. CICERO integrated a large language model, trained on a vast corpus of human gameplay text, with a strategic reasoning engine. This allowed the AI to conduct natural language dialogue with human players to negotiate plans and build trust, while simultaneously using its planning algorithm to predict other players' moves and determine its own optimal strategy.
The research culminated in a paper published in the journal Science in late 2022. In an anonymous online league, CICERO played against human opponents and scored more than double the average score of the human players, ranking in the top 10% of participants who played more than one game. Bakhtin described the project as a step toward creating AI that can act as a cooperative partner, capable of understanding human intent, reasoning about collaborative solutions, and communicating effectively. [2] [4]
At Anthropic, Bakhtin was part of the team that developed the Claude 3 family of AI models, which were released in March 2024. This suite of models, including Claude 3 Opus, Sonnet, and Haiku, was designed to set new industry benchmarks in reasoning, multilingual understanding, vision, and other key AI capabilities. Upon the release of the models, Bakhtin commented on his experience, stating, "RL never works, until it does :) That was incredible to be the part of the adventure." His work at Anthropic contributed to the development of large language models that were noted for their performance and improved user interaction. [2] [4]