Bowen Cheng (程博文) is an artificial intelligence researcher at Meta's Superintelligence Lab. He specializes in multimodal foundation models and has contributed to significant AI projects at OpenAI, including GPT-4o, and Tesla's Full Self-Driving (FSD) software. [1] [2]
Cheng received both his Bachelor of Science and his Ph.D. in Electrical and Computer Engineering (ECE) from the University of Illinois Urbana-Champaign (UIUC). During his doctoral studies, his advisors were Professor Alexander Schwing and Professor Thomas Huang. [1] [2] [4]
As of 2025, Bowen Cheng is a researcher at Meta's Superintelligence Lab (MSL). He joined the newly formed group after a tenure at OpenAI, where he worked as a researcher on multimodal understanding and interaction. While at OpenAI, he was part of the post-training team focused on building multimodal models. Prior to OpenAI, Cheng was a Senior Research Scientist at Tesla, where he worked on the Autopilot team. Throughout his academic career, he completed several research internships at prominent technology labs, including Facebook AI Research (FAIR) in both New York City and Menlo Park, Google Research in Los Angeles, Microsoft Research in Redmond, and Microsoft Research Asia in Beijing. [1] [3] [2] [4] [5] [6]
Cheng has been a core contributor to several high-profile projects in the field of artificial intelligence. His work spans computer vision, autonomous driving, and large-scale multimodal models.
His notable contributions include:
These projects highlight his work in segmentation transformers and multimodal systems. [1] [5] [6]
Cheng's primary research interest is in building real-time multimodal interaction systems. He aims to develop AI that can process streaming audio and video inputs to produce streaming audio and video outputs in real time. His vision for such systems includes features like an infinite context window for smooth interaction, advanced long-term memory capabilities, and the ability to stay updated with new information while proactively creating content. [1] [6] [5]