We've just announced IQ AI.
Chunyuan Li is an artificial intelligence research scientist known for his work in multimodal intelligence, focusing on large-scale language and vision models. He is a key contributor to the LLaVA (Large Language-and-Vision Assistant) model family and is currently a research scientist at Meta's Superintelligence team. [1] [2]
Li completed his undergraduate studies at Huazhong University of Science and Technology, where he earned a bachelor's degree in electronic and information engineering. He later attended Duke University for his doctoral studies, obtaining a PhD in electrical and computer engineering. Under the supervision of Professor Lawrence Carin, his doctoral research focused on the field of deep generative models. [1] [3] [6]
Chunyuan Li began his career as a Principal Researcher at Microsoft Research in Redmond. During his time there, he contributed to several foundational vision-language models, including Oscar and Florence. Following his tenure at Microsoft, he took on a role as the Head of the ByteDance Research Institute. He later joined xAI as a Director Engineer, where he was involved in the development of models such as Grok-3. In mid-2025, Li joined Meta as a Research Scientist, becoming a member of the company's newly formed Superintelligence group, which focuses on advancing artificial general intelligence. His expertise is noted in the areas of diffusion models and multimodal generation. [1] [4] [2] [3] [6]
Li's research has led to the development of several influential models and frameworks in the field of multimodal AI. His work primarily focuses on creating systems that can understand and process information from both visual and textual data. [1] [7]
Li is a key creator of LLaVA, a family of open-source multimodal models designed to possess general-purpose visual and language understanding capabilities. The initial version, released in 2023, was developed using a technique called visual instruction tuning, which leverages the capabilities of large language models like GPT-4 to generate multimodal instruction-following data. The project has since expanded to include several specialized versions and upgrades. [1] [4] [8]
Key developments in the LLaVA family include:
The LLaVA project and its subsequent iterations have been influential in the open-source AI community for providing a powerful and accessible alternative to proprietary multimodal systems. [1]
Prior to his work on LLaVA, Li contributed to several other foundational models that advanced the field of vision-language pre-training. These projects established new methods for aligning visual and textual representations, enabling models to perform complex reasoning and generation tasks that involve both modalities. [1]
His notable early works include:
These projects have been instrumental in building more capable and controllable multimodal AI systems. [1] [7]
In addition to his research roles in industry, Li is an active member of the academic community. He has served as an Area Chair for several major machine learning and natural language processing conferences, including NeurIPS, ICML, ICLR, EMNLP, and TMLR. He has also acted as a Guest Editor for a special issue of the International Journal of Computer Vision (IJCV) on the topic of large vision models. Li has an extensive publication record, with numerous papers presented at top-tier academic venues. [1] [5]