Alexis Conneau

Alexis Conneau

Alexis Conneau is the Co-founder and CEO of the audio AI startup WaveForms AI. His career has included senior research roles at major technology firms such as OpenAI, Google, and Meta, where he has made significant contributions to the fields of natural language processing, cross-lingual modeling, and multimodal AI, he recently joined the Team. [1] [2]

Education

Conneau pursued his higher education at École Polytechnique, where he completed both a Bachelor of Science and a Master of Applied Mathematics degree. He further specialized by earning a master's degree in vision and machine learning from the Math, Vision, and Learning (MVA) program, a joint initiative between ENS Cachan and ENSAE. He later undertook doctoral studies, working toward a Ph.D. in Artificial Intelligence from Le Mans University, which he was expected to complete in 2019. During his doctoral research, he also served as a resident Ph.D. student at Facebook AI Research (FAIR) in Paris. [3] [4]

Career

Conneau began his career with research internships in machine learning, first at the hedge fund management company Capital Fund Management for six months in 2014, followed by a six-month internship at the personalized retargeting company Criteo in 2015. His work at Facebook AI Research (FAIR) as a Ph.D. resident marked his entry into large-scale industrial research, where he focused on deep learning for natural language processing (NLP) and the development of transferable text representations.

In May 2021, Conneau announced that he had joined Google AI Language as a research scientist. In this role, he continued his work on building neural networks capable of learning with minimal or no supervision. He moved to OpenAI in San Francisco in April 2023, taking on the position of Audio Research Lead. At OpenAI, he led research for the GPT-4o and Audio-Visual Model (AVM) projects, playing a key role in developing the native audio understanding capabilities of the GPT-4o model, which was unveiled in May 2024.

In September 2024, Conneau departed from OpenAI to launch his own venture, WaveForms AI, which he co-founded with Coralie Lemaitre, serving as the company's CEO. The startup specialized in using artificial intelligence to understand and replicate emotion in audio and successfully raised $40 million in a funding round led by Andreessen Horowitz. In a significant development in August 2025, Meta Platforms announced its acquisition of WaveForms AI. Following the acquisition, Conneau and Lemaitre were set to join to continue their work in advanced AI research. [2] [1] [3] [4] [5]

Major Works and Research

Conneau's research has consistently focused on advancing the capabilities of neural networks, particularly in the domains of language and speech. His work spans unsupervised learning, cross-lingual representation, and multimodal AI, with a recurring theme of creating models that can learn and operate effectively across different languages and data types with limited supervision. His primary research interests include natural language understanding, sequence-to-sequence learning, neural machine translation, and self-supervised learning. [3]

Cross-Lingual Language Models (XLM)

A significant portion of Conneau's work at Facebook AI Research was dedicated to creating language models that understand multiple languages. This research culminated in the development of Cross-lingual Language Models (XLM).

  • Initial Models: In the 2019 paper "Cross-lingual Language Model Pretraining," Conneau and his collaborators introduced methods for pre-training models on multilingual text corpora. This work demonstrated the effectiveness of using a shared vocabulary and embedding space across languages, enabling the model to transfer knowledge from high-resource languages to low-resource ones.
  • XLM-R: The subsequent paper, "Unsupervised Cross-lingual Representation Learning at Scale," introduced XLM-RoBERTa (XLM-R), a model pre-trained on 2.5 terabytes of text from 100 languages sourced from the Common Crawl dataset. XLM-R established new state-of-the-art results on a range of cross-lingual benchmarks, including the Cross-lingual Natural Language Inference (XNLI) benchmark, which Conneau also co-developed. In August 2019, his team released a PyTorch version of their XLM model trained on 100 languages, which significantly outperformed previous multilingual models. [1] [3]

Datasets and Evaluation

Recognizing the need for high-quality data and robust evaluation methods, Conneau contributed to the creation of several key resources for the NLP community.

  • CC100 Dataset: To train the XLM-R model, Conneau and his team developed CCNet, a pipeline to extract and clean high-quality text from raw web crawl data. This resulted in the CC100 dataset, which was publicly released in October 2020, providing researchers with a massive, multilingual text corpus.
  • Evaluation Toolkits: He was a lead author on "SentEval: An Evaluation Toolkit for Universal Sentence Representations" (2018), which provided a standardized framework for assessing the quality of sentence embeddings. He also co-created the XNLI dataset ("XNLI: Evaluating Cross-lingual Sentence Representations," 2018) to measure a model's ability to perform natural language inference across 15 different languages. [3] [1]

Speech and Multimodal AI

Conneau later extended his research from text to speech, applying principles of self-supervised and cross-lingual learning to the audio domain.

  • Cross-lingual Speech Representation: His work includes "Unsupervised Cross-lingual Representation Learning for Speech Recognition" (2020), which introduced XLSR. This model learned cross-lingual speech representations from raw audio in multiple languages without transcriptions. This was later scaled up in "XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale" (2021).
  • Unified Speech-Text Models: He contributed to projects like SLAM and mSLAM, which aimed to create unified encoders for both speech and text, enabling joint pre-training and fostering cross-modal understanding.
  • GPT-4o Audio: As the Audio Research Lead at OpenAI, Conneau was instrumental in the development of GPT-4o's ability to natively process and understand audio in real-time. He described this work as enabling users to "talk to the Transformer itself," marking a significant step toward more natural human-computer interaction.
  • WaveForms AI: His startup, WaveForms AI, represented a continuation of his audio research, focusing on the nuanced task of understanding and generating emotion in audio. This work was acquired by Meta Platforms to enhance its own AI audio capabilities. [2] [1]

Acquisition of WaveForms AI by Meta

In August 2025, Meta Platforms, Inc. acquired WaveForms AI, a company developing artificial intelligence systems for analyzing and reproducing emotional characteristics in audio. WaveForms AI was founded in December 2024 by Alexis Conneau, formerly an audio researcher at Meta and OpenAI, and Coralie Lemaitre, who had previously worked as an advertising strategist at Google. Prior to the acquisition, the company secured approximately US$40 million in venture capital funding, led by Andreessen Horowitz, reaching a pre-money valuation near US$160 million.

After the acquisition, Conneau and Lemaitre joined Meta’s AI research division, known as . The purchase of WaveForms AI followed Meta’s acquisition of PlayAI in July 2025, marking another addition to its portfolio of audio-focused AI companies.

WaveForms AI’s research concentrated on voice synthesis and analysis, with projects aimed at replicating human-like speech patterns and developing systems capable of identifying and expressing emotional cues in spoken language. These efforts included work on a “Speech Turing Test” and initiatives referred to by the company as “Emotional General Intelligence.” Meta’s integration of WaveForms AI is part of its broader work on AI systems for voice-based interaction across its platforms. [2] [6] [7]

참고 문헌.

카테고리순위이벤트용어집