Marc-André Carbonneau


Welcome to my page!

This site contains information regarding my research and some personal projects. Here’s a subset of my research interests:

  • Machine learning
  • Speech processing
  • Disentangled representations
  • Multiple instance learning
  • Computer vision

I act as principal research scientist at Ubisoft in the La Forge lab. I work there since 2017. I lead a group of resarchers applying the latest techniques in machine learning, speech, signal processing, computer vision & graphics, animation to video games.

Before that, as a PhD student, I was affiliated with two labs:


Apr 1, 2024 We are excited to share our recent work on monocular 3D face reconstruction that will be presented at CVPR 2024. We introduce MoSAR, a new method that turns a portrait image into a realistic 3D avatar.

From a single image, MoSAR estimates a detailed mesh and texture maps at 4K resolution, capturing pore-level details. This avatar can be rendered from any viewpoint and under different lighting condition.

We are also releasing a new dataset called FFHQ-UV-Intrinsics. This is the first dataset that offer rich intrinsic face attributes (diffuse, specular, ambient occlusion and translucency) at high resolution for 10K subjects.
Check out the project page!
Oct 27, 2023 Our paper EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis has been accepted for presentation at the NeurIPS Workshop on ML for Audio. This work has been done in collaboration with colleagues from Rochester University.

In this paper, we propose a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). We also revealed a potential concern regarding diffusion based audio generation models that they tend to generate duplication of the training data.

Check out the project page!
Sep 21, 2023 Our paper “Rhythm Modeling for Voice Conversion” has been published in IEEE Signal Processing Letters. We also released it on Arxiv.
In this paper we model the natural rhythm of speakers to perform conversion while respecting the target speaker’s natural rhythm. We do more than approximating the global speech rate, we model duration for sonorants, obstruents, and silences.

Check out the demo page!
Jul 15, 2023 Ubisoft had published a blog page describing our system for gesture generation conditioned on speech.
This system was presented in “ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech” and showcased on 2 minute papers.
Jul 21, 2021 This is the recording of the presentation that I gave at the 2021 Game Developers Conference on “speech synthesis applied to videogames”.
Generating spoken dialog lines artificially could prove to be pivotal for the future of the gaming industry. Aside from reducing production costs, it offers opportunities for new types of in-games interactions closer to real-world experiences. The goal of the talk is to present an honest snapshot of the state of the technology, discuss remaining challenges and possible present and future use cases. We demonstrate how current commercial speech synthesis solutions do not directly apply to the gaming context where voice require a high level of expressivity. We discuss present solutions to control expressivity, and how we use speech synthesis at Ubisoft.

selected publications

  1. word_disc.png
    Spoken-Term Discovery using Discrete Speech Units
    Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau, and Herman Kamper
    In INTERSPEECH, 2024
  2. mosar.gif
    MoSAR: Monocular Semi-Supervised Model For Avatar Reconstruction Using Differentiable Shading
    Abdallah DibLuiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, and Marc-André Carbonneau
    In CVPR, 2024
  3. binalign.png
    BinaryAlign: Word Alignment as Binary Sequence Labeling
    Gaëtan Lopez Latouche, Marc-André Carbonneau, and Ben Swanson
    In ACL, 2024
  4. edm.png
    EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
    Ge Zhu, Yutong Wen, Marc-André Carbonneau, and Zhiyao Duan
    In NeurIPS Workshop: Machine Learning for Audio, 2023
  5. disentangle.png
    Measuring Disentanglement: A Review of Metrics
    Marc-André Carbonneau, Julian Zaïdi, Jonathan Boilard, and Ghyslain Gagnon
    IEEE Transactions on Neural Networks and Learning Systems, 2022
  6. Urhythmic.png
    Rhythm Modeling for Voice Conversion
    Benjamin van NiekerkMarc-André Carbonneau, and Herman Kamper
    IEEE Signal Processing Letters, 2023
  7. svc2.png
    A Comparaison of Discrete and Soft Speech Units for Improved Voice Conversion
    Benjamin van NiekerkMarc-André Carbonneau, Julian Zaidi, Matthew Baas, Hugo Seuté, and Herman Kamper
    In ICASSP, 2022
  8. daft.png
    Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
    Julian Zaïdi, Hugo Seuté, Benjamin van Niekerk, and Marc-André Carbonneau
    In INTERSPEECH, 2022
  9. Multiple instance learning: A survey of problem characteristics and applications
    Marc-André CarbonneauVeronika CheplyginaEric Granger, and Ghyslain Gagnon
    Pattern Recognition, 2018
  10. zeggs.png
    ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
    Saeed GhorbaniYlva FerstlDaniel Holden, Nikolaus F. Troje, and Marc-André Carbonneau
    Computer Graphics Forum, 2023