Huan Ling

I am a senior Research Scientist at NVIDIA Spatial Intelligence (TorontoAI) Lab. My research focuses on building foundational generative models. I aim to develop models capable of simulating and generating realistic, diverse, and controllable environments—including video and 3D dynamic scenes — with a focus on enabling physical AI scene-level generation, and new forms of generation and interaction of creative content.

Contact me at: linghuan at cs.torototo.edu

Github / Google Scholar / Twitter

Huan Ling
Invited Talks:
  • Oct. 2023: See Through the Eyes of Generative Models, UBC
  • July 2023: Vision generation and perception using Diffusion Models, ByteDance AD Core(Slides)
  • June 2023: Image, Video and 3D Content Creation with Diffusion Models, BAAI 2023 (Slides)
  • June 2023: Align your Latents: VideoLDM, Shanghai AI Lab (Slides)
  • July 2021: GANs for 2D Vision Perception, Walmart CV Conf (Slides)
  • June 2021: GANs for 2D Vision Perception , IIIS, Tsinghua University (Slides)
Publications

Selected publications. Full list at Google Scholar.

Generative Models:

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
Jay Zhangjie Wu*, Yuxuan Zhang*, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcjc^, Huan Ling^
(*, ^: equally contributed)
CVPR, 2025 (Oral & Best Paper Nomination)
project page / paper / Code

Enhancing 3D reconstructions and novel-view synthesis via single step diffusion inference.


DiffusionRenderer: Controllable Single-Image-to-3D Generation via Rewriting 3D Diffusion
Ruofan Liang*, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang*
(* : equally contributed)
CVPR, 2025 (Oral)
project page / paper

A neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework.


Gen3C: Generalized Category-Consistent 3D Generation via Score Distillation
Xuanchi Ren *, Tianchang Shen *, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao
(* : equally contributed)
CVPR, 2025 (Highlight)
project page / paper

A generative video model with precise Camera Control and temporal 3D Consistency with a 3D cache

Media: Two Minute Papers


Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Leading the autonomous driving post-training research, data curation, model training and evaluation
White Paper, 2025
website / code / white paper

Large-scale multimodal control for conditional world generation.


COSMOS: NVIDIA Physical AI World Foundation Model
Core contributor. My contributions include data curation, large scale base model training, and leading self-driving autonomous driving post-training.
White Paper, 2025
website / code / white paper / video / Jensen Huang Keynote at CES 2025

NVIDIA's Cosmos product and open-source models.

Media: Two Minute Papers; 机器之心


L4GM: Large 4D Gaussian Reconstruction Model
Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling
NeurIPS, 2024
project page

A scalable and expressive framework for high-quality dynamic 3D scene reconstruction.


Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling*, Seung Wook Kim*, Antonio Torralba, Sanja Fidler, Karsten Kreis
(* : equally contributed)
CVPR, 2024 (Highlight)
project page

We propose a framework that aligns dynamic 3D Gaussians with text-driven 4D generations.


Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Andreas Blattmann*, Robin Rombach*, Huan Ling*, Tim Dockhorn*, Seung Wook Kim, Sanja Fidler, Karsten Kreis
(* : equally contributed)
CVPR, 2023
project page

We present a latent diffusion framework for high-resolution video synthesis. A follow-up open sourced model, Stable Video Diffusion (SVD), is available at huggingface, featuring enhanced datasets and fine-tuned results.

Media: Two Minute Papers


EditGAN: High-Precision Semantic Image Editing
Huan Ling*, Karsten Kreis*, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler
(* : equally contributed)
NeurIPS, 2021
project page / code & demo

EditGAN enables fine-grained and high-quality semantic edits to images by directly manipulating the latent space of GANs with explicit control over object-level attributes.

Media: Two Minute Papers


Generative Representation Learning:

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany
project page

A novel 3D object detection approach that leverages geometry-aware features derived from diffusion models for more robust 3D understanding.


DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Daiqing Li*, Huan Ling*, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
(* : equally contributed)
ICCV, 2023
project page

We propose a self-supervised pretraining framework that uses generative models to supervise image encoders without requiring labeled data.


BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations
Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba
CVPR, 2022
project page / code & demo

We extend DatasetGAN to synthesize large-scale datasets like ImageNet with dense pixel-wise annotations, significantly reducing the manual labeling burden.


DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
Yuxuan Zhang*, Huan Ling*, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler
(* : equally contributed)
CVPR, 2021
project page / code & data

DatasetGAN leverages the rich latent space of GANs to synthesize annotated datasets with minimal human effort, enabling pixel-level labeling from a few examples.


Variational Amodal Object Completion
Huan Ling*, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler
(* : equally contributed)
NeurIPS, 2020
project page

We introduce a variational framework for completing partially occluded objects, producing plausible and diverse amodal completions for scene understanding.


3D Vision:

Image GANs Meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
Yuxuan Zhang*, Wenzheng Chen*, Huan Ling, Yinan Zhang, Sanja Fidler
(* : equally contributed)
ICLR, 2021
paper / project page
Media: GANverse3D

We unify GANs and differentiable rendering for interpretable 3D reconstruction and inverse graphics from single images using GAN priors and neural rendering techniques.


Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer
Wenzheng Chen, Jun Gao*, Huan Ling*, Edward J. Smith*, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler
(* : equally contributed)
NeurIPS, 2019
paper / project page
Media: Two-minute paper

We propose DIB-R, a differentiable renderer that enables training of 3D object predictors end-to-end with supervision from 2D images alone.


Interactive Annotation:

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation
Bowen Chen*, Huan Ling*, Jun Gao, Xiaohui Zeng, Ziyue Xu, Sanja Fidler
(* : equally contributed)
ECCV, 2020
paper / project page

ScribbleBox is a practical system that allows users to quickly annotate video object segmentations with scribbles, enabling high-quality masks with minimal effort.


Fast Interactive Object Annotation with Curve-GCN
Huan Ling*, Jun Gao*, Amlan Kar, Wenzheng Chen, Sanja Fidler
(* : equally contributed)
CVPR, 2019
paper / code

We introduce Curve-GCN, a real-time annotation tool that models object contours as closed curves and uses graph convolution to refine annotations interactively.


Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++
David Acuna*, Huan Ling*, Amlan Kar*, Sanja Fidler
(* : equally contributed)
CVPR, 2018
demo video / paper

Polygon-RNN++ is an annotation system that speeds up segmentation mask creation by predicting object boundaries with minimal clicks via recurrent polygon prediction.


Fun note from year 2023: Yes, we took a bite of RLHF back in 2017 :)

Teaching Machines to Describe Images with Natural Language Feedback
Huan Ling, Sanja Fidler
NeurIPS, 2017
project page / paper

We introduce an early reinforcement learning with human feedback (RLHF) approach for image captioning, where a model learns to improve descriptions through natural language feedback rather than numerical rewards.

Employment
Senior Research scientist at NVIDIA Mar. 2025 - Present

Research scientist at NVIDIA Jan. 2020 - Mar. 2025

Research intern at NVIDIA Sep. 2018 - Dec. 2019


Pro Bono Office Hours
I always see the information asymmetry between junior students and senior students on problems related to research topics/directions, future career, failure and excitement in research. This problem is more severe for people from underrepresented group.

Following Krishna, Wei-Chiu, Shangzhe and Jun , I decide to commit 1 hours per week to host free pro bono office hours to help reduce the information asymmetry mentioned above. Please send me an email if you are interested.



I borrowed the template from , ,,.