Huan Ling

I am a Senior Research Scientist at NVIDIA Spatial Intelligence (TorontoAI) Lab. I obtained my PhD at the University of Toronto, advised by Prof. Sanja Fidler. My research focuses on building foundational generative models. I aim to develop models capable of simulating and generating realistic, diverse, and controllable environments—including video and 3D dynamic scenes — with a focus on enabling physical AI scene-level generation, and new forms of generation and interaction of creative content.

Contact me at: linghuan at cs.torototo.edu

Github / Google Scholar / Twitter

Invited Talks:

June 2025: Cosmos-Drive-Dreams, Keynote talk@CVPR'25, 6th Embodied AI Workshop (EAI)
Oct. 2023: See Through the Eyes of Generative Models, UBC
July 2023: Vision generation and perception using Diffusion Models, ByteDance AD Core(Slides)
June 2023: Image, Video and 3D Content Creation with Diffusion Models, BAAI 2023 (Slides)
June 2023: Align your Latents: VideoLDM, Shanghai AI Lab (Slides)
July 2021: GANs for 2D Vision Perception, Walmart CV Conf (Slides)
June 2021: GANs for 2D Vision Perception , IIIS, Tsinghua University (Slides)

Publications

Selected publications. Full list at Google Scholar.

Generative Models:
	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling (: equally contributed)* White Paper, 2025 project page / Code A world model based synthetic data generation (SDG) pipeline designed to enhance downstream tasks for autonomous vehicles.

	Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcjc^, Huan Ling^ (, ^: equally contributed)* CVPR, 2025 (Oral & Best Paper Nomination) project page / paper / Code Enhancing 3D reconstructions and novel-view synthesis via single step diffusion inference.

	DiffusionRenderer: Controllable Single-Image-to-3D Generation via Rewriting 3D Diffusion Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang ( : equally contributed)* CVPR, 2025 (Oral) project page / paper A neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework.

	Gen3C: Generalized Category-Consistent 3D Generation via Score Distillation Xuanchi Ren , Tianchang Shen , Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao ( : equally contributed)* CVPR, 2025 (Highlight) project page / paper A generative video model with precise Camera Control and temporal 3D Consistency with a 3D cache Media: Two Minute Papers

	Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Leading the autonomous driving post-training research, data curation, model training and evaluation White Paper, 2025 website / code / white paper Large-scale multimodal control for conditional world generation.

	COSMOS: NVIDIA Physical AI World Foundation Model Core contributor. My contributions include data curation, large scale base model training, and leading self-driving autonomous driving post-training. White Paper, 2025 website / code / white paper / video / Jensen Huang Keynote at CES 2025 NVIDIA's Cosmos product and open-source models. Media: Two Minute Papers; 机器之心

	L4GM: Large 4D Gaussian Reconstruction Model Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling NeurIPS, 2024 project page A scalable and expressive framework for high-quality dynamic 3D scene reconstruction.

	Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis ( : equally contributed)* CVPR, 2024 (Highlight) project page We propose a framework that aligns dynamic 3D Gaussians with text-driven 4D generations.

	Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis ( : equally contributed)* CVPR, 2023 project page We present a latent diffusion framework for high-resolution video synthesis. A follow-up open sourced model, Stable Video Diffusion (SVD), is available at huggingface, featuring enhanced datasets and fine-tuned results. Media: Two Minute Papers

	EditGAN: High-Precision Semantic Image Editing Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler ( : equally contributed)* NeurIPS, 2021 project page / code & demo EditGAN enables fine-grained and high-quality semantic edits to images by directly manipulating the latent space of GANs with explicit control over object-level attributes. Media: Two Minute Papers

Generative Representation Learning:
	3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany project page A novel 3D object detection approach that leverages geometry-aware features derived from diffusion models for more robust 3D understanding.

	DreamTeacher: Pretraining Image Backbones with Deep Generative Models Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler ( : equally contributed)* ICCV, 2023 project page We propose a self-supervised pretraining framework that uses generative models to supervise image encoders without requiring labeled data.

	BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba CVPR, 2022 project page / code & demo We extend DatasetGAN to synthesize large-scale datasets like ImageNet with dense pixel-wise annotations, significantly reducing the manual labeling burden.

	DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler ( : equally contributed)* CVPR, 2021 project page / code & data DatasetGAN leverages the rich latent space of GANs to synthesize annotated datasets with minimal human effort, enabling pixel-level labeling from a few examples.

	Variational Amodal Object Completion Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler (* : equally contributed)* NeurIPS, 2020 project page We introduce a variational framework for completing partially occluded objects, producing plausible and diverse amodal completions for scene understanding.

3D Vision:
	Image GANs Meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering Yuxuan Zhang, Wenzheng Chen, Huan Ling, Yinan Zhang, Sanja Fidler ( : equally contributed)* ICLR, 2021 paper / project page Media: GANverse3D We unify GANs and differentiable rendering for interpretable 3D reconstruction and inverse graphics from single images using GAN priors and neural rendering techniques.

	Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer Wenzheng Chen, Jun Gao, Huan Ling, Edward J. Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler (* : equally contributed)* NeurIPS, 2019 paper / project page Media: Two-minute paper We propose DIB-R, a differentiable renderer that enables training of 3D object predictors end-to-end with supervision from 2D images alone.

Interactive Annotation:
	ScribbleBox: Interactive Annotation Framework for Video Object Segmentation Bowen Chen, Huan Ling, Jun Gao, Xiaohui Zeng, Ziyue Xu, Sanja Fidler ( : equally contributed)* ECCV, 2020 paper / project page ScribbleBox is a practical system that allows users to quickly annotate video object segmentations with scribbles, enabling high-quality masks with minimal effort.

	Fast Interactive Object Annotation with Curve-GCN Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler ( : equally contributed)* CVPR, 2019 paper / code We introduce Curve-GCN, a real-time annotation tool that models object contours as closed curves and uses graph convolution to refine annotations interactively.

	Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ David Acuna, Huan Ling, Amlan Kar, Sanja Fidler (* : equally contributed)* CVPR, 2018 demo video / paper Polygon-RNN++ is an annotation system that speeds up segmentation mask creation by predicting object boundaries with minimal clicks via recurrent polygon prediction.

Fun note from year 2023: Yes, we took a bite of RLHF back in 2017 :)
	Teaching Machines to Describe Images with Natural Language Feedback Huan Ling, Sanja Fidler NeurIPS, 2017 project page / paper We introduce an early reinforcement learning with human feedback (RLHF) approach for image captioning, where a model learns to improve descriptions through natural language feedback rather than numerical rewards.

Employment

Senior Research scientist at NVIDIA Mar. 2025 - Present

Research scientist at NVIDIA Jan. 2020 - Mar. 2025

Research intern at NVIDIA Sep. 2018 - Dec. 2019

Mentored Interns
I’m incredibly proud to work alongside world-class students and interns at the Toronto AI Lab.
We recruit PhD interns year-round - usually we send offers from Sep to Dec — feel free to reach out if you're interested in joining us.

Sherwin Bahmani , CS Ph.D. student at University of Toronto,
Runjian Chen , CS Ph.D. student at University of Hong Kong and HKU-MMLab,
Jiawei Ren , CS Ph.D. student at MMLab@NTU,
Jay Zhangjie Wu , CS Ph.D. student at Show Lab, National University of Singapore,
Chenfeng Xu , CS Ph.D. student at UC Berkeley
Yuxuan zhang , CS Ph.D. student at Princeton University
Bowen Chen , CS undgrad studnet at University of toronto

Pro Bono Office Hours

I always see the information asymmetry between junior students and senior students on problems related to research topics/directions, future career, failure and excitement in research. This problem is more severe for people from underrepresented group.

Following Krishna, Wei-Chiu, Shangzhe and Jun , I decide to commit 1 hours per week to host free pro bono office hours to help reduce the information asymmetry mentioned above. Please send me an email if you are interested.

I borrowed the template from ✩, ✩,✩,✩.

Generative Models:
	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling (: equally contributed)* White Paper, 2025 project page / Code A world model based synthetic data generation (SDG) pipeline designed to enhance downstream tasks for autonomous vehicles.

	Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcjc^, Huan Ling^ (, ^: equally contributed)* CVPR, 2025 (Oral & Best Paper Nomination) project page / paper / Code Enhancing 3D reconstructions and novel-view synthesis via single step diffusion inference.

	DiffusionRenderer: Controllable Single-Image-to-3D Generation via Rewriting 3D Diffusion Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang ( : equally contributed)* CVPR, 2025 (Oral) project page / paper A neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework.

	Gen3C: Generalized Category-Consistent 3D Generation via Score Distillation Xuanchi Ren , Tianchang Shen , Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao ( : equally contributed)* CVPR, 2025 (Highlight) project page / paper A generative video model with precise Camera Control and temporal 3D Consistency with a 3D cache Media: Two Minute Papers

	Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Leading the autonomous driving post-training research, data curation, model training and evaluation White Paper, 2025 website / code / white paper Large-scale multimodal control for conditional world generation.

	COSMOS: NVIDIA Physical AI World Foundation Model Core contributor. My contributions include data curation, large scale base model training, and leading self-driving autonomous driving post-training. White Paper, 2025 website / code / white paper / video / Jensen Huang Keynote at CES 2025 NVIDIA's Cosmos product and open-source models. Media: Two Minute Papers; 机器之心

	L4GM: Large 4D Gaussian Reconstruction Model Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling NeurIPS, 2024 project page A scalable and expressive framework for high-quality dynamic 3D scene reconstruction.

	Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis ( : equally contributed)* CVPR, 2024 (Highlight) project page We propose a framework that aligns dynamic 3D Gaussians with text-driven 4D generations.

	Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis ( : equally contributed)* CVPR, 2023 project page We present a latent diffusion framework for high-resolution video synthesis. A follow-up open sourced model, Stable Video Diffusion (SVD), is available at huggingface, featuring enhanced datasets and fine-tuned results. Media: Two Minute Papers

	EditGAN: High-Precision Semantic Image Editing Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler ( : equally contributed)* NeurIPS, 2021 project page / code & demo EditGAN enables fine-grained and high-quality semantic edits to images by directly manipulating the latent space of GANs with explicit control over object-level attributes. Media: Two Minute Papers

Generative Representation Learning:
	3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany project page A novel 3D object detection approach that leverages geometry-aware features derived from diffusion models for more robust 3D understanding.

	DreamTeacher: Pretraining Image Backbones with Deep Generative Models Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler ( : equally contributed)* ICCV, 2023 project page We propose a self-supervised pretraining framework that uses generative models to supervise image encoders without requiring labeled data.

	BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba CVPR, 2022 project page / code & demo We extend DatasetGAN to synthesize large-scale datasets like ImageNet with dense pixel-wise annotations, significantly reducing the manual labeling burden.

	DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler ( : equally contributed)* CVPR, 2021 project page / code & data DatasetGAN leverages the rich latent space of GANs to synthesize annotated datasets with minimal human effort, enabling pixel-level labeling from a few examples.

	Variational Amodal Object Completion Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler (* : equally contributed)* NeurIPS, 2020 project page We introduce a variational framework for completing partially occluded objects, producing plausible and diverse amodal completions for scene understanding.

3D Vision:
	Image GANs Meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering Yuxuan Zhang, Wenzheng Chen, Huan Ling, Yinan Zhang, Sanja Fidler ( : equally contributed)* ICLR, 2021 paper / project page Media: GANverse3D We unify GANs and differentiable rendering for interpretable 3D reconstruction and inverse graphics from single images using GAN priors and neural rendering techniques.

	Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer Wenzheng Chen, Jun Gao, Huan Ling, Edward J. Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler (* : equally contributed)* NeurIPS, 2019 paper / project page Media: Two-minute paper We propose DIB-R, a differentiable renderer that enables training of 3D object predictors end-to-end with supervision from 2D images alone.

Interactive Annotation:
	ScribbleBox: Interactive Annotation Framework for Video Object Segmentation Bowen Chen, Huan Ling, Jun Gao, Xiaohui Zeng, Ziyue Xu, Sanja Fidler ( : equally contributed)* ECCV, 2020 paper / project page ScribbleBox is a practical system that allows users to quickly annotate video object segmentations with scribbles, enabling high-quality masks with minimal effort.

	Fast Interactive Object Annotation with Curve-GCN Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler ( : equally contributed)* CVPR, 2019 paper / code We introduce Curve-GCN, a real-time annotation tool that models object contours as closed curves and uses graph convolution to refine annotations interactively.

	Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ David Acuna, Huan Ling, Amlan Kar, Sanja Fidler (* : equally contributed)* CVPR, 2018 demo video / paper Polygon-RNN++ is an annotation system that speeds up segmentation mask creation by predicting object boundaries with minimal clicks via recurrent polygon prediction.

Fun note from year 2023: Yes, we took a bite of RLHF back in 2017 :)
	Teaching Machines to Describe Images with Natural Language Feedback Huan Ling, Sanja Fidler NeurIPS, 2017 project page / paper We introduce an early reinforcement learning with human feedback (RLHF) approach for image captioning, where a model learns to improve descriptions through natural language feedback rather than numerical rewards.

Generative Models:

Generative Representation Learning:

3D Vision:

Interactive Annotation:

Fun note from year 2023: Yes, we took a bite of RLHF back in 2017 :)