I am a senior Research Scientist at NVIDIA Spatial Intelligence (TorontoAI) Lab. My research focuses on building foundational generative models. I aim to develop models capable of simulating and generating realistic, diverse, and controllable environments—including video and 3D dynamic scenes — with a focus on enabling physical AI scene-level generation, and new forms of generation and interaction of creative content.
Contact me at: linghuan at cs.torototo.edu |
![]() |
|
Selected publications. Full list at Google Scholar. |
Generative Models: | |
![]() |
Jay Zhangjie Wu*, Yuxuan Zhang*, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcjc^, Huan Ling^ (*, ^: equally contributed) CVPR, 2025 (Oral & Best Paper Nomination) project page / paper / Code Enhancing 3D reconstructions and novel-view synthesis via single step diffusion inference. |
![]() |
Ruofan Liang*, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang* (* : equally contributed) CVPR, 2025 (Oral) project page / paper A neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. |
![]() |
Xuanchi Ren *, Tianchang Shen *, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao (* : equally contributed) CVPR, 2025 (Highlight) project page / paper A generative video model with precise Camera Control and temporal 3D Consistency with a 3D cache Media:
|
![]() |
Leading the autonomous driving post-training research, data curation, model training and evaluation White Paper, 2025 website / code / white paper Large-scale multimodal control for conditional world generation. |
![]() |
Core contributor. My contributions include data curation, large scale base model training, and leading self-driving autonomous driving post-training. White Paper, 2025 website / code / white paper / video / Jensen Huang Keynote at CES 2025 NVIDIA's Cosmos product and open-source models. Media:
|
![]() |
Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling NeurIPS, 2024 project page A scalable and expressive framework for high-quality dynamic 3D scene reconstruction. |
![]() |
Huan Ling*, Seung Wook Kim*, Antonio Torralba, Sanja Fidler, Karsten Kreis (* : equally contributed) CVPR, 2024 (Highlight) project page We propose a framework that aligns dynamic 3D Gaussians with text-driven 4D generations. |
![]() |
Andreas Blattmann*, Robin Rombach*, Huan Ling*, Tim Dockhorn*, Seung Wook Kim, Sanja Fidler, Karsten Kreis (* : equally contributed) CVPR, 2023 project page We present a latent diffusion framework for high-resolution video synthesis. A follow-up open sourced model, Stable Video Diffusion (SVD), is available at huggingface, featuring enhanced datasets and fine-tuned results. Media:
|
![]() |
Huan Ling*, Karsten Kreis*, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler (* : equally contributed) NeurIPS, 2021 project page / code & demo EditGAN enables fine-grained and high-quality semantic edits to images by directly manipulating the latent space of GANs with explicit control over object-level attributes. Media:
|
Generative Representation Learning: | |
![]() |
Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany project page A novel 3D object detection approach that leverages geometry-aware features derived from diffusion models for more robust 3D understanding. |
![]() |
Daiqing Li*, Huan Ling*, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler (* : equally contributed) ICCV, 2023 project page We propose a self-supervised pretraining framework that uses generative models to supervise image encoders without requiring labeled data. |
![]() |
Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba CVPR, 2022 project page / code & demo We extend DatasetGAN to synthesize large-scale datasets like ImageNet with dense pixel-wise annotations, significantly reducing the manual labeling burden. |
![]() |
Yuxuan Zhang*, Huan Ling*, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler (* : equally contributed) CVPR, 2021 project page / code & data DatasetGAN leverages the rich latent space of GANs to synthesize annotated datasets with minimal human effort, enabling pixel-level labeling from a few examples. |
![]() |
Huan Ling*, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler (* : equally contributed) NeurIPS, 2020 project page We introduce a variational framework for completing partially occluded objects, producing plausible and diverse amodal completions for scene understanding. |
3D Vision: | |
![]() |
Yuxuan Zhang*, Wenzheng Chen*, Huan Ling, Yinan Zhang, Sanja Fidler (* : equally contributed) ICLR, 2021 paper / project page Media: ![]() We unify GANs and differentiable rendering for interpretable 3D reconstruction and inverse graphics from single images using GAN priors and neural rendering techniques. |
![]() |
Wenzheng Chen, Jun Gao*, Huan Ling*, Edward J. Smith*, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler (* : equally contributed) NeurIPS, 2019 paper / project page Media: ![]() We propose DIB-R, a differentiable renderer that enables training of 3D object predictors end-to-end with supervision from 2D images alone. |
Interactive Annotation: | |
![]() |
Bowen Chen*, Huan Ling*, Jun Gao, Xiaohui Zeng, Ziyue Xu, Sanja Fidler (* : equally contributed) ECCV, 2020 paper / project page ScribbleBox is a practical system that allows users to quickly annotate video object segmentations with scribbles, enabling high-quality masks with minimal effort. |
![]() |
Huan Ling*, Jun Gao*, Amlan Kar, Wenzheng Chen, Sanja Fidler (* : equally contributed) CVPR, 2019 paper / code We introduce Curve-GCN, a real-time annotation tool that models object contours as closed curves and uses graph convolution to refine annotations interactively. |
![]() |
David Acuna*, Huan Ling*, Amlan Kar*, Sanja Fidler (* : equally contributed) CVPR, 2018 demo video / paper Polygon-RNN++ is an annotation system that speeds up segmentation mask creation by predicting object boundaries with minimal clicks via recurrent polygon prediction. |
Fun note from year 2023: Yes, we took a bite of RLHF back in 2017 :) | |
![]() |
Huan Ling, Sanja Fidler NeurIPS, 2017 project page / paper We introduce an early reinforcement learning with human feedback (RLHF) approach for image captioning, where a model learns to improve descriptions through natural language feedback rather than numerical rewards. |
|
Senior Research scientist at NVIDIA
Mar. 2025 - Present
Research scientist at NVIDIA Jan. 2020 - Mar. 2025 Research intern at NVIDIA Sep. 2018 - Dec. 2019 |
|
I always see the information asymmetry between junior
students and senior students on problems related to research
topics/directions, future career, failure and excitement in research. This
problem is more severe for people from underrepresented group.
Following Krishna, Wei-Chiu, Shangzhe and Jun , I decide to commit 1 hours per week to host free pro bono office hours to help reduce the information asymmetry mentioned above. Please send me an email if you are interested. |
|