About Me
I am Mohammad Mozaffari, a fourth-year PhD candiate in the Computer Science Department at the University of Toronto supervised by Professor Maryam Mehri Dehnavi. I got my B.Sc. in Electrical Engineering with a minor degree in Computer Engineering from the University of Tehran.
My research interests broadly span machine learning, optimization, and sparsity. In particular, I'm interested in developing new algorithms that leverage sparsity in the training and inference of large-scale machine learning models. I am also interested in enhancing the distributed second-order optimization methods to improve the convergence rate of the training process.
A significant focus of my recent work revolves around the "Compression Trinity" for Large Language Models (LLMs) — exploring the interplay of sparsity, quantization, and low-rank approximations to make these powerful models more efficient. I've dedicated a separate page to discuss these concepts and my related publications. You can explore it here: The Compression Trinity for LLMs.
Publications
SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs
M. Mozaffari, A. Yazdanbakhsh, and M. Mehri Dehnavi
Accepted at Forty-second International Conference on Machine Learning, ICML, 2025
@article{mozaffari2025slim, title = {{SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs}}, author = {Mozaffari, Mohammad and Yazdanbakhsh, Amir and Mehri Dehnavi, Maryam}, year = 2025, url = {https://openreview.net/forum?id=4UfRP8MopP} }
- Developed SLiM, improving accuracy by 5.8% in LLMs through integrated quantization and low-rank approximation.
- Implemented 2:4 sparsity and efficient fine-tuning, demonstrating expertise in state-of-the-art model compression.
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
M. Mozaffari, A. Yazdanbakhsh, Z. Zhang, and M. Mehri Dehnavi
Accepted at Thirteenth International Conference on Learning Representations, ICLR, 2025
@article{mozaffari2024slope, title = {{SLoPe: Double-Pruned Sparse Plus Lazy Low-rank Adapter Pretraining of LLMs}}, author = {Mozaffari, Mohammad and Yazdanbakhsh, Amir and Zhang, Zhao and Mehri Dehnavi, Maryam}, year = 2025, booktitle={The Thirteenth International Conference on Learning Representations}, url={https://openreview.net/forum?id=lqHv6dxBkj} }
- Achieved 1.54× faster inference and 1.25× faster training while maintaining model accuracy.
- Reduced memory footprint by 37% through CUDA kernel optimization and N:M sparsity; featured in PyTorch team blog.
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates
M. Mozaffari, S. Li, Z. Zhang, and M. Mehri Dehnavi
Accepted at Thirty-seventh Conference on Neural Information Processing Systems, NeurIPS, 2023
@inproceedings{ mozaffari2023mkor, title = {{MKOR}: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates}, author = {Mohammad Mozaffari and Sikan Li and Zhao Zhang and Maryam Mehri Dehnavi}, booktitle ={Thirty-seventh Conference on Neural Information Processing Systems}, year ={2023}, url ={https://openreview.net/forum?id=jcnvDO96N5} }
- Developed an optimizer achieving up to 2.57× speedup in training large neural networks on distributed systems (64 GPUs).
- Implemented efficient second-order optimization techniques, demonstrating scalability for industry-scale models.
Experience
Research Intern at Autodesk
Aug 2022 - December 2022
Manager: Massimiliano Meneghin
- Proposed and implemented CUDA optimizations, reducing simulation time for a multi-GPU fluid dynamics model from 4 hours to 3.2 hours through code profiling and kernel-level enhancements.
- Designed and applied kernel fusion strategies, reducing memory bandwidth consumption by 30% and enhancing computational efficiency in large-scale simulations.
- Collaborated with a team of 3 engineers, utilizing NVIDIA Nsight Systems/Compute to identify and resolve performance bottlenecks, optimizing data flow across multi-GPU nodes and reducing latency by 20%.
Research Intern at the University of Tehran
Aug 2020 - Jul 2021
Supervisor: Professor Maryam Sabbaghiyan
- Developed a mathematical model for spatial-temporal variations in user behavior, improving accuracy of network traffic predictions by 15% in simulations.
- Implemented machine learning techniques to optimize bandwidth allocation, resulting in a 10% reduction in data transfer latency in test scenarios.
- Gained proficiency in Python and multi-thread programming, creating parallel data processing scripts that reduced analysis time from 2 hours to 90 minutes for large datasets.