Biography
Annie En-Shiun Lee is an Assistant Professor at
Ontario Tech University (OTU)
and a Status-only Assistant Professor at
University of Toronto.
Her goal is to make language technology as inclusive and accessible to as many people as possible. She directs the
Lee Language Lab (L³), focusing on language diversity, multilinguality, and multiculturalism,
aligning with OTU’s vision for “Tech with a Conscience”. Her research has been published in
Nature Digital Medicine, ACM Computing Surveys, ACL, SIGCSE,
IEEE TKDE, and Bioinformatics.
Dr. Lee is the demo co-chair for NAACL 2024 and has received numerous recognitions, including Outstanding Paper Award and Best Theme Paper Award at NAACL 2025, Audience Award at Teaching NLP 2024, ARIA Spotlight Award for MScAC 2024, as well as nominations for the Tim McTiernan Student Mentorship Award 2025 and Women in AI Researcher of the Year Award 2025.
She was an Assistant Professor (teaching stream) at the University of Toronto for the
Elite professional Master’s.
She earned her PhD from the
University of Waterloo,
was a visiting researcher at the Fields Institute and the
Chinese University of Hong Kong,
and worked as a research scientist at VerticalScope (Research lead) and Stradigi AI.
Research Interests
Multilinguality, Language Diversity, and Low-Resource Languages
Multicultural Bias and Multimodal Applications
Pedagogy for Natural Language Processing and Machine Learning
Projects
ProxyLM (Findings NAACL 2025)
A lightweight performance proxy that predicts LM accuracy using ~30× less compute.
Enables faster model selection, fine-tuning, and prompt iteration while maintaining high predictive reliability.
AlignFreeze (NAACL 2025)
Freezes early transformer layers to preserve syntactic knowledge during fine-tuning.
Boosts zero-shot and cross-domain performance with minimal additional training, improving stability and efficiency.
WorldCuisine (NAACL 2025 – Best Theme Paper)
1.2M image–question pairs across 30 languages, capturing global culinary knowledge.
A benchmark for cross-cultural multimodal reasoning with applications in cultural AI research.
Teaching NLP
Award-winning Teaching NLP workshop on empowering multilinguality in NLP education.
Showcases teaching strategies, open resources, and collaborative projects funded by NSERC USRA and the Fields Institute.
URIEL+ World Language Database (COLING 2025)
Expanded typological and geographic language database with improved NLP integration.
Includes a Python package for multilingual benchmarking, cross-lingual transfer, and dataset alignment.
AiTaigi Hokkien Learning App
Multimodal app for Taiwanese Hokkien featuring speech, text, and audio examples.
Developed as a student-led project and awarded the Student Engagement Award by U of T Computer Science.
TranslationCorrect (ACL 2025)
Interactive translation quality assistant that detects and corrects machine translation errors.
Enhances translator efficiency while maintaining linguistic fluency and semantic accuracy.
Multilingual Understanding and Reasoning of LLMs (Findings of EMNLP 2024)
This project aims to strengthen the multilingual understanding and reasoning capabilities of large language models (LLMs),
with a focus on low-resource languages.
Full list on Google Scholar
Browse the complete, up-to-date publication list, citations, and co-authors.
Teaching Experience
Ontario Tech University (OTU)
University of Toronto
York University
- CSML 1010 – Applied Machine Learning and Lifecycle (Certificate in Machine Learning)
Students
David Anugraha
- Lead author: URIEL+ Typological Knowledge Base.
- Co-author: WorldCuisines & ProxyLM (Multilingual VQA, LM performance prediction).
- Co-author: MT performance on low-resource languages.
- Starting PhD, Stanford University (Fall 2025).
Enrique David Guzman Ramírez
- Data Engineer, J.D. Power.
- MScAC student, University of Toronto.
- Vector Scholarship in AI (2022–23).
Kosei Uemura
- Focus: Multilingual NLP & reasoning in LLMs.
- Lead author: AfriInstruct (instruction tuning for African languages).
- Co-author: Empowering the Future with Multilinguality & Language Diversity.
Mason Shipton
- Co-author: URIEL+ (8,000+ language vectors).
- Co-author: Empowering the Future (NLP course framework).
- Programmer Analyst, Ontario Teachers’ Pension Plan (cloud solutions; Innovation Newsletter curator).
Labib Rahman
- ExploRIEL — UI with chatbot for URIEL+ language distances & vectors.
- SoulsBot+ — LLM-powered tutorial chatbot for Dark Souls: The Board Game.
- LinguaQuest — RPG-style educational game for linguists.
- Master’s student, Ontario Tech; researcher at Lee Lab & UXRLab.
Quang Phuoc Nguyen
- Data Selection for Multilingual Alignment — selects optimal languages for LM fine-tuning.
- Merlin: Curriculum Alignment — encoder–decoder stacking to improve multilingual alignment.
- Game Dialogue Translation — survey of LLM performance in game localization.
Malikeh Ehghaghi
Amane Takeuchi
- Business Analyst, Amazon (Tokyo, Japan).
- Research Project Lead & RA (ML model interpretation in clinical apps, NLP, CS education & EDI; PyTorch).
- BSc Applied Math; Specialist Data Science; Major CS; Minor Math — University of Toronto (Dean’s List 2023).
- Vice-Chair & Career Event Director, UofT Japan Network; TA for MAT135/136/235.
Tong Su
- Software Engineer Intern, Vortexa (LLMs for maritime data parsing).
- MSc Advanced Computer Science, University of Oxford (2024–2025).
- Former Full-Stack Developer, Northbridge Financial (Angular, Django, .NET; 10,000+ users).
- TA & Course Supporter, University of Toronto (Python, Unix/Git, Research Software).
- Research Assistant (Lee Lab & AI for Justice): PEFT for low-resource NMT; first author — NAACL 2024.
- Passed CFA Program Level I (Oct 2024).
Syed Mekhael Wasti
- Lead author, ACL 2025 demo: TranslationCorrect.
- Co-author, TeachNLP 2024: Multilinguality paper.
- MSc, Queen’s University (Vector Scholar), Fall 2025.
Hasti Toossi
- Software Engineer, PolyAI.
- Research: NLP; Programming Languages (Type Theory).
- Recent graduate, University of Toronto.
Aditya Khan
Eric Khiu
Vincent Shuai
- Student Engagement Award (2024), University of Toronto CS.
- Software Engineer, TikTok.
Shou-Yi Hung (Ray)
Awards
Awards & Recognitions:
Research Grants:
-
Canada Foundation for Innovation – The John R. Evans Leaders Fund (PI): Infrastructure for Low-Resource Efficient Scalable LLM Benchmark and Evaluation; 2025; $140,000 and O&M $77,000.
-
NSERC Discovery Grant (PI): A Novel Paradigm for Advancing Low-Resource Languages – Uncovering Factors and Expanding Large Language Models; $155,000 with Discovery Launch Supplement $12,500; 2024.
-
NSERC Idea to Innovation (Collaborator): Pattern Discovery and Disentanglement – An Interpretable Multi-Modal AI Platform; $124,000; 2023.
-
Lacuna Funding (Collaborator): Training and Evaluation Datasets for African Languages, Masakhane Natural Language Understanding, Conversational AI, and Benchmark for African Languages; $207,000; 2024.