Prof. Annie En-Shiun Lee – Ontario Tech University

Biography

Annie En-Shiun Lee is an Assistant Professor at Ontario Tech University (OTU) and a Status-only Assistant Professor at University of Toronto. Her goal is to make language technology as inclusive and accessible to as many people as possible. She directs the Lee Language Lab (L³), focusing on language diversity, multilinguality, and multiculturalism, aligning with OTU’s vision for “Tech with a Conscience”. Her research has been published in Nature Digital Medicine, ACM Computing Surveys, ACL, SIGCSE, IEEE TKDE, and Bioinformatics.

Dr. Lee is the demo co-chair for NAACL 2024 and has received numerous recognitions, including Outstanding Paper Award and Best Theme Paper Award at NAACL 2025, Audience Award at Teaching NLP 2024, ARIA Spotlight Award for MScAC 2024, as well as nominations for the Tim McTiernan Student Mentorship Award 2025 and Women in AI Researcher of the Year Award 2025.

She was an Assistant Professor (teaching stream) at the University of Toronto for the Elite professional Master’s. She earned her PhD from the University of Waterloo, was a visiting researcher at the Fields Institute and the Chinese University of Hong Kong, and worked as a research scientist at VerticalScope (Research lead) and Stradigi AI.

Research Interests

Multilinguality, Language Diversity, and Low-Resource Languages

Multicultural Bias and Multimodal Applications

Pedagogy for Natural Language Processing and Machine Learning

Open Lab News

Projects

ProxyLM (Findings NAACL 2025)

A lightweight performance proxy that predicts LM accuracy using ~30× less compute. Enables faster model selection, fine-tuning, and prompt iteration while maintaining high predictive reliability.

Paper Poster

AlignFreeze (NAACL 2025)

Freezes early transformer layers to preserve syntactic knowledge during fine-tuning. Boosts zero-shot and cross-domain performance with minimal additional training, improving stability and efficiency.

Paper Video Slides Poster

WorldCuisine (NAACL 2025 – Best Theme Paper)

1.2M image–question pairs across 30 languages, capturing global culinary knowledge. A benchmark for cross-cultural multimodal reasoning with applications in cultural AI research.

Paper Award Poster

Teaching NLP

Award-winning Teaching NLP workshop on empowering multilinguality in NLP education. Showcases teaching strategies, open resources, and collaborative projects funded by NSERC USRA and the Fields Institute.

LinkedIn Paper Video Slides OTU News Fields

URIEL+ World Language Database (COLING 2025)

Expanded typological and geographic language database with improved NLP integration. Includes a Python package for multilingual benchmarking, cross-lingual transfer, and dataset alignment.

Paper LinkedIn PyPI Video Fields Annoucement

AiTaigi Hokkien Learning App

Multimodal app for Taiwanese Hokkien featuring speech, text, and audio examples. Developed as a student-led project and awarded the Student Engagement Award by U of T Computer Science.

Paper Poster

TranslationCorrect (ACL 2025)

Interactive translation quality assistant that detects and corrects machine translation errors. Enhances translator efficiency while maintaining linguistic fluency and semantic accuracy.

Paper Poster ACL Demo Video

Multilingual Understanding and Reasoning of LLMs (Findings of EMNLP 2024)

This project aims to strengthen the multilingual understanding and reasoning capabilities of large language models (LLMs), with a focus on low-resource languages.

EMNLP Paper

Full list on Google Scholar

Browse the complete, up-to-date publication list, citations, and co-authors.

Google Scholar

Join Us

Choose the path that fits you. Please carefully follow the instructions below:

Collaborations (Academic & Industry)

Propose joint research, co-supervision, visiting positions, R&D projects, evaluations, or joint award submissions with L³.

Information Doc

Undergraduate Students (UofT, OTU & other institutions)

Pathways for undergrads to work with L³ via URA, Honours Thesis, University Works, or UofT CSC494/495.

Information Doc

MSc / PhD Applicants (OTU, UofT, etc.)

For applicants seeking graduate study with L³ at OTU/UofT. Start with the call, complete forms, and prepare your materials.

Information Doc

Letters of Recommendation (Former Students)

For former L³ students requesting an academic or industry reference letter.

Information Doc

Teaching Experience

Ontario Tech University (OTU)

CSCI 2000U – Scientific Data Analysis
CSCI 4055U – Natural Language Processing (award-winning course)
CSCI 4030U – Big Data Analytics
CSCI 6720G – Advanced Topics in Information

University of Toronto

CSC2701 – Communications for Computer Scientists
CSC2702 – Technical Entrepreneurship
CSC2703 – MScAC Internship (Content Curation)
CSC401/2511 – Natural Language Computing
Vector/DCS Machine Learning & Data Science Technology Upskilling Microcredentials

York University

CSML 1010 – Applied Machine Learning and Lifecycle (Certificate in Machine Learning)

Students

David Anugraha

Lead author: URIEL+ Typological Knowledge Base.
Co-author: WorldCuisines & ProxyLM (Multilingual VQA, LM performance prediction).
Co-author: MT performance on low-resource languages.
Starting PhD, Stanford University (Fall 2025).

Enrique David Guzman Ramírez

Data Engineer, J.D. Power.
MScAC student, University of Toronto.
Vector Scholarship in AI (2022–23).

Kosei Uemura

Focus: Multilingual NLP & reasoning in LLMs.
Lead author: AfriInstruct (instruction tuning for African languages).
Co-author: Empowering the Future with Multilinguality & Language Diversity.

Mason Shipton

Co-author: URIEL+ (8,000+ language vectors).
Co-author: Empowering the Future (NLP course framework).
Programmer Analyst, Ontario Teachers’ Pension Plan (cloud solutions; Innovation Newsletter curator).

Labib Rahman

ExploRIEL — UI with chatbot for URIEL+ language distances & vectors.
SoulsBot+ — LLM-powered tutorial chatbot for Dark Souls: The Board Game.
LinguaQuest — RPG-style educational game for linguists.
Master’s student, Ontario Tech; researcher at Lee Lab & UXRLab.

Quang Phuoc Nguyen

Data Selection for Multilingual Alignment — selects optimal languages for LM fine-tuning.
Merlin: Curriculum Alignment — encoder–decoder stacking to improve multilingual alignment.
Game Dialogue Translation — survey of LLM performance in game localization.

Malikeh Ehghaghi

Benchmarking Interpretability in Healthcare Using Pattern Discovery and Disentanglement.
Interpretable Disease Prediction from Clinical Text by Leveraging Pattern Disentanglement.
PhD student, University of Toronto & Vector Institute (R3 Lab, Prof. Colin Raffel).
Vice-Chair, ACM-W Toronto; Co-host, Cost of Women in AI Research.

Amane Takeuchi

Business Analyst, Amazon (Tokyo, Japan).
Research Project Lead & RA (ML model interpretation in clinical apps, NLP, CS education & EDI; PyTorch).
BSc Applied Math; Specialist Data Science; Major CS; Minor Math — University of Toronto (Dean’s List 2023).
Vice-Chair & Career Event Director, UofT Japan Network; TA for MAT135/136/235.

Tong Su

Software Engineer Intern, Vortexa (LLMs for maritime data parsing).
MSc Advanced Computer Science, University of Oxford (2024–2025).
Former Full-Stack Developer, Northbridge Financial (Angular, Django, .NET; 10,000+ users).
TA & Course Supporter, University of Toronto (Python, Unix/Git, Research Software).
Research Assistant (Lee Lab & AI for Justice): PEFT for low-resource NMT; first author — NAACL 2024.
Passed CFA Program Level I (Oct 2024).

Syed Mekhael Wasti

Lead author, ACL 2025 demo: TranslationCorrect.
Co-author, TeachNLP 2024: Multilinguality paper.
MSc, Queen’s University (Vector Scholar), Fall 2025.

Hasti Toossi

Software Engineer, PolyAI.
Research: NLP; Programming Languages (Type Theory).
Recent graduate, University of Toronto.

Aditya Khan

A Faculty Initiative Addressing Gender Disparity — ACM GCEC 2024.
URIEL+ — COLING 2025.
Entering 5th (final) year of Honours BSc, University of Toronto.
Specializations: Data Science, Statistics, Computer Science.

Eric Khiu

URIEL+ — COLING 2025.
Reproducibility Study on Language Similarity (URIEL) — NAACL SRW 2024.
Predicting MT Performance on Low-Resource Languages — EACL 2024.
Software Engineer outsourced to Microsoft, Beyondsoft Malaysia.

Vincent Shuai

Student Engagement Award (2024), University of Toronto CS.
Software Engineer, TikTok.

Shou-Yi Hung (Ray)

Faculty Initiative on Gender Disparity — ACM GCEC 2024.
TranslationCorrect — ACL Demo 2025.
SiniticMTError dataset — COLM WMDQS Workshop 2025.
SDE Intern, AWS Vancouver; former ML Research Intern, Huawei Canada.

Awards

Awards & Recognitions:

Nominee – Women in AI Researcher of the Year Award (2025)
ARIA Spotlight Award – Master of Science in Applied Computing (2024)
Nominee – Tim McTiernan Student Mentorship Award (2024)
Best Audience Award – Teaching NLP for "Empowering Multilinguality"
Canadian Association for University Continuing Education Program Award (2020)

Research Grants:

Canada Foundation for Innovation – The John R. Evans Leaders Fund (PI): Infrastructure for Low-Resource Efficient Scalable LLM Benchmark and Evaluation; 2025; $140,000 and O&M $77,000.
NSERC Discovery Grant (PI): A Novel Paradigm for Advancing Low-Resource Languages – Uncovering Factors and Expanding Large Language Models; $155,000 with Discovery Launch Supplement $12,500; 2024.
NSERC Idea to Innovation (Collaborator): Pattern Discovery and Disentanglement – An Interpretable Multi-Modal AI Platform; $124,000; 2023.
Lacuna Funding (Collaborator): Training and Evaluation Datasets for African Languages, Masakhane Natural Language Understanding, Conversational AI, and Benchmark for African Languages; $207,000; 2024.

Biography

Research Interests

Multilinguality, Language Diversity, and Low-Resource Languages

Multicultural Bias and Multimodal Applications

Pedagogy for Natural Language Processing and Machine Learning

Projects

ProxyLM (Findings NAACL 2025)

AlignFreeze (NAACL 2025)

WorldCuisine (NAACL 2025 – Best Theme Paper)

Teaching NLP

URIEL+ World Language Database (COLING 2025)

AiTaigi Hokkien Learning App

TranslationCorrect (ACL 2025)

Multilingual Understanding and Reasoning of LLMs (Findings of EMNLP 2024)

Full list on Google Scholar

Join Us

Collaborations (Academic & Industry)

Undergraduate Students (UofT, OTU & other institutions)

MSc / PhD Applicants (OTU, UofT, etc.)

Letters of Recommendation (Former Students)

Teaching Experience

Ontario Tech University (OTU)

University of Toronto

York University

Students

David Anugraha

Enrique David Guzman Ramírez

Kosei Uemura

Mason Shipton

Labib Rahman

Quang Phuoc Nguyen

Malikeh Ehghaghi

Amane Takeuchi

Tong Su

Syed Mekhael Wasti

Hasti Toossi

Aditya Khan

Eric Khiu

Vincent Shuai

Shou-Yi Hung (Ray)

Awards

Instagram