Professor of Computational Linguistics

University of Toronto, Department of Computer Science


Determining causes of death from verbal autopsies

Verbal autopsies (VAs) are written records of the events leading up to a person’s death, collected by interviewing the family of the deceased, in situations where there was no physical autopsy and the cause of death was not determined by a physician. VAs are typically used in developing countries such as India for public health authorities to gain a better idea of the most prevalent causes of death, which are not accurately represented just by the small number of well-documented deaths that occur in health facilities.

A verbal autopsy contains both structured information from answers to a questionnaire and a free-text narrative that captures additional information, such as the subject's medical history and the circumstances of their death, including the time sequence of their symptoms. The VA is reviewed by two physicians who assign it a code from International Classification of Diseases (ICD-10). 

Automating some of this coding process would reduce the time and costs of VA surveys. However, previous research, primarily focusing only on the structured information, has had mediocre results. To counter this problem, we are focusing instead on analyzing the narrative texts, which contain most of the information necessary for the coding.In this way, we have developed new cause-of-death classifiers that have advanced the state of art, and we are continuing to make improvements with additional analysis of the time sequence in narrative.

So far our research has focused on English text, but to be useful the system must work in the languages of India in which most of the VAs are written. We are presently working on porting the system to several Devanagari-script languages, beginning with Hindi, using data from the Million Death Study.

Our work is in collaboration with Prabhat Jha of the Centre for Global Health Research, Toronto, and Mireille Gomes of Gavi, the Vaccine Alliance, Geneva. It was supported by a Google Faculty Research Award.


Jeblee, Serena. Automating disease diagnosis and cause-of-death classification from medical narratives using event extraction and temporal ordering, Doctoral dissertation, April 2021. [PDF]

Jeblee, Serena; Gomes, Mireille; and Hirst, Graeme. Multi-task learning for interpretable cause-of-death classification using key phrase prediction. Proceedings, BioNLP 2018, Melbourne, July 2018, 12–17. [PDF]

Jeblee, Serena; Budhkar, Akshay; Milić, Saša; Pinto, Jeff; Pou-Prom, Chloé, Vishnubhotla, Krishnapriya; Hirst, Graeme; and Rudzicz, Frank. Toronto CL at the CLEF 2018 eHealth Challenge Task 1. CLEF 2018 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS, September 2018. [PDF]

Jeblee, Serena and Hirst, Graeme. Listwise temporal ordering of events in clinical notes. Proceedings, LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis, Brussels, October 2018, 177–182. [PDF]

Jeblee, Serena; Gomes, Mireille; Jha, Prabhat; Rudzicz, Frank, and Hirst, Graeme.Automatically determining cause of death from verbal autopsy narratives. BMC Medical Informatics and Decision Making, 19:127, 9 July 2019. [Open access]

Yan, Zhaodong; Jeblee, Serena; and Hirst, Graeme. Can character embeddings improve cause-of-death classification for verbal autopsy narratives? Proceedings, BioNLP Workshop, Florence, August 2019, 234–239. [PDF]