University of Toronto

Computer Science 2528, Winter 2007
Advanced Computational Linguistics

Instructor: Graeme Hirst

CSC 2528 is a participatory course. The class meets once a week for discussions of recent research papers in computational linguistics and natural language processing.

Meetings, Winter 2007: Tuesdays 1-3pm, Bahen B025.

For further information, contact Graeme Hirst at

9, 16, and 23 January 2007: Graeme Hirst introduces semantics in computational linguistics. Readings for the case studies:

Peter Clark; Phil Harrison; John Thompson. "A knowledge-driven approach to text meaning processing." Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, Edmonton, May 2003. PDF

Sergei Nirenburg and Victor Raskin. Ontological Semantics. The MIT Press, 2004. Intro to part II and sections 6.1 and 6.2, which is the first 14 pages of this file: PDF

29 January 2007: Chris Parisien and Paul Cook lead discussion on:

Ben Wellner, Lisa Ferro, Warren Greiff and Lynette Hirschman. "Reading comprehension tests for computer-based understanding evaluation." Natural Language Engineering, 12(4), December 2006, 305-334. Link to PDF via UofT Library

Ruifang Ge and Raymond J. Mooney. "A statistical semantic parser that integrates syntax and semantics". In Proceedings of the Ninth Conference on Computational Natural Language Learning, Ann Arbor, MI, pp. 9--16, June 2005. PDF

6 February 2007: Mike Demko and Tim Fowler lead discussion on:

Edmonds, Philip and Hirst, Graeme. ``Near-synonymy and lexical choice.'' Computational Linguistics, 28(2), June 2002, 105--144. PDF

13 February 2007: Graeme Hirst leads discussion on:

Caroline Barrière, Fred Popowich, "Expanding the type hierarchy with nonlexical concepts". Advances in Artificial Intelligence: 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 2000, Montréal, Quebec, Canada, May 2000. Proceedings. Link to publisher's page

20 February 2007: Reading Week, no class

27 February 2007: Saif Mohammad and Frank Rudzicz lead discussion on statistical methods in natural language generation:

Irene Langkilde, and Kevin Knight. "Generation that exploits corpus-based statistical knowledge." Proceedings of COLING-ACL, 1998. PDF

Bangalore, Srinivas and Rambow, Owen, 2000. Exploiting a probabilistic hierarchical model for generation. Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrücken, Germany. PDF

Bangalore, Srinivas and Rambow, Owen, 2000. Corpus-based lexical choice in natural language generation. Proceedings of the 38th Meeting of the Association for Computational Linguistics (ACL'00), Hongkong, China. PDF

6 March 2007: Uli Germann and Paul Cook lead discussion on recognizing textual entailment (part I):

The following papers are all from Recognizing Textual Entailment: Proceedings of the First Challenge Workshop , Southampton, 11-13 April 2005.

Ido Dagan, Oren Glickman and Bernardo Magnini. "The PASCAL Recognising Textual Entailment Challenge."

Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng. "Robust textual inference using diverse knowledge sources."

Oren Glickman, Ido Dagan and Moshe Koppel. "Web based probabilistic textual entailment."

Rodolfo Delmonte, Sara Tonelli, Marco Aldo Piccolino Boniforti, Antonella Bristot and Emanuele Pianta. "VENSES -- a linguistically-based system for semantic evaluation."

Samuel Bayer, John Burger, Lisa Ferro, John Henderson, Alexander Yeh. "MITRE's submissions to the EU Pascal RTE Challenge."

13 March 2007: Tim Fowler and Chris Parisien lead discussion on recognizing textual entailment (part II):

The following papers are all from Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, June 2005.

Annie Zaenen; Lauri Karttunen; Richard Crouch. "Local textual inference: Can it be defined or circumscribed?"

Fabio Massimo Zanzotto; Maria Teresa Pazienza; Marco Pennacchiotti. "Discovering entailment relations using 'textual entailment patterns'."

Roy Bar-Haim; Idan Szpecktor; Oren Glickman. "Definition and analysis of intermediate entailment levels".

20 March 2007: Frank Rudzicz and Mike Demko lead discussion on automated essay evaluation:

Non-technical background reading: Burstein, Jill; Chodorow, Martin; Leacock, Claudia. "Automated essay evaluation: The Criterion online writing service". AI Magazine, 25(3), Fall 2004, 27-36. PDF

Burstein, Jill and Marcu, Daniel. "A machine learning approach for identification of thesis and conclusion statements in student essays". Computers and the Humanities, 37(4), November 2003, 455-467. PDF AND Burstein, Jill; Marcu, Daniel; and Knight, Kevin. "Finding the WRITE stuff: Automatic identification of discourse structure in essays". IEEE Intelligent Systems, 18(1), January-February 2003, 32-39. PDF

Higgins, D; Burstein, Jill; and Attali, Yigal. "Identifying off-topic student essays without topic-specific training data". Natural Language Engineering, 12(2), June 2006, 145-159. Link to PDF via UofT Library

Miltsakaki, Elena and Kukich, Karen. "Evaluation of text coherence for electronic essay scoring systems". Natural Language Engineering, 10(1), March 2004, 25-55. Link to PDF via UofT Library

27 March 2007: Chris Parisien and Tim Fowler lead discussion on more applications of NLP in evaluation of student writing:

Burstein, Jill and Wolska, Magdalena. "Toward evaluation of writing style: Finding overly repetitive word use in student essays". 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, April 2003, 35-41. PDF

Miller, Tristan. ``Essay assessment with latent semantic analysis.'' Journal of Educational Computing Research, 29(4), 2003, 495--512. PDF

Halteren, Hans van. "Detection of plagiarism in student essays". Proceedings, 14th Meeting of Computational Linguistics in the Netherlands, Antwerp, December 2003. PDF

3 April 2007: Paul Cook and Saif Mohammad lead discussion on:

James F. Allen and C. Raymond Perrault. "Analyzing intention in utterances." Artificial Intelligence, 15(3), December 1980, 143-178. PDF

10 April 2007: Visit by Aurélie Névéol, National Library of Medicine (CL group meeting).

Indexing the biomedical literature with a controlled vocabulary

The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. This talk will present various methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. An evaluation of the methods will be presented. Finally, issues regarding indexing evaluation and the practical use of automatic indexing tools by NLM indexers will be discussed.

What we did in Fall 2004.
What we did in Winter 2004.
What we did in Fall 2002.

Last modified, 7 March 2007.