CSC 401/2511 -- Natural Language Computing
Winter 2009
Index of this document
Contact information
Instructor: Gerald Penn
-
Office: PT 396B (St. George campus)
-
Office hours: Wednesays and Fridays 1-2, or by appointment
-
Tel: 978-7390
-
Email: gpenn@cdf.utoronto.ca
Back to the index
Meeting times
-
Lectures: WF 12-1, SS 2118
-
Tutorials: M 12-1, SS 2118
-
(Exceptions: there will be lectures on MWF, 5/7/9 January - no tutorial
first week;
there will be lecture on MWF 19/21/23 January - no turorial third
week;
there will be a lecture on Monday, 9 February and a tutorial on
Friday, 13 February;
-
there will be a lecture on Monday, 9 March and a tutorial on Friday,
11 March;
-
there will be a lecture on Monday, 6 April, and no lecture or tutorial
on Friday, 10 April)
A bulletin
board has also been created for the class, which willi be monitored
by the TAs.
Back to the index
Texts for the Course
| Required |
C. Manning &
H.
Schuetze,
Foundations
of Statistical Natural Language Processing, MIT,
1999. |
Errata |
| |
for which there is an
on-line edition from MIT CogNet |
|
| Optional |
D. Jurafsky
& J. Martin, Speech
and Language Processing, Prentice
Hall, 2nd ed., 2008. |
Errata |
| Recommended |
A. Martelli, Python
in a Nutshell, 2nd ed., O'Reilly,
2006. |
Errata |
| Optional |
M. Lutz, Learning Python, 3rd
ed., O'Reilly, 2007. |
Errata |
| Free! |
various tutorials on the Python website |
|
Supplementary Reading for the Lectures
| Topic |
Title |
Author |
Publication Details |
parsing,
phrase structure models |
Statistical
Language Learning |
E. Charniak |
MIT Press, 1993. |
| machine learning |
The
Elements of Statistical Learning |
T. Hastie, R. Tibshirani and J. Friedman |
Springer, 2001. |
information theory
(including entropy) |
Elements
of Information Theory |
T. M. Cover and J. A. Thomas |
Wiley & Sons, 1991. |
| maximum entropy modelling |
A Maximum Entropy Approach to Natural Language Processing |
A. L. Berger, S. A. Della Pietra and V. J. Della Pietra |
Computational
Linguistics, 22(1): 39-71. |
hidden Markov models
(state emission) |
Fundamentals
of Speech Recognition, Chapter 6. |
L. Rabiner and B.-H. Juang |
Prentice Hall, 1993. |
| Good-Turing estimation |
A comparison of the enhanced Good-Turing and deleted estimation methods
for estimating probabilities of English bigrams |
K. Church and W. Gale |
Computer
Speech and Language 5:19-54. |
| information retrieval |
Modern
Information Retrieval |
R. Baeza-Yates and B. Ribeiro-Neto |
ACM Press, 1999. |
| text summarization |
Automatic
Summarization |
I. Mani |
Benjamins, 2001. |
| phonetics (articulatory and acoustic) |
Acoustic
Phonetics |
K. N. Stevens |
MIT Press, 1998. |
Back to the index
Tentative Course outline
-
Introduction to Corpus-based Linguistics
-
Text Categorisation
-
N-gram Models
-
Markov Models
-
Automatic Speech Recognition
-
Part-of-Speech Tagging
-
Information Retrieval
-
Text Summarisation
-
Statistical Machine Translation
Back to the index
Calendar of important course-related events
| Date |
Event |
| Mon, 5 January |
First lecture |
| Fri, 16 January |
Last day to add course (CSC 2511) |
| Sun, 18 January |
Last day to add course (CSC 401) |
| Mon, 9 February |
Assignment 1 due |
| 16-20 February |
Reading Week - no classes |
| Fri, 27 February |
Last day to drop course (CSC 2511) |
| Sun, 8 March |
Last day to drop course (CSC 401) |
| Mon, 9 March |
Assignment 2 due |
| Mon, 6 April |
Assignment 3 due |
| Wed, 8 April |
Last lecture |
| 20 April - 8 May |
Final exam period |
Back to the index
Evaluation and related policies
There will be three homeworks, and a final exam. The relative weights of
these components towards the final mark are shown in the table below:
| Assignment 1 |
20% |
| Assignment 2 |
20% |
| Assignment 3 |
20% |
| Final |
40% |
Important note on final: A mark of at least a D- on the final
exam is required to pass the course. In other words, if you receive
an F on the final exam you automatically fail the course, regardless of
your performance on homeworks.
Important note on homeworks: No late homeworks will be accepted
except in case of documented medical or other emergencies.
Policy on collaboration: No collaboration on homeworks is permitted.
The work you submit must be your own. No student is permitted to
discuss the final exam with any other student until the instructor or TAs
make the solutions publicly available. Failure to observe this policy
is an academic offense, carrying a penalty ranging from a zero on
the homework to suspension from the university.
Back to the index
Announcements
In this space, you will find announcements related to the course. Please
check this space at least weekly.
-
29 April: REMINDER: no exam aids are allowed in the final exam.
-
30 March: MATERIAL COVERED IN WEEK 11: text summarization.
-
23 March: MATERIAL COVERED IN WEEK 10: information retrieval, tf.idf, singular
value decomposition. You should read M&S Chapter 15.
-
16 March: MATERIAL COVERED IN WEEK 9: relative entropy and mutual information.
The flip-flop algorithm. You should read M&S Sections 7.2 and 8.4.
-
9 March: MATERIAL COVERED IN WEEK 8: acoustic phonetics.
-
2 March: MATERIAL COVERED IN WEEK 7: more part-of-speech tagging, transformation-based
learning, the Brill tagger, articulatory phonetics. You should read
J&M sections 4.1-4.2 and Chp. 7.
-
12 February: MATERIAL COVERED IN WEEK 6: hidden Markov models, Viterbi
algorithm, Baum-Welch re-estimation, interpolation methods for language
modelling, language modelling with HMMs, combining language models, POS
tagging. You should read M&S Chapters 9 and 10.
-
12 February: MATERIAL COVERED IN WEEK 5: smoothing, Markov models.
-
12 February: MATERIAL COVERED IN WEEK 4: language modelling, n-grams, maximum
likelihood estimation, Bayes's rule. You should read M&S Chapters
4 and 6.
-
28 January: MATERIAL COVERED IN WEEK 3: cosine method, entropy, decision
trees, k-nearest neighbours, perceptron learning, Lagrange's method, maximum
entropy modelling. You should read M&S 2.1-2.2.4.
-
28 January: MATERIAL COVERED IN WEEK 2: more parts of speech, corpus annotation,
genre classification. Here is our python tutorial. You should
read M&S Chapter 16, 15.2-15.2.1 and 8.1.
-
14 January: MATERIAL COVERED IN WEEK 1: Zipf's law, parts of speech.
You should read M&S Chapter 1, Section 3.1, and Section 4.3.2.
-
7 January: PREREQUISITES. CSC 207 or 209 or 228, and STA 247 or 255 or
257 and a CGPA of 3.0 or higher or a CSC subject POSt. MAT 223 or 240 is
strongly recommended. Note that the University's automatic registration
system does not check for prerequisites: even if you have registered for
the course, you will not receive credit for it unless you had satisfied
the prerequisite before you registered.
Back to the index
Handouts
In this space you will find on-line PDF versions of course handouts,
including homeworks.
To view these handouts you will need access to a PDF viewer. If your
machine does not have the required software, you can
download
Adobe Acrobat Reader for free.
Back to the index
Old Exams
Some old midterm and final exams for this course (with no solutions).
Back to the index
Gerald Penn, 29 April,
2009
This web-page was adapted from the web-page for another course,
created by Vassos Hadzilacos.