CSC2431: Topics in Computational Biology
Analysis of Next Generation Sequencing Data
Classes: W 11-1 in Bahen 025
Instructor: Michael Brudno
Office: Pratt (PT) 286C & CCBR 604
Office Hours: By appointment
- Note that the April 2nd class has been rescheduled for April 4th (same time, same room: 11am BA 025)
- We have a new room -- B025. Hopefully it will fit all of us
- 1/26 -- Readings for this week are finally up.
- 1/22 -- We now have a google group: uoft-csc2431. Please sign up for it. Still no word on a room change. Additionally I've put together a very brief guide on what I expect from the paper summaries.
- 1/19 -- Reading for Jan 23 is posted below. Watch this space
for a room change announcement (requested, but no word yet).
Next Generational Sequencing (NGS) technologies, such as Illumina/Solexa, AB SOLiD and 454 Pyrosequencing are
revolutionizing the acquisition of genomics data. These platforms offer much reduced costs and an increased speed of data
acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 25 base
pairs per read. Simultaneously the methodologies offer several important advantages, for example the ability to acquire
paired reads on a very large scale.
The development of NGS is forcing a reconsideration of the computational methods used for genome analysis, with the
problems of read mapping and genome assembly becoming much more complex. Simultaneously, NGS is enabling the development
of methods to address problems which were previously not addressed with genome sequencing, such as the prediction of
structural or copy number polymorphisms. The NGS data has a very different error model, requiring modifications to
classical algorithms, and the sheer size of the data requires the use of effective algorithms, appropriate hardware, and
effective implementations. In this class we will explore the features of NGS data that make it different from classical sequencing data, and try to
determine what are the possible methods to address some of these differences. Because of the novelty of the data and of
the problems, the emphasis will be on discovering the right solutions, rather than just learning about them.
The prerequisite is CSC 2417 -- Algorithms for Genome Analysis, or
permission of the instructor. The permission will be given if you have a
basic knowledge of molecular biology (transcription, etc), a strong background
in algorithms (at least CSC 373 level), and basic probability theory.
The basic requirements for the class will be a course project (60% of the
grade), paper presentations and participation (20% of the grade) and written
paper summaries (20% of the grade).
Each person taking the class for credit is responsible for submitting a one page summary
of *at least two* of the assigned papers before every class. The system for grading them will be
a simple check-off, so no need to sweat too much. From the writeup I am looking for evidence that
you read the papers and thought about them. Some evidence of this would be talking about 1. the
weaknesses of the paper (the strengths are in the abstract :)), 2. if the method is not directly
applicable to NGS how it can be used there. The writeup need not be long or thoroughly polished; it
is supposed to be evidence that you've done the work, not work in itself.
If you are presenting aa paper, you are exempt from doing a writeup that week.
The whole point of the paper summaries is to make sure that you've read the papers before coming to class.
However I will allow you to hand in no more than two summaries up to 2 days late (by Friday of the same week).
The class will satisfy the 2c breadth.