CS262

Announcements
02/24/2016Problem set 4 is out! It is due on 3/8 at the beginning of the lecture.
02/10/2016Problem set 3 is out! It is due on 2/23 at the beginning of the lecture.
01/26/2016Problem set 2 is out! It is due on 2/9 at the beginning of the lecture.
01/12/2016Problem set 1 is out! It is due on 1/26 at the beginning of the lecture. I encourage you to start forming teams quickly and to start working on the homework ASAP.
01/7/2016Students are encouraged to start forming homework groups. Let us know if you need some help. First assignment is coming up on January 12th!
01/7/2016The Winter 2016 website for CS262 is up! Please sign up for Piazza here.
Course Description
Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.

Prerequisites
The following courses are strongly recommended:
  • CS161: Design and Analysis of Algorithms, or equivalent familiarity with algorithmic and data structure concepts.

Textbooks

Durbin, Eddy, Krogh, Mitchison: Biological Sequence Analysis

Makinen, Belazzougui, Cunial, Tomescu: Genome-Scale Algorithm Design

Requirements and Grading
  1. Homework. Course will be graded based on the homeworks, NO FINAL. The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of three free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.

    Late homeworks should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time. Students with biological and computational backgrounds are encouraged to work together.

  2. Scribing. Optionally, a student can scribe one lecture. Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.



Collaboration and Honor Code

Students may discuss and work on problems in groups of at most three people but must write up their own solutions. A student can be part of at most one group. If a student works individually, then the worst problem per problem set will be dropped. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.

Students are expected not to look at the solutions from previous years. Copying or intentionally refering to solutions from previous years will be considered an honor code violation.

Class Schedule
Lecture: Tuesday and Thursday 1:30pm - 2:50pm in Y2E2 111

Instructor
Serafim Batzoglou
Office: Clark S266
Office hours: By Appointment
Phone: (650) 723-3334
Email: ude.drofnats@mifares (written backwards to avoid spam)

Teaching Assistants

Ali Sharafat
Office hours: Monday 10:00am - noon
Location: Shriram 052
Email: sharafat@stanford

Bo Wang
Office hours: Tuesday 3:00pm - 4:30pm
Location: Shriram 054
Email: wangbo.yunze@gmail

Jason (Junjie) Zhu
Office hours: Tuesday 3:00pm - 4:30pm
Location: Shriram 054
Office hours: Thursday 3:00pm - 3:30pm
Location: Clark S256
Email: jjzhu@stanford

Communication
We use Piazza as our main source of Q&A, so please sign up here.
All email correspondence should be sent to the course staff mailing list, cs262-win1516-staff@lists.stanford.edu. Alternatively, you can communicate your questions in person after lecture or during office hours.

Additional Material and Tutorials
Some additional materials can be found here

Previous Lecture Notes
The lecture notes from a previous edition of this class (Winter 2015) are available here.
Schedule (future tentative)
As the quarter progresses, the following schedule will be updated accordingly. Please check back often for the latest material.

 DateTitleReadingHomeworksScribe
11/5A Zero-Knowledge Based Introduction to BiologyMakinen 1.1, 1.2, 1.3 Shubha Raghvendra
21/7Sequence AlignmentDurbin: 2.1, 2.2, 2.3, 2.4 Nico Chaves
31/12Linear-Space AlignmentDurbin: 2.5, 2.6; Makinen: 6.1, 6.2Problem Set 1 outThomas Lau
41/14Burrows-Wheeler Transform (BWT)  Robbie Ostrow
51/19BLAST continued, Hidden Markov ModelsOptional: BWA and BOWTIE papers. Pavitra Rengarajan
61/21Hidden Markov Models continuedDurbin: 3.* Sarah Sterman
71/26HMMs continuedMakinen 7.*Problem Set 1 due. Problem Set 2 outQianying Lin
81/28Pair HMMs, CRFs, and DNA sequencingDurbin 4.* Sam Kim
92/2DNA sequencing and Assembly  Mark Berger
102/4Cancer Sequencing  Anvita Gupta
112/9Fragment AssemblyMakinen 13.*Problem Set 2 due. Problem Set 3 outAmelia Hardy
122/11Single Cell Sequencing  Minna Xiao
132/16Sequence Assembly   Samantha Zarate
142/18Human Population Genomics  Alex Wells
152/23Molecular Evolution and Phylogenetic Tree Reconstruction Problem Set 3 due. Problem Set 4 outDana Wyman
162/25Multiple Sequence Alignment  John Luttig
173/1Human Population Genomics  Arushi Raghuvanshi
183/3Human Population Genomics continued  Max Drach
193/8Modeling RNA Secondary Structure Problem Set 4 dueGus Liu
203/10RNA Secondary Structure continued  Jiwei Li