CS273a

Course Description

Your genome is 3 billion letters, driving 3 trillion cells, for 3 billion seconds. Why this computational analysis and not that? What did I just find? Who cares? Unidentified caller from Stockholm at 3 in the morning?

We will introduce you to various aspects of genomic data such as what it looks like, how to get it, and what are some of the most (and less) interesting things you could do with it.

Class includes:
Human genome parts list, COVID-19 genome parts list, Genome sequencing technologies, and a taste of the three main forces of life, neutral, negative & positive selection, via, respectively: Population genomics & paternity testing; Medical AI (disease) genomics (where you could really help really sick kids from your keyboard); and Comparative (evolutionary) genomics (bats, cats, rats, gnats, SARS-CoV-2). And maybe a dash of cryptogenomics and genomic privacy.

Get a taste of Machine Learning, Natural Language Processing, Cryptography and even Genomics in the service of humanity.

Background in Biology, ML or NLP purely optional. See class Explore page for more details.

All course materials will be available via this website and Piazza, not Canvas.

Prerequisites

CS106 or equivalent (aka, some programming experience in any language)
Example: read string from a file, count some patterns in it, print counts (refer to tutorials from previous offerings; linked below).

Cross-listings

This course is cross-listed as DBIO273A and BIOMEDIN273A. Write to Gill if you want to help get it cross-listed elsewhere.

Class Schedule

Mondays and Wednesdays 11:30AM-12:50PM.

Zoom Link

The course will be taught entirely online.
Link for Zoom
No attendance taken, but lectures will not be recorded.

Bibliography
The course is mostly based on current or very recent literature. As such, it does not follow any textbook. Please use the papers mentioned at each lecture as pointers into the relevant literature (for more material, you can look at the papers' references, or at more recent publications that cite those papers). The easiest way to find a paper would be to search for its title and/or authors on Google Scholar or vanilla Google.

As a Stanford student you also have free access to many biomedical journals. To access all biomedical resources Stanford pays for from off campus, you can install a browser extension and a shortcut that allows you to directly search and access Lane Library online resources using your SUNetID. Many of the terms we teach are also well defined in wikipedia.

Communication

All course communication will be handled via Piazza. You can enroll by clicking this link (our class page). Course announcements and other private course resources will be communicated via Piazza.


Auditing

Auditors are welcome. Please sign up to Piazza as well. Send us an email if you want to be included in the class mailing list.

Instructor

Gill Bejerano
Office: Via Zoom
Office hours: Email for appointment
Phone: (650) 723-7666
Email:

Teaching Assistants

Bo Yoo
Office: N/A
Office hours: No OH during the exam
Email:

Course Assignments
There will be four homework assignments (programming and conceptual questions) and one final take home exam. Each homework will be 15% of your final grade, and the final exam will be 40% of your final grade.

All codes must be executable on stanford student machines (i.e. cardinal, myth, or rice). Jupyter notebooks are allowed for Homework 4 and the final exam. Include how to run your code in your README, and all your codes must be able to run without user modification (e.g. if the code takes in a file as an input the path or the file name should not be hard coded but should be passed in through command line. All files must be named appropriately and your submitted zipped file must include your name. Be as detailed as possible to ensure that you get all the points.

If you are registered with the Office of Accessible Education (OAE), please send the accommodation letter via email to the class staff email () in the beginning of the quarter.
Late days
Four late days are awarded for the quarter. Once these late days are used up, homework turned in late will be penalized 20% per late day. The number of late days used is rounded up to the nearest day, so assignments turned in one hour late use one full late day. Late days cannot be applied to the final exam.

Honor Code and Regrade Policy

All homework assignments are individual assignments and you may not work in a group. You are allowed to discuss ideas and compare final numeric outputs (e.g. number of lines in a file), but no part of your final code can be shared with other students. In your submitted writeup (e.g., README), you must note the names of your collaborators. You may not share any part of your submissions with each other until grades are returned. We take honor code violations seriously. Violations will be reported to the Office of Community Standards.


We may make mistakes when we grade your homework. If you find one please send an email to to ask for a regrade. We will regrade your entire homework, and your grade may go up or down as a result. You cannot redo your homework after grades have been returned. We will not accept anymore submissions after grades have been sent out.

Take home exam must be done independently. You may not discuss it with anyone.

Course Tools

The base course directory is located at /afs/ir.stanford.edu/class/cs273a, and is reachable from the cardinal and myth machines. Source tree executables are available within the bin directory, and are machine-dependent. If you add "/afs/ir.stanford.edu/class/cs273a/bin/@sys" to your PATH variable, the correct version of the executable will be executed (see text processing tutorial).

Previous CS273A Materials
Tutorial Sessions
The following is a list of introductory sessions in biology and computer science to the depth necessary to make the course enjoyable and help get started on the homework. These primer sessions will be held on the following dates during the class time. We strongly recommend you attend all tutorials even if you think you know most of the material.

Date Subject
1/13 Introductory Biology Primer
1/20 Introduction to Text Processing
1/27 Introduction to the UCSC Genome Browser

Schedule