CS 323: Understanding Images and Videos: Recognizing and Learning High-Level Visual Concepts

Announcements:

• Class on Wed, Sep 30th has been rescheduled for Fri, Oct 9th 9am - 12pm.

• The first class will be on Wednesday, September 23rd from 2:15 - 5:05 PM in Herrin T185.

Instructor: Prof. Fei-Fei Li

Office: Room 246 Gates Bldg

Phone: (650)725-3860

Email: feifeili [at] cs [dot] stanford [dot] edu

Office hours: by email appointment

Course Assistant: Andy L. Lin

Email: ydna [at] stanford [dot] edu

Office hours: by email appointment

Class Location and Time:

Wed 2:15-5:05pm - 3 units - Room: Herrin T185 (TO BE CHANGED)

Course Description:

The field of computer vision has seen an explosive growth in the past decade. Much of the recent effort in vision research is towards developing algorithms that can perform high-level visual recognition tasks on real-world images and videos. With the development of the Internet, this task becomes particularly challenging and interesting given the heterogeneous data on the web. This course will focus on reading recent research papers that are focused on solving high-level visual recognition problems, such as object recognition and categorization, scene understanding, human motion understanding, etc.

Syllabus:

Weekly reading on recent, state-of-the art papers Course project involving using data from the ImageNet ontology and a Video Dataset

Week 1-2: classic papers in object recognition
Week 3-5: object categorization in 3D, in context and large numbers
Week 6-7: scene understanding
Week 7-8: human motion understanding
Week 9-10: webscale recognition

Pre-req:

Some experience in research with one of the following fields: computer vision, image processing, computer graphics, machine learning.

Textbook:

None required.