Information Sheet
EE372: Quantization, Compression, and Classification
spring 2006 - 2007
- Handouts
-
Handouts will be made available at
http://www.stanford.edu/class/ee372/handouts.html
as they are produced. These include lecture notes, copies of the slides used in class,
homework, a pdf version of this information sheet, and a discussion of the
class project.
- Instructor:
-
Robert M. Gray
Office: Packard 261
Phone: (650) 723-4001 (ISL), 723-6685 (EE)
rmgray at stanford.edu
Office hours: M 1:00-2:30, W 10:-11:30 or by email appointment.
- Teaching Assistant
-
Sangho Yoon
Office: Packard 109
Phone: TBA
Office hours: TBA
e-mail: holyoon at stanford.edu
Office Hours: TBA
- Grader:
-
TBA
- Admin. Ass't.
-
Kelly Yilmaz
Packard 259
(650) 723-4539 yilmaz@stanford.edu
- Lectures
-
MW 3:15-4:30, 60-61H (time may be shifted slightly to
remove conflict)
- Problem Sessions
-
Weekly sessions to discuss homework and projects.
TBA
- Course Objective
-
Develop the fundamentals of quantization and compression
and quantization-based classification.
- Course Description
-
Theory and design of codes for quantization and signal compression
systems (source coding systems), systems which convert analog or high
bit rate digital signals into relatively low bit rate digital signals. Applications to
analog-to-digital conversion, source coding and data compression, statistical classification, clustering, fitting
discrete models to continuous models, density estimation, and machine learning.
Necessary conditions for optimality of codes and implied code design algorithms.
Constrained and structured vector quantization.
- Intended Audience
-
Electrical engineers and computer scientists interested in
lossy and lossless compression and statistical classification, especially of speech and images.
- Prerequisites
-
- Basic linear systems theory, including
discrete and continuous parameter Fourier Transforms,
basics of 2D Fourier transforms. The material covered in EE261.
- Familiarity with Unix and Matlab (or willingness to learn!).
Knowledge of some programming language such as C or C++ or Java.
- Probability and random processes. The material covered in in
EE278.
- Information theory (EE376A) will certainly help, but it is not assumed
and needed material is taught (rapidly) as part of this class.
- Tentative Course Topics
-
The topics are likely to be closely approximated, but not
necessarily in the same order.
The lectures will begin with an introductory survey of most of the basic ideas in
order to frontload enough of the material to begin looking for projects and studying the literature. Topics will be then treated individually in more depth.
The introductory tour will be covered in handout lecture notes.
- Topic 1: Introduction and overview
A description of quantization in general terms and a brief sketch of some of the ideas and issues considered in the course.
The first lecture will be a broad survey of the entire course and a discussion
of the course logistics. The subsequentfour lectures will go through the handout
lecture notes An introductory tour of quantization.
- Quantization:extracting discrete information from a continuous world
- Encoder and decoder
- Cost functions: measures of distortion and rate
- Fixed-rate and variable-rate codes
- Average performance and optimality
- Theories of quantization
- Nonasymptotic theory: code optimality properties
- Shannon rate-distortion theory
- High rate theory
- Applications
- compression for transmission and storage
- putting points in space
- statistical classification/detection
- statistical clustering and learning
- statistical estimation/regression including a digital link
- density estimation
- designing and choosing Gauss mixture models
- Topic 2: Fundamentals of quantization
Elaboration of the basic ideas of quantization including codes, cost measures, measures of performance, and definitions of optimality.
- Preliminaries: Random variables, vectors, processes, and fields review
- Quantization: codes, partitions, and codebooks
- Distortion, rate, and optimal performance
- Lagrangian formulation of quantization
- Necessary conditions for optimal codes
- The Lloyd clustering algorithm
- Topic 3: Lossless and almost lossless coding
Quantization can be viewed as the Shannon model for lossy source coding. This topic treats lossless source coding because it usually forms a component of overall quantization systems and because it provides operational significance to Shannon entropy as well as a simple example of a Shannon theory asymptotic result.
- Uniquely decodable and prefix codes
- Kraft inequality
- Shannon's lossless coding theorm
- Shannon codes
- Huffman codes
- Coding vectors
- Entropy rate and the Shannon limit
- Brief survey of modern lossless codes, esp. Fano,
arithmetic and Lempel-Ziv. See EE 376 for more details.
- Predictive coding
- Code mismatch
- Universal codes
- Topic 5: Structured quantization
Quantizers are usually implemented with structural constraints in order to
have manageable complexity.
- Scalar quantization
- Uniform quantization
- Lattice quantization
- Product quantization
- Transform coding and bit allocation
- Subband and wavelet coding
- Gain/shape quantization
- Successive approximation
- Tree- structured quantization (TSVQ): Growing and pruning, CART and BFOS
- Multistage quantization
- Predictive quantization
- Recursive quantization
- Classified/switched/composite quantization
- Universal quantization
- Quantizing vectors into models: LPC, minimum discrimination information
and related distortion measures, Gauss mixture vector quantization
- Topic 6: Quantization, Classification, and Estimation
- Quantization and classification
- Bayes quantization
- Gauss mixture quantization
- Classification by compression
- Quantization and density estimation
- Texts
-
- Primary Texts
-
Vector Quantization and Signal Compression,
A. Gersho and R.M. Gray. Kluwer Academic Press, 1992.
- R.M. Gray and D.L. Neuhoff,
"Quantization,"
IEEE Transactions on Information Theory,
vol. 44,
pp. 2325-2384, Oct. 1998.
- Handout lecture notes and copies of slides.
- Other texts of possible interest
- T.M. Cover and J.A. Thomas,
Elements of Information Theory,
Wiley, 1991.
(Excellent text on information theory,
including noiseless coding theory.)
- T.C. Bell, J.G. Cleary, and I.H. Witten,
Text Compression,
Prentice-Hall, 1990.
(Excellent reference on lossless compression algorithms.)
- Ian H. Witten, Alistair Moffat, and Timothy C. Bell
Managing Gigabytes: Compressing and Indexing Documents and Images,
1999.
- H. Abut, ed.,
Vector Quantization,
IEEE Press, 1990.
(Collected papers on VQ through 1990.)
- N.S. Jayant and P. Noll,
Digital Coding of Waveforms,
Prentice-Hall, 1984.
(Old standby. Concentrates on various forms of scalar quantization. Has serious errors in treatment of dithering)
- W.B. Pennebaker and J.L. Mitchell,
JPEG: Still Image Data Comrpression Standard,
Van Nostrand Reinhold, 1993.
- T. Berger,
Rate-Distortion Theory,
Prentice-Hall, 1971.
(The bible on rate-distortion theory.)
- R. Gallager,
Information Theory and Reliable Communication,
Wiley, 1968.
(The old standby on information theory,
including noiseless coding and rate-distortion theory.
Still a wonderful book.)
- S. Graf and H. Luschgy,
Foundations of Quantization for Probability Distributions,
Springer, Lecture Notes in Mathematics, 1730, Berlin, 2000.
- R.M. Gray,
Entropy and Information Theory,
Springer-Verlag, 1990.
(The hard core mathematics of information theory,including general
source coding theorems.)
- R. Veldhuis and M. Breeuwer,
An Introduction to Source Coding,
Prentice Hall, New York, 1993
- M. Rabbani and P. W. Jones,
Digital Image Compression Techniques,
SPIE Optical Engineering Press,
Bellingham, Washington, 1991.
- Introduction to Data Compression,
K. Sayood. Morgan Kauffman, 1996.
- T. Hastie, R. Tibshirani, and J. Friedman.
The Elements of Statistical Learning: Data Mining, Inference,
and Prediction.
Springer, New York, 2001.
- Y. Fisher, Ed. Fractal Image Compression: Theory and Application, Springer-Verlag, NY,
1995.
- Mark Nelson and Jean-Loup Gailly, The Data Compression Book, M&T Books, NY, 1996.
- J.A. Storer, Data Compression - Methods and Theory,
Computer Science Press, NY, 1988.
- Data compression : the complete reference ,
D. Salomon, Springer, 1998.
Occasional notes will be handed out and posted to the Web.
- Some relevant Web links
-
- Course Requirements and Grading
-
|
4 Homework sets | 35% |
|
Midterm | 30% |
|
Final Project | 35% |
|
|
The midterm is tentatively scheduled for Thursday, 17 May 2006 from 6:30
PM to 9:30 PM. It will cover material from the first 11 lectures.
The room on campus will be announced.
The course project will consist of either a theoretical
analysis of a quantization or compression system or the design,
programming, and simulation of a signal compression/quantization/classification
algorithm for a
particular application (or some combination of the two).
The project should be described in readable English in a report.
There will also be a 15 minute oral presentation during the final week
of the quarter before finals.
The report should provide appropriate
references to the literature and a comparative discussion with existing
methods. Suggestions for projects may be found at
projects.html,
but creativity in developing a topic will be considered in the grade. The
projects can be developed for any available platform. The project
grade will be based on the creativity and technical content of the project
and on the quality of the presentation and participation in the discussion of
the other project presentations.
A 1 to 3 page
proposal for the project should be submitted prior to Wednesday 2 May describing the basic
problem to be attacked, the general approach, and a list of relevant references.
In previous years some of the class projects have turned into conference
publications, including the IEEE Data Compression Conference and the
IEEE International Conference on Image Processing. Students
are encouraged to develop a project that has some relation to their own
research interests. A list of suggested projects will be handed out the
second week of class.
- Collaboration Policy
-
You are encouraged to discuss together the concepts presented in the
class and the book. You may also discuss the homework problems, and
help each other if someone gets stuck. Projects may be done by pairs,
but individual responsibilities must be clearly described and
separate reports submitted. You are
required to implement new code by yourself, work out written problems by
yourself, and turn in only your own work. Specifically, you may not
copy anyone else's computer files or copy anyone else's written
homework. If you choose to share subroutines, then these subroutines
should be available to anyone in the class, e.g., via a web page. The
goal is that each student should be able to give and receive sufficient
help so that each person can complete the homework, but that in the end
each student is turning in a homework that he or she understands fully
and would be able to reproduce in its entirety unaided. Obviously it is
difficult to write a completely specific collaboration policy; you are
asked to abide by the spirit, rather than the letter, of this one.
- Image Systems Engingeering Program Lab
-
Computing facilities will be made available at in the Image Processing
Lab
http://scien.stanford.edu/labsite/scien_teaching_lab.html
for use in the homeworks and projects.
rmgray@stanford.edu