Information Sheet
EE372: Quantization, Compression, and Classification
spring 2006  2007
 Handouts

Handouts will be made available at
http://www.stanford.edu/class/ee372/handouts.html
as they are produced. These include lecture notes, copies of the slides used in class,
homework, a pdf version of this information sheet, and a discussion of the
class project.
 Instructor:

Robert M. Gray
Office: Packard 261
Phone: (650) 7234001 (ISL), 7236685 (EE)
rmgray at stanford.edu
Office hours: M 1:002:30, W 10:11:30 or by email appointment.
 Teaching Assistant

Sangho Yoon
Office: Packard 109
Phone: TBA
Office hours: TBA
email: holyoon at stanford.edu
Office Hours: TBA
 Grader:

TBA
 Admin. Ass't.

Kelly Yilmaz
Packard 259
(650) 7234539 yilmaz@stanford.edu
 Lectures

MW 3:154:30, 6061H (time may be shifted slightly to
remove conflict)
 Problem Sessions

Weekly sessions to discuss homework and projects.
TBA
 Course Objective

Develop the fundamentals of quantization and compression
and quantizationbased classification.
 Course Description

Theory and design of codes for quantization and signal compression
systems (source coding systems), systems which convert analog or high
bit rate digital signals into relatively low bit rate digital signals. Applications to
analogtodigital conversion, source coding and data compression, statistical classification, clustering, fitting
discrete models to continuous models, density estimation, and machine learning.
Necessary conditions for optimality of codes and implied code design algorithms.
Constrained and structured vector quantization.
 Intended Audience

Electrical engineers and computer scientists interested in
lossy and lossless compression and statistical classification, especially of speech and images.
 Prerequisites

 Basic linear systems theory, including
discrete and continuous parameter Fourier Transforms,
basics of 2D Fourier transforms. The material covered in EE261.
 Familiarity with Unix and Matlab (or willingness to learn!).
Knowledge of some programming language such as C or C++ or Java.
 Probability and random processes. The material covered in in
EE278.
 Information theory (EE376A) will certainly help, but it is not assumed
and needed material is taught (rapidly) as part of this class.
 Tentative Course Topics

The topics are likely to be closely approximated, but not
necessarily in the same order.
The lectures will begin with an introductory survey of most of the basic ideas in
order to frontload enough of the material to begin looking for projects and studying the literature. Topics will be then treated individually in more depth.
The introductory tour will be covered in handout lecture notes.
 Topic 1: Introduction and overview
A description of quantization in general terms and a brief sketch of some of the ideas and issues considered in the course.
The first lecture will be a broad survey of the entire course and a discussion
of the course logistics. The subsequentfour lectures will go through the handout
lecture notes An introductory tour of quantization.
 Quantization:extracting discrete information from a continuous world
 Encoder and decoder
 Cost functions: measures of distortion and rate
 Fixedrate and variablerate codes
 Average performance and optimality
 Theories of quantization
 Nonasymptotic theory: code optimality properties
 Shannon ratedistortion theory
 High rate theory
 Applications
 compression for transmission and storage
 putting points in space
 statistical classification/detection
 statistical clustering and learning
 statistical estimation/regression including a digital link
 density estimation
 designing and choosing Gauss mixture models
 Topic 2: Fundamentals of quantization
Elaboration of the basic ideas of quantization including codes, cost measures, measures of performance, and definitions of optimality.
 Preliminaries: Random variables, vectors, processes, and fields review
 Quantization: codes, partitions, and codebooks
 Distortion, rate, and optimal performance
 Lagrangian formulation of quantization
 Necessary conditions for optimal codes
 The Lloyd clustering algorithm
 Topic 3: Lossless and almost lossless coding
Quantization can be viewed as the Shannon model for lossy source coding. This topic treats lossless source coding because it usually forms a component of overall quantization systems and because it provides operational significance to Shannon entropy as well as a simple example of a Shannon theory asymptotic result.
 Uniquely decodable and prefix codes
 Kraft inequality
 Shannon's lossless coding theorm
 Shannon codes
 Huffman codes
 Coding vectors
 Entropy rate and the Shannon limit
 Brief survey of modern lossless codes, esp. Fano,
arithmetic and LempelZiv. See EE 376 for more details.
 Predictive coding
 Code mismatch
 Universal codes
 Topic 5: Structured quantization
Quantizers are usually implemented with structural constraints in order to
have manageable complexity.
 Scalar quantization
 Uniform quantization
 Lattice quantization
 Product quantization
 Transform coding and bit allocation
 Subband and wavelet coding
 Gain/shape quantization
 Successive approximation
 Tree structured quantization (TSVQ): Growing and pruning, CART and BFOS
 Multistage quantization
 Predictive quantization
 Recursive quantization
 Classified/switched/composite quantization
 Universal quantization
 Quantizing vectors into models: LPC, minimum discrimination information
and related distortion measures, Gauss mixture vector quantization
 Topic 6: Quantization, Classification, and Estimation
 Quantization and classification
 Bayes quantization
 Gauss mixture quantization
 Classification by compression
 Quantization and density estimation
 Texts

 Primary Texts

Vector Quantization and Signal Compression,
A. Gersho and R.M. Gray. Kluwer Academic Press, 1992.
 R.M. Gray and D.L. Neuhoff,
"Quantization,"
IEEE Transactions on Information Theory,
vol. 44,
pp. 23252384, Oct. 1998.
 Handout lecture notes and copies of slides.
 Other texts of possible interest
 T.M. Cover and J.A. Thomas,
Elements of Information Theory,
Wiley, 1991.
(Excellent text on information theory,
including noiseless coding theory.)
 T.C. Bell, J.G. Cleary, and I.H. Witten,
Text Compression,
PrenticeHall, 1990.
(Excellent reference on lossless compression algorithms.)
 Ian H. Witten, Alistair Moffat, and Timothy C. Bell
Managing Gigabytes: Compressing and Indexing Documents and Images,
1999.
 H. Abut, ed.,
Vector Quantization,
IEEE Press, 1990.
(Collected papers on VQ through 1990.)
 N.S. Jayant and P. Noll,
Digital Coding of Waveforms,
PrenticeHall, 1984.
(Old standby. Concentrates on various forms of scalar quantization. Has serious errors in treatment of dithering)
 W.B. Pennebaker and J.L. Mitchell,
JPEG: Still Image Data Comrpression Standard,
Van Nostrand Reinhold, 1993.
 T. Berger,
RateDistortion Theory,
PrenticeHall, 1971.
(The bible on ratedistortion theory.)
 R. Gallager,
Information Theory and Reliable Communication,
Wiley, 1968.
(The old standby on information theory,
including noiseless coding and ratedistortion theory.
Still a wonderful book.)
 S. Graf and H. Luschgy,
Foundations of Quantization for Probability Distributions,
Springer, Lecture Notes in Mathematics, 1730, Berlin, 2000.
 R.M. Gray,
Entropy and Information Theory,
SpringerVerlag, 1990.
(The hard core mathematics of information theory,including general
source coding theorems.)
 R. Veldhuis and M. Breeuwer,
An Introduction to Source Coding,
Prentice Hall, New York, 1993
 M. Rabbani and P. W. Jones,
Digital Image Compression Techniques,
SPIE Optical Engineering Press,
Bellingham, Washington, 1991.
 Introduction to Data Compression,
K. Sayood. Morgan Kauffman, 1996.
 T. Hastie, R. Tibshirani, and J. Friedman.
The Elements of Statistical Learning: Data Mining, Inference,
and Prediction.
Springer, New York, 2001.
 Y. Fisher, Ed. Fractal Image Compression: Theory and Application, SpringerVerlag, NY,
1995.
 Mark Nelson and JeanLoup Gailly, The Data Compression Book, M&T Books, NY, 1996.
 J.A. Storer, Data Compression  Methods and Theory,
Computer Science Press, NY, 1988.
 Data compression : the complete reference ,
D. Salomon, Springer, 1998.
Occasional notes will be handed out and posted to the Web.
 Some relevant Web links

 Course Requirements and Grading

4 Homework sets  35% 
Midterm  30% 
Final Project  35% 

The midterm is tentatively scheduled for Thursday, 17 May 2006 from 6:30
PM to 9:30 PM. It will cover material from the first 11 lectures.
The room on campus will be announced.
The course project will consist of either a theoretical
analysis of a quantization or compression system or the design,
programming, and simulation of a signal compression/quantization/classification
algorithm for a
particular application (or some combination of the two).
The project should be described in readable English in a report.
There will also be a 15 minute oral presentation during the final week
of the quarter before finals.
The report should provide appropriate
references to the literature and a comparative discussion with existing
methods. Suggestions for projects may be found at
projects.html,
but creativity in developing a topic will be considered in the grade. The
projects can be developed for any available platform. The project
grade will be based on the creativity and technical content of the project
and on the quality of the presentation and participation in the discussion of
the other project presentations.
A 1 to 3 page
proposal for the project should be submitted prior to Wednesday 2 May describing the basic
problem to be attacked, the general approach, and a list of relevant references.
In previous years some of the class projects have turned into conference
publications, including the IEEE Data Compression Conference and the
IEEE International Conference on Image Processing. Students
are encouraged to develop a project that has some relation to their own
research interests. A list of suggested projects will be handed out the
second week of class.
 Collaboration Policy

You are encouraged to discuss together the concepts presented in the
class and the book. You may also discuss the homework problems, and
help each other if someone gets stuck. Projects may be done by pairs,
but individual responsibilities must be clearly described and
separate reports submitted. You are
required to implement new code by yourself, work out written problems by
yourself, and turn in only your own work. Specifically, you may not
copy anyone else's computer files or copy anyone else's written
homework. If you choose to share subroutines, then these subroutines
should be available to anyone in the class, e.g., via a web page. The
goal is that each student should be able to give and receive sufficient
help so that each person can complete the homework, but that in the end
each student is turning in a homework that he or she understands fully
and would be able to reproduce in its entirety unaided. Obviously it is
difficult to write a completely specific collaboration policy; you are
asked to abide by the spirit, rather than the letter, of this one.
 Image Systems Engingeering Program Lab

Computing facilities will be made available at in the Image Processing
Lab
http://scien.stanford.edu/labsite/scien_teaching_lab.html
for use in the homeworks and projects.
rmgray@stanford.edu