MS&E 325: Topics in Stochastic Optimization

Department of Management Science and Engineering, Stanford University, Winter 2008-09.
TuTh 2:05-3:15 pm, Terman M33.

Instructor:
Ashish Goel. Office hours: Mondays 4-5, Terman 311.

Guests/auditors, please subscribe to msande325-win0809-guests by going to mailman.stanford.edu .


Course Description

From the bulletin: 
Markov decision processes; optimization with sparse priors; multi-armed bandit problems and the Gittins' index; regret bounds for multi-armed bandit problems; stochastic knapsack and the adaptivity gap; budgeted learning; adversarial queueing theory; stochastic scheduling and routing; stochastic inventory problems; multi-stage and multi-objective stochastic optimization. Prerequisites: MS&E 221 or equivalent; and MS&E 212 or CS 261 or equivalent.

Informally: The first part of the class will  focus on topics that can loosely be called "Algorithmic decision theory":  Algorithms for designing provably (near)-optimal and computationally efficient strategies for decision making under uncertainty, using  available and acquired data, as well as  probabilistic models thereon.  The second part will focus on stochastic versions of combinatorial optimization problems.

Course requirements: There will be four homework assignments, and an optional project. Please sign up for a project only if you believe you will be able to devote substantial amount of attention. Also, every student may be expected to scribe one lecture each.


Reading list (will be updated continually; papers marked as [C] will be covered in class; papers marked as [R] are for your reference; this list is not exhaustive)


[C] A short proof of the Gittins Index theorem. J. Tsitsiklis. Notice that this proof assumes that the time for which an arm gets played is a continuous distribution, whereas we assumed discrete in the proof in class.

[C] A spreadsheet illustrating a computation of the Gittins' index for Beta priors.

[R] Four proofs of Gittins' multiarmed bandit theorem. E. Frostig and G. Weiss. Specially, see the proof which involves the fair and prevailing prices.

[R] Elicitation of a Beta prior for Bayesian inference in clinical trials. Y. Wu, W. Shih, and D. Moore. This is just to support my contention that the multi-armed bandit problems are important not just for new Internet related applications but also traditional applications in health-care etc. It would be enough to read the abstract.

[R] Asymptotically efficient adaptive allocation rules. T. Lai and H. Robbins. This is the original "regret paper", and is a classic. Read the treatment of the lower bound from this paper, but for upper bounds, the proofs have been simplified and generalized (sometimes with slightly weaker constants, but that is not a major concern in this course).

[R] Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. R. Agarwal. This contains a simplified proof of the main result of Lai and Robbins. It would be most instructive to read sections 1 and 2 of this paper, and skip the rest, since (modulo constants), these sections are subsumed by the next paper.

[C] Finite-time Analysis of the Multiarmed Bandit Problem. P. Auer, N. Cesa-Bianchi, and P. Fischer. Specifically, read the analysis of UCB1. Then contemplate the following question: why is the performance of this algorithm apparently worse if the arms have very similar means? Can you derive a bound that does not depend on the difference in the means?

[C] The Nonstochastic Multiarmed Bandit Problem. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. Read the proofs for the very first algorithm (section 3) followed by a quick look at section 4. Then ask the following question: where did all the probabilistic analysis disappear in this proof? Also read the lower bound in the appendix.

[C] Efficient algorithms for online decision problems. A. Kalai and S. Vempala. A nice exposition of the full information model. Read the first three sections. Note that this algorithm can be used to solve a large variety of linear optimization problems sequentially, without any assumptions on the linear utilities in each time period. Also note that the bounds that they obtain in the simplest multi-armed bandit setting are stronger than in the partial information model, most notably, the sqrt(N) changes to a log(N) -- verify this for yourself by finding the optimum epsilon for the simplest algorithm in this paper.

[R] Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich. Read the algorithm. Then before reading the proof, consider the following question: how would you use this technique in a black-box fashion to solve the linear generalization of the bandits problem, as descried by Kalai and Vempala in section 1.1? Before you assume that this is trivial, note that Zinkevich requires a convex decision space, whereas Kalai and Vempala assume that the decision space is arbitrary, possibly just a set of points.

[R] Logarithmic Regret Algorithms for Online Convex Optimization. E. Hazan, A. Agarwal, and S. Kale. Read the first two sections and make sure you understand the connection to Zinkevich's results and those of Kalai and Vempala.

[R] Universal portfolios. T. Cover. Another classic. Read the problems in this paper and think about how you would solve them using more recent techniques. Then go back and read the rest of Hazan et al (don't worry too much about the proofs since the techniques used are not similar to those used in the rest of this class).

[R] The value of knowing the demand curve. R. Kleinberg and T. Leighton. An interesting application of the ideas in the UCB algorithm of Auer et al, as well as new insights.

[R] Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. V. Dani and T. Hayes. Read the discussion on adaptivity.

[C] Allocating bandwidth for bursty connections. J. Kleinberg, Y. Rabani, E. Tardos. Read the sections on load balancing. Observe how the lower bound and the upper bound seem almost contradictory -- the lower bound says that "no definition of effective bandwidth works" while the upper bound says that "the standard definition of effective bandwidth works"! The trick, of course is in dealing with exceptional jobs separately and doing binary search ot find the right cut-off. An imporvement in the constant factor would be very interesting, specially if the improvement is large.

[R] Stochastic Load Balancing and Related Problems. A. Goel and P. Indyk. Read the sections on Poisson and Exponential jobs. Then ask the following questions: (a) How does the Poisson case for load balancing relate to the majorization based proof we saw in class? And most importantly, can the kinds of techniques we used for exponential jobs be used for Bernoulli jobs? For general jobs? A PTAS for the Bernoulli or general problem would be spectacular.

[C] Simultaneous optimization via approximate majorization for concave profits or convex costs. A. Goel and A. Meyerson. Read sections 2, 3.4, and section 4 (skip the proofs of section 4).

[C] Approximating the stochastic knapsack: the benefit of adaptivity. B. Dean, M. Goemans and J. Vondrak.

[C] Approximation algorithms for budgeted learning problems S. Guha and K. Munagala. Notice how the proof uses techniques very similar to those used by Dean, Goemans, and Vondrak.

[C] Multi-armed Bandits with Metric Switching Costs S. Guha and K. Munagala. Notice the nice balancing trick in the LP.

[C] The Ratio Index for Budgeted Learning, with Applications. A. Goel, S. Khanna, B. Null. Read the definition of the ratio index, the statement of the result, and then just focus on sections 3 and 4.

(Many) more to come shortly.


Project suggestions.

My preference would be if you were to do this in teams of two. Remember, the project is optional but if you choose to do one, I would expect you to devote substantial effort. I will add  more project suggestions if needed. Also, please feel free to come up with a formulation of your own.