Sivaraman Balakrishnan | 10-725 Convex Optimization

Nearly every problem in machine learning and computational statistics can be formulated in terms of the optimization of some function, possibly under some set of constraints. As we obviously cannot solve every problem in machine learning, this means that we cannot generically solve every optimization problem (at least not efficiently). Fortunately, many problems of interest in machine learning can be posed as optimization tasks that have special properties—such as convexity, smoothness, sparsity, separability, etc.—permitting standardized, efficient solution techniques.

This course is designed to give a graduate-level student a thorough grounding in these properties and their role in optimization, and a broad comprehension of algorithms tailored to exploit such properties. The focus will be on convex optimization problems (though we also may touch upon nonconvex optimization problems at some points). We will visit and revisit important applications in machine learning and statistics. Upon completing the course, students should be able to approach an optimization problem (often derived from a machine learning or statistics context) and:

identify key properties such as convexity, smoothness, sparsity, etc., and/or possibly reformulate the problem so that it possesses such desirable properties;
select an algorithm for this optimization problem, with an understanding of the advantages and disadvantages of applying one method over another, given the problem and properties at hand;
implement this algorithm or use existing software to efficiently compute the solution.

Instructors

Sivaraman Balakrishnan
Yuanzhi Li

Education Associate

Daniel Bird

TAs

Course Syllabus

The syllabus provides information on grading, class policies etc.

Lecture Notes

Fundamentals of Convex Optimization

Lecture 1: (1/17) Introduction, Convex Sets
Lecture 2: (1/19) Convex Functions, Optimization Basics

First-Order Methods

Lecture 3: (1/24) Gradient Descent
Lecture 4: (1/26) More Gradient Descent and Subgradients
Lecture 5: (1/31) The Subgradient Method and Oracle Lower Bounds
Lecture 6: (2/2) Projected Gradient Descent and the Proximal Method
Lecture 7: (2/7) More Proximal Method
Lecture 8: (2/9) Stochastic Gradient Descent
Lecture 9: (2/14) Mirror Descent

Duality

Lecture 10: (2/16) LPs and Lagrangian Duality
Lecture 11: (2/21) More Lagrangian Duality and KKT
Lecture 12: (2/23) Fenchel Conjugates and Fenchel Duals

Advanced Topics

Lecture 13: (2/28) (Nearly)-Convex Optimization
Lecture 14: (3/2) Little Test I

Advanced Optimization Techniques

Lecture 15: (3/14) Sums of Squares
Lecture 16: (3/16) Second-order optimization: Newton’s method and Preconditioned Gradient Descent
Lecture 17: (3/21) Adaptive optimization algorithms
Lecture 18: (3/23) Interior Point Methods
Lecture 19: (3/28) Optimization over the cloud: Asynchronous Optimization

Online optimization

Lecture 20: (3/30) Online learning and (Online) Mirror Descent
Lecture 21: (4/4) Reinforcement learning and variance reduction

Introduction to Non-convex optimization

Lecture 22: (4/6) Limitations of convex optimization
Lecture 23: (4/11) Basics of non-convex optimization and (noisy) gradient descent algorithm
Lecture 24: (4/18) Nonconvex optimization theories, and a case study of optimizing a transformer block
Lecture 25: (4/20) Optimization and sampling I
Lecture 26: (4/25) Optimization and sampling II
Lecture 26: (4/27) Little Test II