This program differs from the standard Statistics Ph.D. program in its emphasis on machine learning and computer science. Students in this track will be involved in courses and research from both the Departments of Statistics and Machine Learning.

During the first year, students will normally be situated in and supported by the Department of Statistics. During later years, students will be located in the Department of their primary advisor.

Students in this program are subject to all of the core Ph.D. requirements, except that the course Statistical Computing (36-750) is recommended, not required, and the Data Analysis Exam is also not required. (The Data Analysis Exam is required, however, in order to receive the M.S. in Statistics.)

Additional course requirements of this joint program are the following:

**10-715: Advanced Introduction to Machine Learning****10/36-702: Statistical Machine Learning**

The rapid improvement of sensory techniques and processor speed, and the availability of inexpensive massive digital storage, have led to a growing demand for systems that can automatically comprehend and mine massive and complex data from diverse sources. Machine Learning is becoming the primary mechanism by which information is extracted from Big Data, and a primary pillar that Artificial Intelligence is built upon. This course is designed for Ph.D. students whose primary field of study is machine learning, or who intend to make machine learning methodological research a main focus of their thesis. It will give students a thorough grounding in the algorithms, mathematics, theories, and insights needed to do in-depth research and applications in machine learning. The topics of this course will in part parallel those covered in the general graduate machine learning course (10-701), but with a greater emphasis on depth in theory and algorithms. The course will also include additional advanced topics such as RKHS and representer theory, Bayesian nonparametrics, additional material on graphical models, manifolds and spectral graph theory, reinforcement learning and online learning, etc. Students entering the class are expected to have a pre-existing strong working knowledge of algorithms, linear algebra, probability, and statistics.

Statistical Machine Learning is a second graduate level course in advanced machine learning, assuming that students have taken Machine Learning (10-701) or Advanced Machine Learning (10-715), and Intermediate Statistics (36-705). The term “statistical” in the title reflects the emphasis on statistical theory and methodology.This course is mostly focused on methodology and theoretical foundations. It treats both the art of designing good learning algorithms and the science of analyzing an algorithm's statistical properties and performance guarantees. Theorems are presented together with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own research. Though computation is certainly a critical component of what makes a method successful, it will not receive the same central focus as methodology and theory. We will cover topics in statistical theory that are important for researchers in machine learning, including consistency, minimax estimation, and concentration of measure. We will also cover statistical topics that may not be covered in as much depth in other machine learning courses, such as nonparametric density estimation, nonparametric regression, and Bayesian estimation.

Also, an elective must be chosen from among the following courses:

**10-708: Probabilistic Graphical Models****10-725: Convex Optimization****15-750: Algorithms****15-826: Multimedia Databases and Data Mining****15-853: Algorithms in the Real World**

Many of the problems in artificial intelligence, statistics, computer systems, computer vision, natural language processing, and computational biology, among many other fields, can be viewed as the search for a coherent global conclusion from local information. The probabilistic graphical models framework provides an unified view for this wide range of problems, enabling efficient inference, decision-making and learning in problems with a very large number of attributes and huge datasets. This graduate-level course will provide you with a strong foundation for both applying graphical models to complex problems and for addressing core research topics in graphical models. The class will cover three aspects: The core representation, including Bayesian and Markov networks, and dynamic Bayesian networks; probabilistic inference algorithms, both exact and approximate; and, learning methods for both the parameters and the structure of graphical models. Students entering the class should have a pre-existing working knowledge of probability, statistics, and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. It is expected that after taking this class, the students should have obtain sufficient working knowledge of multi-variate probabilistic modeling and inference for practical applications, should be able to formulate and solve a wide range of problems in their own domain using GM, and can advance into more specialized technical literature by themselves. Students are required to have successfully completed 10701 or 10715, or an equivalent class.

Nearly every problem in machine learning can be formulated as the optimization of some function, possibly under some set of constraints. This universal reduction may seem to suggest that such optimization tasks are intractable. Fortunately, many real world problems have special structure, such as convexity, smoothness, separability, etc., which allow us to formulate optimization problems that can often be solved efficiently. This course is designed to give a graduate-level student a thorough grounding in the formulation of optimization problems that exploit such structure, and in efficient solution methods for these problems. The main focus is on the formulation and solution of convex optimization problems. These general concepts will also be illustrated through applications in machine learning and statistics. Students entering the class should have a pre-existing working knowledge of algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate. Though not required, having taken 10-701 or an equivalent machine learning or statistics class is strongly encouraged, since we will use applications in machine learning and statistics to demonstrate the concepts we cover in class. Students will work on an extensive optimization-based project throughout the semester.

The course will cover a fairly wide range of topics in algorithm design. Some of the topics will be quite a bit older work from the 1960s to more recent work from this century. We hope to present as many different tools and algorithms as time permits. Most topics will be covered in two phases. In the first we will cover an important design technique by presenting a classic possibly simple application. In the second, an application will be presented that hopefully will be one that is new to most of the class. By the end of the class students are expected to be able to recognize which tool or method to apply to a problem and reasonably proficient at using this tool. We also feel that the student should be able to explain their algorithmic design idea to their piers and supervisors both in writing and orally. This is why we require both written and oral presentations of homework. Please check the preliminary schedule for an idea of the possible topics to be covered. The schedule at this point is mostly a list of topics covered some five years ago. The schedule will be updated with several newer topics.

The course covers advanced algorithms for learning, analysis, data management and visualization of large datasets. Topics include indexing for text and DNA databases, searching medical and multimedia databases by content, fundamental signal processing methods, compression, fractals in databases, data mining, privacy and security issues, rule discovery, data visualization, graph mining, stream mining.

This course covers how algorithms and theory are used in real-world applications. The course will cover both the theory behind the algorithms and case studies of how the theory is applied. It is organized by topics and the topics change from year to year.

Students in this program are required to complete the Advanced Data Analysis (ADA) project to the same standards as regular Statistics Ph.D. students. Namely, they are required to work on a substantive, real data project with a domain expert as outside advisor. A faculty member from the Department of Statistics must play an oversight role in the project, if not as the primary advisor. This project will satisfy the ML Data Analysis Project (DAP), speaking skills, and writing requirements, provided that an ML faculty member is an advisor.

Thesis research must be either co-supervised by a faculty in ML and a faculty in Statistics, or supervised by a faculty member who holds a joint appointment in Statistics and Machine Learning. The thesis committee must contain at least one member from the Department of Statistics and one from the Machine Learning Department.

Shown below is a two-year plan of study that will satisfy all of the coursework requirements.

**Year One**

**Fall Semester**

- 36-699: Immigration to Statistics
- 36-705: Intermediate Statistics
- 36-707: Regression Analysis
- 10-715: Advanced Introduction to Machine Learning

**Spring Semester**

- 36-752: Advanced Probability
- 36-757: Advanced Data Analysis I
- 10/36-702: Statistical Machine Learning
- Begin Work on ADA/DAP Project

**Year Two**

**Fall Semester**

- 36-755: Advanced Statistical Theory
- 36-758: Advanced Data Analysis II
- 36-750: Statistical Computing (recommended)
- Complete ADA/DAP Project

**Spring Semester**

- ML or CS Elective
- Finalize Thesis Advisor and Topic