36-720, Statistical Network Models

Mini-semester I, Fall 2016

Cosma Shalizi

Mondays and Wednesdays, 3:00--4:20 in Baker Hall 235A
Office hours, Tuesdays 11--12 in Baker Hall 229C

This course is a rapid introduction to the statistical modeling of social, biological and technological networks. Emphasis will be on statistical methodology and subject-matter-agnostic models, rather than on the specifics of different application areas. No prior experience with networks is expected, but familiarity with statistical modeling is essential.

Pre-requisites

There are no formal pre-requisites, but it is intended for those who already know some statistics and want to learn about networks. (Reasonable background would be at the level of Davison's Statistical Models or Wasserman's All of Statistics; or Casella and Berger's Statistical Inference, plus some actual experience with data.) Others are welcome to try, but are warned that we will make little effort to fill in background statistical knowledge.

Undergraduates wishing to take the course should register for 36-491.

Auditors are welcome, provided there is space for them in the classroom.

Topics

The class will cover the following topics, in roughly this order: basic graph theory; data collection and sampling; random graphs; block models and community discovery; latent space models; "small world" and preferential attachment models; exponential-family random graph models; visualization; model validation; dynamic processes on networks. Additional topics or variations will depend on time and the interests of the class. See below for the precise list of lecture topics, subject to revision.

Textbooks

There is one required text, Eric D. Kolaczyk, Statistical Analysis of Network Data (Springer, 2009; available electronically through SpringerLink). The lecture notes will not be an adequate substitute.

There are two recommended books:

Course Mechanics and Grading

Class will meet for lecture twice a week. You are expected to attend every lecture. Every student is expected to act as the "scribe" for two lectures, taking notes and producing a (rough) version in LaTeX by the next week. (Most lectures will have two scribes, who will be each produce rough versions.) I will randomly select the scribes at the start of each lecture. (If you must miss a lecture, let me know, so I don't call on you.) My evaluation of the job you do as scribe will be the participation portion your grade, and count for 10% of your total grade.

There will be three homework assignments, due roughly every other week. Homework will be 40% of your grade, evenly split across the assignments.

There will also be a final project, which will by default will be a data analysis of a real data set. If you want to use an outside data set for the final project, or to tackle a theoretical or methodological question, you will need prior permission from the professor; otherwise, a data set and agenda will be provided to you. The final project will be 50% of your grade.

There will be no curve, and the threshold for an A- will be 90/100.

Computation

Most homework assignments, and the course project, will involve varying combinations of data analysis and simulation. The use of R for these assignments is encouraged but not required. (If you use other languages, you are on your own computationally.)

Resources

This makes no attempt to be comprehensive. Suggestions are welcome.

Schedule

29 August, Lecture 1: Introduction to the course
Basic concepts; elementary graph theory
Lecture notes (based on scribed notes by Brendan McVeigh)
Reading: Kolaczyk, chapters 1 and 2
Optional reading:
Homework 1: assignment, ckm_network.dat data file
31 August, Lecture 2: Data collection and sampling
Lecture notes (based on scribed notes by Cristobal De La Maza and Valerie Yuan)
Reading: Kolaczyk, sections 3.1--3.3, 3.5, chapter 5
Optional readings:
7 September, Lecture 3: Visualization and descriptive statistics
Scribed lecture notes: by Momin Malik and Neil Spencer
Reading: Kolaczyk, section 3.4, chapter 4
Optional reading:
12 September, Lecture 4: Random graphs
The Erdos-Renyi model and its properties
Scribed lecture notes: by Ciaran Evans and by Jacqueline Mauro
Reading: Kolaczyk, sections 6.1--6.2
Optional readings:
14 September, Lecture 5: Block models and stochastic block models
Structural equivalence; regular equivalence; regular, stochastic equivalence; latent variables
Scribed lecture notes: by Zongge Liu and Brendan McVeigh
Reading: TBA
Optional readings:
Homework 1 due
Homework 2: assignment, fj.txt data file
19 September, Lecture 6: Community discovery
Stochastic block models; modularity
Reading: Lecture canceled due to instructor back injury; see under lecture 7 for slides
Optional readings:
21 September, Lecture 7: Latent space models
Continuous latent spaces; Euclidean and non-Euclidean models
Reading: Slides (.Rmd source file)
Optional readings (including references from the slides):
Homework 2 due
Homework 3 assigned
26 September, Lecture 8: Cumulative advantage, preferential attachment, and vertex-copying models
Skew distributions from positive feedback
Scribed lectures notes: by Matthew Babcock and Mo Li
Reading: Kolaczyk, sections 6.3--6.4
Optional reading:
28 September, Lecture 9: The scale-free network controversy
Drawing straight lines on log-log plots makes the baby Gauss cry.
Scribed lecture notes: by Alan Mishler
Reading: Clauset, Shalizi and Newman, "Power-law Distributions in Empirical Data", SIAM Review 51 (2009): 661--703, arxiv:0706.1062
Optional reading:
3 October, Lecture 10: Exponential-family random graph models I: the good news
Scribed lecture notes: by Nicolas Kim and Shengming Luo
Reading: Kolaczyk, section 6.5
Optional reading:
5 October, Lecture 11: Exponential-family random graph models II: the bad news
Scribed lecture notes: by Abulhair Saparov
Reading:
Optional reading:
Homework 3 due
Final project begins: assignment
10 October, Lecture 12: Network models are statistical models
Goodness-of-fit, link prediction, model comparison
Scribed lecture notes: by Lynn Kaack
Reading: Kolaczyk, section 7.2
Optional reading:
12 October: NO LECTURE
17 October, Lecture 13: Dynamic processes on networks I: basic models
Diffusion; percolation; epidemics; coupled oscillators; regression on neighbors
Notes for this lecture: Many Cheerful Facts about the Laplacian
Scribed lecture notes: by Xiaoying Tu
Optional readings:
19 October, Lecture 14: Dynamic processes on networks II: homophily vs. influence
Or, "A social network is a machine for creating selection bias."
Scribed lecture notes: by Shengming Luo and by Jacqueline Mauro
Reading: Cosma Rohilla Shalizi and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", Sociological Methods and Research 40 (2011): 211--239, arxiv:1004.4704
Optional reading:
20 October: Final projects due

Policies

Collaboration, Cheating and Plagiarism

This is a graduate-level class, so you are all expected to be willing and able to follow the standards of academic integrity usual at the university. In general, you are free to discuss homework with each other, though all the work you turn in must be your own; you must not copy mathematical derivations, computer output and input, figures or writing from anyone or anywhere else, without reporting the source within your work. (This includes copying, consulting or otherwise using other analyses of our data sets, in older classes or the professional literature.) Unacknowledged copying or unauthorized collaboration will lead to severe disciplinary action. Please read the CMU Policy on Academic Integrity, and don't plagiarize.

If you are unsure about what is or is not appropriate, please ask me before submitting anything; there will be no penalty for asking.

Physically Disabled and Learning Disabled Students

The Office of Equal Opportunity Services provides support services for both physically disabled and learning disabled students. For individualized academic adjustment based on a documented disability, contact Equal Opportunity Services at eos [at] andrew.cmu.edu or (412) 268-2012.

Take care of yourself

Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding excess drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226
If the situation is life threatening, call the police:
On campus: CMU Police: 412-268-2323
Off campus: 911

If you have questions about this or your coursework, please let me know.