36-720, Statistical Network Models
Mini-semester I, Fall 2016
Mondays and Wednesdays, 3:00--4:20 in Baker Hall 235A
Office hours, Tuesdays 11--12 in Baker Hall 229C
This course is a rapid introduction to the statistical modeling of social,
biological and technological networks. Emphasis will be on statistical
methodology and subject-matter-agnostic models, rather than on the specifics of
different application areas. No prior experience with networks is expected,
but familiarity with statistical modeling is essential.
Pre-requisites
There are no formal pre-requisites, but it is intended for those who already
know some statistics and want to learn about networks. (Reasonable background
would be at the level of Davison's Statistical Models or
Wasserman's All of Statistics; or Casella and
Berger's Statistical Inference, plus some actual experience with
data.) Others are welcome to try, but are warned that we will make little
effort to fill in background statistical knowledge.
Undergraduates wishing to take the course should register
for 36-491.
Auditors are welcome, provided there is space for them in the classroom.
Topics
The class will cover the following topics, in roughly this order: basic graph
theory; data collection and sampling; random graphs; block models and community
discovery; latent space models; "small world" and preferential attachment
models; exponential-family random graph models; visualization; model
validation; dynamic processes on networks. Additional topics or variations
will depend on time and the interests of the class. See below for the precise
list of lecture topics, subject to revision.
Textbooks
There is one required text, Eric
D. Kolaczyk, Statistical
Analysis of Network Data (Springer,
2009; available
electronically through SpringerLink). The lecture notes will not
be an adequate substitute.
There are two recommended books:
Course Mechanics and Grading
Class will meet for lecture twice a week. You are expected to attend every
lecture. Every student is expected to act as the "scribe" for two lectures,
taking notes and producing a (rough) version in LaTeX by the next week. (Most
lectures will have two scribes, who will be each produce rough versions.) I
will randomly select the scribes at the start of each lecture. (If you must
miss a lecture, let me know, so I don't call on you.) My evaluation of the job
you do as scribe will be the participation portion your grade, and count for
10% of your total grade.
There will be three homework assignments, due roughly every other week.
Homework will be 40% of your grade, evenly split across the assignments.
There will also be a final project, which will by default will be a data
analysis of a real data set. If you want to use an outside data set for the
final project, or to tackle a theoretical or methodological question, you will
need
prior permission from the professor; otherwise, a data set and agenda
will be provided to you. The final project will be 50% of your grade.
There will be no curve, and the threshold for an A- will be 90/100.
Computation
Most homework assignments, and the course project, will involve varying
combinations of data analysis and simulation. The use of R for these
assignments is encouraged but not required. (If you use other languages,
you are on your own computationally.)
Resources
This makes no attempt to be comprehensive. Suggestions are welcome.
Schedule
- 29 August, Lecture 1: Introduction to the course
- Basic concepts; elementary graph theory
- Lecture notes (based on scribed notes by Brendan McVeigh)
- Reading: Kolaczyk, chapters 1 and 2
- Optional reading:
- Homework 1: assignment,
ckm_network.dat data file
- 31 August, Lecture 2: Data collection and sampling
- Lecture notes (based on scribed notes by Cristobal De La Maza and Valerie Yuan)
- Reading: Kolaczyk, sections 3.1--3.3, 3.5, chapter 5
- Optional readings:
- Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella, "Reconsidering the Foundations of Network Sampling" [PDF preprint]
- Mark S. Handcock and Krista J. Gile, "Modeling Social Networks from Sampled Data", Annals of Applied Statistics 4 (2010): 5--25, arxiv:1010.0891
- Gueorgi Kossinets, "Effects of Missing Data in Social Networks",
Social Networks 28 (2006):
247--268, arxiv:cond-mat/0306335
- Grant Schoenebeck, "Potential Networks, Contagious Communities, and Understanding Social Network Structure", WWW 2013, arxiv:1304.1845
- Michael P. H. Stumpf and Carsten Wiuf, "Sampling properties
of random graphs: the degree
distribution", Physical
Review E 72 (2005): 036118, cond-mat/0507345
- Michael P. H. Stumpf, Carsten Wiuf and Robert M. May,
"Subnets of scale-free networks are not scale-free: Sampling properties of
networks", PNAS 102
(2005): 4221--4224
- 7 September, Lecture 3: Visualization and descriptive statistics
- Scribed lecture notes: by Momin Malik and Neil Spencer
- Reading: Kolaczyk, section 3.4, chapter 4
- Optional reading:
- 12 September, Lecture 4: Random graphs
- The Erdos-Renyi model and its properties
- Scribed lecture notes: by Ciaran Evans and by Jacqueline Mauro
- Reading: Kolaczyk, sections 6.1--6.2
- Optional readings:
- 14 September, Lecture 5: Block models and stochastic block models
- Structural equivalence; regular equivalence; regular, stochastic
equivalence; latent variables
- Scribed lecture notes: by Zongge Liu and Brendan McVeigh
- Reading: TBA
- Optional readings:
- Stephen E. Fienberg and Stanley Wasserman,
"Categorical Data Analysis of Single Sociometric Relations", pp. 156--192 in
Samuel Leinhardt (ed.), Sociological Methodology 1981 (San Francisco: Jossey-Bass, 1981) [JSTOR]
- Paul W. Holland, Kathryn Blackmond Laskey and Samuel Leinhardt,
"Stochastic blockmodels: First steps",
Social Networks 5 (1983): 109--137
- Stephen E. Fienberg, Michael M. Meyer and Stanley S. Wasserman,
"Statistical Analysis of Multiple Sociometric Relations",
Journal of the
American Statistical Association 80 (1985): 51--67
- Tom A. B. Snijders and Krzysztof Nowicki, "Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure",
Journal of Classification
14 (1997): 75--100
- Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing, "Mixed Membership Stochastic Blockmodels",
Journal of Machine Learning Research 9 (2008): 1981--2014
- Jörg Reichardt and Douglas R. White, "Role models for complex
networks", arxiv:0708.0958
[Discussion]
- Homework 1 due
- Homework 2: assignment,
fj.txt data file
- 19 September, Lecture 6: Community discovery
- Stochastic block models; modularity
- Reading: Lecture canceled due to instructor back injury; see under lecture 7 for slides
- Optional readings:
- 21 September, Lecture 7: Latent space models
- Continuous latent spaces; Euclidean and non-Euclidean models
- Reading: Slides (.Rmd source file)
- Optional readings (including references from the slides):
- Aurelien Decelle, Florent Krzakala, Cristopher Moore and Lenka
Zdeborova
- "Phase transition in the detection of modules in sparse networks",
Physical Review Letters 107 (2011): 065701,
arxiv:1102.1182
- "Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications", Physical Review E 84 (2011): 066106, arxiv:1109.3041
- Peter Hoff, "Modeling homophily and stochastic equivalence in symmetric relational data", NIPS 2007
- Peter D. Hoff, Adrian E. Raftery and Mark S. Handcock, "Latent Space Approaches to Social Network Analysis", Journal of the American Statistical Association 97 (2002): 1090--1098 [PDF preprint]
- Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, Marian Boguna, "Hyperbolic Geometry of Complex Networks", Physical Review E 82 (2010): 036106, arxiv:1006.5169
- Andreas Noack, "Modularity clustering is force-directed
layout", Physical Review E 79 (2009): 026102, arxiv:0807.4052
- Homework 2 due
Homework 3 assigned
- 26 September, Lecture 8: Cumulative advantage, preferential attachment,
and vertex-copying models
- Skew distributions from positive feedback
- Scribed lectures notes: by Matthew Babcock and Mo Li
- Reading: Kolaczyk, sections 6.3--6.4
- Optional reading:
- Newman, chapter 14 and section 15.1
- Derek J. de Solla Price, "Networks of Scientific Papers", Science 149 (1965): 510--515
- Price, "A General Theory of Bibliometric and Other Cumulative Advantage Processes", Journal of the American Society for Information Science 27 (1976): 292--306
- William J. Reed and Barry D. Hughes, "From Gene Families and Genera to Incomes and Internet File Sizes: Why Power Laws are so Common in Nature", Physical Review E 66 (2002): 067103
- Herbert A. Simon, "On a Class of Skew Distribution Functions", Biometrika
42 (1955): 425--440
- Réka Albert and Albert-László Barabási", "Statistical Mechanics of Networks", Reviews of Modern Physics 74 (2002): 47--97, arxiv:cond-mat/0106096
- Carsten Wiuf, Markus Brameier, Oskar Hagberg and Michael P. H. Stumpf, "A likelihood approach to analysis of network data", Proceedings of the National Academy of Sciences (USA) 103 (2006): 7566--7570 [Commentary]
- 28 September, Lecture 9: The scale-free network controversy
- Drawing straight lines on log-log plots makes the baby Gauss cry.
- Scribed lecture notes: by Alan Mishler
- Reading: Clauset, Shalizi and Newman, "Power-law Distributions in Empirical Data", SIAM Review 51 (2009): 661--703, arxiv:0706.1062
- Optional reading:
- 3 October, Lecture 10: Exponential-family random graph models I: the good news
- Scribed lecture notes: by Nicolas Kim and Shengming Luo
- Reading: Kolaczyk, section 6.5
- Optional reading:
- Newman, section 15.2
- Mark S. Handcock, David R. Hunter, Carter T. Butts,
Steven M. Goodreau, and Martina Morris (eds.), "Statistical Modeling
of Social Networks with 'statnet'", special volume (24) of the Journal
of Statistical Software (2008)
- Steven M. Goodreau, James A. Kitts and Martina Morris,
"Birds of a Feather, Or Friend of a Friend?: Using Exponential Random Graph Models to Investigate Adolescent Social Networks", Demography 46 (2009): 103--125
- Arun Chandrasekhar and Matthew O. Jackson, "Tractable and Consistent Random Graph Models", arxiv:1210.7375
- Juyong Park and Mark E. J. Newman, "The Statistical Mechanics of Networks", Physical Review E 70 (2004): 066117, arxiv:cond-mat/0405566
- 5 October, Lecture 11: Exponential-family random graph models II: the bad news
- Scribed lecture notes: by Abulhair Saparov
- Reading:
- Mark S. Handcock, "Assessing Degeneracy in Statistical Models of Social Networks", Center for Statistics and the Social Sciences, University of Washington,
Working Paper 39 (2003)
- Alessandro Rinaldo, Stephen E. Fienberg and Yi Zhou, "On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models", Electronic Journal of Statistics 3 (2009): 446--484
- Cosma Rohilla Shalizi and Alessandro
Rinaldo, "Consistency under Sampling of Exponential Random Graph
Models", Annals of
Statistics 41 (2013): 508--535, arxiv:1111.3054
- Optional reading:
Homework 3 due
- Final project begins: assignment
- 10 October, Lecture 12: Network models are statistical models
- Goodness-of-fit, link prediction, model comparison
- Scribed lecture notes: by Lynn Kaack
- Reading: Kolaczyk, section 7.2
- Optional reading:
- 12 October: NO LECTURE
- 17 October, Lecture 13: Dynamic processes on networks I: basic models
- Diffusion; percolation; epidemics; coupled oscillators; regression on neighbors
- Notes for this lecture: Many Cheerful Facts about the Laplacian
- Scribed lecture notes: by Xiaoying Tu
- Optional readings:
- Newman, chapters 16--18
- Mark E. J. Newman, "The spread of epidemic disease on networks", Physical Review E 66 (2002): 016128, arxiv:cond-mat/0205009
- Eben Kenah and James M. Robins, "Second look at the spread of epidemics on networks", Physical Review E 76 (2007): 036113, arxiv:q-bio/0610057
- Stephen Judd, Michael Kearns, and Yevgeniy Vorobeychik, "Behavioral dynamics and influence in networked coloring and consensus", Proceedings of the National Academy of Sciences
107 (2010): 14978--14982
- Seth A. Marvel, Travis Martin, Charles R. Doering, David Lusseau, M. E. J. Newman, "The small-world effect is a modern phenomenon", arxiv:1310.2636
- Laura M. Smith, Kristina Lerman, Cristina Garcia-Cardona, Allon G. Percus and Rumi Ghosh, "Spectral Clustering with Epidemic Diffusion",
Physical Review E 88 (2013): 042813, arxiv:1303.2663
- 19 October, Lecture 14: Dynamic processes on networks II: homophily vs. influence
- Or, "A social network is a machine for creating selection bias."
- Scribed lecture notes: by Shengming Luo and by Jacqueline Mauro
- Reading: Cosma Rohilla Shalizi and Andrew C. Thomas,
"Homophily and Contagion Are Generically Confounded in Observational Social
Network Studies",
Sociological
Methods and Research 40 (2011):
211--239, arxiv:1004.4704
- Optional reading:
- Thomas W. Valente, "Network Models and Methods for Studying the Diffusion of Innovations", pp. 98--116 in Carrington, Scott and Wasserman (eds.), Models and Methods in Social Network Analysis (Cambridge University Press, 2005)
- Nicholas A. Christakis and James H. Fowler, "The Spread of Obesity in a Large Social Network over 32 Years", The New England Journal of Medicine 357 (2007): 370--379
- Russell Lyons, "The Spread of Evidence-Poor Medicine via Flawed
Social-Network
Analysis", Statistics, Politics, and Policy 2
(2011): 1, arxiv:1007.2876
- Greg Ver Steeg and Aram Galstyan, "Statistical Tests for Contagion in Observational Social Network Studies", AISTATS 2013, arxiv:1211.4889
- Cosma Rohilla Shalizi and Edward McFowland III, "Controlling for Latent Homophily in Social Networks through Inferring Latent Locations", arxiv:1607.06565
- 20 October: Final projects due
Policies
Collaboration, Cheating and Plagiarism
This is a graduate-level class, so you are all expected to be willing and able
to follow the standards of academic integrity usual at the university. In
general, you are free to discuss homework with each other, though all
the work you turn in must be your own; you must not copy mathematical
derivations, computer output and input, figures or writing from anyone or
anywhere else, without reporting the source within your work. (This includes
copying, consulting or otherwise using other analyses of our data sets, in
older classes or the professional literature.) Unacknowledged copying or
unauthorized collaboration will lead to severe disciplinary action. Please
read the
CMU Policy
on Academic Integrity, and don't plagiarize.
If you are unsure about what is or is not appropriate, please ask me before
submitting anything; there will be no penalty for asking.
Physically Disabled and Learning Disabled Students
The Office of Equal Opportunity Services provides support services for both
physically disabled and learning disabled students. For individualized
academic adjustment based on a documented disability, contact Equal Opportunity
Services at eos [at] andrew.cmu.edu or (412) 268-2012.
Take care of yourself
Do your best to maintain a healthy lifestyle this semester by eating well,
exercising, avoiding excess drugs and alcohol, getting enough sleep and taking
some time to relax. This will help you achieve your goals and cope with
stress.
All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226
If the situation is life threatening, call the police:
On campus: CMU Police: 412-268-2323
Off campus: 911
If you have questions about this or your coursework, please let me know.