Introduction to the Course

Cosma Shalizi

1 September 2020, 36-467/667

Data over Space and Time

Data over Space and Time

(From [https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html] on 2020-08-31)

Data over Space and Time

(From [https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html] on 2020-08-31)

Data over Space and Time

(From [https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html] on 2020-08-31)

Data over Space and Time

Special Statistical Issues

Course Mechanics

Homework

Every week, Thursdays at 6 pm (Pittsburgh)

After-Lecture Questions

Online through Canvas, the day after each class, short answers

Office hours

Textbooks

Gidon Eshel, Spatiotemporal Data Analysis

Required, but full text PDF through JSTOR

Textbooks

Peter Guttorp, Stochastic Modeling of Scientific Data

Textbooks

Paul Teetor, The R Cookbook

Recommended; consult as needed

Lectures

Assignments/Grading

  1. After-class review questions and exercises (10%)
  2. Weekly homework (90%)

NO exams

After-class review questions and exercises

Homework

R Markdown

Collaboration, Cheating & Plagiarism

Any questions?

What are the big issues?

Problems:

  1. Basic statistical theory is about independent, identically distributed (IID) data.
  2. But we (usually) only see one realization of a whole process.
  3. Every observation is dependent on every other observation.
  4. Basic statistical theory says that \(n=1\) and refuses to draw any inferences.

How are we going to deal with these issues?

Why is this worth knowing?

So what are we going to cover?

What we will not cover

To get started…

Cherry blossoms in Kyoto

Cherries at the Hirano shrine in Kyoto (David Montasco on flickr)

Flowering of cherry trees has been a central part of Japanese high art & culture for well over a millennium

Hanami

Kitao Shigemasa, Sangatsu, Asukayam Hanami = Third Lunar Month, Blossom Viewing at Asuka Hill, c. 1776, via Library of Congress

Notice the date in the title!

This is data!

Cherry blossoms track climate

Snow at the Hirano shrine (yopparainokobito on flickr)

A data set

A data set

Problems

Smoothed cherry blossoms

Both curves come from averaging observed values

More concrete problems

We’re going to need to build some concepts

By Thursday:

Take-aways


  1. Calvino’s Cosmicomics is a precious part of our common cultural heritage.

  2. The Pillow Book of Sei Shonagon is a precious part of our common cultural heritage.

  3. The traditional Japanese calendar system (from 645) didn’t have an accumulating count of years the way we do, but rather reckoned years by “name eras”, so a given year would be called something like “year \(k\) of the reign of Emperor So-and-so.” (Some emperors had more than one name era, and the name of the era was not the emperor’s name but one he chose, but that was the basic idea.) This is as though we called this year 3 of Trump, called 2016 year 8 of Obama, etc. Part of the work of compiling data like this is to keep track of when, in our terms, each name era began. Japan in fact still has name eras for some official purposes (this is year 3 of the Reiwa era), but in 1873, as part of the Meiji Revolution, the government adopted the Gregorian calendar and the common era, the year-numbering scheme formerly known as AD/BC. (“Meiji” is itself an era name.) More-or-less similar schemes, where the count of years resets when the ruler changes, have been very common across the world; unending sequential year numbers seem to have been invented twice, by the Seleucid dynasty in what we now call the Middle East around 300 BCE, with remarkable consequences, and in Central America, in the form of the “long count” calendar used by the Mayans and other civilizations and reckoning days since 11 August 3114 BCE. (The calendar was certainly invented much more recently and we don’t know why that was their zero-day.)