Cosma Shalizi

Data and Code for "Scaling and Hierarchy in Urban Economies"

This is not a substitute for reading the paper (arxiv:1102.4101).

Data

All the data files are comma-separated. The sources are (1) the Bureau of Economic Analysis's estimates of GDP by metropolitan area, accessed Dec. 2010 and Feb. 2011; (2) the BEA's estimates of personal income by metropolitan area, accessed Dec. 2010; and (3) the caption to the figure in M. H. Bornstein and H. G. Bornstein, "The Pace of Life", Nature 259 (1976): 557--559.

In the case of (1) and (2), I took the BEA's online data files, and added comment characters (#) at the beginning of the lines giving totals for all metropolitan areas, and some meta-data at the end of each file. Entries of (D) in the BEA tables indicate values which are not disclosed by the BEA because they would reveal too much about specific firms; my analysis treats these as missing data, and my code reads them in as NA values. In the case of (3), I typed in the data by hand.

real-gdp-by-city-2006.csv
Real gross domestic product, in millions of dollars, for metropolitan areas in 2006 (i.e., inflation-adjusted to 2001 dollars).
real-per-capita-gdp-by-city-2006.csv
Per-capita values of gross domestic product, in dollars, inflation-adjusted.
gdpmetro-financial-activities-2006.csv
Real GDP derived from financial activities for each city (millions of dollars).
gdpmetro-information-and-technology-2006.csv
Real GDP derived from the "information, communication and technology" (ICT) sector for each city (millions of dollars).
gdpmetro-management-of-companies-and-enterprises-2006.csv
Real GDP derived from "management of companies and enterprises" for each city (millions of dollars).
gdpmetro-professional-and-technical-services-2006.csv
Real GDP derived from "professional and technical services" for each city (millions of dollars).
personal-income-by-city-2006.csv
Personal income (all sources) for each city. Note: metropolitan areas are given in a different order, and with somewhat different names and identifying codes, in this file than in the preceding ones.
population-2006.csv
Populations of cities, in the same order as the personal income file.
pace-of-life.csv
Observed time needed to walk a given distance in various villages and cities, including error bars.

Code

The code for reproducing all figures and analyses is contained in urban-scaling.R, which was run under R v. 2.12 in December 2010 and February 2011. It calls for the R packages mgcv, VGAM, ddst and sfsmisc, all on CRAN, and was run using the versions current in December 2010. Executing the file will produce a series of post-script images (the figures), and populate the workspace with the functions and objects used in the analysis. See comments for details.