Cosma Shalizi
Data and Code for "Scaling and Hierarchy in Urban Economies"
This is not a substitute for reading the paper (arxiv:1102.4101).
Data
All the data files are comma-separated. The sources are (1) the Bureau of
Economic Analysis's estimates
of GDP by metropolitan area, accessed Dec. 2010 and Feb. 2011; (2) the
BEA's estimates of personal income
by metropolitan area, accessed Dec. 2010; and (3) the caption to the figure
in M. H. Bornstein and H. G. Bornstein, "The Pace of
Life", Nature 259
(1976): 557--559.
In the case of (1) and (2), I took the BEA's online data files, and added
comment characters (#) at the beginning of the lines giving totals for
all metropolitan areas, and some meta-data at the end of each file. Entries
of (D) in the BEA tables indicate values which are not disclosed by
the BEA because they would reveal too much about specific firms; my analysis
treats these as missing data, and my code reads them in as NA values. In the
case of (3), I typed in the data by hand.
- real-gdp-by-city-2006.csv
- Real gross domestic product, in millions of dollars, for metropolitan areas
in 2006 (i.e., inflation-adjusted to 2001 dollars).
- real-per-capita-gdp-by-city-2006.csv
- Per-capita values of gross domestic product, in dollars, inflation-adjusted.
- gdpmetro-financial-activities-2006.csv
- Real GDP derived from financial activities for each city (millions of dollars).
-
gdpmetro-information-and-technology-2006.csv
- Real GDP derived from the "information, communication and technology" (ICT)
sector for each city (millions of dollars).
- gdpmetro-management-of-companies-and-enterprises-2006.csv
- Real GDP derived from "management of companies and enterprises" for
each city (millions of dollars).
-
- gdpmetro-professional-and-technical-services-2006.csv
- Real GDP derived from "professional and technical services" for each city (millions of dollars).
- personal-income-by-city-2006.csv
- Personal income (all sources) for each city. Note: metropolitan areas are given in a different order, and with somewhat different names and
identifying codes, in this file than in the preceding ones.
- population-2006.csv
- Populations of cities, in the same order as the personal income file.
- pace-of-life.csv
- Observed time needed to walk a given distance in various villages
and cities, including error bars.
Code
The code for reproducing all figures and analyses is contained
in urban-scaling.R, which was run
under R v. 2.12 in December 2010 and
February 2011. It calls for the R packages mgcv, VGAM, ddst and
sfsmisc, all on CRAN, and was
run using the versions current in December 2010. Executing the file will
produce a series of post-script images (the figures), and populate the
workspace with the functions and objects used in the analysis. See comments
for details.