README for public data and reproducibility analysis code pertaining to 
"NIH Peer Review: Criterion Scores Completely Account for Racial Disparities in Overall Impact Scores" 
by Elena A. Erosheva, Sheridan Grant, Mei-Ching Chen, Mark D. Lindner, Richard K. Nakamura, and Carole J. Lee.

This directory contains:
1. README.txt (this file)
2. CSV data file (NIH-public-data_Erosheva-et-al.csv)
3. R Markdown reproducibility file (reproduce_Erosheva-et-al.RMD)
4. HTML output from the R Markdown reproducibility file (reproduce_Erosheva-et-al.html)
5. A folder with figures for viewing the HTML output file in-line.

Below, we provide descriptions of the data file and the reproducibility file contents.

2. CSV data file:

Study data was sampled from a full set of 54,740 R01 applications submitted by black and white PIs and reviewed by NIH’s Center for Scientific Review (CSR) during council years 2014-2016. The available variables include the variables of interest (applicant race, preliminary criterion and overall impact scores), the structural covariates (PI ID, application ID, reviewer ID, administering institute, IRG, and SRG), the matching variables---gender, ethnicity (Hispanic/Latino or not), career stage, type of academic degree, institution prestige (as reflected by the NIH funding bin), area of science (as reflected by the IRG handling the application), and application type (new or renewal) and status (amended or not)---as well as the final overall impact score. In addition, the file includes study group ID variable that refers to the Matched and Random subsets used in the study: the Matched subset consists of reviews in the "Matched White" and "Matched Black" study groups, whereas the Random subset consists of reviews in the "Random White", "Matched Black", and "All Black" study groups.

The rows of the data set are reviews of applications. The first line contains variable names. All of the variables are explained below. 

Study group: 
GROUP_ID: "Matched White", "Matched Black", "Random White", or "All Black" 

PI's race (self-identified):
PI_RACE: "White" or "Black"

Structural identifiers (random integers):
PI_ID: PI id
REVIEWER_ID: reviewer id
APPLICATION_ID: application id
IRG: IRG (Integrated Research Group) id
ADMIN_ORG: Administering Organization id
SRG: SRG (Scientific Research Group) id

Preliminary Criterion Scores (1-9 integer scale, 1 best; see main text for description):
SIGNIFICANCE_INIT
INVESTIGATOR _INIT
INNOVATION_INIT
APPROACH_INIT
ENVIRONMENT_INIT

Overall Impact Scores (see main text for description):
IMPACT_INIT: preliminary Overall Impact score (1-9 integer scale, 1 best)
IMPACT_FINAL: final Overall Impact score (1-9 integer scale, 1 best; "ND" refers to "not discussed")

Other matching variables:
APPLICATION_TYPE: "New" or "Renewal"
PI_GENDER: "Male" or "Female"
PI_ETHNICITY: "Hispanic/Latino" or "Non-Hispanic"
CAREER_STAGE: "ESI" (Early Stage Investigator), "Experienced" (Experienced Investigator), or "Non-ES NI" ("Non-Early Stage New Investigator")
DEG_CATEGORY: "PhD", "MD", "MD/PhD", or "Others"
INSTITUTION_BIN: Lead PI's institution's FY 2014 total institution NIH funding; 5 bins with 1 being most-funded (see Supplement for exact definition)

3. R Markdown reproducibility file with R code

The R Markdown code file provides the complete R code necessary to reproduce Tables and Figures from the Reproducibility section of the Supplement. Knitting the RMD file produces HTML output that can be viewed in any internet browser.

4. HTML output from RMD code

This is the output that results from "knitting" the RMD file. It includes code, results from running code, as well as text explanations of what the code is producing.