Data Explorers

NFL Passing Data Photo by Binyamin Mellish

This dataset contains all game passing statistics from the 2009-2017 NFL seasons for passers with at least 10 attempts in the games. It was collected by our own Ron Yurko via nflscrapR.

Broadway Photo by Jimmy Teoh

Data on Broadway shows, grouped over weeklong periods. Only shows that reported capacity were included, so the dataset stretches back to the 1990s. The dataset is made available by the Broadway League.

Bike Sharing Data Photo by Snapwire

Bike sharing systems are a new generation of bike rentals where the whole process from membership, rental and return is automated. Our concern is understanding how these bike sharing systems are used.

Forest Fires Photo by Matt Howard

Forest fires cause major economical and ecological damage and endanger humans and animals. Forest preservation relies tremendously on containment and prevention of wildfires. Some causes are natural (e.g. lightning); others are preventable (e.g. human negligence).

Diamonds Photo by Sina Katirachi

Diamonds, one of the most popular gemstones, are usually formed in high pressure, high temperature conditions deep underground. Their physical properties, extreme hardness and thermal conductivity, make them very good cutting and polishing tools.

Infection

The Study on the Efficacy of Nosocomial Infection Control (or SENIC) project was an effort to determine whether infection surveillance and control programs were associated with a reduced rate of infections contracted in United States hospitals.

College Rankings Photo by Element5 Digital

U.S. News & World Report College Rankings data for universities from a random sample of states in the U.S in the mid 1990s.

Fuel Economy Photo by Patrick Hendry

This dataset contains fuel economy data from 1999 and 2008 for 38 popular cars. It is a subset of the fuel economy data that the EPA makes available on fueleconomy.gov and has been made available as part of the RHelp library for ggplot2.

Heart Disease Photo by Robina Weermeijer

Health insurance premiums are decided by insurance companies by a subscriber’s age, medical history, and other lifestyle characteristics (diet, smoker status, etc). Predictions are made about a subscriber’s potential medical costs in the future; the premiums are set accordingly. What characteristics are associated with increased cost of claims by the subscribers?

Food Photo by Dan Gold

The United States Department of Agriculture (USDA) maintains a National Nutrient Database for Standard Reference.

Integrity Photo by Will O

This is data from a sample of 206 college students from a large public university on whether they would report cheating.

Grenada Photo by Priscilla Du Preez

Time series indicators for the country Grenada for the years 1960-2018. The data were obtained from the World Bank.

Movies Photo by Ahmet Yalçınkaya

IMDB is a popular website that collects movie data from studios and fans. Information was scraped from the website and made publicly available by Hadley Wickham. We are looking at about 1800 movies that have a known budget, length, and MPAA rating and were rated by at least one user at the time of scraping.

Science Forums Photo by John Schnobrich

In online discussion forums, a wide variety of information is collected about each post. This information can help moderators monitor the site, by helping them predict which posts will receive the most attention, and which may need to be moderated or deleted due to offensive or off-topic comments.

OKCupid Photo by Christiana Rivers

Data from public profiles on www.okcupid.com. The data set includes a sample of 1,500 people within a 25 mile radius of San Francisco, who were online in the last year before 6/30/2012, with at least one profile picture.

School Absence

In most countries, primary and secondary education are required for most children. Attendance is taken every day and students with a large number of unexcused absences are not allowed to move to the next grade level or graduate.

Worldbank Photo by Christine Roy

Data on demographic and economic indicators from the World Bank's DataBank service for 211 countries of the world. All values are for the year 2016.

New York City Photo by Luca Bravo

The New York City Housing and Vacancy Survey is done every three years in an attempt to accurately understand the current housing conditions in the New York City. The survey is well-designed and has an admirably high response rate.

Asteroids Photo by Vincentiu Solomon

Asteroid Data from the Jet Propulsion Laboratory at the California Institute of Technology.

Mashable Photo by Markus Spiske

You have been given information for a sample of Mashable articles collected over two years to determine how characteristics of published articles are related to their dissemination.

Intro Statistics Survey Photo by Nik MacMillan

Data collected from 2001-2003 from (non-random) voluntary anonymous online surveys of introductory statistics students at the University of Pittsburgh, collected at the beginning of the semesters.

Justice Photo by Bill Oxford

The Civil Justice Survey of State Courts, 2001 is a systematic examination of civil trials, specifically bench and trial jury cases, disposed in State general jurisdiction courts in the nation’s 75 most populous counties. In 2001, the Bureau of Justice Statistics awarded a grant to the National Center for State Courts to gather detailed information on tort, contract, and real property rights trials in 46 jurisdictions chosen to represent the 75 most populous counties in the nation.

Cleveland Clinic Data Photo by Arseny Togulev

The dataset you will be exploring contains 13 physical attributes of patients, as well as one indicator for the presence of heart disease in the patient. It was donated by the Cleveland Clinic in 1988. The goal is to determine whether and with what accuracy you can predict the presence of heart disease based on the given set of attributes.

World Cup Photo by Emilio Garcia

The 2010 Soccer World Cup was held in South Africa, and was ultimately won by Spain. Prior to the World Cup final, between Spain and the Netherlands, the Guardian published data on player performances. The data was collected by Opta, a company that gathers data on a range of sports around the world, and contains information on all 595 players who appeared in games before the final.

Cereal Photo by John Matychuk

The data come from the 1993 ASA Statistical Graphics Exposition, and are taken from the mandatory F&DA food label. The data have been normalized to a portion of one American cup.

Airlines Photo by Ethan McArthur

The Bureau of Transportation Statistics has been keeping track of every flight over at least the past twenty years and has collected data on every segment of the flight.

State of the Union Addresses

The State of the Union address is an annual speech given by the President of the United States of America to a joint session of the United States Congress. Included are the speeches from 1790 to 2016.