This project will begin on Monday June 13th, and conclude with a 10-15 minute presentation on Friday, June 24th (either during the morning session from 10:30 to 12 PM or in the afternoon from 1:30 to 3 PM). The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.
Your team is expected to produce R Markdown
slides (an example template will be provided shortly) to accompany your 10-15 minute presentation with the following information:
Explanation of the data structure of the dataset,
Three hypotheses you are interested in exploring,
Three data visualizations exploring the hypotheses, at least two of which must be multivariate. Each visualization must be in a different format from the other two, and you must have at least one categorical and one continuous visualization.
One clustering example,
Conclusions reached for the hypotheses based on your EDA and data visualizations.
There will be two submission deadlines:
Friday, June 17th @ 5:00 PM EST - Each student will push their individual code for the project thus far to their GitHub accounts for review. We will then provide feedback on the code submitted.
Thursday, June 23rd @ 11:59 PM EST - Slides and full code must be completed and ready for presentation. Send your slides to Prof Yurko’s email (ryurko@andrew.cmu.edu). All code, visualizations, and presentations must be made in R
. Take advantage of examples from lecture and the presentation template, but also feel free to explore material online that may be relevant!
Your team is assigned the Hospital ratings data. This dataset was curated by the CORGIS Dataset Project to: “allow consumers to directly compare across hospitals performance measure information related to heart attack, emergency department care, preventive care, stroke care, and other conditions. The data is part of an Administration-wide effort to increase the availability and accessibility of information on quality, utilization, and costs for effective, informed decision-making.” Original source of data located here.
Each row of the dataset corresponds to a single hospital and has the following columns (with definitions borrowed the online glossary:
Facility.Name
: Name of the hospitalFacility.City
: City in which the hospital is locatedFacility.State
: Two letter capitalized abbreviation of the State in which the hospital is located (e.g., AZ is Arizona)Facility.Type
: Kind of organization operating the hospital: one of Government, Private, Proprietary, Church, or UnknownRating.Overall
: Overall rating between 1 and 5 stars, with 5 stars being the highest rating; -1 represents no rating.Rating.Mortality
: Above, Same, Below, or Unknown comparison to national hospital mortalityRating.Safety
: Above, Same, Below, or Unknown comparison to national hospital safetyRating.Readmission
: Above, Same, Below, or Unknown comparison to national hospital readmissionRating.Experience
: Above, Same, Below, or Unknown comparison to national hospital patience experienceRating.Effectiveness
: Above, Same, Below, or Unknown comparison to national hospital effectiveness of careRating.Timeliness
: Above, Same, Below, or Unknown comparison to national hospital timeliness of careRating.Imaging
: Above, Same, Below, or Unknown comparison to national hospital effective use of imagingProcedure.Heart Attack.Cost
: Average cost of care for heart attacksProcedure.Heart Attack.Quality
: Lower, Average, Worse, or Unknown comparison to national quality of care for heart attacksProcedure.Heart Attack.Value
: Lower, Average, Worse, or Unknown comparison to national cost of care for heart attacksProcedure.Heart Failure.Cost
: Average cost of care for heart failureProcedure.Heart Failure.Quality
: Lower, Average, Worse, or Unknown comparison to national quality of care for heart failuresProcedure.Heart Failure.Value
: Lower, Average, Worse, or Unknown comparison to national cost of care for heart failuresProcedure.Pneumonia.Cost
: Average cost of care for pneumoniaProcedure.Pneumonia.Quality
: Lower, Average, Worse, or Unknown comparison to national quality of care for pneumoniaProcedure.Pneumonia.Value
: Lower, Average, Worse, or Unknown comparison to national cost of care for pneumoniaProcedure.Hip Knee.Cost
: Average cost of care for hip or knee conditionsProcedure.Hip Knee.Quality
: Lower, Average, Worse, or Unknown comparison to national quality of care for hip or knee conditionsProcedure.Hip Knee.Value
: Lower, Average, Worse, or Unknown comparison to national cost of care for hip or knee conditions