Name:
Andrew ID:
Collaborated with:
This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Thursday 10pm, this week.
This week’s agenda: practicing version control with Git and Github
If you do not have a GitHub account, you should sign up for one before proceeding.
If you have not installed and configured Git, you should do that before proceeding.
1a. Show us that you have a GitHub account. Create a repository on GitHub called “36-350”. Then edit the code below so that we see the contents of README.md for that repo. To get the correct URL, do the following: go to your GitHub repo, click on 36-350 and then again on README.md, and click on the “Raw” button. Copy and paste the URL to the raw README.md file into the call to readLines() below.
NOTE: If you encounter 404 errors make sure that your repo is public.
# readLines("Enter the URL to README.md Here")
1b. Show us that you have Git installed on your computer. Create a new project within RStudio that is tied to your “36-350” repo on GitHub. Then create a new R Script (and not an R Markdown file) in which you put print(“Hello, world!”). Save this file (call it hello_world.R) to your local “36-350” repo. Stage the file, commit the file (and add a suitable commit message), and push the file to GitHub. Follow the steps above to find the URL to the raw file for hello_world.R and copy and paste that URL below in the call to source_url(). If everything works, “Hello, world!” should appear.
#install.packages("devtools") # uncomment this if you need to install devtools
library(devtools)
#source_url("Enter the URL to hello_world.R Here")
In the following questions create a separate file lab_12_simulation.R
in the 36-350
repo to implement the following tasks. After completing each task (in order) make a commit of the necessary changes to complete the task. Each commit should be specific and complete. Your commit messages should be informative.
For submission fill in the readLines
calls with the link to the patch file for the corresponding commit. You can find this link by opening the commit on Github and appending “.patch” to the end of the hash. Alternatively you can just copy and paste the commit’s hash into “https://github.com/$(user)/$(repo)/commits/$(hash).patch” where “\((user)\)” is your username and “$(repo)” is your repo’s name.
generate_data(n, p)
which returns a list with the following elements: covariates
which is a n-by-p matrix of draws from the standard normal distribution, and responses
which is a vector of length n of draws from the standard normal.# readLines("Enter the URL to relevant commit of lab_12_simulation.R")
model_select(covariates, responses, cutoff)
which fits the linear regression responses ~ covariates
and retains only those covariates whose coefficient p-values are less than or equal to cutoff
. Then fit another regression using only the retained covariates and return the p-values from this reduced model. If there are no retained covariates return an empty vector. HINT: You can use indexing inside of formulas: lm(responses ~ covariates[, c(1, 2)])
will fit a regression with only the first two covariates.# readLines("Enter the URL to relevant commit of lab_12_simulation.R")
run_simulation(n_trials, n, p, cutoff)
which uses the previous two functions to run n_trials
simulations which uses data from generate_data
in model_select
, collects the returned p-values and displays a histogram of the p-values. Under the null hypothesis (that the regression coefficients are zero) these p-values should be uniformly distributed between 0 and 1; does this seem to be the case? Display figures for all combinations of n = c(100, 1000, 10000)
, p = 10, 20, 50
and set cutoff = 0.05
.# readLines("Enter the URL to relevant commit of lab_12_simulation.R")
make_plot(datapath)
which reads the data from datapath
and makes the plot.# readLines("Enter the URL to relevant commit of lab_12_simulation.R")
data_split_select(covariates, responses, cutoff)
which follows this strategy. Are the p-values correct now?# readLines("Enter the URL to relevant commit of lab_12_simulation.R")