Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 11 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 11 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 11:59pm on Tuesday November 22. This document contains 15 of the 30 total points for Homework 11.

Practice with SQL data extraction

dbGetQuery(con, paste("SELECT playerID, yearID, AB, H, HR",
                      "FROM Batting",
                      "ORDER BY yearID",
                      "LIMIT 10"))

dbGetQuery(con, paste("SELECT playerID, yearID, AB, H, HR",
                      "FROM Batting",
                      "ORDER BY HR DESC",
                      "LIMIT 10"))

dbGetQuery(con, paste("SELECT playerID, yearID, AB, H, HR",
                      "FROM Batting",
                      "WHERE HR > 55",
                      "ORDER BY HR DESC"))

dbGetQuery(con, paste("SELECT playerID, yearID, AB, H, HR",
                      "FROM Batting",
                      "WHERE yearID >= 1990 AND yearID <= 2000",
                      "ORDER BY HR DESC",
                      "LIMIT 10"))

Practice with SQL computations

dbGetQuery(con, paste("SELECT AVG(HR)",
                      "FROM Batting"))

dbGetQuery(con, paste("SELECT SUM(HR)",
                      "FROM Batting"))

dbGetQuery(con, paste("SELECT playerID, yearID, teamID, MAX(HR)",
                      "FROM Batting"))

dbGetQuery(con, paste("SELECT AVG(HR)",
                      "FROM Batting",
                      "WHERE yearID >= 1990"))

Hw11 Q1 (7 points). Remove eval=FALSE from the preamble to the next R code chunk below. Then, after each SQL query, explain in words what is being extracted, and write one line of R code (sometimes you might need two lines) in which you call one of the d*ply() functions in order to do the equivalent on batting. (Note: the SQL query here uses GROUP BY, which is explained in Wednesday’s mini-lecture “SQL: Queries”, so you may want to work on this homework problem after Wednesday’s class, or read ahead.)

dbGetQuery(con, paste("SELECT teamID, AVG(HR)",
                      "FROM Batting",
                      "WHERE yearID >= 1990",
                      "GROUP BY teamID",
                      "LIMIT 5"))

dbGetQuery(con, paste("SELECT teamID, AVG(HR)",
                      "FROM Batting",
                      "WHERE yearID < 1960",
                      "GROUP BY teamID",
                      "ORDER BY AVG(HR) DESC",
                      "LIMIT 5"))

dbGetQuery(con, paste("SELECT teamID, yearID, AVG(HR)",
                      "FROM Batting",
                      "WHERE yearID == 1991 OR yearID == 1992",
                      "GROUP BY teamID, yearID",
                      "ORDER BY AVG(HR) DESC",
                      "LIMIT 15"))

Hw11 Q2 (8 points). Using dbReadTable(), grab the table named “Salaries” and save it as a data frame in your R session, called salaries. Check that salaries has dimension 21464 x 5, and display its first 5 rows.

Using the salaries data frame and ddply(), compute the payroll (total of salaries) for each team in the year 2010. Display the first 5 rows of the output. Then list the 3 teams with the highest payrolls, and the team with the lowest payroll (ouch!).

Recompute the payroll for each team in 2010, but now with dbGetQuery() and an appropriate SQL query. In particular, the output of dbGetQuery() should be a data frame with two columns, the first giving the team names, and the second the payrolls, just like your output from dadly(). (Hint: your SQL query here will have to use GROUP BY, which is explained in Wednesday’s mini-lecture “SQL: Queries”, so you may want to work on this homework problem after Wednesday’s class, or read ahead.) Display the first 5 rows of your data frame, and verify that the salaries you computed here are the same as those from ddply().