Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 11 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 11 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 11:59pm on Tuesday November 22. This document contains 15 of the 30 total points for Homework 11.

Practice with SQL computations

Hw11 Bonus. Calculate each player’s hitting percentage, which is their number of hits (H) divided by number of at bats (AB), but only over seasons in which they had at least 100 at bats. Display the players with the top 10 highest average hitting percentages, along with their average hitting percentages.

Practice with SQL join operations

Hw11 Q3 (2 points). Using the “Fielding” table, list the 10 worst (highest) number of errors (E) committed by a player in a season, only considering the year 1990 and later. In addition to the number of errors, list the year and player ID for each record.

Hw11 Q4 (4 points). By appropriately merging the “Fielding” and “Salaries” tables, list the salaries for each record that you extracted in the last question. Then, answer the following question: what was the highest salary paid to a player who made at least 30 errors in a season, after 1990?

Back to payrolls

Hw11 Q5 (9 points). In Hw11 Q2, we investigated the payrolls of teams in 2010. Here we pick up on the same topic. Use a SQL query to compute the total payroll for each team, added up over all years between 1985 and 2010. Your output should be a data frame of dimension 34 x 2, where the columns display the team ID, and total payroll. Check that your data frame has the proper dimensions, and display its first 10 rows. What team has the highest total payroll?

Then use a SQL query to compute the payroll for each team, but now separately for each year in between 1985 and 2010. (Hint: GROUP BY can take two arguments, separated by a comma.) Your output should be a data frame of dimension 738 x 3. The 3 columns should be display the team ID, year, and payroll. Check that your data frame has the proper dimensions, and display its last 20 rows.

Finally, plot the Pittsburgh Pirates’ payroll over time, i.e., over the years 1985 to 2010, with appropriately labeled axes and an appropriate title. What is the trend that you see?

Hw11 Bonus. Pick a few other baseball teams (other than the Pittsburgh Pirates) and plot their payrolls over time, using the data frame you created in the last question. Make sure the axes and titles are set appropriately. What is the general trend that you tsee?

To make these plots more sensible, we need to adjust for inflation. The following reads in consumer price index or CPI, from the years 1985 through 2011. This has been calculated from FRED (the Federal Reserve Economic Data service), using the package fImport.

cpi = as.numeric(read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/cpi.txt")[,1])

The vector cpi is normalized so that \(1=\$1\) in 2011. So an expression like y = x/cpi[1990-1985+1] will convert x = 1990 dollars into y = 2011 dollars.

Calculate the inflation-adjusted payroll of each baseball team over time, using the data frame you created in Hw11 Q5. Then reproduce the plots you made earlier, of the Pittsburgh Pirates’ payroll and other teams’ payrolls over time. How do these inflation-adjusted versions compare to the originals?