36-402, Section A
7 February 2019
\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\StdErr}[1]{\mathrm{se}\left[ #1 \right]} \newcommand{\EstStdErr}[1]{\widehat{\mathrm{se}}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Var}[1]{\mathbb{V}\left[ #1 \right]} \]
stocks <- read.csv("http://www.stat.cmu.edu/~cshalizi/uADA/19/hw/03/stock_history.csv")
stocks$MAPE <- with(stocks, Price/Earnings_10MA_back)
returns.on.mape.transforms <- lm(Return_10_fwd ~ I(1/MAPE) + MAPE + I(MAPE^2), data=stocks)
summary(returns.on.mape.transforms)
##
## Call:
## lm(formula = Return_10_fwd ~ I(1/MAPE) + MAPE + I(MAPE^2), data = stocks)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.110743 -0.029043 0.002934 0.028354 0.099453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.550e-02 2.612e-02 0.976 0.329
## I(1/MAPE) 7.356e-01 1.268e-01 5.801 8.07e-09 ***
## MAPE -2.194e-04 1.559e-03 -0.141 0.888
## I(MAPE^2) -3.578e-05 2.679e-05 -1.336 0.182
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0421 on 1480 degrees of freedom
## (240 observations deleted due to missingness)
## Multiple R-squared: 0.358, Adjusted R-squared: 0.3567
## F-statistic: 275.1 on 3 and 1480 DF, p-value: < 2.2e-16
\[
R_i = \beta_0 + \beta_1\frac{1}{M_i} + \beta_2 M_i + \beta_3 M_i^2 + \epsilon_i, ~ \Expect{\epsilon|M} = 0
\] lm
also assumes \(\epsilon_i\) IID Gaussian, \(~ \mathcal{N}(0, \sigma^2)\)
lm
Test?x <- with(na.omit(stocks), cbind(1/MAPE, MAPE, MAPE^2))
colnames(x) <- c("1/MAPE", "MAPE", "MAPE^2")
var(x)
## 1/MAPE MAPE MAPE^2
## 1/MAPE 0.0009283645 -0.1662826 -5.736245
## MAPE -0.1662826454 42.2134165 1736.549657
## MAPE^2 -5.7362445792 1736.5496566 77562.536040
## 1/MAPE MAPE MAPE^2
## 1/MAPE 1.0000000 -0.8399673 -0.6759933
## MAPE -0.8399673 1.0000000 0.9597010
## MAPE^2 -0.6759933 0.9597010 1.0000000
n <- 1e4; p <- 40
predictors <- matrix(rnorm(n*p), nrow=n, ncol=p); response<-rnorm(n) # no relationship at all
summary(lm(response ~ predictors))
##
## Call:
## lm(formula = response ~ predictors)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3460 -0.6627 0.0075 0.6704 3.7687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0041641 0.0100825 -0.413 0.6796
## predictors1 0.0121260 0.0100739 1.204 0.2287
## predictors2 0.0069030 0.0100144 0.689 0.4906
## predictors3 0.0086452 0.0101618 0.851 0.3949
## predictors4 -0.0021411 0.0100596 -0.213 0.8315
## predictors5 0.0007777 0.0099655 0.078 0.9378
## predictors6 0.0081664 0.0100775 0.810 0.4177
## predictors7 -0.0039913 0.0100751 -0.396 0.6920
## predictors8 0.0125959 0.0100846 1.249 0.2117
## predictors9 -0.0012746 0.0100264 -0.127 0.8988
## predictors10 0.0207467 0.0100252 2.069 0.0385 *
## predictors11 0.0078052 0.0099835 0.782 0.4343
## predictors12 -0.0134649 0.0100076 -1.345 0.1785
## predictors13 0.0054900 0.0101045 0.543 0.5869
## predictors14 0.0034054 0.0100011 0.341 0.7335
## predictors15 -0.0043387 0.0101054 -0.429 0.6677
## predictors16 -0.0134847 0.0101699 -1.326 0.1849
## predictors17 0.0172360 0.0100663 1.712 0.0869 .
## predictors18 0.0097484 0.0100367 0.971 0.3314
## predictors19 0.0045021 0.0099840 0.451 0.6520
## predictors20 0.0073811 0.0100638 0.733 0.4633
## predictors21 0.0183324 0.0099965 1.834 0.0667 .
## predictors22 0.0101453 0.0101195 1.003 0.3161
## predictors23 -0.0072479 0.0100816 -0.719 0.4722
## predictors24 0.0010313 0.0101461 0.102 0.9190
## predictors25 -0.0064636 0.0100027 -0.646 0.5182
## predictors26 -0.0123409 0.0101139 -1.220 0.2224
## predictors27 0.0129432 0.0100371 1.290 0.1972
## predictors28 -0.0081849 0.0099898 -0.819 0.4126
## predictors29 0.0040183 0.0101693 0.395 0.6927
## predictors30 0.0076783 0.0100893 0.761 0.4467
## predictors31 0.0024647 0.0099502 0.248 0.8044
## predictors32 0.0058126 0.0099462 0.584 0.5590
## predictors33 0.0063024 0.0100313 0.628 0.5298
## predictors34 0.0079884 0.0099747 0.801 0.4232
## predictors35 -0.0046969 0.0100587 -0.467 0.6405
## predictors36 0.0212055 0.0100044 2.120 0.0341 *
## predictors37 -0.0138073 0.0099478 -1.388 0.1652
## predictors38 -0.0028815 0.0100857 -0.286 0.7751
## predictors39 -0.0201438 0.0101221 -1.990 0.0466 *
## predictors40 0.0173406 0.0101536 1.708 0.0877 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.006 on 9959 degrees of freedom
## Multiple R-squared: 0.0043, Adjusted R-squared: 0.000301
## F-statistic: 1.075 on 40 and 9959 DF, p-value: 0.344
rnorm()
, we could do this simulation, and we’d get \(P\)-values that match lm
’s calculationslm()
Bahadur, R. R. 1967. “Rates of Convergence of Estimates and Test Statistics.” Annals of Mathematical Statistics 38:303–24. https://doi.org/10.1214/aoms/1177698949.
———. 1971. Some Limit Theorems in Statistics. Philadelphia: SIAM Press.
Borges, Jorge Luis. n.d. Ficciones. New York: Grove Press.
Mayo, Deborah G. 1996. Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press.
Mayo, Deborah G., and D. R. Cox. 2006. “Frequentist Statistics as a Theory of Inductive Inference.” In Optimality: The Second Erich L. Lehmann Symposium, edited by Javier Rojo, 77–97. Bethesda, Maryland: Institute of Mathematical Statistics. http://arxiv.org/abs/math.ST/0610846.
Seber, George A. F., and Alan J. Lee. 2003. Linear Regression Analysis. Second. New York: Wiley.
Vaart, A. W. van der. 1998. Asymptotic Statistics. Cambridge, England: Cambridge University Press.