36-467/36-667
13 October 2020 (Lecture 13)
\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \newcommand{\Indicator}[1]{\mathbb{I}\left( #1 \right)} \newcommand{\se}[1]{\mathrm{se}\left[ #1 \right]} \newcommand{\CrossEntropy}{\ell} \newcommand{\xmin}{x_{\mathrm{min}}} \]
Normalized log-likelihoods for 10 different IID samples from the Pareto distribution (\(n=10\), \(\theta=1.5\), \(\xmin=1\))
Normalized log-likelihoods for 10 different IID samples from the Pareto distribution (\(n=10^3\), \(\theta=1.5\), \(\xmin=1\))
Normalized log-likelihoods for 10 different IID samples from the Pareto distribution (\(n=10^5\), \(\theta=1.5\), \(\xmin=1\))
Normalized log-likelihoods for the Pareto distribution, showing convergence as \(n\rightarrow\infty\) along a single IID sequence
(some disclaimers apply)
(still in 1D)
So \(\Var{\hat{\theta}_n} = \frac{1}{n} \mathbf{i}^{-1}(\theta_0)\), if the model is right
For any random variable \(Z\), and any \(\epsilon > 0\),
\[ \Prob{|Z-\Expect{Z}| > \epsilon} \leq \frac{\Var{Z}}{\epsilon^2} \]
Proof: Apply Markov’s inequality to \((Z-\Expect{Z})^2\), which is \(\geq 0\) and has expectation \(\Var{Z}\).
For any non-negative random variable \(Z\), and any \(\epsilon > 0\),
\[ \Prob{Z \geq \epsilon} \leq \frac{\Expect{Z}}{\epsilon} \]
Proof: \[\begin{eqnarray} Z & = & Z\Indicator{Z \geq \epsilon} + Z\Indicator{Z < \epsilon}\\ \Expect{Z} & = & \Expect{Z \Indicator{Z \geq \epsilon}} + \Expect{Z \Indicator{Z < \epsilon}}\\ & \geq & \Expect{Z \Indicator{Z \geq \epsilon}}\\ & \geq & \Expect{\epsilon \Indicator{Z \geq \epsilon}}\\ & = & \epsilon\Expect{\Indicator{Z \geq \epsilon}} = \epsilon \Prob{Z \geq \epsilon} \end{eqnarray}\]
where the last line uses Jensen’s inequality
(proof for pmfs is entirely parallel)
Barndorff-Nielsen, O. E., and D. R. Cox. 1995. Inference and Asymptotics. London: Chapman; Hall.
Clauset, Aaron, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51:661–703. http://arxiv.org/abs/0706.1062.
Fisher, R. A. 1922. “On the Mathematical Foundations of Theoretical Statistics.” Philosophical Transactions of the Royal Society A 222:309–68. http://digital.library.adelaide.edu.au/dspace/handle/2440/15172.
Huber, Peter J. 1967. “The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, edited by Lucien M. Le Cam and Jerzy Neyman, 1:221–33. Berkeley: University of California Press. http://projecteuclid.org/euclid.bsmsp/1200512988.
Koenker, Roger, and Kevin F. Hallock. 2001. “Quantile Regression.” Journal of Economic Perspectives 15:143–56. https://doi.org/10.1257/jep.15.4.143.
Stigler, Stephen M. 2007. “The Epic Story of Maximum Likelihood.” Statistical Science 22:598–620. https://doi.org/10.1214/07-STS249.
Vaart, A. W. van der. 1998. Asymptotic Statistics. Cambridge, England: Cambridge University Press.
White, Halbert. 1994. Estimation, Inference and Specification Analysis. Cambridge, England: Cambridge University Press.
Zeileis, Achim. 2004. “Econometric Computing with HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software 11 (10):1–17. https://doi.org/10.18637/jss.v011.i10.
———. 2006. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16 (9):1–16. https://doi.org/10.18637/jss.v016.i09.