\[ \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \]
This lecture and the next are going to cover common sources of technical failure in data mining projects — the kinds of issues which lead to them just not working. (Whether they would be worth doing even if they did work is another story, for the last lecture.) Today we’ll look at three sources of technical failure which are pretty amenable to mathematical treatment:
Next time we’ll cover issues about measurement, model design and interpretation. They’re less mathematical but actually more fundamental.
“Covariate shift” = \(\Prob{Y|X}\) stays the same but \(\Prob{X}\) changes
“Prior probability shift” or “class balance shift” = \(\Prob{X|Y}\) stays the same but \(\Prob{Y}\) changes
“Concept drift”1 = \(\Prob{X}\) stays the same, but \(\Prob{Y|X}\) changes, or, similarly, \(\Prob{Y}\) stays the same but \(\Prob{X|Y}\) changes
Anderson, Chris. 2008. “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” Wired 16 (17). http://www.wired.com/2008/06/pb-theory/.
Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford: Oxford University Press.
Cortes, Corinna, Yishay Mansour, and Mehryar Mohri. 2010. “Learning Bounds for Importance Weights.” In Advances in Neural Information Processing 23 [Nips 2010], edited by John Lafferty, C. K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and A. Culotta, 442–50. Cambridge, Massachusetts: MIT Press. http://papers.nips.cc/paper/4156-learning-bounds-for-importance-weighting.
Györfi, László, Michael Kohler, Adam Krzyżak, and Harro Walk. 2002. A Distribution-Free Theory of Nonparametric Regression. New York: Springer-Verlag.
Harrington, Anne. 1989. Medicine, Mind and the Double Brain: A Study in Nineteenth Century Thought. Princeton, New Jersey: Princeton University Press.
Kpotufe, Samory. 2011. “K-Nn Regression Adapts to Local Intrinsic Dimension.” In Advances in Neural Information Processing Systems 24 [Nips 2011], edited by John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando Pereira, and Kilian Q. Weinberger, 729–37. Cambridge, Massachusetts: MIT Press. http://papers.nips.cc/paper/4455-k-nn-regression-adapts-to-local-intrinsic-dimension.
Kpotufe, Samory, and Vikas Garg. 2013. “Adaptivity to Local Smoothness and Dimension in Kernel Regression.” In Advances in Neural Information Processing Systems 26 [Nips 2013], edited by C. J. C. Burges, Léon Bottou, Max Welling, Zoubin Ghahramani, and Kilian Q. Weinberger, 3075–83. Curran Associates. https://papers.nips.cc/paper/5103-adaptivity-to-local-smoothness-and-dimension-in-kernel-regression.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014a. “Google Flu Trends Still Appears Sick: An Evaluation of the 2013–2014 Flu Season.” Electronic pre-print, SSRN/2408560. https://doi.org/10.2139/ssrn.2408560.
———. 2014b. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343:1203–5. https://doi.org/10.1126/science.1248506.
Pick, Daniel. 1989. Faces of Degeneration: A European Disorder, C. 1848 – C. 1918. Cambridge: Cambridge University Press.
Quiñonero-Candela, Joaquin, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence, eds. 2009. Dataset Shift in Machine Learning. Cambridge, Massachusetts: MIT Press.
Shallice, Tim, and Richard P. Cooper. 2011. The Organisation of Mind. Oxford: Oxford University Press.
Why “concept drift”? Because some of the early work on classifiers in machine learning came out of work in artificial intelligence on learning “concepts”, which in turn was inspired by psychology, and the idea was that you’d mastered a concept, like “circle” or “triangle”, if you could correctly classify instances as belonging to the concept or not; this meant learning a mapping from the features \(X\) to binary labels. If the concept changed over time, the right mapping would change; hence “concept drift”.↩