\[ \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \]
This lecture and the next are going to cover common sources of technical failure in data mining projects — the kinds of issues which lead to them just not working. (Whether they would be worth doing even if they did work is another story, for the last lecture.) Today we’ll look at three sources of technical failure which are pretty amenable to mathematical treatment:
Next time we’ll cover issues about measurement, model design and interpretation. They’re less mathematical but actually more fundamental.
“Covariate shift” = \(\Prob{Y|X}\) stays the same but \(\Prob{X}\) changes
“Prior probability shift” or “class balance shift” = \(\Prob{X|Y}\) stays the same but \(\Prob{Y}\) changes
“Concept drift”1 = \(\Prob{X}\) stays the same, but \(\Prob{Y|X}\) changes, or, similarly, \(\Prob{Y}\) stays the same but \(\Prob{X|Y}\) changes
Why “concept drift”? Because some of the early work on classifiers in machine learning came out of work in artificial intelligence on learning “concepts”, which in turn was inspired by psychology, and the idea was that you’d mastered a concept, like “circle” or “triangle”, if you could correctly classify instances as belonging to the concept or not; this meant learning a mapping from the features \(X\) to binary labels. If the concept changed over time, the right mapping would change; hence “concept drift”.↩