Linear Classifiers and Logistic Regression

36-462/36-662, Spring 2020

4 February 2020

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \]

Context

The prototypical case for the prototype method

The prototype method

The prototype method in action

The prototype method in action

The prototype method in action

The prototype method…

The boundary

The boundary, with algebra

\[\begin{eqnarray*} \| \vec{x}_0 - \vec{m}_1 \| & = & \|\vec{x}_0 - \vec{m}_0\| \\ \| \vec{x}_0 - \vec{m}_1 \|^2 & = & \|\vec{x}_0 - \vec{m}_0\|^2 \\ \|\vec{x}_0\|^2 - 2\vec{x}\cdot\vec{m}_1 + \|\vec{m}_1\|^2 & = & \|\vec{x}_0\|^2 - 2\vec{x}\cdot\vec{m}_0 + \|\vec{m}_0\|^2 \\ \|\vec{m}_1\|^2 - \|\vec{m}_0\|^2 & = & \vec{x}_0 \cdot 2(\vec{m}_1 - \vec{m}_0)\\ \end{eqnarray*}\]

Linear classifiers

linear.classifier = function(x, coefficients, offset) {
  # The following is actually a (multiple of) the directed distance
  distance.from.plane = function(z) { offset + z %*% coefficients }
  directed.distances = apply(x, 1, directed.distance.from.plane)
  return(ifelse(directed.distances >= 0, 1, 0))
}

Margin

Estimating a linear classifier

Working probabilities back in

First try: linear probability models

Second try: find a transformation of the probability that’s linear

Think about the likelihood

Logistic regression

The logistic curve

Thinking through logistic regression

How do we estimate logistic regression?

How do we estimate logistic regression?

glm(y ~ x1 + x2, data = df, family = "binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:  glm(formula = y ~ x1 + x2, family = "binomial", data = df)
## 
## Coefficients:
## (Intercept)           x1           x2  
##     0.07124      1.66586      1.44060  
## 
## Degrees of Freedom: 199 Total (i.e. Null);  197 Residual
## Null Deviance:       277.3 
## Residual Deviance: 1.849e-09     AIC: 6

Why logistic regression?

Summing up

Going beyond linear classification

Backup: Optimizing the log-likelihood

Why “logistic”?

References

Butler, Ronald W. 1986. “Predictive Likelihood Inference with Applications.” Journal of the Royal Statistical Society B 48:1–38. http://www.jstor.org/stable/2345635.