Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Density-Sensitive Semisupervised Inference

Publication Date

April, 2012

Publication Type

Tech Report

Author(s)

Martin Azizyan, Aarti Singh, Larry Wasserman

Abstract

Semisupervised methods are techniques for using labeled data (X1,Y1),…,(Xn,Yn) together with unlabeled data Xn+1,…,XN to make predictions. These methods invoke some assumptions that link the marginal distribution PX of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of PX. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution PX. Our model includes a parameter α that controls the strength of the semisupervised assumption. We then use the data to adapt to α.