Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Treelets -- An Adaptive Multi-Scale Basis for Sparse Unordered Data

Publication Date

July, 2007

Publication Type

Tech Report

Author(s)

Ann B. Lee, Boaz Nadler and Larry Wasserman

Abstract

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered -- with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity; the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets -- a novel construction of multi-scale orthonormal bases that extends wavelets to non-smooth signals. Treelets capture the internal structure of the data and can as a dimensionality reduction tool significantly improve inference and prediction. We examine a variety of situations where treelets outperform principal component analysis and some common variable selection methods. The proposed method is illustrated on a linear mixture model, and on two real data sets: internet advertisements and DNA microarray data.