September, 2014

Tech Report

We propose Partial Correlation Screening (PCS) as a new row-by-row approach to estimating a large precision matrix Omega. To estimate the i-th row of Omega, 1 i \leq p, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index j that has the largest (in magnitude) empirical partial correlation with i. In the Clean step, PCS re-investigates all recruited indices and use them to reconstruct the i-th row of Omega.

PCS is computationally efficient and modest in memory use: to estimate a row of Omega, it only needs a few rows (determined sequentially) of the empirical covariance matrix. This enables PCS to execute the estimation of a large precision matrix (e.g., p=10K) in a few minutes, and open doors to estimating much larger precision matrices. We use PCS for classification. Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential in practice, one needs a good estimate of the precision matrix Omega. Combining HCT with any approach to estimating Omega gives a new classifier: examples include HCT-PCS and HCT-glasso. We have applied HCT-PCS to two large microarray data sets (p = 8K and 10K) for classification, where it not only significantly outperforms HCT-glasso, but also is competitive to the Support Vector Machine (SVM) and Random Forest (RF). The results suggest that PCS gives more useful estimates of Omega than the glasso.

We set up a general theoretical framework and show that in a broad context, PCS fully recovers the support of Omega and HCT-PCS yields optimal classification behavior. Our proofs shed interesting light on the behavior of stage-wise procedures.