Jiashun's Research

Reports

I have sorted my published papers and submitted manuscripts in two ways: by year and by areas.

Research description


My primary research area is in statistical inference for Big Data, focusing on how to better the inference by exploiting all kinds of sparsity we see today (signal sparsity, graph sparsity, sparsity in eigenvalues of matrices, etc.).

Vision

Many of my work have been orbited around the vision that in many application examples of Big Data we see today, the signals are Rare are Weak. To me, Rare and Weak signals is not a mathematical curiosity but is the unavoidable consequence of the trend of "large p and small n" we frequently see with Big Data.

When you collect data with increasingly more features (i.e., increasing large dimensions), the signals tend to be increasingly more sparse as the number of true features would not increase proportionally. At the same time, in many cases we can not enroll sufficient subjects for experiments (such as study on a rare disease), so the sample size would not grow proportionally with the number of features, and the signals end up being weak.

In this "Rare and Weak" situation, classical methods and most contemporary empirical methods are simply overwhelmed, and principled statistical approach are badly in need.

Research Topics

In the past years, I have explored the following topics in high dimensional data analysis, where in a significant fraction of the work, the theme in on "Rare and Weak" signals.

Applications

My research are motivated by many interesting problems in various application areas.

Methods

I have developed and co-developed four groups of new methods appropriate for Rare and Weak signals.

Theory

I have a strong interest in statistical theory, and I am especially fond of the so-called "Phase Diagram" which is a novel way to justify optimality. The phase diagram can be viewed as a new criterion for optimality that is especially appropriate for Rare and Weak signals in Big Data.

Just like there are three phases for water (water vapor, water, and ice), there are three phases for many given statistical problems (variable selection, classification, multiple testing, spectral clustering). The phase diagram is a two-dimensional parameter space, where the x-axis calibrates the signal rarity, and the y-axis calibrates the signal strength. For a particular statistics problem, say, variable selection, the phase space usually partitions into three sub-regions (and so the name of phase diagrams), Phase I-III. In the past years, I have worked out the phase diagrams for the following problems.