Astrostatistics is the study of stars, galaxies and the large scale structure of the Universe. Using data from telescopes and satellites, astrostatisticians study questions about the origin, evolution and fate of the universe. In the last decade, there has been a deluge of valuable data and statisticians play an important role in analyzing these data. Genovese and Wasserman are founding members of the International Computational Astrostatistics (INCA) group, a cross-disciplinary research team consisting of astrophysicists, statisticians and computer scientists. Within the department, several faculty, post-docs, and graduate students are members of the group; and other active members are drawn from the other departments at Carnegie Mellon, the University of Pittsburgh, and several international institutions. The statistics department works closely with the McWilliams Center for Cosmology at Carnegie Mellon as well as with the Department of Physics and Astronomy at the University of Pittsburgh. Recent projects include: analysis of the cosmic microwave background, estimating the dark energy equation of state, analysis of galaxy spectra, detecting galaxy clusters, identifying filaments, and estimating density functions with truncated data. A common theme in this work is the goal of detecting subtle, nonlinear signals in noisy, high-dimensional data. Our primary focus is on using state-of-the-art data, and analytical methods, to advance cosmology.

Cosmic Microwave Background

We analyze first-year data of WMAP to determine the significance of asymmetry in summed power between arbitrarily defined opposite hemispheres, using maps that we create ourselves with software developed independently of the WMAP team. We find that over the multipole range l=[2,64], the significance of asymmetry is ~ 10^-4, a value insensitive to both frequency and power spectrum. We determine the smallest multipole ranges exhibiting significant asymmetry, and find twelve, including l=[2,3] and [6,7], for which the significance -> 0. In these ranges there is an improbable association between the direction of maximum significance and the ecliptic plane (p ~ 0.01). Also, contours of least significance follow great circles inclined relative to the ecliptic at the largest scales. The great circle for l=[2,3] passes over previously reported preferred axes and is insensitive to frequency, while the great circle for l=[6,7] is aligned with the ecliptic poles. We examine how changing map-making parameters affects asymmetry, and find that at large scales, it is rendered insignificant if the magnitude of the WMAP dipole vector is increased by approximately 1-3 sigma (or 2-6 km/s). While confirmation of this result would require data recalibration, such a systematic change would be consistent with observations of frequency-independent asymmetry. We conclude that the use of an incorrect dipole vector, in combination with a systematic or foreground process associated with the ecliptic, may help to explain the observed asymmetry.

P. E. Freeman (1), C. R. Genovese (1), C. J. Miller (2), R. C. Nichol (3), L. Wasserman (1)

Nonlinear Data Transformation

Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work we apply a recently-developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework provides a computationally efficient means by which to predict redshifts of galaxies, and thus could inform more expensive redshift estimators such as template cross-correlation. It also provides a natural means by which to identify outliers (e.g., misclassified spectra, spectra with anomalous features). We analyze 3835 SDSS spectra and show how our framework yields a more than 95% reduction in dimensionality. Finally, we show that the prediction error of the diffusion map-based regression approach is markedly smaller than that of a similar approach based on PCA, clearly demonstrating the superiority of diffusion maps over PCA for this regression task.

Comparing Distributions of Galaxy Morphologies

A principal goal of astronomy is to describe and understand how galaxies evolve as the
Universe ages. To understand the processes that drive evolution, one needs to investigate the
connections between various properties of galaxies, such as mass, star-formation rate (SFR),
and morphology, in a quantitative manner. The last of the these properties, morphology,
refers to the two-dimensional appearance of a galaxy projected onto the plane of the sky

Inferring the Evolution of Galaxy Morphology

In astronomy, one of the major goals is to put tighter constraints on parameters in the Lambda-CDM
model, which is currently the standard model describing the evolution of the Universe after the Big
Bang. One way to work towards this goal is to estimate how galaxy structure and morphology evolve;
we can then compare what we observe with rates predicted by the standard model via simulation.