Research Topics:
In the last few years we have worked on the following broad research themes:
Learning high-dimensional complex models from small-sample data is a common problem in genomics and proteomics, which requires appropriate regularization. To this end, we have propose a two-way James-Stein-type shrinkage procedure for the estimation of large scale covariance matrices. This gives rise to corresponding plug-in estimators of partial correlation and also regression coeffients. For large-scale multiple testing we proposed a a unified procedure for the estimation of false discovery rates. We also investigated the problem of feature selection in high-dimensional prediction and methods for entropy estimation.
We have developed and implemented a series of statistical approaches for the inference of large-scale networks from gene expression and other "omics" data, both for static and for dynamic data. For example, we suggested network selection procedures for graphical Gaussian models, chain graphs, and vector autoregressive models. In addition, using information from DNA-protein binding networks we proposed a method for the prediction of transcription factor activities.
We investigated methods for the analysis of gene sets and proposed new tools for gene ranking and biomarker discovery. Futhermore, we developed methods for classification of gene expression data using shrinkage discriminant or partial least squares analysis. In order to monitor gene expression during the cell cycle we have developed an exact statistical test to identify periodically expressed genes.
Evolution is a powerful process on the molecular level. We have developed a series of statistical methods for use in evolutionary genetics. Most recently, we have proposed a general non-parametric reversible jump MCMC approach to infer the population size history from genealogical trees using coalescent theory. Other more early work has focused, e.g., on molecular dating or the reconstruction of gene trees from sequence data.