
Jinko Graham
In science, we collect data to develop and evaluate theories about the way things work. To address scientific questions, we design studies and then look at the resulting data to find and interpret signal in the context of noise. Statistics provides a powerful framework and set of methods to help in these efforts. As a statistician, my interests lie at the interface with genomic science. Though this interface is everchanging, several fundamental aspects remain constant. For example, one guiding principle is that the genomic data on DNA sequence variation of individuals reflects their underlying genealogical relationships. These relationships can tell us about individual disease susceptibility for traits that run in families, and so are of use in mapping disease genes. Another constant is that, in genomic studies, highthroughput measurement technologies produce massive data sets of different types. Often, these data sets can be integrated for additional scientific insights. However, with big data come issues in statistical interpretation: what are the real patterns and what are blind alleys that result from unrecognized biases in sampling, or from seeing chance patterns in random data that has no systematic component? Does the way the data have been collected allow us to answer our research questions? In my research, I try to incorporate fundamental genetics principles into statistical study designs and models for genomic data, in order to improve the scientific interpretation of such data. Recent developments in statistical computing and Bayesian modelling of data structures with complex dependencies have enabled and enriched this effort. Additionally, advances in highdimensional data analysis have enabled genomic data to be integrated with imaging and clinical data, in recent collaborations with neuroimaging and pediatrics experts. As a statistical geneticist, my focus is on developing and evaluating analytic tools to uncover patterns in genomic data, while accounting for random variation, complex data dependencies, and the way the data have been sampled.
