Modeling the effect of genetic variation on gene
expression:
My most recent work focuses on characterizing
the landscape of genetic variants that affect gene
expression, as these regulatory genetic variants
are believed to
play an important role in common diseases. We
have collected RNA-sequencing data for nearly one thousand European subjects.
Combined with genotyping of
each individual, we have identified thousands of
novel
associations between genetic variation and diverse
aspects of gene expression. From this extensive catalog of
associations, we
have trained Bayesian latent variable models,
incorporating features based on genomic
annotations, to
characterize and actually predict the consequences
of regulatory
variants.
(paper and data coming soon!)
Active learning for discovery of interactions in
complex human traits:
Interactions, where particular
combinations of genetic variants result in
non-additive effects on a trait, may also play a
major role in human
disease. Unfortunately, many genome-wide studies
are underpowered to
identify any significant interactions.
However, exploring patterns of interaction in
three human disease traits, our analysis suggests
non-additive
effects are not distributed at random but rather
follow predictable and biologically meaningful
patterns,
including enrichment between genes with known
relationships. I have worked on an approach
leveraging these patterns in a novel
active learning method, Guided Adaptive
Interaction Testing (GAIT), which automatically
prioritizes candidate
interactions and reduces the statistical burden of
multiple hypothesis testing dramatically. GAIT
identifies a
large number of interactions from a variety of disease data,
significantly improving our understanding of the
mechanisms that
drive them.
(under review)
A network-based framework for identifying disease
risk-variants:
Disease variants with small effects on
risk are often buried among many spurious
associations in genome-wide
association studies (population-level
studies of genetic variation in disease). However,
a key observation
is that multiple co-functional or pathway-connected
genes often affect the same trait. We leverage
this observation to improve our power to detect
disease
variants. We developed a flexible regression-based
framework,
PriorNet, which incorporates a Markov Random
Field prior on gene relevance, constructed from
diverse sources of gene network and pathway
information. This approach results in improved identification of disease-relevant genes, particularly those with small effect sizes.
(presented at MLCB 2012)
Bayesian structure learning for causal gene networks:
We developed methods to learn intricate networks
describing the joint effects of hundreds of genes
together on complex traits in yeast
(with Jonathan
Weissman,
HHMI, UCSF). Recently developed interventional
experimental methods have enabled large-scale
measurement
of quantitative genetic interactions (GI) in
yeast, reporting functional dependencies among
pairs of genes. With
these measurements, we developed a Bayesian
structure-learning method utilizing Annealed
Importance
Sampling, specifying a distribution over networks
based on agreement with GI data. Applied to a
recent proteinfolding GI dataset in yeast,
our results showed
that detailed multi-gene networks can be
reconstructed on a
large scale from GI data, providing testable
hypotheses in the
genetics of complex
traits.
(Battle MSB 2010)