Adaptation is the fundamental process in evolution but we still know surprisingly little about its population genetics. Does adaptation typically involve a few or many molecular variants? Are such variants already present in the population or do they first need to arise by de novo mutations? What is the typical strength of natural selection that drives adaptation, and how quickly can adaptation occur?

A quantitative theory of adaptation has long seemed elusive due to the presumption that molecular adaptation is idiosyncratic and infrequent, and adaptive changes in the genome therefore typically occurred so far in the past that they are now obscured by subsequent changes. However, my studies and those of others have shown that molecular adaptation can be much more rapid, frequent, and repeatable than previously assumed. Moreover, contrary to the long-held belief that only polygenic traits can adapt quickly due to selection acting on preexisting polymorphisms, it appears that adaptation can often be rapid despite relying on specific new mutations. Even more intriguingly, the same adaptive mutations often arise repeatedly within short time intervals, suggesting that adaptation is not generally limited by the availability of individual adaptive mutations.

These findings challenge our understanding of molecular adaptation, but they also bring up fascinating opportunities for its study: the apparent rapidity and repeatability of adaptation, combined with the entirely new kinds of deep population genomic data that are collected explicitly through time and space, will allow us, for the first time, to study adaptation directly as it unfolds. My research aims at developing the methodologies required for such studies and applying them to population genomic data from natural and experimental populations. Below is a list of specific research projects I am currently involved in:

1. Rapid evolution of pesticide and drug resistance

During my postdoctoral research I studied the puzzle of extremely rapid adaptation in the evolution of pesticide and drug resistance. We know that in these cases mutational target sizes are small and that adaptation does not involve standing genetic variation at a large number of loci. Nevertheless, pesticide and drug resistance evolves quickly and in a recurrent fashion. In one of the first studies to investigate rapid adaptation using deep population sequencing, we analyzed the evolution of pesticide resistance in the fruitfly Drosphila melanogaster in the gene Ace, which encodes for Acetylcholinesterase - the major target of most commonly used insecticides [1]. By sequencing the Ace gene in a large number of flies, we showed that this multi-step adaptation, which involves four point mutations at highly conserved sites, arose repeatedly from de novo mutations within only a few years after the introduction of pesticides. Moreover, we found that resistance mutations arose repeatedly even within continents, implying a much higher rate at which adaptive mutations arise in fruit flies than previously thought.

Our results suggest that adaptation in D.melanogaster is not actually limited by the availability of individual adaptive mutations, which has profound implications for the population dynamics of adaptive alleles. In this non-mutation-limited regime, adaptation produces so-called "soft" selective sweeps, where several adaptive mutations of independent origin sweep through the population at the same time. Such soft sweeps are very different from the classical hard sweeps in mutation-limited scenarios, where only a single adaptive mutation arises and sweeps through the population.

The presumption of mutation-limitation often stems from estimates of effective population sizes, inferred from levels of neutral diversity. Such estimates are typically low, as they can be strongly affected by rare and ancient population bottlenecks. However, we showed that rapid adaptation should only depend on recent population sizes, which are typically much larger. This can explain why in many species adaptation is not actually mutation-limited and soft sweeps should be common [2].

Together with Richard Neher from the Max-Planck Institute in Tuebingen I developed a new approach to measure the selection coefficients of hard and soft sweeps from deep population diversity data [3]. In contrast to previous methods, which typically analyze the reduction in diversity caused by a sweep, our method utilizes the novel variation that arises from mutations occurring on the sweeping haplotypes. When applying this method to HIV populations, we again observed several examples of strong adaptation involving both hard and soft sweeps.

2. Signatures of hard and soft sweeps in population genomic data

Together with Nadita Garud, Erkan Buszbaz, and Dmitri Petrov, we analyzed whole-genome population data of D.melanogaster to assess whether recent adaptations more commonly involved hard or soft sweeps in this species [4]. Our results indicate that strong adaptation indeed primarily shows the signatures of soft sweeps. With Ben Wilson, I am currently modeling how the dramatic fluctuations in population size that fly populations undergo each year are expected to affect the likelihood of observing soft sweeps.

3. The role of positive selection in human evolution

The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked recurrent deleterious mutations, known as background selection.

In a project with David Enard and Dmitri Petrov we analyzed human polymorphism data from the 1000 Genomes project and detected signatures of pervasive positive selection once we corrected for the effects of background selection [5]. We specifically showed that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, we found that amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as the presence of unusually long and frequent haplotypes and specific distortions in the site frequency spectrum.

We then used my forward simulation SLiM [6] to show that the observed signatures require a high rate of strongly adaptive substitutions in the vicinity of the amino acid changes. We further demonstrated that the observed signatures of positive selection correlate more strongly with the presence of regulatory sequences, as predicted by ENCODE, than the positions of amino acid substitutions. Our results establish that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson that adaptive divergence is primarily driven by regulatory changes.

4. Incomplete sweeps and balancing selection

One striking result of recent studies is that adaptation often produces only incomplete sweeps, where the adaptive allele does not become fixed in the population. For example, when fruit flies were evolved over 600 generations of laboratory selection for accelerated development, many polymorphisms changed their frequencies in response to selection, yet none of these variants ever reached fixation [7]. Consistently with this observation, when we measured the population frequencies of polymorphisms in D.melanogaster at different times of the year in the wild, we found hundreds of polymorphisms that systematically cycle between seasons, often showing frequency differences on the order of 20% or larger.

One possible mechanisms to generate incomplete sweeps is heterozygote advantage, which can cause an adaptive mutation to be maintained at an intermediate population frequency. We recently derived an intriguing theoretical explanation for why heterozygote advantage should indeed be common during adaptation in diploid species [8]. One way to see this is to consider that when a new mutation first arises in a diploid population, it primarily exists in heterozygotes. In order for the variant to become more common in the population, it thus needs to be beneficial in heterozygotes. However, the mutation does not actually need to be beneficial in homozygotes. If selection is stabilizing and mutations are sufficiently large, homozygotes can then often "overshoot" the fitness optimum. These mutations will be maintained at intermediate population frequencies and result in an adaptive dynamics that differs quite substantially from the classic picture. Strikingly, in this scenario adaptation promotes rather than exhausts genetic variation.

5. The effects of linkage under frequent adaptation

Under the paradigm of the neutral theory of molecular evolution, where the bulk of natural molecular variation is assumed to be selectively neutral, the effects of linkage between different polymorphisms, so-called Hill-Robertson interference (HRI), is generally neglected in population genetic models. However, recent studies show that in many species adaptation appears to be much more frequent than assumed by the neutral theory. In D.melanogaster, for example, applications of the McDonald-Kreitman (MK) test yield that actually more than 50% of the amino-acid changing substitutions had been adaptive in this species, implying that HRI from recurrent selective sweeps might also be common. In addition, there is accumulating evidence that many polymorphisms in natural populations are slightly deleterious, and such polymorphisms are expected to generate another kind of HRI, so-called background selection.

These findings raise the question of whether it is indeed reasonable to always neglect HRI when modeling evolutionary dynamics, and to what extent population genetic methods built on this assumption are biased under realistic scenarios. Since the MK-test itself assumes that most observable polymorphisms are selectively neutral, this also raises a fundamental problem of consistency.

In a recent paper [9], I used my forward simulation SLiM to simulate the evolution of entire chromosomes under a range of parameter values relevant to humans and other organisms. We then applied various forms of the MK test to the population genomic data resulting from their simulations and studied how accurately these methods re-infer the true evolutionary parameters in the simulations. Strikingly, I found that the MK test can substantially underestimate the true rate of adaptation even when adaptation is only moderately frequent.

The bigger claim of the paper is that the effects of linked selection cannot be simply swept under the rug by introducing effective parameters, such as effective population size or effective strength of selection, and then using these effective parameters in formulae derived from the diffusion approximation under the assumption of free recombination.

6. Mutation, purifying selection, and demography

In order to fully understand the role of adaptation, it is essential to first understand the non-adaptive forces that shape evolution, such as mutation, purifying selection, and demography. During my PhD at the Max-Planck-Institute for Molecular Genetics in Berlin, I investigated the elementary patterns of mutational processes by applying analytic and modeling methods together with comparative genomics approaches to genome-level datasets. These studies brought up interesting findings of how insertions and deletions contribute to genome evolution [10,11], how they shape statistical properties of genomes [12,13], and how this can affect commonly used bioinformatics methods such as sequence aligment [14,15,16].

At Stanford, I developed a new method to estimate the rates and pattern of mutation from the low-frequency polymorphism data gathered from deep sequencing projects [17]. My method overcomes many of the problems of indirect estimates from divergence or heterozygosity, which typically suffer from unknown selective and demographic biases.

We also developed a maximum-likelihood framework to infer the strength of purifying selection under complex demographies and applied this approach to a particular family of transposable elements in D.melanogaster [18]. This study highlighted the importance of accounting for demographic history when inferring selection.

In another research project we investigated the interplay between mutational biases and purifying selection [19]. Surprisingly, we were able to show that mutational biases can cause constrained sequences to evolve faster than would be expected under the neutral expectation, provided that selection is weak and mutational biases favor the states that selection disfavors. We investigated how this phenomenon, in practice, can affect comparative genomics methods used for the detection of constraints. This study demonstrated that accounting for mutational biases and weak selection is necessary to accurately infer regions of the genome evolving under purifying selection.