Simulating Evolution II — First Look at the Alleles Underlying Adaptation

Science is supposed to be about testing and refuting hypotheses, and theorists are supposed to be scientists, so from where do theorists get their hypotheses? Often we just find them lying around in the literature, waiting to be formalized and quantified so they can be put to the test; often we discover them through collaboration. Rarely, I think, do theorists in biology dream up useful hypotheses from our imaginations; contact with nature, either through our own efforts or mediated by experimental biologists, is pretty much required.

This current project–the simulated evolution of entirely fictitious gene networks–is partly an attempt to cut out the middleman and give me access to the raw patterns of adaptation in nature. The idea that evolution in artificial systems can give us real insight into biology has a surprisingly long and checkered history, which I won’t attempt to summarize–suffice it to say, you can disagree with the whole enterprise and be in good company. But let’s suspend the philosophical argument, and resolve to judge evolutionary simulation by the fruits that it bears, bringing us back to the question from the last post–what determines the rate of adaptation of a gene network? I already have some guesses, but since this project is about generating new insights, I want to let the data tell me what to look at. And that means finding good ways to visualize the data.

ecoli_3_plot

Here’s an example of a visualization designed to help me link genetic changes to fitness increases (you may have to click it open to see everything going on). I chose the most immediately interesting of our ten populations, namely, the one with the greatest number of apparent substitutions, and also the highest total fitness. And wow, is it crazy–but let me explain what you’re looking at. The bottom row shows mean fitness over time; all the rows above it plot allele frequencies at all the relevant loci. Each locus is labeled depending on its type, with “R” standing for regulatory and “P” for phenotypic–more on that later. Each significant allele at a locus gets assigned a color, allowing us to easily see sequential substitions. The plot also shows the frequency of the locus itself as the area of white vs grey.

So let’s try to digest this a bit. Our population starts with an organism with six genes: R1-R3 and P1-P3. Remember that this is an asexual haploid, so all these genes are fully linked. A deletion of R3 arises pretty quickly and sweeps to fixation, evidently conferring a fitness advantage (compare row 3 to the change in the bottom plot; also, note the rapid sigmoidal shape of the change in the locus frequency). P3 is lost even faster. At about the same time, an allele at P1 rises to near-fixation, then crashes; did this allele depend on R3 for its beneficial effect? More mutations fix at P1, R1, and R2, then some more action: R4 is born by duplication, seemingly driving a substitution at R1 in the process. R5 and R6 follow in quick succession; comparing their ascents with the fitness curve, we can infer that each conferred an advantage on the order of a percent or so. Then, more substitution, including some odd action and potentially interacting events at R1 and R2, and a very slow change at R4 that, maybe, sets off some further craziness at R2. Note how little fitness is increasing in this latter portion, and the indications that we’re not done: is R5 on its way out? Has R2 finally settled down?

Just in case you’re not totally confounded, let me add that these are only the significant changes–defined as those loci, and those alleles, that are present in at least 10,000 copies over the whole course of the experiment. Beneath the comparatively calm surface of this visualization are thousands of unsuccessful mutations, most of which went extinct only a few generations after they were born. We can ignore those unsuccessful variants if we only want to know what happened–but can we if we want to know why it happened?

Two points: first, we already see evidence of interaction between alleles at different loci, or epistasis. While we’re stuck right now with inferring epistasis from coincidences between changes, we can dig a lot deeper later on. Second, there’s a big chunk of R code underlying this visualization: it’s complicated, takes around a minute to run for a single population, and could well still have bugs. Although everything about these populations is represented in the computer and therefore fully knowable, we can still only view what’s happening through the lenses of a tool.

So, what does this all add up to? There’s a lot going on here, but what catches my eye on this day is the role of deletion and duplication. What excites me is that these changes in gene number are clearly driven by the immediate benefit of the change, but that these additions and subtractions might shape the future evolvability of the network. Does adding genes by duplication improve the chances of finding new adaptations later on? Do a few early, largely random events determine the eventual success of an evolving network? Next week, I’ll start looking at statistical approaches to these questions.

Advertisements

4 thoughts on “Simulating Evolution II — First Look at the Alleles Underlying Adaptation

  1. Kim

    Jeremy, I had a few questions about this as well as your previous post.

    Just out of curiosity, you mentioned in the last post the idea of evolution by adaptation in computer programs, and I wonder if you could elaborate on example programs that evolve as they run, or what exactly you mean by this? Being new to programming, I’d not thought about it before, but I didn’t think it was possible for a program to do anything outside of what it is hard-coded to do?

    Secondly, and more to the science. Are these all cases where an existing population suddenly has the environment change around it rather than colonizing a new environment (and therefore having fewer individuals)? And have you started to look at if the results are different depending on the degree of difference the new environment is from the population’s original? If the environment is much different and selection much stronger, my intuition is that the gene network would have to adapt much quicker, or the population isn’t going to make it.

    I’m also just curious what your population sizes are and if these are changing in any way relative to what you’re seeing happen in the gene networks?

    1. jdraghi

      Hi Kim, thanks for the questions!
      You’re right that programs only do what we tell them to do (at best!), so it might be better to think about an evolving algorithm inside a program. The algorithm does something like sort a list of numbers; the program surrounding it tests different versions of the algorithm, breeds together the ones that work well, and eventually stops when some condition is met for having evolved a good algorithm.
      We’re using soft selection, where we always maintain a population of 10,000 adults no matter how badly they’re doing in the environment. So, I agree that this might look a lot different if these were colonizers in some ecological scenario where the population had to adapt quickly to even survive. Good idea for future work.

  2. Adrian Chira

    Interesting project. I have interest in testing evolution through computer programs myself. Here are my questions:

    1) Regarding what you said: “The idea that evolution in artificial systems can give us real insight into biology has a surprisingly long and checkered history, which I won’t attempt to summarize” – if you could still provide a few references that would be great.

    2) How fit is the initial algorithm you are starting from? Does it just arranges number randomly or is there any sorting involved?

    3) What are the criteria for testing the different versions of the algorithm and determine which ones are have fitness advantage? Do you just compare how well sorted the resulting list of numbers is? Do you also consider how fast the algorithms work (a secondary criteria when two algorithms achieve the same level of sorting)?

    4) How applicable do you think your findings are to cases where the task to be achieved is more complex? For example, regarding Avida, “when the researchers took away rewards for simpler operations, the organisms never evolved an equals program” (http://discovermagazine.com/2005/feb/cover), meaning that it failed to scale to more complex scenarios unless they are made up of smaller steps each with an adaptive advantage.

    Thanks,
    Adrian

    1. jdraghi

      Adrian, appreciate your interest. I’d be curious to hear about your work. Some brief replies:

      1) Kenneth De Jong’s Evolutionary Computation has a brief historical overview of work dating back to the 60s. I really enjoyed Chris Adami’s text on Artificial Life, though I don’t have it in front of me to see how much historical overview there is. Then, there are things like Conway’s game of life and cellular automata in general, culminating (if that’s the word) in Wolfram’s book. I would amend the word “checkered”; I really wanted to express the point that, to me, there have been a lot of smart and curious people who have poked around with exploring biological questions in computers, and their efforts haven’t really been integrated into mainstream academic biology.

      2&3) I think I led you astray as to the fitness scheme here: it’s not a sorting algorithm at all, but a simulated gene network that takes a signal (here, a very simple one) and produces a phenotype (a trait, expressed as a numerical vector). The match between the phenotype and a specified optimum determines the fitness. I try to regulate the starting fitness by choosing genotypes randomly (of course, there are some details there) and rejecting those genotypes outside a narrow range of initial fitnesses, centered around 0.1.

      4) Great question, and definitely a big focus of the ongoing project–stay tuned!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s