Science is supposed to be about testing and refuting hypotheses, and theorists are supposed to be scientists, so from where do theorists get their hypotheses? Often we just find them lying around in the literature, waiting to be formalized and quantified so they can be put to the test; often we discover them through collaboration. Rarely, I think, do theorists in biology dream up useful hypotheses from our imaginations; contact with nature, either through our own efforts or mediated by experimental biologists, is pretty much required.
This current project–the simulated evolution of entirely fictitious gene networks–is partly an attempt to cut out the middleman and give me access to the raw patterns of adaptation in nature. The idea that evolution in artificial systems can give us real insight into biology has a surprisingly long and checkered history, which I won’t attempt to summarize–suffice it to say, you can disagree with the whole enterprise and be in good company. But let’s suspend the philosophical argument, and resolve to judge evolutionary simulation by the fruits that it bears, bringing us back to the question from the last post–what determines the rate of adaptation of a gene network? I already have some guesses, but since this project is about generating new insights, I want to let the data tell me what to look at. And that means finding good ways to visualize the data.
Here’s an example of a visualization designed to help me link genetic changes to fitness increases (you may have to click it open to see everything going on). I chose the most immediately interesting of our ten populations, namely, the one with the greatest number of apparent substitutions, and also the highest total fitness. And wow, is it crazy–but let me explain what you’re looking at. The bottom row shows mean fitness over time; all the rows above it plot allele frequencies at all the relevant loci. Each locus is labeled depending on its type, with “R” standing for regulatory and “P” for phenotypic–more on that later. Each significant allele at a locus gets assigned a color, allowing us to easily see sequential substitions. The plot also shows the frequency of the locus itself as the area of white vs grey.
So let’s try to digest this a bit. Our population starts with an organism with six genes: R1-R3 and P1-P3. Remember that this is an asexual haploid, so all these genes are fully linked. A deletion of R3 arises pretty quickly and sweeps to fixation, evidently conferring a fitness advantage (compare row 3 to the change in the bottom plot; also, note the rapid sigmoidal shape of the change in the locus frequency). P3 is lost even faster. At about the same time, an allele at P1 rises to near-fixation, then crashes; did this allele depend on R3 for its beneficial effect? More mutations fix at P1, R1, and R2, then some more action: R4 is born by duplication, seemingly driving a substitution at R1 in the process. R5 and R6 follow in quick succession; comparing their ascents with the fitness curve, we can infer that each conferred an advantage on the order of a percent or so. Then, more substitution, including some odd action and potentially interacting events at R1 and R2, and a very slow change at R4 that, maybe, sets off some further craziness at R2. Note how little fitness is increasing in this latter portion, and the indications that we’re not done: is R5 on its way out? Has R2 finally settled down?
Just in case you’re not totally confounded, let me add that these are only the significant changes–defined as those loci, and those alleles, that are present in at least 10,000 copies over the whole course of the experiment. Beneath the comparatively calm surface of this visualization are thousands of unsuccessful mutations, most of which went extinct only a few generations after they were born. We can ignore those unsuccessful variants if we only want to know what happened–but can we if we want to know why it happened?
Two points: first, we already see evidence of interaction between alleles at different loci, or epistasis. While we’re stuck right now with inferring epistasis from coincidences between changes, we can dig a lot deeper later on. Second, there’s a big chunk of R code underlying this visualization: it’s complicated, takes around a minute to run for a single population, and could well still have bugs. Although everything about these populations is represented in the computer and therefore fully knowable, we can still only view what’s happening through the lenses of a tool.
So, what does this all add up to? There’s a lot going on here, but what catches my eye on this day is the role of deletion and duplication. What excites me is that these changes in gene number are clearly driven by the immediate benefit of the change, but that these additions and subtractions might shape the future evolvability of the network. Does adding genes by duplication improve the chances of finding new adaptations later on? Do a few early, largely random events determine the eventual success of an evolving network? Next week, I’ll start looking at statistical approaches to these questions.