Skip to content

could maybe radically optimize simple neutral models by deferring genome generation #608

@bhaller

Description

@bhaller

An idea I had on my walk today. When an offspring individual gets generated, right now SLiM spends a lot of time creating the Individual object and generating that individual's haplosomes from the haplosomes of the parents, with recombination and mutation. In a simple neutral model with no recombination() callbacks, no modifyChild() callbacks, no script asking for the mutations possessed by individuals, no changes in the recombination rate or mutation rate over time, etc., it seems like it would be possible to defer virtually all of that work. Just record, into a "deferred individuals table", the pedigree ID of the new individual and its parent(s) and whether it was generated by cloning, selfing, or crossing – basic info like that, columns in a data table. And then move on. Keep parents around if they are referenced by this table, so that you still have their genetic information if you need it later. At some future point – perhaps the end of the simulation, or perhaps whenever the preconditions for this mode of operation are violated – you can generate the genetics that are needed, on demand. To do that, you start from the extant individuals; if one is deferred, go up to its parents; if one of them is deferred, go up to that individual's parents; and so on, until you reach individuals for whom you have genetics. Then generate the offspring that are needed, drawing recombination breakpoints, doing the recombination, generating the mutations, etc., just as you would have at the time. Continue doing that, unwinding your ascent up the pedigree, until you get down to the bottom again and have generated the genetics for the extant individual.

For individuals that leave descendants, this is the same amount of work, or even a little bit more since there's a bit of overhead for maintaining the "deferred individuals table" and doing the walk up and down the tree. The savings comes from the idea that, given enough generations, most individuals leave no pedigree descendants; most lineages go extinct eventually. So you end up generating only a small fraction of the individuals you would have generated.

In a way this is similar to the concept of recapitation and neutral mutation overlay. But it would work even without tree-sequence recording; in fact, it could build up the relevant tree-sequence recording information in the same deferred fashion, I think, and so it could do a tree-seq burn-in without needing to deal with hybrid-simulation complications. It would have various advantages: it would produce a true SLiM pedigree (WF or nonWF) based upon the SLiM-generated patterns of mating rather than a coalescent-style pedigree (in terms of things like discrete time, polytomies, etc.), it could be influenced by SLiM scripted dynamics (monogamous mating, spatiality, etc.) as long as those dynamics didn't need to consult the genomes of the individuals... lots of advantages. And SLiM could do it automatically, I think; it could detect that the conditions were met and defer offspring generation behind the scenes, and then whenever the conditions were violated later on (or when the end of the simulation was reached), it could transparently generate the genetic information needed to bring things back to a "normal" SLiM state. The whole thing could, I think, be totally invisible to the user.

This scheme depends on a vague idea I have that over sufficient time most individuals don't leave any pedigree descendants. So if a complete pedigree, starting from the original individuals and going forward in time, would contain N individuals, but only M of those individuals appear in the simplified pedigree where all branches of the pedigree that went extinct are removed, then my hope is that if this optimized regime lasted a large number of generations – 10,000, or 100,000, etc. – the fraction M/N would get somewhat small (assuming, say, a randomly-mating WF population), making this optimization worthwhile – maybe very worthwhile. My vague understanding is that this is true, but I am probably thinking of the related claim that most individuals don't leave any genetic descendants (even if they do still appear in the pedigree of ancestors of the extant individuals). Clearly M/N is smaller for genetic descendants than for pedigree descendants, but the genetics-based optimization is much harder to do. (I think I had a somewhat similar idea, months or years back, that @petrelharp shot down, in fact – using recorded tree-seq information to defer the SLiM generation of individuals, such that we wouldn't bother generating the ancestors at all if they left no genetic descendants. That turned out to be hard. :->) But if M/N is sufficiently small even for pedigree descendants, over sufficiently long time periods, then this idea remains viable. And it might be particularly viable if something about the script being run reduces Ne – spatiality, monogamy, social dominance hierarchies, etc. So it might be very useful in some contexts even if it isn't much of a win for a vanilla panmictic WF model.

Seems like a good idea? Any thoughts, @philippmesser or @petrelharp or anybody else? Any roadblocks to doing this? I suppose this idea probably isn't new (since so few ideas are new); is anybody aware of a forward simulator that does this already? (If so, I want to go read about it! :->)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions