Reconstructing large regions of an ancestral mammalian genome in silico

 

Mathieu Blanchette

 

It is believed that most modern mammalian lineages arose from a series of  rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently  available we can expect to get 98% of the bases correct in reconstructing megabase scale euchromatic regions of an eutherian ancestral genome from the genomes of approximately 20 optimally chosen modern mammals. Using actual genomic sequences from 19 extant mammals, we reconstruct 1.1 Mb of ancient genome sequence around the CFTR locus. Detailed examination suggests the reconstruction is accurate and that it allows us to identify features in modern species, such as remnants of ancient transposon insertions, that were not identified by direct analysis. Tracing the predicted evolutionary history of the bases in the reconstructed region, estimates are made of the amount of DNA turnover due to insertion, deletion and substitution in the different placental mammalian lineages since the common eutherian ancestor, showing considerable variation between lineages. This talk will focus on the algorithmic issues of this reconstruction.