Molecular Data in Phylogenetics

Genomes vary as wildly as animal forms. They are fluid: genes are lost, bases are rearranged, entire sequences are duplicated. Within all this variation, however, there are commonalities that link taxa together. These are conserved genes – ones so important, that a mutation will likely destroy’s an organism’s function. These change relatively little over evolutionary time.

Anatomy is similarly variable: think of the difference between a honey bee and a human. Traditionally, we have looked for morphological similarities among taxa in order to group them together and uncover their ancestry. The key is to look for plesiomorphies and apomorphies.

Plesiomorphy: ancestral state. E.g.: all vertebrates have backbones; therefore, the vertebrate ancestor had a backbone.

Apomorphy: derived state. E.g.: all tetrapods have four limbs; having four limbs is an apomorphy of vertebrates, and a plesiomorphy of tetrapods.

This is done for a large number of traits and a tree is computed at the end based on the parsimony principle (the simpler, the better!). This is how fossils are classified, and the traditional way of grouping organisms.

Molecular data works similarly. Sequences are downloaded and compared; the more similar the sequences, the more related the organisms. With the rise of gene sequencing technology, there is a tremendous amount of data coming in: gene sequences, protein sequences, even entire genomes are available for free in large online databases. The question remains though: can we use this data to reconstruct the tree of life?

The answer is yes, theoretically. All organisms carry with them three records of their ancestry: their genome, their anatomy and their development. The genome is the source of information for development, which leads to the morphology of the organism. In that view, development is the crossroads at which genetics and morphology meet.

But while master developmental genes involved in development are generally conserved, their products are not. Development is a cascade, with components interacting and influencing each other. The pathways leading from the gene to the phenotype have, over evolutionary time, been altered. We cannot infer phylogenetic relationships between organisms based on just one gene, no matter how fundamental it is. The same gene in one organism may not have the same function in another; for example, humans and sponges share the same (in us, very important) gene for collagen production, but sponges cannot produce collagen.

The standard practice among molecular phylogeneticists nowadays is to use as many relevant sequences as possible, and pushing the quality/quantity mark. This is not really a solution to the problem, in my opinion – but that is not a suitable discussion for here. The reason why molecular data is so often used to reconstruct phylogenies is practicality: it is simply much less labor-intensive than to analyse specimens for morphological characters.

That does not make it correct, though. Although the majority of molecular and morphological trees agree with each other, there are some (in my opinion) embarassing results that can come out of molecular phylogenetics. For example, one study where sponges were not the most basal animals, a view which cannot possibly be supported by looking at sponges: they really are proto-metazoans. Another, more prominent example is them finding that myriapods are closely related to the chelicerates. The only character they share is that some of them use venom to incapacitate their prey. Zoologically, this grouping makes no sense at all. But it still gets supported by molecular analyses (and does not, when those analyses include morphological traits).

To summarise, there is an enormous amount of raw data coming in from the genomics side. It is tempting to simply say that the genotype is directly linked to the phenotype, and that simply looking at representative genetic sequences allows us to infer relationships between organisms. But it isn’t: the genome is very fluid, often changing in ways that we do not understand. Until we understand how changes in representative genes affect the organism, we cannot use them to reconstruct morphology. And the final point is that morphology is the only hard, physical evidence we have of how organisms evolved. By looking at the genetic sequence of a bird and a crocodile, there is no way to imagine a dinosaur. This is why morphology (i.e. fossils) must take priority in any attempt to reconstruct the Tree of Life.

Leave a Reply