Molecular Clocks, Part 1: Hypothesis

This is meant as an introduction to the molecular clock hypothesis. As a disclaimer, I have to say that I am a zoologist and palaeontologist, not a specialist on genomes. Anything stated in this series is based in organismal biology.

The molecular clock hypothesis was first proposed by Zuckerkandl and Pauling. It came as a fallout of their research into protein structure. What they noticed was that the phylogenetic distance between organisms was directly reflected in their protein sequences.

Genetic Distance. Kumar, 2005.

But that is nothing more than molecular phylogenetics. Where does the clock part come in?

They hypothesised that for any given macromolecule the rate of evolution is approximately constant in all evolutionary lineages. Integral to this idea is the neutral theory of molecular evolution, proposed later by Kimura, which states that mutations are not necessarily advantageous or deleterious, but that the majority of them are completely neutral. Although controversial at first, it is now accepted as true.

Theories of Molecular Evolution. Bromham & Penny, 2003.

The reason why this is important is simple: when deciding what genes to use, we need to find those where only neutral mutations are kept, so the end result is not affected by natural selection.

But we’re getting ahead of ourselves. The hypothesis still needs to be tested. Is the rate of molecular evolution, i.e. the rate at which neutral mutations accumulate, approximately constant?

Substitution Rate. Kumar, 2005.

As you can see, a best-fit line can easily be drawn. There is a positive correlation between genetic difference and time. But there are also some outliers, and they are not anomalies. They are the biggest problem with the molecular clock and we need to identify where they come from.

Differences Between Proteins.

The diagram above shows the difference in substitution rates between different proteins. It is clear that it depends on the protein’s function. Histones are proteins that DNA wraps around during cell division, therefore they are absolutely critical. Their function is directly linked to their structure (it all works with complex chemical and physical interactions between the DNA molecule and the histone protein), and any modification to it will almost certainly be deleterious. On the other hand, hemoglobin can afford to get more mutations as it only needs to have an affinity for oxygen. The substitution rates reflect these differences perfectly.

But they also make our job more difficult: using histones to date a recent divergence is useless: the differences are negligible. Using fibrinopeptides to date the origin of the animals is just as pointless. We need to find the genes and proteins that are suited to our question, as there is no global clock for proteins.

But what about the second part of the hypothesis, about the rate being constant in all evolutionary lineages? This is, for lack of a better word, bullshit.

Relative Rate Differences. Kumar, 2005.

The above diagram shows the substitution rate difference between mammalian taxa. There is absolutely no pattern to be seen, and it’s the same story with any other lineage: there is no global clock for organisms.

There are four mainr easons for this. The first two have to do with DNA repair. DNA gets repaired whenever cell division takes place, therefore the substitution rate depends on generation time. But this can only works on physical damage to the DNA molecule; mutations cannot get detected and do not get repaired. The more damage you have, the more mutations you get. One source of damage and mutation is metabolism, which releases free radicals. They react with anything, including DNA, and are a big source of mutations. So the substitution rate also varies with the metabolic rate. FInally, we have the size of the population: in small populations, genetic drift plays a much larger role than in large populations, and can lead to deleterious mutations getting fixed instead of getting thrown out (nearly neutral theory, see second diagram).

These variations vary in scale from the biochemical to the population scale. They are also completely unpredictable, in part due to their stochastic nature, but also due to lack of data (we don’t know how large dinosaur populations were, for example).

This was a purely theoretical look at the molecular clock. It does seem like a very shaky hypothesis – but for some reason, it has persisted. The reason for that is practicality, as we shall see in the next post. But the very unfirm ground on which it stands on should not be forgotten. In fact, it should be called out, as some results from this method are scandalously bad, but somehow get accepted (mainly by others within the same field).


  1. Pingback: Molecular Clocks, Part 2: Practice | Teaching Biology

  2. Pingback: Molecular Clocks, Part 4: Usefulness? | Teaching Biology

  3. Pingback: Endemic frogs show us how to mix geology and phylogenetics | Teaching Biology

Leave a Reply