I received an e-mail asking why building phylogenies is so hard, if it’s just looking for similarities between organisms. The reason is that building phylogenies is not looking for similarities between organisms.
A phylogenetic tree is a hypothesis of the relationships between the organisms represented in it. It’s a strict evolutionary diagram, since those organisms evolved in a certain pattern – they weren’t created ex nihilo, they have an evolutionary pedigree we try to uncover. The way to do that is to look for homologies, characters that only some of the organisms share. These are likely to be present in only those organisms because they evolved in the last common ancestor, meaning that these organisms descend from that last common ancestor and are thus more closely related to each other than to the rest. (Convergent evolution and reduction of characters throw big wrenches in the process; that’s why we use as much data as possible, to uncover a large amount of possible homologies.)
Homologies are not synonymous with similarities. Homologies are evolutionarily important units, while similarities are arbitrary.
You can classify organisms based on homologies to make a systematic classification (this is why the field is called systematics), or a phylogenetic classification.
A classification based on similarities does not necessarily have an evolutionary component to it. For example, you can classify organisms by locomotion type and end up with birds and bats clustered together, which makes little evolutionary sense. You can classify bacteria by pathogenicity, even though pathogenic bacteria evolve convergently in many classes. Such classifications do have their uses, but they’re not evolutionary, and cannot be shown on a phylogenetic tree.
However, they can be shown on a dendrogram, which can be made to look fairly identical to a phylogenetic tree, since it’s a very intuitive data visualisation. When constructing such similarity-based dendrograms, you can use the same tools and algorithms as for constructing a phylogenetic tree, but you must realise that the relationships you are supposedly uncovering are not evolutionarily-sound, since you’re making up categories as they suit you and whatever you’re categorising.
This is the critical distinction between a phylogenetic tree and a tree for clustering similar things together. The former is one based on evolution, the latter is based on custom categories.
However, what one can do is take the issue abstractly. For example, I am building a program that will allow the user to enter their favourite movie/genre/director/etc. and the program spits out a series of recommendations. This program is based on a massive database treated as a phylogenetic data matrix. Films from the same director are very likely to be clustered together. This is because they’re similar (directors tend to reuse themes and staff), but you can also look at them as having evolved from a common ancestor, with the director as a homology.
But that’s more of a philosophical and practically irrelevant distinction. In biology (and comparative linguistics!), homologies and similarities are not synonymous. Homologies can often be similarities, but similarity is not part of the definition of homology.