Building Cladograms 2: Plotting in Mesquite

So you have a complete character matrix in Mesquite. Now comes the time for magic. You have three choices: exporting the matrix to plot your tree in R or Phylip, or sticking with Mesquite for the plotting. Plotting in Mesquite is the most user-friendly of the bunch, and all the options are great, however it may not be as thorough as the other programs: it’s a tool more for comparison and analysis, rather than building phylogenies. But it’s more than enough for most intents and purposes (and, let’s face it, if you’re advanced enough to be wanting more… you wouldn’t be reading this guide :P).

Anyway, right off the bat, you have plenty of choices for how you can build a tree from your character matrix. Go to Taxa&Trees/Tree Inference/TreeSearch/Mesquite Heuristics and look at all the options there. Most of them aren’t needed, as they’re meant for the analysis of already-plotted trees. The ones listed below are the only reliable ones to use.

Note that we will not consider the cluster analyses, as we are doing cladistics, not numerical taxonomy. The cluster analyses are useful with molecular data and are phenetic. I use them when I build identification keys. But never for a proper cladistic analysis.

Some general notes: always choose the Stored Matrix option when asked for your source of characters; when asked how many trees should be searched for, the standard is 10000. If you’re just doing a quick-and-dirty analysis, 100 is fine. In either case, make sure you have something else to do and by all means don’t sit there with baited breath. If you must, you can always interrupt the analysis and restart it later by going to the Tree window, and click on Tree/Alter Tree/Search for Better Tree.

Tree Value Using Character Matrix:

These methods consider the entire character matrix, and are the ones you will use for a complete cladistic analysis.

Treelength: This will regard the tree with the shortest branch lengths as the valid one, as it requires the least evolution (the parsimony principle). You will be asked to choose a tree rearrangement algorithm and given the choice between SPR and NNI.

Subtree Pruning and Regrafting (SPR) is an algorithm that takes every possible subtree and places it in another position, then recalculates the value of the entire tree. By elimination, bad moves get thrown out and you eventually winnow it down to the most optimal trees. It’s very effective, but it takes a long time, since the value for the entire tree has to be recalculated at every move.

Nearest Neighbour Interchange (NNI) is an algorithm that selects a random node and exchanges it with its sister nodes, the value of the tree recalculated after each exchange. It’s not as exhaustive as SPR, but it is faster. If your traxon sampling and matrix are thorough enough, there will be no noticeable difference in accuracy.

Consistency Index: The Consistency Index (CI) is a metric, calculated by dividing the minimum number of steps implied by a tree by the actual number of steps recovered. In other words, it’s a measure of how close a tree is to being perfectly parsimonious. Again, you will be asked to use either SPR or NNI, except that this time, the good trees will be accepted not by their lengths, but by their CI.

Retention Index: The Retention Index (RI) measures homoplasy of a character, calculating the number of steps a character undergoes divided by the theoretical minimum number of steps a character needs to take. The actual formula is (max number of steps – number of steps)/(max number of steps – min number of steps). Again, this is the metric used by the tree rearrangement algorithm to keep the good trees.

Boolean Tree Value:

This class of methods should only be used when you have some expectation that needs to be fulfilled, because only trees that fulfill your conditions will be returned. Note that you can also use these to filter the trees you get from your previous analyses, to only get the ones you consider as valid.

Selected Taxa Form A Clade: If you know for sure that two taxa are related, but will probably mess up the analysis otherwise (in cases of rampant homoplasy, where unrelated taxa get the same characteristics and so will likely form an erroneous clade), then you can specify that you only want trees returned where your surely sister taxa will clade together. Note that you can only define one clade at a time. For each clade, go to the Taxa window and select your clade, then go to the Tree Inference menu and select Boolean Tree Value/Selected Taxa Form a Clade. You’ll be asked for the tree rearranger as well. If you need more than one defined clade, you will have to do another analysis, and then put the tree together (we’ll look at how that’s done in the Analyses post).

Selected Taxa Convex In Tree: Similar to above, but will return only rooted trees.

Tree Value Satisfies Criterion: With this option, you can specify exactly what your tree is supposed to have: only trees with a good (defined by you) RI or CI will be returned. Or you can be specific and define what CI and RI you want for a specific character (under Tree Value Using Character). This will guarantee that you get exactly the trees that are useful for you, but is a bit unwieldy, since you have to know exactly what you’re looking for in the first place. It depends on your research question.

Once the analysis is run, all the returned trees (100 or 10000, or whatever number ends up being filtered by the boolean) are saved in a tree block. You can analyse them all separately, and we’ll look at that in a later post, but if you want a final phylogeny, you have to build a summary consensus tree.

To do this, go to Taxa&Trees/Make New Trees Block from/Consensus Tree. Choose Stored Trees as the tree source (if you have multiple tree blocks, you will choose between them after the algorithm menus, so don’t worry about that), and you will be given the choice between three algorithms for calculating a consensus tree.

Majority Rule Consensus: Choose this, and the next window will make you choose the “required frequency of clades”, a number between 0 and 1 (1 = 100%; default is 0.5 = 50%). The way this algorithm summarises the tree is by choosing only those clades present in x% of the trees (where x is the number you choose above). Another option, “consider tree weights”, should also be clicked: this leads to a more reliable consensus tree, by allowing the length of the trees to be factored in, instead of just having a summary based on binary acceptance/rejection of the trees. I also like to have as much data as possible, so it doesn’t harm to also click on the “write group frequency list”, which will tell you how common each clade is.

Semistrict Consensus: This will approve of those clades that are present in at least one tree, and not contradicted by clades in other trees.

Strict Consensus: This will approve only of those clades present in every single tree.

There is no general rule for what consensus to choose, it depends on your sepcific question. For general uses, I run all the analyses (majority rule using four factors, 0.2, 0.5, 0.8, 0.9) and pick the most reliable tree of those.

Once you have your consensus tree, you mess around with its look by going through the Drawing menu. Try a different tree form and find the one that looks best for clarity. If you want to display anything more than general topology, make sure that the “Branches Proportional To Length” is turned on. I will give details on how to produce a publication-ready tree in a later post, but there’s no harm in experimenting by yourself.

To export your finished tree as a graphics file, go to File/Save Tree as PDF… and save it. The PDF at first glance seem like a useless extremely low-resolution file. You need to open it in a vector graphics program (Inkscape is free; Adobe Illustrator is the Photoshop equivalent), and save it as a graphics file from there. Note that you can also edit everything in Inkscape/Illustrator – the entire tree and all the labels are drawn as paths. Again, experiment by yourself (or wait until my explanatory post).

In the next post in the series, we will plot a tree using the function available in the PHYLIP line of programs instead of Mesquite.

Jump to: Building a Character Matrix; Plotting in Phylip; Plotting in R; Analyses in Mesquite; Analyses in R; Polishing the Tree; A Phylogeny of Creation Myths

One Comment

  1. Pingback: The Supertree Method | Teaching Biology

Leave a Reply