The Importance of Taxon Sampling: An Example from the Snout Butterflies (Lepidoptera: Libytheinae)

ResearchBlogging.orgContinuing from yesterday’s theme of injecting some personal remarks (and to make it 3 posts in a row on butterflies, for no particular reason), I want to note something about this Wahlberg et al. (2005) tree (opens in a new window/tab!) that I reprinted in each of the posts, specifically to point out a supposedly large error in it.

If you look at the orange branching, you will see Libythea placed as a sister group to Danaus. Libythea is a genus in the Libytheinae, Danaus in the Danainae, and both are in the Nymphalidae family. As you can tell from the title, this post is about the Libytheinae, and I will come out and say directly that this section of the tree is completely wrong.

It stands in opposition to every single morphological and molecular analysis of the Nymphalidae, where the Libytheinae unanimously come out as the basalmost nymphalids, rather than derived nymphalids as in this cladogram.

Explaining this faulty positioning is easy: insufficient taxon sampling. It’s a problem that has classically plagued all molecular phylogenetics and is one of the main source of their problems (though not the largest of them). At its most basic, this means that they are not using enough species, or species from a broad enough spectrum.

In the case of the Wahlberg et al. study, the broad range is more than satisfactory, but there aren’t enough species to resolve lower-level systematics. The former allows them to resolve higher levels, such as families and superfamilies, and so it’s not surprising that relationships within the families are not quite correct – they are correct if you zoom out enough. As an analogy, think of a pink rose you steal for your girlfriend. It isn’t “really” pink – the pink colour comes from the fact that you’re looking at it from afar. If you were to zoom in, you would find a collection of red and white dots. But it doesn’t matter to you, because it’s still pink at the scale that you’re using. Same here: the intrafamily relationships (zoomed in) aren’t important, only the interrelationships (zoomed out) are.

This highlights the importance of study design in systematics, both morphological and molecular (although we morphologists are much less guilty of this, because we have no problem studying every single species in our study group, unlike the mollies who have to watch their cost). Even though there are many gene sequences available now, and the cost of sequencing new ones is falling weekly, one still has to make the decision of how broad to cast the taxon sampling net. It is a well-known mantra that the more species, the better (Zwickl & Hillis, 2002). But if you’re doing a tree for the insects, for example, it won’t help to have 100 Drosophila species and one from every other order. Nor would it help if you take one from every single family – you would still get an unbalanced amount of beetles, flies and hemipterans compared to stoneflies. Ultimately, the very first question you have to set yourself is: what is my hypothesis? And what is the goal of my study? If you’re making a broad survey of butterflies, like Wahlberg et al. (2005), you will need to have representatives from every butterfly clade. The more basal, the better (this is why any general insect tree that uses only Drosophila to represent the flies can automatically be discarded). If you were doing only the Nymphalidae (e.g. Freitas & Brown Jr., 2004), then you ideally should get one representative from each genus. For smaller families, you take every single species.

All of these may seem obvious, but catch me in a jovial mood, hand me a strong drink and touch the trigger (say the word Myriochelata), and I will rant for hours about just how many flawed systematic studies get published all the time, studies that forget (or purposely neglect?) such basic information. So I feel I have to repeat these things all the time.

By the way, in case I didn’t make this clear, the Wahlberg et al. (2005) study is not flawed – it answered its study question (seeing if there are differences between molecular and morphological analyses of the butterflies as a group; giving the broad relationships within the butterflies), and gave very good results. You just have to know which results to read. Interfamily relationships are good, but not the species relationsips. If they had included more nymphalid species, especially libytheines and danaines, then the relationships within that family would have come out correctly (see below).

Also, keep in mind that it is my professional opinion that the faulty position is due to taxon sampling, but I have not re-run their datasets with more data or anything like that. I’m just speaking from experience. If I’m wrong, then I will retract the criticism.

In any case, I used the Libytheinae as the example. Might as well introduce them. Before anything, the cladogram above shows their accepted position at the base of the Nymphalidae (Freitas & Brown Jr., 2004).

It’s a bit of a strange choice, since the Libytheinae are a tiny, monophyletic (Kawahara, 2003) subfamily, with less than 20 species in 2 genera (Libythea in the Old World, Libytheana in the New World). Unlike their distant pierid cousins, they haven’t undergone any recent host shifts, using only the ulmacean genus Celtis as caterpillar fodder (Ackery et al., 1995).

The easiest way to recognise them (besides having a pictorial field guide) is to look at the labial palpus, which is extremely elongated and serves as camouflage by resembling a leaf petiole – this is where their common name, snout butterflies comes from (Google Images, look at the mouth). Otherwise, you will have to find specimens from both genders and look at the legs: males have a reduced foreleg, females do not.

Their fossil record, as with most butterflies, is scant and I only know of two species, both from the Florissant Lagerstätte, Colorado.

Research Blogging Paper:

Wahlberg, N., Braby, M., Brower, A., de Jong, R., Lee, M., Nylin, S., Pierce, N., Sperling, F., Vila, R., Warren, A., & Zakharov, E. (2005). Synergistic effects of combining morphological and molecular data in resolving the phylogeny of butterflies and skippers Proceedings of the Royal Society B: Biological Sciences, 272 (1572), 1577-1586 DOI: 10.1098/rspb.2005.3124


Ackery PR, Smith CR & Vane-Wright RI. 1995. Carcasson’s African Butterflies.

Freitas AVL & Brown Jr. KS. 2004. Phylogeny of the Nymphalidae (Lepidoptera). Systematic Biology 53, 363-383.

Kawahara AY. 2003. Rediscovery of Libythea collenettei Poulton and Riley (Nymphalidae: Libytheinae) in the Marquesas, and a description of the male. Journal of the Lepidopterists’ Society 57, 81–85.

Wahlberg N, Braby MF, Brower AVZ, de Jong R, Lee M-M, Nylin S, Pierce NE, Sperling FAH, Vila R, Warren AD & Zakharov E. 2005. Synergistic effects of combining morphological and molecular data in resolving the phylogeny of butterflies and skippers. Proc. R. Soc. B 272, 1577-1586.

Zwickl DJ & Hillis DM. 2002. Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology 51, 588-598.

Leave a Reply