Forum Paper |
Corresponding author: Jose L. Fernandez-Triana ( cnc.braconidae@gmail.com ) Academic editor: Lyubomir Penev
© 2022 Jose L. Fernandez-Triana.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Fernandez-Triana JL (2022) Turbo taxonomy approaches: lessons from the past and recommendations for the future based on the experience with Braconidae (Hymenoptera) parasitoid wasps. ZooKeys 1087: 199-220. https://doi.org/10.3897/zookeys.1087.76720
|
A recent paper (
Although somewhat subjective, turbo taxonomy can be characterized as the rapid description of many species in “fast” papers (as compared to the “slower” pace of traditionally produced taxonomic papers). This is usually accomplished using a combination of tools and approaches to automate and expedite dealing with the material examined, e.g., morphological traits quickly assessed and scored, often with brief descriptions and/or descriptions generated using software packages, high-quality illustrations, a heavy reliance on molecular and other data (e.g., biological, distributional) to differentiate and diagnose species. The combination of techniques for species recognition and description at least partially intersects with another concept, that of “integrative taxonomy”, sensu
The main difference between
Describing new species based only or mostly on molecular data is not new.
Thus, the novelty of the Sharkey et al. paper is hardly the approach itself but rather the scaling up of the work to a mammoth monograph in which more than 400 new species were described. That is indeed a first. And, as quoted from the very first sentence of their introduction, the authors presented their article as a way to “further refine methods to overcome the taxonomic impediment of ichneumonoid biodiversity” (
In the months following that paper, the scientific community has engaged in lively discussions about “how useful” such descriptions are, whether they in fact impede the cataloguing of biodiversity, “how valid” (from the ICZN perspective) those species are, and general issues about the future of taxonomy, and the shortcomings of BINs and even BOLD (e.g.,
In this Forum Paper I discuss some of the above issues, present alternative/complementary ideas from my perspective, and include a detailed proposal on how to approach turbo taxonomy in a hyperdiverse group such as braconid parasitoid wasps balancing rapid descriptions of species while also keeping a higher use value of the final product(s). I do not claim to have better or newer insights than others, and I certainly do not pretend to have any definitive answers, but perhaps my comments could be useful because a) I am a braconid researcher, like the main authors of the
There are many published papers that discuss the need to and possibilities of speeding up taxonomy by using newer technologies such as DNA barcoding. Unfortunately, most of those papers present somewhat general discussions or are intended just as a proof of concept, without actually applying it to describing new species. In many cases, DNA barcoding is presented as a useful and comparatively rapid tool to rapidly distinguish species, often revealing a much higher species diversity than previously thought based on morphological study and/or revealing complexes of cryptic species. However, usually things stop there, and the next step is not made, i.e., the new taxa are not described in those papers praising how much DNA barcoding brings to the taxonomist’s table. I would consider those papers examples of “talking the talk” but not necessarily “walking the walk” (in the sense presented here: https://knowyourphrase.com/talk-the-talk). It is important to stress that this statement does not apply to the four braconid experts and coauthors of the
But the truth is that comparatively few works could have the turbo taxonomy label applied to them. Examples include lichens (
Selection of published Braconidae papers (2005–2021) that could be considered as examples of turbo taxonomy. For the sets of data in columns 5–9, the use of “-” means such data was not present in the paper, “+” means that it was used but only in a very basic and limited way, and “++” means that it was fairly used. ACG = Area de Conservación de Guanacaste, Costa Rica.
Paper | Subfamily/ genus covered | Main geographical area | Total species /new species described | Use of dichotomous keys | Use of morphological data | Use of illustrations | Use of molecular data | Use of other data |
---|---|---|---|---|---|---|---|---|
|
11 Subfamilies of Braconidae | ACG | 416/403 | – | – | + | ++ | + |
|
Doryctinae/ Heterospilus | Costa Rica | 286/280 | ++ | ++ | ++ | – | – |
|
Microgastrinae/ Apanteles | ACG | 205/186 | ++ | ++ | ++ | ++ | ++ |
Rogadinae/ Aleiodes | Thailand | 186/179 | ++ | ++ | ++ | ++ | – | |
|
Microgastrinae/ Glyptapanteles | ACG/ Ecuador | 136/136 | ++ | ++ | ++ | ++ | ++ |
|
Microgastrinae/Apanteles | China | 97/48 | ++ | ++ | ++ | – | + |
|
Agathidinae/ Alabagrus | ACG | 87/66 | ++ | ++ | ++ | ++ | ++ |
|
Microgastrinae/Dolichogenidea | China | 67/39 | ++ | ++ | ++ | – | + |
|
Macrocentrinae/Macrocentrus | Nearctic | 54/13 | ++ | ++ | ++ | – | + |
|
Microgastrinae/ Hypomicrogaster | ACG | 45/40 | ++ | ++ | ++ | – | ++ |
|
Microgastrinae/Pseudapanteles | ACG | 36/25 | ++ | ++ | ++ | ++ | ++ |
|
Microgastrinae/Dolichogenidea | China | 34/26 | ++ | ++ | ++ | – | + |
|
Microgastrinae/ Microplitis, Snellenius | ACG | 33/28 | ++ | ++ | ++ | ++ | ++ |
|
Agathidinae/ Zelomorpha | ACG | 19/18 | – | – | + | ++ | ++ |
What is somewhat surprising (or worrisome?) is the realization that few of the researchers who have published a paper that could be considered as turbo taxonomy have continued to do afterwards, i.e., they have not produced additional monographs in the same turbo taxonomy style. Based on my, admittedly non-exhaustive, online searches, I can only mention Riedel and colleagues for weevils (
One may then ask, if turbo taxonomy is touted as “the way to move forward” in taxonomy, why are there so few adopters of the approach, and even fewer who repeat their efforts in subsequent papers? In my opinion the answer is simple: because turbo taxonomy still requires a significant amount of invested work and time, and it is not as easy and rapid as one might think or as it is purported to be in papers advocating for those revolutionary taxonomic approaches. A simple search of author names reveals that most of the published turbo taxonomy papers have been done primarily by graduate students (M.Sc. and Ph.D.) or postdoctoral fellows. They represent some of the more enthusiastic, hard-working, and “overperformer” researchers in the taxasphere, a great combination of youth, energy, and a desire/need to advance their careers. They certainly put in the effort needed to accomplish their turbo taxonomy feats, and they deserve all the praise for that. But could those papers become the “new normal” for taxonomy? I would argue that it is unrealistic to expect that turbo taxonomy papers can be produced effortlessly and quickly, much less in a sustained way, at least those closer to “traditional taxonomy” in the sense of providing keys and morphological descriptions.
I believe that
First let us look at what has been accomplished with turbo taxonomy relative to Braconidae during the past 15 years or so (2005–present). Table
Four of the large papers provide identification keys, “traditional” (i.e., morphology-based) species descriptions (as opposed to only DNA-based ones), and multiple illustrations of all or most species. The only exception to this is the paper of
The pattern among the shorter papers is mostly similar, with
An interesting comparison can be drawn between the
The examples in Table
There is no question that these papers could have been produced faster and easier if a minimalistic approach, such as those of
Thus, when considering papers that claim to be “fast” because they only rely on DNA-based descriptions, one must also consider hidden but significant amounts of work done prior to the taxonomy study. If time, expertise, and resources needed to obtain all the previous information on which the taxonomy is based were accounted for, then those papers would suddenly appear less quick and easy to produce than as advertised, at least relative to ACG studies.
Beyond time and resources not being properly assessed in a paper employing only DNA-based descriptions, there is a bigger issue. And that is the fact that any user of such a paper must, by default, obtain DNA data for their own specimens before any meaningful comparison can be made with the species dealt with in that paper. Otherwise, it is not possible to conclude if a specimen at hand belongs to a previously “DNA-described” species or is new. Thus, “DNA-only description” papers force users to do “DNA-only identifications”.
There is no problem with that, say some enthusiastic supporters of turbo taxonomy and DNA barcoding. It will actually democratize taxonomy because technical knowledge of a taxon, including the associated morphological jargon used to described it (e.g., number of setae on propodeum or sculpture on mesoscutum), would no longer be required. What used to be the domain of a relatively few taxonomists would become mostly unnecessary, because “soon” everyone would be able to use a device, à la Star Trek tricorder (https://en.wikipedia.org/wiki/Tricorder), to identify species. It would allow even school children to rapidly identify the caterpillar they found in their backyard or farmers in Central America to recognize which pest or parasitoid wasps are found in their crops. It all looks so nice and promising!
I fully agree that DNA barcoding democratizes taxonomy because indeed it reduces somehow the need for trained taxonomists to do routine identifications (e.g.,
Meanwhile, what we have is the fact that DNA-based taxonomy is not accessible or affordable to everyone (see further analyses and/or other perspectives in
Never mind the school children or farmers, arguably most world researchers cannot afford the current costs and associated logistic challenges mentioned above to obtain DNA-based identifications for every specimen they may need or want to identify (e.g., Srivathsan et al. 2021). If the route of having to obtain DNA barcodes (or any other molecular marker) to identify species becomes the only route to a scientific name, then this could make taxonomy even less accessible and democratic than using “traditional” techniques such as microscopes and dichotomous keys. At present is certainly valid to argue that the cost of traditional, morphology-based taxonomy is largely a “front end” cost mainly borne by the taxonomist, whereas DNA-only taxonomy necessitates high and significant “back end” user costs.
In addition to cost and who pays this, there is also the problem of the almost two million species described in the pre-molecular era, many with no DNA associated. Those species cannot simply be ignored, as it has been claimed to be the case in the
In the end, it comes down to the practicality and benefits/damages that minimalistic (extreme?) taxonomic approaches, such as those relying only on DNA barcodes for species description and recognition, bring. Do future revisions to be produced really need to ignore morphology and previously described species to instead rely entirely or almost exclusively on DNA barcodes, with the “justification” of describing species faster because of the biodiversity crisis? Or is it possible to build upon the works of
What I propose below is a workflow and guidelines for preparing turbo taxonomy papers, including estimated times for each task. The main motivation is to provide an alternative to
I do not pretend to reinvent the wheel, e.g., see Reidel et al. (2013),
Simplified key(s) and diagnostic descriptions, with a minimum set of morphological traits, will be prepared. The morphological traits, ideally chosen by a specialist in the taxon, need not be numerous but ideally should be easily and quickly assessed and scored (i.e., not requiring dissections, slide preparation, or other labour-intensive techniques). It is understood that DNA evidence likely is being used in most turbo taxonomy studies because of a perceived lack of differential morphological features for the group, and that morphology will not necessarily suffice to tell every species apart. However, morphology should at least be able to place most (ideally all) species within some sort of smaller group of species. A “species group”, as here considered, is based on some simple, diagnosable trait(s), e.g., “all species with legs brown or black versus all species with legs yellow” and does not necessarily have to be monophyletic.
The morphology component of the taxonomic revision should serve as the minimum piece of information to allow someone with a basic knowledge of the taxonomic group and simple equipment such as a microscope to recognize a species or species group if no other source of information, such as DNA, is available. [This statement may not be applicable in some groups, such as nematodes, fungi, etc. The present paper was mostly written thinking of insects, and it is mainly directed to groups where morphology has some role in recognizing species or groups.]
Although diagnostic descriptions should be as short as possible based on easily observable features, each species should be illustrated as fully as possible with images showing body areas from different angles in order to document the features important for differentiating species in the group (e.g., coloration, sculpture, etc.) and those features that are otherwise not described. Ideally, illustrations should be based on the holotype or specimens compared with the holotype; if a species is thought to be variable morphologically, then specimens showing the perceived range in variation should also be photographed.
In species complexes with very similar or cryptic morphology, additional effort does not necessarily need to be spent trying to separate them based on detailed study of morphology or morphometrics, but instead other non-morphological criteria (see below), if known, could be used to help distinguish the species.
The estimated time needed for the morphological work is 5 hours per species. This includes scoring and writing the species description based on minimum morphological traits, and also includes studying intraspecific variation and making a few measurements of relevant structures. All of these steps should take, on average, less than one hour per species, the exception being species with many available specimens and/or significant morphological variation. To account for extremes, an estimate of two hours of work per species is considered here. Photographing a species (4–8 shots of a specimen, to capture different angles) can be done in one hour depending on the number of specimens per species imaged, and the photographic equipment and montaging software used. Preparing a plate of images can be done in less than one hour. Estimating the time to prepare a simplified key is very difficult, and here a conservative estimate of one hour per species in the key is proposed. [Obviously, the calculations for this point do not include the years of taxonomic experience that are required to be able to describe a species in 5 hours. This is indeed another “hidden prior work” and time to factor in. However, it would not only apply similarly to both turbo taxonomy and any other taxonomic approaches but also it would be very difficult, if not impossible, to calculate; thus, that factor is not included here. One simple observation from that problem would be that we still need to have more trained taxonomists to do the work of describing new species!].
DNA barcoding and/or any other molecular marker will be a very important criterion to recognize and diagnose species, and for morphologically cryptic or very similar species, it may be the primary criterion. Species will be characterized as much as possible by their corresponding Barcode Index Number (BIN) (for a definition of BIN see
Where a species is primarily defined and identified by DNA barcodes because, e.g., basic morphology is insufficient or inconclusive, such “DNA-only species” must include sequences from at least two different specimens (to exclude potential definition of a species based on a single sequence, which could be a lab contamination, a chimera, or any other error). Where a species is defined by a combination of traits (morphological, biological, etc.), a less stringent molecular criterion is acceptable, and a single DNA barcode can be sufficient.
The estimated time needed for the molecular tasks is 5 hours per species. Sampling tissue for DNA barcoding from dry, pinned specimens is straightforward and takes less than 10 minutes per specimen. However, the associate requirements for preparing a 96-well plate and submitting it to the lab for processing may require many other tasks, e.g., taking one image per specimen and providing some details of the specimen for the BOLD database (in the case where specimen tissue is sequenced by the Canadian Center for DNA barcoding). A conservative estimate of 30 minutes per specimen is proposed. Because, as discussed above, it is usually necessary to have DNA barcodes of more than one specimen per species, the estimated here includes 3 hours per species. This estimate will vary significantly if specimens are prepared in batches smaller or larger than one 96-wells plate (which accommodates 95 specimens). Basic analysis of DNA barcodes (Neighbour-Joining trees as generated in BOLD) can be done quickly, but more complex and comprehensive analyses will take longer; a conservative estimate of 2 hours per species is proposed here.
Any extra information that contributes to recognizing or identifying a species based on ecological or ethological traits should be used as additional evidence supporting species delimitation, but not as the single source to describe a species. Examples in Braconidae include host data, parasitoid ecology, wasp seasonality, etc.
The estimated time needed for the ecological/ethological tasks is 1 hour per species, though this greatly depends on the available information for each taxon; it could be significantly less or even zero. This and the following are probably the least accurate time estimates of the list.
The minimum standard should be broad geographical distribution, i.e., biogeographical region, country, although detailed locality data is preferable. Information on habitat, e.g., collected in a rainforest or finer details, e.g., collected on understory of forest, on leaves of plant X, should also be provided when available. Distribution data can be used as supplementary evidence supporting a species delimitation and/or recognition, but not as the single source to describe a species.
The estimated time needed for the distribution data task is 1 hour per species, depending on the number of specimens to be data-mined and their geographic breadth, i.e., the amount of data available, and how much of that information is already databased.
Details of the name-bearing specimens (primary types) should be provided that minimally meet International Code of Zoological Nomenclature (ICZN) publication requirements, such as type depository, but also including the specimen’s unique identifier, specimen sex, country and other information on type specimen label(s) (photographs of such labels can be included), and any other detail (e.g., “specimen in good condition” or “missing a leg”) that facilitates the unambiguous recognition of the name-bearing type(s). The ZooKeys guidelines mentioned above are a great standard to follow.
For paratypes and other non-type specimens, considerably abbreviated data can be included. For example, just mentioning the unique identifiers for each specimen instead of detailing all the data for every specimen data is sufficient, as long as the unique identifiers are linked to a publicly available database or dataset where more detailed information is available.
The estimated time needed for dealing with specimen details is 1 hour per species, depending on the number of specimens and prior databasing. If most specimens are already databased, as is becoming more the norm in many collections, then the time may be less than 10 minutes for every primary type and another 10 minutes to record the unique identifiers of all other specimens.
Previously described species should not be ignored, i.e., all species treated in a new paper should not, by default, be considered as new species if there are prior available names. Instead, effort should be made to incorporate the previously described species including a reasonable effort to locate and study their types and/or authenticated material. Admittedly, there will be instances when this is not possible and the only data available is just a prior, possibly uninformative, and very short description. However, even if only incomplete information is available for previously described species this should be discussed in the paper as far as possible. Two hypothetical examples are discussed below.
The most extreme example would be that of a previously described species known only from the missing holotype, already lost, and a useless original description a few words long. Such a species should still be dealt with in a manner like this: “Species A cannot be run though our key because it is impossible to assess morphological traits X, Y, and Z used in the key and the only known specimen is lost. Thus, it is not possible to determine whether the name applies to one of the new species described here, but for practical purposes we assume that is not the case.” Statements like that would make clear to the user/reader that such names cannot be presently assigned, and may never be, while still allowing progress in describing any new species.
Most cases will be less extreme than the above, with most previously described species being able to be placed within some context of the taxonomic revision, i.e., compared with the new species being described. Included should be at least some sort of basic statement such as: “Species B can only be run to couplet 3 of our key, as characters X and Y (from our key) cannot be assessed for that species, and therefore the name could potentially apply to species C, D, or E (new species being described in our paper), but for practical purposes we assume it is none of them”. Again, this method reduces the potential number of names that could (eventually) be found to be synonyms (as at least the species keyed out through the first two couplets would not), while still enabling the new, better characterized species to be recognized.
In these two hypothetical cases, the previously described species are not ignored, even if their status can never be properly assessed. Thus, the new taxonomic revision would bring together all available information, including presenting the shortcomings and gaps in our current knowledge of some species.
The estimated time needed for dealing with previously described species is, conservatively, 2 hours per species, though it will depend on all factors discussed above.
The sum of all the time estimates above renders a total of 15 hours per species. That is roughly two days of work per species, or 2.5 species per week. Rounding down to 2 species per week and 50 weeks per year, one arrives at an estimate of 100 new species described in one full-time year of work by a turbo taxonomy practitioner.
However, how accurate is this estimate? Are there examples of this in the real world, or is the above just a theoretical, futile exercise?
It is difficult to get actual data from previous turbo taxonomy papers as to the time it took to complete the work because this is rarely (or never) stated by the author(s). But some information is available and other can be guessed.
I have no exact knowledge of how much time it took
Many of the other larger papers listed in Table
Fortunately, I can provide a more accurate estimate for my own work revising Apanteles (Braconidae) in Mesoamerica (
Another factor to consider is that a rate of 100 species/year can only be accomplished if treating species “in bulk”, i.e., if the purported review would include many new species. But not all taxonomic groups to be studied have hundreds of undescribed species and a taxonomic revision of “just” a dozen species would not be as time efficient. Furthermore, most people cannot spend 100% of their time doing taxonomic revisions. Even Ph.D. students have other things to do than just taxonomic revisions! Thus, a rate of 100 species/year is, in my opinion, a very high and somewhat unfair standard to expect, much less to meet on a consistent, year to year, basis; at least with current technology.
However, regardless of the actual time used for any taxonomic revision, efficiencies can be realized, such as including brief descriptions instead of traditional, longer, and more comprehensive ones, as proposed above. Going back to the real-world example of my own Apanteles paper, for that work I measured and scored 49 morphological characters (altogether more than 15,000 measurements). Many of those characters ultimately proved to be uninformative to distinguish species, being repetitive, too variable, or too subjective or complex to assess. In retrospect, the keys were also unnecessarily long, and some species almost impossible to tell apart based on the keys only (Eduardo Shimbori, pers. comm.). Looking back, eight years after I completed that paper in 2013, I see many inefficiencies in my work, and much superfluous data that could have been eliminated. Had I chosen a lower number of morphological characters and simplified the keys, it could have been completed quicker, without diminishing the final quality of the work. Had I assumed an approach similar to my proposed “cookbook recipe” above, the species would have been mostly recognized by DNA and host data, and the keys would have been constructed to serve a more basic and limited function than what I had intended, while still retaining some utility to recognize basic species-groups. Of course, one could argue that the potential value of any character cannot be comprehended until it has been analyzed. One cannot know that there are “x” number of useful characters, and what they are, prior to studying them. This is what research is all about. Perhaps the “useless” time spent on some measurements is actually an example of what is necessary and a part of all taxonomic revisions, unless morphological features are completely ignored.
One example of how work can be reduced and made faster but still retain value is the case of the Apanteles leucostigmus species group, which comprises 39 species and is, by far, the largest and most difficult group of Apanteles to recognize and separate species in Mesoamerica. The key from
Details of the key to the Apanteles leucostigmus species group as it appeared in
The above example, which I chose because it was the most difficult and problematic group of the Apanteles revision, illustrates how a mostly-but-not-only DNA based paper could be constructed in a more time-effective way. Other Apanteles groups from that
I do not pretend that my suggestions above will “solve” the problem of describing millions of additional species in a short period of time. Even a “fast” pace of 100 species/year per taxonomist would still take a few hundred years to finish the task, a luxury we cannot afford, or would require a significant increase in the number of professional taxonomists (an unlikely scenario). There is no easy or simple answer to the necessity (and urgency!) of accelerating taxonomic inventories. My opinion is that it will require a wide embracement of current and additional technology advances, but also some consensus-building among the taxonomic community on how to move forward, and perhaps even a broader involvement of citizen science. The present paper must be seen only as a modest attempt to provide some alternatives, even if insufficient. For some different perspectives and opinions on these topics, I recommend the reading of what the reviewers of the present paper had to say (Suppl. material
It is very telling to see how many strong reactions a single paper has awakened in just a few months after its publication (or two papers, if we account for
The authors cited in the previous paragraph have discussed in a more coherent, compelling, and convincing way that I probably could about the dangers and shortcomings of approaches such as those of
I want to acknowledge the reviewers Marko Mutanen (University of Oulu, Oulu, Finland), Stefan Schmidt (Bavarian State Collection, Munich, Germany), Brian V. Brown (Natural History Museum of Los Angeles County, Los Angeles, USA), and Istvan Miko (University of New Hampshire, Durham, USA), as well as the subject editor Lyubo Penev (Pensoft) for providing additional ideas and suggestions that not only improved the final version of this paper, but also significantly contributed to the wider dialogue about taxonomy and biodiversity crisis. I am also very grateful to John Huber and Gary Gibson (Canadian National Collection of insects, Ottawa, Canada) for their excellent reviews of earlier versions of the manuscript. This work was supported by Project J-002276 “Systematics of beneficial arthropods in support of resilient agroecosystems”, from Agriculture and Agri-Food Canada.
Comments from the reviewers
Data type: docx file
Explanation note: Comments from the reviewers of the manuscript, ordered as they were submitted. All reviewers agreed to be named in this supplementary file.