Research Article |
Corresponding author: Jacqueline Heckenhauer ( jacqueline.heckenhauer@senckenberg.de ) Academic editor: Ana Previšić
© 2023 Jacqueline Heckenhauer, Ernesto Razuri-Gonzales, Francois Ngera Mwangi, Julio Schneider, Steffen U. Pauls.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Heckenhauer J, Razuri-Gonzales E, Mwangi FN, Schneider J, Pauls SU (2023) Holotype sequencing of Silvatares holzenthali Rázuri-Gonzales, Ngera & Pauls, 2022 (Trichoptera, Pisuliidae). ZooKeys 1159: 1-15. https://doi.org/10.3897/zookeys.1159.98439
|
While DNA barcodes are increasingly provided in descriptions of new species, the whole mitochondrial and nuclear genomes are still rarely included. This is unfortunate because whole genome sequencing of holotypes allows perpetual genetic characterization of the most representative specimen for a given species. Thus, de novo genomes are invaluable additional diagnostic characters in species descriptions, provided the structural integrity of the holotype specimens remains intact. Here, we used a minimally invasive method to extract DNA of the type specimen of the recently described caddisfly species Silvatares holzenthali Rázuri-Gonzales, Ngera & Pauls, 2022 (Trichoptera: Pisuliidae) from the Democratic Republic of the Congo. A low-cost next generation sequencing strategy was used to generate the complete mitochondrial and draft nuclear genome of the holotype. The data in its current form is an important extension to the morphological species description and valuable for phylogenomic studies.
Caddisflies, extended specimen, holotype genomics, taxonomy
In zoology, especially when considering invertebrates, new species are often not recognized as such in the field due to the minute size of the structures used to differentiate them from already described species. Intensive treatment (e.g., preparation and preservation) and careful examination of the collected specimens are required to determine if they are indeed undescribed. In addition, many new species are discovered in regions of the world where the scientific infrastructure is insufficient to guarantee high-quality, unfragmented DNA in collected specimens. Such was also the case for the holotype of Silvatares holzenthali Rázuri-Gonzales, Ngera & Pauls, 2022 (Trichoptera: Pisuliidae) (
The holotype specimen (SMFTRI00018633) was collected by FNM in the eastern D.R. Congo in 2017 and preserved in locally produced 80% ethanol. By the time the specimen was identified as representing a new species, it had been transferred into new ethanol, analyzed multiple times under the stereoscope, and shipped between countries. Without the possibility of cooling the preservative or the specimen in the D.R. Congo, it was clear that the DNA of this specimen would be substandard to what might be extractable from a freshly caught caddisfly specimen preserved in high-quality ethanol and with uninterrupted cooling. However, the described scenario for the holotype of S. holzenthali is the norm rather than the exception. In this paper, we want to showcase that it is possible and very valuable to generate a genomic resource from holotypes, even if the quality of the starting DNA is far from ideal.
Many initiatives are currently trying to harness recent technological developments to sequence and produce reference genomes for all species on Earth (
Another limitation of many genome sequencing initiatives is that they generally do not focus on the holotype of a species. However, in the currently accepted type-based taxonomy, the holotype (or, if necessary, the designated lectotype and neotype) serves as a species’ reference. For many species, sequencing a reference genome from the holotype is not a viable option. Many type specimens are old, and naturally, all type specimens are rare and of singular value, requiring special care, and non-invasive DNA extraction methods for genome sequencing. Thus, reference genome sequencing initiatives that require ample amounts of high-quality DNA for long-read sequencing technologies are logically and correctly focused on less valuable specimens, at best, from the locus typicus or from a paratype. Nevertheless, sequencing the holotype of a species allows for the genetic characterization of the most representative specimen for a given species as an eternal digital reference. Here we show that using a minimally invasive method to extract DNA from poorly preserved specimens allows taxonomists to capture and present the genetic characterization of the holotype while maintaining most of its morphological and structural integrity.
Genomic DNA was extracted from two legs as described in
We used two different approaches to estimate the genome size. First, we used a k-mer distribution-based method. For this, k-mers were counted with JELLYFISH v.2.3.0 (
The mitochondrial genome was first assembled with the raw reads using NOVOplasty v.4.2 (
Nuclear genome assembly was conducted in Spades v.3.14.1 (
Illumina sequencing resulted in 160 534 832 raw short reads with a data amount of 24.1 Gb. 3.3% of reads were identified as contaminated (2.7% Homo sapiens, 0.6% bacteria, 0.1% viruses, 0.03% other). Over-represented k-mers were successfully removed using autotrim.pl v.0.6.1 (Fig.
K-mer analysis based on raw read data estimated the genome size to be 531.15 Mb, with a heterozygosity of 37.7% (Fig.
The NOVOplasty assembly resulted in a 17 205 bp-long and circularized contig (Fig.
Standard abbreviations are given for protein-coding (yellow), transfer (pink), and ribosomal RNA (red) genes. The control region is shown in gray. Orientation of genes is indicated by direction of arrows.
The nuclear genome assembly contains 298 265 scaffolds with a total length of 534.50 Mb, an N50 of 2 549, and a GC of 35.27%. 99.07% of reads were mapped back to the assembly. The BUSCO search with 2 124 Endopterygota orthologs resulted in 74.7% BUSCOs; of these, 44.7% were complete (44.3% single, 0.4% duplicated), and 31% were fragmented. Blobtools detected no contaminations based on GC content and coverage distribution (Fig.
Taxon-annotated GC-coverage (TAGC) plots for the nuclear genome assembly. Scaffolds are represented with circles. Colors indicate the best match to the corresponding taxonomic annotation (grey= no hits, blue= Arthropoda, for other colors see legend in the figure, upper right box). The distribution of the total span (kb) of contigs for a given GC proportion or coverage is given in the upper- and right panels, respectively.
While the morphology of the genus Silvatares has been described extensively, less than a handful of partial genes have been published or uploaded to NCBI GenBank. For example, cadherin, cytochrome oxidase subunit 1 (COX1), and the 28S large subunit and 18S small subunit ribosomal RNA are available for S. ensifera (MN364796, KX291165, KX106901, AF436522, AF436172, AF436293, MN296628, AF436410); carbamoylphosphate synthase domain protein, isocitrate dehydrogenase, RNA polymerase II, and COX1 for Silvatares sp. (KC559510, KC559654, KC559734, KC559575); COX1 for S. collyrifer Barnard, 1934, (KX291056); and COX1 for S. thrymmifer Barnard, 1934 (MN344469; MN344493) (
Here, we present the mitogenome and a draft nuclear genome assembly through our ~45× sequencing coverage of short-read data. This genome assembly is admittedly far away from the quality standards of a reference genome; however, we argue that this genomic resource is still an invaluable addition to the characterization of the holotype of Silvatares holzenthali. The genome assembly reported in this study includes all the partial genes that had been hitherto sequenced for other Silvatares species, as well as 74.7% of the 2 442 benchmarking universal single-copy orthologs in Endopterygota. Additionally, this assembly provides the complete mitogenome, including the barcode markers. This highlights that for a few hundred dollars we can produce much more genomic information on type specimens than the “DNA barcode,” which has already become an important addition to many morphological species descriptions (e.g.,
In other taxa, holotype genomes have already been published. In a pioneering study,
The draft genome assembly generated in this study can be applied in population genetic studies, for example, to assess the heterozygosity of the type specimen as a proxy for population genetic variation at the time of sampling (
While we think a de novo genome or a genomic resource of any kind is an invaluable added resource and important additional diagnostic character in species descriptions, priority should be given to preserving specimen integrity of the type specimens. Most methods for extracting DNA ultimately cause at least minimal structural damage to the holotype. In our case, this damage (removing and clearing the abdomen; DNA extraction from two legs) was necessary to recognize and identify the holotype as a new species. No additional damage was done to extract DNA for generating the genome. However, in other situations, the methods used to preserve and store insects may not always allow for generating a de novo genome from holotypes without causing significant additional damage to the type. In this case we maintain that priority should be given to safekeeping the structural integrity of the holotype specimen, and genomic information possibly obtained from a paratype or a duplicated structure on the holotype.
Field work and taxonomy were funded by German Science Foundation Grant (DFG PA1617/4-1) to SUP. The genome sequencing is a result of the LOEWE Centre for Translational Biodiversity Genomics funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). JH was supported by the LOEWE Centre for Translational Biodiversity Genomics. Tilman Schell (LOEWE-TBG, Frankfurt) is acknowledged for his advice on bioinformatic methods.
Commands used in the study Heckenhauer, J., Rázuri-Gonzales, E., Mwangi, F.N., Schneider, J., Pauls, S. U. (2022) Holotype sequencing of Silvatares holzenthali (Trichoptera: Pisuliidae)
Data type: Bioinformatic commands
Genomic DNA degradation assessment on a TapeStation 2200
Data type: DNA Extraction quality control