Research Article |
Corresponding author: Ge-Xia Qiao ( qiaogx@ioz.ac.cn ) Corresponding author: Jun Chen ( chenj@ioz.ac.cn ) Academic editor: Maria Elina Bichuette
© 2024 Hao Meng, Yingnan Wang, Ge-Xia Qiao, Jun Chen.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Meng H, Wang Y, Qiao G-X, Chen J (2024) Mitochondrial genome data provide insights into the phylogenetic relationships within Triplophysa dalaica (Kessler, 1876) (Cypriniformes, Nemacheilidae). ZooKeys 1197: 43-55. https://doi.org/10.3897/zookeys.1197.116342
|
Due to the detrimental effect of formaldehyde on DNA, ethanol has replaced formalin as the primary preservative for animal specimens. However, short-term formalin fixation of specimens might be applied during field collection. In an increasing number of studies, DNA extraction and sequencing have been successfully conducted from formalin-fixed specimens. Here the DNA from five specimens of Triplophysa dalaica (Kessler, 1876) were extracted and performed high-throughput sequencing. Four of the specimens underwent short-term fixation with formalin and were subsequently transferred to ethanol. One was continuously stored in ethanol. No significant difference of DNA quality and amount were observed among these samples. Followed by assembly and annotation, five mitochondrial genomes ranging in length from 16,569 to 16,572 bp were obtained. Additionally, previously published data of other individuals or species were included to perform phylogenetic analyses. In the reconstructed trees, all eight individuals of T. dalaica form a monophyletic group within the Triplophysa branch. The group is divided into three clades: (1) samples from the Yellow River, (2) those from the Yangtze River, and (3) those from the Haihe River, and the Lake Dali Nur. This study sheds initial light on the phylogeographic relationships among different populations of T. dalaica, and will support the research about its evolutionary history in the future.
High-throughput sequencing, mitogenome assembly, phylogeny, stone loach
Biological specimens preserved in museums represent a reservoir of valuable data. They provide information required in various fields such as taxonomy, geographic distribution, population dynamics, and climate change (
Formalin, an aqueous solution of formaldehyde, has been widely used as a preservative for specimens of invertebrates, fish, amphibians, and reptiles. However, it is a formidable challenge to extract hDNA from formalin-fixed specimens due to formalin’s propensity to induce three forms of DNA damage in specimens: (1) fragmentation, (2) cross-linking between DNA and protein molecules, and (3) modification of DNA bases (
Owing to the deleterious impact on genetic material and the inherent toxicity of formaldehyde, ethanol has been used as a preferred preservative instead of formalin. An increasing number of museums have transferred historical specimens preserved in formalin to ethanol. Moreover, due to limitations during fieldwork, specimens may undergo a temporary fixation in formalin and later transferred to ethanol for preservation. This study attempts to process these formalin-to-ethanol samples and conduct high-throughput sequencing (HTS).
Triplophysa is one of the most diverse genera within the family Nemacheilidae, with over 140 documented species in FishBase (
Within the genus, Triplophysa dalaica (Kessler, 1876) is an endemic species in China. It was initially described and collected from the Lake Dali Nur, which is an alkaline lake located in Inner Mongolia, China (43.38°N, 116.72°43′E). In addition to the Lake Dali Nur and its surrounding lakes and rivers, T. dalaica is also distributed in fresh water such as the Yellow River (
Current study involves five mitogenomes obtained through HTS from T. dalaica specimens, and three T. dalaica mitogenomes published or assembled from released HTS data, along with 16 published mitogenomes of other species as outgroups. The analyses aim to confirm the taxonomic status of T. dalaica, and reconstruct the phylogenetic relationship of populations residing different geographic origins.
In this study, five specimens of Triplophysa dalaica were selected from the National Animal Collection Resource Center, representing individuals originating from three distinct rivers (Fig.
ID | Location | Water System | Data Acc. No. | Source |
---|---|---|---|---|
YeR1 | 34.66°N, 107.04°E | Yellow River | OR857523 | This study |
YeR2 | 34.94°N, 106.72°E | Yellow River | OR857524 | This study |
YaR1 | 33.85°N, 107.46°E | Yangtze River | OR857525 | This study |
YaR2 | 33.85°N, 107.46°E | Yangtze River | OR857526 | This study |
HaR3 | 40.32°N, 113.30°E | Haihe River | OR857527 | This study |
HaR1 | Hebei Province | Haihe River | KY945353 | Submitted by Feng et al. |
HaR2 | 35.91°N, 113.86°E | Haihe River | SRX8097844 |
|
LDN1 | 43.38°N, 116.66°E | Lake Dali Nur | SRX8097848 |
|
To minimize the potential contamination, the entire DNA extraction process was conducted in a laboratory that had not previously been exposed to fish samples. Fin clips, approximately 5 mm in length from the tip of the right pectoral fin, were utilized. For formalin-fixed samples, a modified version of the protocol outlined by a previous study (
High-throughput sequencing (HTS) was conducted on a Illumina platform with PE150 strategy at Berry Genomics (Beijing, China) and Novogene Bioinformatics Technology Co., Ltd (Beijing, China). In addition to the five individuals sequenced for this study, HTS data for two T. dalaica individuals from the Lake Dali Nur and the Haihe River (Fig.
The reads were subjected to quality control using fastp v. 0.23.4 (
The reads were mapped to the reference mitogenome of T. dalaica in NCBI (accession number KY945353) using Geneious v. 9.1.8 (Biomatters Ltd, Auckland, New Zealand) with 10 iterations in medium-low sensitivity. To eliminate nuclear mitochondrial DNA segments (NUMTs), the mapped reads were subjected to de novo assembly using Geneious assembler. This approach yielded a contig approximately 16 kb in length. Subsequently, a manual inspection and sequence concatenation process was performed at both ends of the contig, resulting in the circular mitochondrial genome.
Following the assembly, the mitogenome was annotated on GeSeq (
In addition to the five mitogenomes obtained in our study and the two assembled from published HTS data, an additional mitogenome of T. dalaica from the Haihe River (Fig.
The sequences for 13 protein-coding genes (PCGs) without stop codon from the mitogenomes were extracted and aligned based on their translated amino acid sequences with Geneious v. 9.1.8. These alignments were concatenated and indels were preserved, resulting in a total alignment length of 11,427 bp. Subsequently, the optimal partitioning scheme and substitution model for different genes and codon positions were determined by PartitionFinder 2 (
The phylogenetic analysis was performed using both maximum-likelihood (ML) and Bayesian-inference (BI) methods based on the best-fitting partition strategy. For the ML analysis, 1,000 fast bootstrap replicates were conducted with the GTR+I+G substitution model to assess the support values using RAxML v. 8.2.12 (
With the alignments of the 13 PCGs, the divergence time for three main clades of T. dalaica was estimated by a Markov chain Monte Carlo (MCMC) approach using BEAST v. 2.7.6 (
Because there is no solid fossil record of Triplophysa, a fossil of the genus Cobitis was used as time calibration for the tree (Suppl. material
DNA was successfully extracted from the specimens of five Triplophysa dalaica. Agarose gel electrophoresis and fragment analysis by Agilent 5400 revealed varying degrees of degradation in the DNA from all these samples, with most fragments shorter than 4 kb. When utilizing nearly equal amounts of fin tissue samples, the DNA concentrations obtained from all five samples fell within the range of 10–30 ng/μL (Suppl. material
High-throughput sequencing of the five samples yielded an average of approximately 5 Gb of data per sample (Suppl. material
The above eight mitochondrial genomes displayed an average base composition of A: 28.13%, T: 28.14%, G: 18.04%, C: 25.63%, and the GC content was 43.67%. These compositions exhibited no significant differences among each other (Chi-squared test, df = 6, p = 0.634). Following de novo assembly, seven mitochondrial genomes were annotated with GeSeq. These mitochondrial genomes included 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes, 13 protein-coding genes (PCGs), and one non-coding control region. The 13 PCGs in these eight T. dalaica mitochondrial genomes spanned 11,421 bp (including stop codons), encoding a total of 3,800 amino acids. Among these, 1,754 nucleotide and 149 amino acid variable sites were observed. All nucleotide mutations were found to be substitutions, and no insertions or deletions were detected (Suppl. material
The coding sequences (CDS) without stop codon of the 13 mitochondrial PCGs from these eight T. dalaica individuals, along with 13 other Triplophysa species, two Cobitidae species, and one Gastromyzontidae species, yielded a total alignment with length of 11,427 bp. This alignment was divided into 39 user-defined partitions based on the three different codon positions of the 13 PCGs. The best scheme consisted of four partitions with its own best-fitting substitution model according to PartitionFinder 2 (Suppl. material
Subsequently, employing the best-fitting partitioning scheme, phylogenetic trees for all 24 taxa were reconstructed using both RaXML (with a unified GTR + I + G model) and MrBayes (with individual partition-specific best-fitting substitution models). The resulting majority rule consensus trees (Fig.
A The majority rule consensus tree constructed using MrBayes based on the CDS of 13 mitochondrial PCGs (excluding stop codons) of eight Triplophysa dalaica individuals and outgroup species, totaling 11,427 bp. The topology of the tree closely resembles that constructed by RAxML. Posterior probabilities (from MrBayes) and bootstrap values (from RAxML) for branches are depicted as two different colored rectangles, one above the other B details of the clade containing the eight T. dalaica individuals in the phylogenetic tree. Numerical values on branches represent posterior probabilities and bootstrap values, respectively. The dashes represent values less than 50. The red dots indicate divergence time estimated by MCMC approach with 95% HPD.
Within the branch of T. dalaica (Fig.
The divergence time between the T. dalaica and its sister clade containing T. dorsalis and T. stoliczkai was estimated at 11.35 Ma (95% HPD: 8.75–14.6 Ma; Suppl. material
In this study, DNA was extracted from four samples that underwent short-term formalin fixation and one sample continuously preserved in ethanol. Notably, there were no significant differences in DNA concentration or fragmentation among these samples, and mitogenomic sequences were successfully assembled. It suggests that short-term formalin fixation (for around 30 days) may not significantly contribute to DNA degradation. Therefore, when ethanol is unavailable due to acquisition or transportation in the field, short-term formalin fixation may be considered an acceptable approach. However, prolonged immersion in formalin would lead to irreversible DNA damage. (
The Triplophysa dalaica has been reported to inhabit the Lake Dali Nur, surrounding rivers and lakes of Inner Mongolia, as well as the Yellow River, the Haihe River, and the Yangtze River. The geographic sources of the samples analyzed in this study encompass all four of mentioned regions. All T. dalaica individuals form a monophyletic branch. This suggests that these eight individuals are the same taxon which belongs to the genus Triplophysa. Within the clade, the individuals from the Yangtze River (YaR1 and YaR2) and the Yellow River (YeR1 and YeR2) form two well-supported clusters, signifying the genetic distinctiveness of these two populations. They exhibit a sister relationship, suggesting their most common ancestor diverged from the clade III. Additionally, the sampling sites of the Yangtze and the Yellow Rivers are geographically adjacent, with a straight-line distance of approximately 100 km on the map. Their closer spatial distance correlates their genetic distance.
Clade III includes individuals from the Haihe River (HaR1, HaR2, and HaR3), and the Lake Dali Nur (LDN1), indicating close relationships among these individuals. The divergence-time estimation with mitochondrial PCGs also suggested they differentiated recently. However, a prior study conducted demographic history analysis using whole-genome resequencing data with the software PSMC for these two individuals, and the change of effective population size implied that populations from the Lake Dali Nur and the Haihe River might diverge approximately 1 million years ago (
In the divergence-time analysis, these three clades were estimated to diverged 7–10 Ma. The divergence is relatively deep compared to the intraspecies differentiation in other Triplophysa species (
We thank the collectors of the specimens, Drs Yahui Zhao, Yingchun Xing, Haibo Liu, Xuejian Li, Chengyi Niu, and Jie Bai. They also provided information about specimens collecting and preserving. Yongqiang Wang, Zhiyun Chen, Huanshan Wang, Bo Cai, and other staff of museums and institutes were interviewed for the current status of wet specimen preservation. We thank Dr Baocheng Guo and his group for valuable suggestions on this study.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This research was supported by the National Science & Technology Fundamental Resources Investigation Program of China (grant no. 2019FY101800), and the National Animal Collection Resource Center, China.
Conceptualization: HM. Data curation: HM. Formal analysis: HM, YW. Funding acquisition: JC, GXQ. Investigation: HM. Methodology: HM, YW. Project administration: JC. Resources: GXQ, YW, JC. Supervision: JC. Visualization: HM. Writing – original draft: HM. Writing – review and editing: YW, GXQ, JC.
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Supplementary data
Data type: pdf
Explanation note: table S1. Collecting and extraction information of the five Triplophysa dalaica samples; table S2. Summary of reads used for the mitochondrial genome assembly; table S3. Summary of the published mitogenomes used in phylogenetic analyses; table S4. Variable sites of the 13 PCGs in eight T. dalaica mitochondrial genomes; table S5. The best partition scheme and substitution model by PartitionFinder 2; figure S1. Divergence time among mitogenomes in this study with MCMC approach.