The complete mitochondrial genome of Microphysogobioelongatus (Teleostei, Cyprinidae) and its phylogenetic implications

Abstract Mitochondria are important organelles with independent genetic material of eukaryotic organisms. In this study, we sequenced and analyzed the complete mitogenome of a small cyprinid fish, Microphysogobioelongatus (Yao & Yang, 1977). The mitogenome of M.elongatus is a typical circular molecule of 16,612 bp in length containing 13 protein-coding genes (PCGs), 22 transfer RNA genes, two ribosomal RNA genes, and a 930 bp control region. The base composition of the M.elongatus mitogenome is 30.8% A, 26.1% T, 16.7% G, and 26.4% C. All PCGs used the standard ATG start codon with the exception of COI. Six PCGs terminate with complete stop codons, whereas seven PCGs (ND2, COII, ATPase 6, COIII, ND3, ND4, and Cyt b) terminate with incomplete (T or TA) stop codons. All tRNA genes exhibited typical cloverleaf secondary structures with the exception of tRNASer(AGY), for which the dihydrouridine arm forms a simple loop. The phylogenetic analysis divided the subfamily Gobioninae in three clades with relatively robust support, and that Microphysogobio is not a monophyletic group. The complete mitogenome of M.elongatus provides a valuable resource for future studies about molecular phylogeny and/or population genetics of Microphysogobio.


Introduction
The genus Microphysogobio Mori, 1934, small gudgeons of the subfamily Gobioninae, was originally established by Mori (1934) for M. hsinglungshanensis Mori, 1934(Sun et al. 2021. Currently, this genus comprises approximately 30 species that are widely distributed in East Asia, including China, Vietnam, Mongolia, Laos, and the Korean Peninsula (Jiang et al. 2012;Huang et al. 2016;Huang et al. 2017). The prominent feature of the lip papillae was considered a diagnostic character for defining the genus Microphysogobio and distinguishing it from other genera in the subfamily Gobioninae (Yue 1998). Molecular phylogenetic studies of the subfamily Gobioninae has confirmed the monophyletic nature of the Gobioninae (Tang et al. 2011;Zhao et al. 2016). However, the phylogenetic relationships of Microphysogobio and related genera have not been fully resolved, and it is a long-standing issue in the classification of Gobioninae.
The typical vertebrate mitogenome is approximately 15-18 kb in length, consisting of 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes, and one non-coding control (D-loop) region (Wolstenholme 1992;Boore 1999). Mitochondrial genomic DNA has the following characteristics: small size, multiple copies, maternal inheritance, conservative gene products, no introns, fast evolutionary rate, and rare recombination (Boore 1999;Xiao and Zhang 2000). Therefore, it is widely used in species identification, molecular evolution, and phylogenetic studies (Imoto et al. 2013;Sharma et al. 2020). Historically, several genes on the mitochondrial genome, such as Cyt b gene and D-loop (Wang et al. 2002;He and Chen 2006) were used to study the evolutionary relationships. More recently, with advances in sequencing technology and data analysis methods, information on fish mitogenomes has been accumulating in public databases (Miya and Nishida 2000;Miya et al. 2003;Saitoh et al. 2006;Yamanoue et al. 2007;Kim et al. 2009).
Microphysogobio elongatus (Yao & Yang, 1977) is a small, benthic, freshwater fish which is widely distributed in China (Yue 1998;Wang 2019). However, little is known regarding M. elongatus, with previous studies focusing on resources investigation and taxonomy (Li et al. 2012;Liu et al. 2013;. In this study, we sequenced, annotated, and characterized the complete mitochondrial genome of M. elongatus. Additionally, we reconstructed the mitogenomic phylogeny of Gobioninae, involving 103 species and subspecies based on 13 PCGs to confirm the taxonomic status of M. elongatus and its relationships within Gobioninae.

Ethics statements
For field collection, no specific permissions are required for the collection of gobionine fishes from public areas. The field collections did not involve endangered or protected species, and the collection site is not a protected area.

Sample collection and DNA extraction
Individuals of M. elongatus were collected from Jiangkou County, Guizhou Province, China (27°46'12"N, 108°46'56"E), in August 2019. The specimens were preserved in 95% ethanol and stored at -20 °C until DNA extraction. Genomic DNA was extracted using a standard high-salt method (Sambrook et al. 1989). The integrity of the genomic DNA was measured by 1% agarose gel electrophoresis, and the concentration and purity of DNA were determined using an Epoch 2 Microplate Spectrophotometer (Bio Tek Instruments, Inc., Vermont, USA).

PCR amplification and sequencing
The entire mitogenome of M. elongatus was amplified in overlapping PCR fragments by 14 primer pairs designed from the mitogenome of M. kiatingensis (GenBank accession number NC_037402) by Primer Premier v. 5.0 software (Lalitha 2000). The primers used in this study are provided in Suppl. material 1: Table S1. Each PCR reaction was carried out in 35 μL total volume, containing 17.5 μL of 2×Taq Plus Master-Mix (CoWin Biosciences, Beijing, China), 1 μL of each primer (10 μM) and 1.0 μL of template DNA (100 ng). The PCR reactions were performed under the following conditions: an initial pre-denaturation at 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 42-55 °C for 30 s, 72 °C for 1-2 min, and a final extension at 72 °C for 10 min. Amplification products were fractionated by electrophoresis through 1% agarose gels. The lengths of fragments were determined by comparison with the DL2000 DNA marker (TaKaRa, Japan). The PCR products were sequenced by ABI PRISM 3730 (Sangon Biotech. Co., Ltd, China).

Mitogenome annotation and sequence analysis
The mitogenome was initially assembled by the SeqMan software of DNAStar (DNASTAR Inc., Madison, WI, USA), then manually proofread based on sequencing peak figures. The assembled mitogenome sequence was subsequently annotated using MitoAnnotator on the MitoFish homepage (Iwasaki et al. 2013). All tRNA genes were identified with tRNAscan-SE search server (Lowe and Chan 2016) and MITOS WebServer (Bernt et al. 2013). The base composition, codon usage, and relative synonymous codon usage (RSCU) of all PCGs were calculated using MEGA v. 6.0 (Tamura et al. 2013). Strand asymmetry was calculated using the following formulae: ATskew = (A -T) / (A + T) and GC-skew = (G -C) / (G + C) (Perna and Kocher 1995).

Phylogenetic analysis
For phylogenetic analysis, 103 gobionine fishes were downloaded from Gen-Bank. Additionally, Acheilognathus omeiensis (NC_037404.1), Rhodeus ocellatus (NC_011211.1), and R. sinensis (NC_022721.1) were used as outgroups. Species used in the analysis are listed in Suppl. material 2: Table S2. The shared 13 concatenated protein-coding genes (PCGs) were extracted and recombined to construct a matrix using PhyloSuite v. 1.1.16 (Zhang et al. 2020). The 13 PCGs were aligned separately using MAFFT v. 7.313 (Katoh and Standley 2013) and concatenated. The optimal partition strategy and nucleotide sequence substitution model of each partition were estimated by PartitionFinder v. 2.1.1 (Lanfear et al. 2017) with the Corrected Akaike information criterion (AICc) algorithm under a greedy search. A Bayesian inference (BI) analysis was performed using MrBayes v. 3.2.6 (Ronquist et al. 2012) with the models determined by PartitionFinder. Two independent runs of four Markov Chain Monte Carlo (MCMC) chains (one cold chain and three heated chains) were performed for two million generations sampling every 100 generations. The first 25% of the generations were discarded as burn-in and a 50% majority rule consensus tree was constructed. A maximum likelihood (ML) analysis was performed using IQ-TREE v. 1.6.8 (Nguyen et al. 2015) with 10,000 bootstrap replicates using the ultrafast bootstrapping algorithm (Minh et al. 2013). All software were integrated into PhyloSuite v. 1.1.16 (Zhang et al. 2020). The phylogenetic trees were visualized using FigTree v. 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

Genome organization and nucleotide composition
The complete mitochondrial genome of M. elongatus was first reported and analyzed in this study. The full length of the M. elongatus mitochondrial genome sequence had 16,612 bp. The complete mitochondrial genome of M. elongatus was annotated and submitted to GenBank (GenBank accession number MN832777). It consisted of 13 PCGs, 22 tRNA genes, two rRNA genes, and one control region ( Fig. 1; Table 1). All mitochondrial genes were encoded on the heavy strand (H strand), except the ND6 gene and eight tRNAs (Table 1). The arrangement and content of these genes were conserved and typical of Microphysogobio mitochondrial genomes (Hwang et al. 2014;Lin et al. 2014;Cheng et al. 2015). The M. elongatus mitogenome contained a total of 21 bp overlapping regions which were in six pairs of neighboring genes, ranging from 1 to 7 bp in length. The longest overlapping region (7 bp) was located between ATP8 and ATP6, ND4L and ND4. A total of 65 bp intergenic nucleotides (IGN) were dispersed in 13 locations, ranging from 1 to 31 bp in length (Table 1). The longest intergenic spacer was located between tRNA Asn and RNA Cys . These overlapping and intergenic regions are very common in fish mitochondrial genomes Wang et al. 2020).
The nucleotide composition of the M. elongatus mitogenome was as follows: 30.8% A, 26.1% T, 16.7% G, and 26.4% C, and were slightly (56.9%) A+T rich (Table 2). In addition, the A+T contents of PCGs, rRNAs, and tRNAs were also slightly A+T rich (Table 2). Compared to the entire mitogenome, the control region, known as an A+T rich region, contained the highest A+T content (68.1%) ( Table 2). The skew statistics revealed a positive AT-skew and a negative GC-skew across the whole mitogenome (Table 2), indicating a bias toward As and Cs.

Protein-coding genes and codon usage
The 13 PCGs were 11,423 bp in total length. The longest PCG was 1836 bp (ND5), and the shortest was 165 bp (ATP8) ( Table 1). The average base composition of the 13 PCGs were as follows: 28.7% A, 28.2% T, 16.2% G, and 26.9% C ( Table 2). All PCGs were initiated with the typical ATG codon except COI with GTG as its initiator codon. Six PCGs (ND1, COI, ATPase 8, ND4L, ND5, and ND6) terminated with a complete stop codon. The others terminated with an incomplete stop codon TA-or T-, which would be completed as TAA by post-transcriptional polyadenylation at the 3' end of the mRNA (Ojala et al. 1981).

Transfer and ribosomal RNAs
The mitogenome of M. elongatus contains 22 tRNAs, which were interspersed across the circular genome, ranging from 68 bp (tRNA Cys ) to 76 bp (tRNA Leu (UUR) and tRNA Lys ) in length ( Table 1). The secondary structure of all tRNA sequences were predicted and the results showed they are capable of folding into typical cloverleaf secondary structures except for tRNA Ser(AGY) , in which the dihydrouridine (DHU) arm did not form a stable structure (Suppl. material 7: Fig. S7). This unique secondary structure has been commonly witnessed in many other fishes Zhong et al. 2018). The average base composition of the tRNAs was 28.4% A, 26.9% T, 23.5% G, and 21.2% C ( Table 2).
The 12S rRNA and 16S rRNA were the only two ribosomal genes in the mitogenome of M. elongatus. They were 960 bp and 1692 bp in length, respectively (Table 1). Similar to other fishes (Broughton et al. 2001;, the 12S rRNA and 16S rRNA were located between tRNA Phe and tRNA Val , and between tRNA Val and tRNA Leu (UUR) , respectively (Table 1). Their average base composition was as follows: 34.2% A, 20.0% T, 21.2% G, and 24.6% C. The average A + T content of both rRNAs was 54.2% (Table 2). The lengths and A + T content of these two rRNAs were well within the ranges observed in other Microphysogobio mitogenomes (Lin et al. 2014;Hwang et al. 2014;Cheng et al. 2015).

Mitochondrial control region
The mitochondrial control region (CR), or D-loop, is responsible for replication and transcription of the mitogenome (Boore 1999). The CR of M. elongatus was 930 bp in length and located between tRNA Phe and tRNA Pro . Multiple homologous sequence alignment revealed three conserved structures (termination-associated sequence (TAS), central conserved sequence blocks (CSB-F, CSB-E, and CSB-D) and conserved sequence blocks (CSB-1, CSB-2, and CSB-3)) within the CR (Suppl. material 8: Fig. S8), as seen in most fish mitogenomes (Broughton et al. 2001;.

Mitochondrial phylogeny within Gobioninae
We reconstructed the phylogenetic tree of gobionine fishes based on the 13 concatenated protein-coding genes. The optimal partitioning scheme for the dataset and the bestfitting substitution model for each partition were provided in Suppl. material 4: Table  S4. The trees resulting from the BI and ML analyses showed a consensus topology, and the only differences were the Bayesian posterior probabilities and ML bootstrap values (Fig. 2, Suppl. material 9: Fig. S9). The phylogenetic analysis revealed that Gobioninae could be separated into three clades (Tribe Sarcocheilichthyini, Tribe Gobionini and Hemibarbus-Squalidus group) with Squalidus gracilis majimae excluded (Fig. 2), which was consistent with previous phylogenetic studies (Tang et al. 2011;Zhao et al. 2016).
The Hemibarbus-Squalidus group includes Belligobio, Hemibarbus, and Squalidus (BS = 99%, PP = 100%). The Hemibarbus-Squalidus group was located at the basal position Gobioninae in the phylogenetic tree. This confirmed morphology-based hypothesis that Hemibarbus and Belligobio might represent the primitive group of Gobioninae (Bǎnǎrescu 1992). Hemibarbus and Belligobio were similar in morphological, and therefore, Bǎnǎrescu and Nalbant (1973) assigned Belligobio as a subgenus of Hemibarbus. The phylogenetic tree of Gobioninae subfamily based on single gene confirmed the close relationship of Squalidus to Hemibarbus (Yang et al. 2006;Liu et al. 2010;Tang et al. 2011). Nonetheless, the phylogenetic tree suggests that the classification of S. g. majimae should be further revised.
The tribe Gobioninae includes Gobiobotia, Xenophysogobio Saurogobio, Pseudogobio, Platysmacheilus, Biwia, Microphysogobio, Romanogobio, Abbottina, Acanthogobio, Gobio, and Ladislavia (BS = 85%, PP = 97%). Within the group, Ladislavia taczanowskii was at the basal position. The phylogenetic tree from mtDNA supported Ladislavia should be included in the Gobioninae group (Tang et al. 2011). Bǎnǎrescu and Nalbant (1973) highlighted that Acanthogobio seemed to be a morphologically derived species of Gobio, as confirmed in our study. Microphysogobio is not monophyletic because of the placement of Biwia, Romanogobio, and Platysmacheilus which are found nested within Microphysogobio; this is in accordance with previous studies based on mitochondrial and nuclear genes (Yang et al. 2006;Tang et al. 2011). In morphology, P. exiguous and Microphysogobio showed similar characteristics that were a single row of dentition, with indicated that the evolutionary process was the decreasing number of teeth rows (Yu and Liu 2011). The taxonomic status of Microphysogobio remains uncertain because its putative member species were found to be broadly polyphyletic.
The tribe Sarcocheilichthyini includes Coreius, Coreoleuciscus, Gnathopogon, Paracanthobrama, Gobiocypris, Pungtungia, Pseudopungtungia, Pseudorasbora, Rhinogobio, and Sarcocheilichthys (BS = 86%, PP = 100%). Based on our trees, Pungtungia herzi was assigned to Pseudopungtungia, and a grouping like this has been proposed in an earlier study (Kim et al. 2013). Our results and a previous study by Kim et al. (2013) suggested an unstable taxonomic status of the Pseudopunungtungia genus, which is polyphyletic. The placement of Gobiocypris within the Gnathopogon gives support to Gobiocypris as a subgenus of Gnathopogon (Tang et al. 2011). Moreover, we found that Paraleucogobio was also included in Gnathopogon, so we speculated that Paraleucogobio might also be a subgenus of Gnathopogon. Surprisingly, the phylogenetic tree showed that Sarcocheilichthys biwaensis and S. variegatus microoculus had almost non-existent branch lengths. Komiya (2014) et al. suggested multiple colonization events of Lake Biwa by S. biwaensis and S. v. microoculus and confirmed the rapid speciation of S. biwaensis from an ancestral S. v. microoculus form. Therefore, we surmise that S. biwaensis and S. v. microoculus probably have mitochondrial introgression. Introgressive hybridization was not rare between closely related species (Yang et al. 2006).

Conclusions
In the present study, we sequenced and described the complete M. elongatus mitogenome (16,612 bp) that contains 37 genes and one control region as typical for vertebrate mitogenomes. The characteristics of the newly sequenced mitogenome are mostly consistent with those reported in other Microphysogobio mitogenomes. The subfamily Gobioninae was composed of three major lineages, and the phylogenetic trees strongly supported the non-monophyly of Microphysogobio. The results of the present study will be useful for further investigation of the evolutionary relationships within Gobioninae.