Complete mitochondrial genome sequence of Lepus yarkandensis Günther, 1875 (Lagomorpha, Leporidae): characterization and phylogenetic analysis

Abstract Lepus yarkandensis is a national second-class protected animal endemic to China and distributed only in the hot and arid Tarim Basin in Xinjiang. We sequenced and described the complete mitogenome of L. yarkandensis to analyze its characteristics and phylogeny. The species’ DNA is a 17,047 bp circular molecule that includes 13 protein-coding genes (PCGs), two rRNA genes, 22 tRNA genes, and one control region. The overall base composition was as follows: A, 31.50%; T, 29.40%; G, 13.30% and C, 25.80%, with a high A+T bias of 60.9%. In the PCGs, ND6 had deviation ranges for AT skew (–0.303) and GC skew (0.636). The Ka/Ks values of ND1 (1.067) and ND6 (1.352) genes were >1, indicating positive selection, which might play an important role in the adaptation of L. yarkandensis to arid and hot environments. The conserved sequence block, the central conserved domain, and the extended termination-associated sequences of the control region and their features were identiﬁed and described. The phylogenetic tree based on the complete mitogenome showed that L. yarkandensis was closely related to the sympatric Lepus tibetanus pamirensis. These novel datasets of L. yarkandensis can supply basic data for phylogenetic studies of Lepus spp., apart from providing essential and important resource for further genetic research and the protection of this species.


introduction
The Yarkand hare (Lepus yarkandensis) is endemic to China and is restricted to scattered oases around the Taklamakan Desert in the Tarim Basin of Xinjiang (Luo 1988;Smith et al. 2008Smith et al. , 2018. These hares live in hot, arid environments with scarce food and open terrain. Thus, this species is highly morphologically specialized, with smaller bodies, longer ears, and larger tympanic bullae than other Lepus species in China (Shan et al. 2011;Wu et al. 2011). This species is also listed as a second-class protected animal (Wang 1998). Several studies have been published on L. yarkandensis, including its morphology, skull morphometrics, genetic diversity, and genetic structures based on partial mitochondrial DNA (mtDNA) markers, microsatellites, and several nuclear genes (Li et al. 2005;Li et al. 2006;Aerziguli et al. 2010;Shan et al. 2011). The complete mtDNA sequence of L. yarkandensis has been reported (Huang et al. 2019), but without the details given of its characteristics, particularly those adapting to such extremely arid environments.
Characterized by small size, stable gene content, high evolutionary rate, relatively conserved gene arrangement, high information content, and maternal inheritance, animal mitogenomes are powerful tools used to investigate molecular evolution, phylogenetic relationships, and protective biology for many animals (Yu et al. 2017;Song et al. 2019;Hu et al. 2020;Wu et al. 2020).
In the present study, we successfully sequenced and characterized the complete mtDNA of L. yarkandensis, including its base composition, gene structure, and arrangement of protein-coding genes (PCGs) and a control region. We also constructed a phylogenetic tree based on complete mitogenome sequences to elucidate the relationship of L. yarkandensis with other Lepus spp. Therefore, this study provides essential scientific data and contributes to population genetics, adaptation, and phylogenetic studies of L. yarkandensis.

Materials and methods
A male adult L. yarkandensis was collected from Alar, Xinjiang, China (40°34'00"N, 81°19'33"E) on 24 December 2016. Complete mtDNA was extracted from muscle tissue using standard phenol-chloroform (Psifidi et al. 2010). The complete mitogenome of the species was sequenced by next-generation sequencing using an Illumina HiSeq platform by Hengchuang Gene Technology Co., Ltd (Shenzhen, China) and assembled using SOAPdenovo 12.04 (Luo et al. 2012). The genome structure was mapped using the CGView software (Stothard et al. 2005). The complete mitogenome sequences of 25 other lagomorph species were downloaded from GeneBank (Table 1). The base composition, Ka and Ks (Ka, Ks, Ka/Ks) values, and composition skew were analyzed using MEGA7, together with the following formulas: AT skew = [A -T]/ [A + T] and GC skew = [G -C] / [G + C] (Perna et al. 1995). A conserved sequence block (CSB) in the control region was identified based on previously published se- quence data from several mammals (Sbisà et al. 1997). All tRNA secondary structures, except for tRNA-Ser (AGN), were verified using the tRNAscan-SE Webserver (Lowe and Eddy 1997). A phylogenetic tree was constructed by neighbor-joining (NJ) using MEGA7 and Bayesian analysis using MrBayes (Ronquist et al. 2012;Kumar et al. 2016). An NJ tree was constructed with default settings. Bayesian analyses were performed using MrBayes v. 3.2.6 ×64 for the best-fit model, GTR+I+F+G4, as determined by IQ-TREE (Nguyen et al. 2015). With the final model, analyses were run for 5,000,000 generations.

Mitochondrial genome organization
The mitogenome of L. yarkandensis was a circular, double-stranded DNA molecule 17047 bp in size (GenBank accession number: MG279351) which is slightly longer than reported L. yarkandensis (MN450151) with 17011 bp (Huang et al. 2019). It contained all 37 typical vertebrate mitogenomes-13 PCGs, two rRNA genes, 22 tRNA genes, and one control region-among which 28 genes were encoded on the heavy strand (H strand), except for eight tRNA genes and the ND6 gene ( Fig. 1; Table 2). Eleven overlapping nucleotides with lengths ranging from 1 bp to 47 bp were present in the L. yarkandensis mitogenome, comprising a total length of 140 bp, with the longest nucleotide located between ND4 and tRNA-His. Moreover, 70 bp of intergenic spacer sequences spread over 12 regions in the hare mitogenome, ranging from 1 bp to 32 bp in size, with the longest was located between tRNA-Asn and tRNA-Cys (Table 2).

Genome composition and skewness
AT skew, GC skew, and A + T content were selected as parameters for investigating the pattern of the mitogenome nucleotide composition (Wei et al. 2010;Hassanim et al. 2005). The L. yarkandensis mitogenome had a base nucleotide composition of 31.50% for A, (Overlap is denoted as "-". Spacer regions are denoted as "+". No overlap or interval is denoted as "0".) 29.40% for T, 13.30% for G, and 25.80% for C, with an A+T bias of 60.90%. Moreover, A and C were more popular than T and G with overall AT skew = 0.034 and GC skew = -0.320 in the entire L. yarkandensis mitogenome (Table 3). These overall genome composition and skewness are highly similar to those of other Lepus spp., such as L. yarkandensis (MN450151), Lepus coreanus and Lepus tolai (Yu et al. 2015;Huang et al. 2019;Shan et al. 2020). However, in species such as Caenorhabditis elegans, Ascaris suum, and Mytilus edulis, different AT and GC skew values were determined-negative AT skew and positive GC skew (Perna et al. 1995). In Arbacia lixula and Anopheles cracens, both AT and GC skews were negative (Perna et al. 1995;Mao et al. 2019). Moreover, an AT-rich region is typically observed in vertebrates (Quinn et al. 1993;Zhao et al. 2016;Sarvani et al. 2018). Thus, this variation in AT and GC skews shows a degree of similarity within the same genus but not in different classes, which can also be used as an auxiliary reference for evaluating phylogenetic relationships.

Protein-coding genes
The total length of PCGs in the L. yarkandensis mitogenome was 11,417 bp, and its base composition was 30.50% for A, 30.90% for T, 12.00% for G, and 26.50% for C with an A+T bias of 61.40%. Among the 13 PCGs, 12 were located on the heavy strand (H strand), whereas ND6 was located on the light strand (Tables 2, 3), as observed in other Lepus species (Ding et al. 2014;Shan et al. 2020). The skewness of the entire PCGs in L. yarkandensis (Table 3) indicated a higher occurrence of T than A, with a negative AT skew (-0.007), and C than G with a negative GC skew (-0.337) ( Table 3). The negative AT skew value was inconsistent with that for most mammalians, which had positive AT skew values (Sarvani et al. 2018;Priyono et al. 2020). However, the result of the current study is highly similar to the result obtained for Camelus dromedarius (both AT and GC skews were negative), a heat-tolerant mammal (Sarvani et al. 2018;Manee et al. 2019).
To further estimate and understand the level of base bias between all PCGs, we calculated the AT and GC skew ratios for each PCG in the mtDNA genome of L. yarkandensis (Fig. 2). All values for the skewness of GC (except for ND6) in PCGs were negative, with C being more prevalent that G in the nucleotide composition. The ATP6, ATP8, ND2, and ND3 genes had positive AT skews, whereas the remaining genes (9 of 13) had negative values. Notably, ND6 had deviation ranges for AT skew (-0.303) and GC skew (0.636) when compared with the other 12 PCGs in the L. yarkandensis mtDNA sequence, and the deviation range is highly similar to some mammalians, such as Moschiola indica, Camelus dromedarius, and Bubalus quarlesi (Sarvani et al. 2018;Manee et al. 2019;Priyono et al. 2020).
As with the vertebrate mtDNA genome, the majority of PCGs in the L. yarkandensis mitogenome used ATG as the start codon, although ND2, ND3, and ND5 used ATT as the start codon. Most PCGs used typical stop codons (TAA for ND2, COI, COII, ATP8, ATP6, ND4L, and ND5; TAG for ND6 and COII), whereas a small  (Table 2), which could be completed via posttranscriptional polyadenylation (Anderson et al. 1981;Ojala et al. 1981). Both PCGs of our L. yarkandensis (MG279351) and reported Yarkand hare (MN450151) have identical start and end codons, but different skewness. The Ka, Ks, and Ka/Ks values of PCGs were estimated using substitution rates (Fig. 3). If Ka/Ks > 1, a positive selection effect was considered; if Ka/Ks = 1, a neutral effect was assumed; and if Ka/Ks < 1, purification selection was considered (Hurst et al. 2002). Except for ND1 and ND6, all PCGs in L. yarkandensis had average Ka/Ks values < 1, indicating purification selection. Meanwhile, for ND1 and ND6, Ka/Ks > 1 indicated positive selection. The function of the mitochondrial genome is crucial because it mainly undergoes evolutionary neutral or purifying selection. Other studies have reported that mitochondrial genes are also influenced by positive selection, particularly in animals adapting to harsh environments (Luo et al. 2008;Hichem et al. 2017;Jin et al. 2018). In the present study, positive selection in ND1 and ND6 might be beneficial to organisms and may confer to L. yarkandensis the ability to adapt to harsh and arid environments.

Control region
The control region 1604 bp in length was organized between trnP and trnF genes in the L. yarkandensis mitogenome (Table 2; Fig. 4). In vertebrate mitogenomes, the control region is a noncoding segment and consists of several control elements. These elements regulate genome replication and transcription (Boore 1999). In the current study, we successfully identified several highly conserved domains within the control region of the L. yarkandensis mitogenome-conserved sequence blocks (CSB) I-III, con- served domain (CD), and extended termination associated sequence (ETAS) I-II-on the basis of their homology with other members of Lagomorpha and mammals (Elisabetta et al. 1997) (Table 4; Fig. 4). Characteristic motifs were used to detect the CSB domains: CSBI (GACATA), CSBII (CAAACCCCCC), and CSBIII (TGCCAAAC-CCCAAAAAC) (Gemmell et al. 1996;Elisabetta et al. 1997). We found from the sequence alignment results among hares and other mammals (Elisabetta et al. 1997) that more variations existed in Yarkand hare, including base insertions and deletions in the whole control region. CD was conservative with a narrow length range. The ETAS and  CSB regions widely varied in the length of the control region, which is also the main reason for variations in mitogenome size in different species (Xu et al. 2012).
In CSB regions, CSB1 and CSB3 were relatively conservative, and CSB2 widely varied in L. yarkandensis. This finding contradicted the results for Felis catus and Mustelidae species (Elisabetta et al. 1997l;Zhang et al. 2009). In the present study, an ACCCC motif in the ETAS I sequence of L. yarkandensis was found, similar to that of the horseshoe bat ). In some taxa such as species of Mustelidae, cattle, and Cervidae, the sequences were GCCCC (Zhang et al. 2009;Douzary et al. 1997). Between CSB I and CSB II, a number of short tandem repeat motifs, which commonly characterize mitogenomes, were observed in the L. yarkandensis mitogenome (Ren et al. 2009). The short repeat CGTCTACGCGCACGTACAC-CCA was 22 bp with 14 repetitions (Table 4), whereas the long repeat ACAATACT-GACATAGCACTCAGCCTTTTATTTTTCCTCCAACAGGCATAACCCTAAT-TAAATTTTTCCAAAAAAAA occurred twice. Similarly, the short repeats CSB3 CGTCTACGCGCACGTACACCCA in L. yarkandensis (Fig. 4) occurred twice, which was also found in other Lepus species in this study. Notably, tandem repeats have been described in the control region of metazoans (Lunt et al. 1998;Rand et al. 1993;Yokobori et al. 2004) and the family Veneridae.

Transfer RNAs and ribosomal RNAs
Except for tRNA-ser (AGY), which lacked a D stem, the other 21 tRNAs formed complete secondary structures (Suppl. material 1). Aberrant loops have been found in some tRNA genes. These mismatches could be rectified by the post-transcriptional RNA-editing mechanism to maintain tRNA functions (Tomita et al. 2002).

Phylogenetic analysis
We constructed NJ and Bayesian trees based on the complete mtDNA genome of L. yarkandensis in this study and 25 other lagomorphs published on NCBI (Fig. 5). The topological structures of both trees were consistent and supported by high bootstrap values. The phylogenetic tree confirmed the existence of three distinct lineages-hares, rabbits, and pikas-which is consistent with Smith et al. (2018). In the present study, L. yarkandensis was not closely related to neither Lepus europaeus nor Lepus americanus but was closely related to Lepus tibetanus in Xinjiang, China. The latter was misnamed as Lepus capensis pamirs in our previous study (Shan et al. 2015) and was renamed by Smith et al. (2018). Our L. yarkandensis and L. yarkandensis (MN450151) were clustered on the same branch. One reason for this close relationship could be the relatively close habitat. Lepus t. pamirensis are mainly distributed in the Pamir plateau of southeastern Kashgar, Xinjiang, China, bordering the Tarim Basin. The L. yarkandensis sample used in the current study was from Alar City in western Tarim Basin, which is near the L. t. pamirensis distribution. Another reason could be similarly extreme environments. Both habitats are dry with scarce rainfall and a lack of food (Shan et al. 2011). However, the phylogenetic relationship between L. yarkandensis and L. t. pamirensis remains uncertain, as hybridization has occurred between them (Wu et al. 2011). Further analysis with more samples and more extensive markers is required.