The complete mitochondrial genome of Orancistrocerusaterrimusaterrimus and comparative analysis in the family Vespidae (Hymenoptera, Vespidae, Eumeninae)

Abstract To date, only one mitochondrial genome (mitogenome) in the Eumeninae has been reported in the world and this is the first report in China. The mitogenome of O.a.aterrimus is 17 972 bp long, and contains 38 genes, including 13 protein coding genes (PCGs), 23 tRNA genes, two rRNA genes, a long non-coding region (NCR), and a control region (CR). The mitogenome has 79.43% A + T content, its 13 PCGs use ATN as the initiation codon except for cox1 using TTG, and nine genes used complete translation termination TAA and four genes have incomplete stop codon T (cox2, cox3, nad4, and cytb). Twenty-two of 23 tRNAs can form the typical cloverleaf secondary structure except for trnS1. The CR is 1 078 bp long with 84.69% A+T content, comprising 28 bp tandem repeat sequences and 13 bp T-strech. There are two gene rearrangements which are an extra trnM2 located between trnQ and nad2 and the trnL2 in the upstream of nad1. Within all rearrangements of these mitogenomes reported in the family Vespidae, the translocation between trnS1 and trnE genes only appears in Vespinae, and the translocation of trnY in Polistinae and Vespinae. The absent codons of 13 PCGs in Polistinae are more than those both in Vespinae and Eumeninae in the family Vespidae. The study reports the complete mitogenome of O.a.aterrimus, compares the characteristics and construct phylogenetic relationships of the mitogenomes in the family Vespidae.


Introduction
Animal mitochondrial genomes (mitogenomes) have been widely used in studies of molecular evolution, population genetic structure, and phylogeny because of their stable gene content, rapid evolutionary rate, relatively conserved gene arrangement, maternal inheritance, and infrequent recombination (Wolstenholme 1992;Saccone et al. 1999;Oliveira et al. 2008;Li et al. 2017). The family Vespidae has more than 5000 known species worldwide, which are divided into six subfamilies, Euparagiinae, Masarinae, Eumeninae, Stenogastrinae, Polistinae, and Vespinae (Carpenter 1993), but their phylogenetics have not been settled. There have been ten mitogenomes sequences reported in the Vespidae (seven in the subfamily Vespinae, three in Polistinae, and one in Eumeninae) ( Table 1). Among these six subfamilies, there are more than 3600 species in the subfamily Eumeninae worldwide, more than half of the known species of Vespidae. The species in Eumeninae, also known as potter wasps, are solitary, and mostly catch caterpillars as food for their next generation in the environment of farmlands, forests, and orchards, which can directly control caterpillar pests. To date, there is only one species (Abispa ephippium) with its mitogenome published . Orancistrocerus aterrimus aterrimus, the species under study in this work, belongs to the Eumeninae, and is widely distributed in China (Jiangsu, Anhui, Fujian, Jiangxi, Hunan, Guangxi, Chongqing, Sichuan, Yunnan provinces), and Laos, Vietnam (Li 1985;Selis 2018).
In the present study, the complete mitogenome of O. a. aterrimus was sequenced using Illumina sequencing technique, and its characteristics analyzed, including gene rearrangements, nucleotide composition, codon usage, etc. More importantly, the phylogenetic relationships of 12 species of mitogenomes in Vespidae are constructed and discussed based on nucleotide sequences of 13 PCGs using both Maximum Likelihood (ML) and Bayesian Inference (BI) methods. The study updates phylogenetic research based on the mitogenomes, and provides basic information framework of mitogenomes in Vespidae for further research on the phylogenetic relationships of both genera and subfamilies in this family.

Sample collection and DNA preparation
The specimens of O. a. aterrimus were collected from Yangshuo county of Guangxi province, preserved in the 100% ethanol, and stored at -20 °C. Total DNA of a single adult specimen was extracted from the muscle tissues using the DNeasy DNA Extraction Kit (QIAGEN) in accordance with the manufacturer's instructions. The concentration of genomic DNA in extraction product was assayed on a Qubit fluorometer using a dsDNA High-sensitivity Kit (Invitrogen).

Mitogenomes sequencing and assembling
The Illumina TruSeq library was constructed from the gDNA with the average length of the inserted fragment of 480 bp. The library was sequenced on a full run of Illumina Hiseq 2500 with 500 cycles and paired-end sequencing (250 bp reads). High-quality reads were used in de novo assembly with IDBA-UD after removing adapters, unpaired, short and low quality reads (Peng et al. 2012). With IDBA-UD, these parameters have a similarity threshold of 98% and minimum and maximum k values of 80 and 240 bp, respectively. To identify the mitogenome assemblies from the pooled sequencing files, two different fragments of mtDNA (cox1 and rrnS) were amplified as bait sequences by standard PCR reactions using primers designed with reference of Simon et al. (2006). Using BLASTN search against the reference of bait sequences, matching rate of 100% was confirmed as the mitogenome of O. a. aterrimus. The identical or near-identical overlapping terminal regions of mitogenome sequences were examined and circularized by Geneious (http://www.geneious.com/).

Sequence annotations and analysis
PCGs and rRNA genes were aligned with other published Vespidae insect mitogenomes by Clustal X (Thompson et al. 1997). The majority of the tRNA gene locations and secondary structures were identified by tRNAscan-SE Search Server v.1.21 (Lowe and Eddy 1997), and the remaining tRNA were identified in comparison with other known species of tRNAs in Vespidae Song et al. 2016). The CR  KP670862 Yang et al. (2015) and the tandem repeat sequence were analyzed with Tandem Repeats Finder (http:// tandem.bu.edu/trf/trf.html) (Benson 1999). Base composition and codon usage in all 12 mitogenomes of Vespidae were calculated by MEGA v 6. 0 (Tamura et al. 2013).

Phylogenetic analysis
Eleven known mitogenome sequences in the family Vespidae and the mitogenome sequence of Formica selysi (KP670862) in the family Formicidae were downloaded from GeneBank, and that of O. a. aterrimus was produced in the present study ( Table 1). The phylogenetic tree of 12 mitogenomes sequences in the family Vespidae was constructed using ML and BI methods with MEGA 6.0 (Tamura et al. 2013) and MrBayes 3.1.1 (Huelsenbeck and Ronquist 2001), and the Formica selysi (KP670862) was used as outgroup. The nucleotide sequences of 13 PCGs were applied in the phylogenetic inference, and the best fitting substitution model was detected using Mrmodeltest 2.3 (Nylander 2004). The bootstrap values were calculated based on 1000 replications, and the confidence values of the topology is high.

Genomic organization
The complete mitogenome of O. a. aterrimus is a double-strand of circular molecular DNA and 17,972 bp. It contains 38 genes: 13 PCGs, 23 tRNAs, two rRNAs, a control region (CR), and a long non-coding region (NCR) (Figure 1), of which 24 genes are situated in the majority strand (J-strand) and the other 14 genes are located in the minority strand (N-strand) ( Table 2). An extra trnM2 and a long NCR were found in the mitogenome. The gene trnM2 is 67 bp and located in 2 142-2 208 between trnQ and nad2. The NCR is 1 946 bp long, located in 128-2 073 between trnM1 and trnQ.
With the exception of the NCR (1 946 bp), 14 intergenic spacers exist and sum to 174 bp, of which the longest spacer is 48 bp long, located between nad4l and trnT. In addition, a total of 24 bp overlaps was identified in 12 genes, with the overlap length of each gene ranging from 1 to 8 bp.

Gene rearrangements
The gene order of 13 PCGs and two rRNAs in O. a. aterrimus mitogenome is consistent with the putative hymenopteran ancestor: the sawfly Perga condei (Hymenoptera: Symphyta: Pergidae:) (Castro and Dowton 2005). However, there are two rearrange-ments of tRNAs in the mitogenome (Figure 2), namely, an extra trnM2 and trnL2 in the upstream of nad1, contributing to the novel gene order: trnL2 -nad1 -rrnL -trnV -rrnS -CR -trnI -trnM1 -trnQ -trnM2 -nad2 ( Figure 2). In the mitogenome of Abispa ephippium, another species in the subfamily Eumeninae, the gene order of rearrangements is trnL2 -trnM1 -trnQ -trnM2 -trnI, trnL1 -trnL1 -trnL1 -trnL1 and trnS2 -nad1 ( Figure 2) ). In the subfamily Polistinae, the translocation between nad1 and trnL1 is present in three reported species. In addition, the translocation of trnY in Parapolybia crocea occurs, trnQ, trnM and trnY genes are lost in Polistes humilis mitogenome, and in Polistes jokahamae mitogenome, not only trnD is in the upstream of trnK but also trnI, trnQ and trnY are missing ( Figure 2) Song et al. 2016;Peng et al. 2017). In the subfamily Vespinae, except for the incomplete mitogenomes of Vespula germanica and Vespa bicolor, there is the Abbreviations of the gene name are as follows: nad1-4 and nad4L act as nicotinamide adenine dinucleotide hydrogen dehydrogenase subunits 1-6 and 4L; cox1, cox2, and cox3 act as the cytochrome C oxidase subunits; cytb act as cytochrome b; atp8 and atp6 act as adenosine triphosphate synthase subunits 6 and 8; rrnL and rrnS act as large and small rRNA subunits; In addition, CR indicates control region and NCR indicates non-coding region. same rearrangements in other four reported species, such as the translocation of trnY, the translocation between trnQ and trnM genes, between trnS1 and trnE genes, and between nad1 and trnL2 (CUN) genes, respectively and Dolichovespula panda is different from other four species: the translocation between trnS1 and trnE genes in exchange for shuffling of trnN and trnE (Figure 2) Fan et al. 2017;Kim et al. 2017a;Kim et al. 2017b;Nizar et al. 2017). In general, the rearrangement frequency in Eumeninae is lower than those of both Vespinae and Polistinae. The rearrangement of tRNAs is a typical event in the mitogenomes of Hymenoptera (Dowton and Austin 1999;Dowton et al. 2009;Chen et al. 2016).

Nucleotide composition
To date, the nucleotide compositions of ten complete mitogenomes have been reported in the family Vespidae. In the subfamily Eumeninae, the overall A + T content of O. a. aterrimus and Abispa ephippium mitogenomes is 79.43% and 80.61%, respectively (  (Table 3), a universal feature is presumed that A + T content of tRNAs and rRNAs higher than that of PCGs.  Two other parameters, AT-skew and GC-skew, have been widely used to measure the nucleotide compositional behaviors of mitogenome in addition to the A + T content (Enrico et al. 2011). The AT skew of O. a. aterrimus mitogenome is -0.005 near to 0, and the GC skew (-0.216) is negative. The base composition bias plays an important role in researching the mechanism of replication and transcription of mitogenomes (Wei et al. 2010).
Among the PCGs of 12 Vespidae species (containing two incomplete mitogenomes), the A + T content of cox1 is the lowest in 13 PCGs, ranging from 70.18% (Vespa mandarinia) to 75.29% (P. humilis) (Figure 3). The A + T content of atp8, nad2, and nad4L is highest (Figure 3). This result ascertains cox1 is conserved relatively again, which is the reason for former abundant phylogenetic analysis in other insects (Rivera and Currie 2009;Santos et al. 2015). In addition, it is a common phenomenon that T content is more than A, and C content is slightly more than G (Figure 3).

Protein coding genes
In the 13 PCGs of the O. a. aterrimus mitogenome, nine PCGs are encoded in the J-strand, and the other four PCGs are located in the N-strand. The total length of PCGs is 11 122 bp. All PCGs use the conventional start codons ATN except for cox1 using TTG which was also employed as the initiation codon in other insects (Sheffield et al. 2008;Li et al. 2012a). The termination codons of nine PCGs in O. a. aterrimus mitogenome use complete TAA (nad2,cox1,atp8,atp6,nad3,nad5,nad4l,nad6 and nad1), and other four genes have incomplete stop codons T (cox2, cox3, nad4 and cytb). In general, the termination codons of insect mitogenomes PCGs were the TAA or incomplete T (Ojala et al. 1981;Li et al. 2012a).
There is a total of 3697 codons in O. a. aterrimus mitogenome, excluding termination codons, which is within the range of the common insect mitogenomes codon number (3585-3746) (Cha et al. 2007). According to the relative synonymous codon usage (RSCU), all of these 12 Vespidae species frequently used UUU, UUA, AUU and AUA (Figure 4), leading to the high A + T content in the PCGs of the family Vespidae mitogenomes. CUG is absent in O. a. aterrimus mitogenome and CGC and AGC are absent in A. ephippium. Some codons are also lacking in other species of Vespidae. For example, CGC and AGC in Vespa orientalis, CUG, GCG, CGC in V. bicolor and CCG, ACC, ACG, GCG, UGC, and CGC in Dolichovespula panda are absent, respectively. There are several codons missing in Polistes jokahamae, namely, CUG, GUC, ACG, GCG, CGC, CGG, AGC; and CUG, GUC, GCG, CGC, and GGC are also lacked in P. humilis (Figure 4). Thus, the amount of absent codons in Vespinae and Polistinae is more than in Eumeninae.

Transfer RNA and ribosomal RNA genes
There are 23 tRNAs found in O. a. aterrimus mitogenome and their lengths range from 60 bp (trnS1) to 72 bp (trnK) including an extra trnM2, whereas usually there  are 22 tRNAs in other insects (Boore 1999;Chen et al. 2015). Among 23 anticodons of these tRNAs, 21 are coincident with the majority of insects mitogenomes (Lee et al. 2008;Hua et al. 2016), but trnI and trnS1 change from CCT to GAT, and GCT to TCT, respectively. Except for trnS1, the other 22 tRNAs have the capability of folding into typical clover-leaf secondary structures. The secondary structure of trnS1 lacks the dihydrouridine DHU arm and reduces its shape to a simple loop (Figure 5), which is a common phenomenon in metazoan mitogenomes (Wolstenholme 1992;Li et al. 2012b). There are 20 mismatches in 13 tRNAs, including 18 unmatched GU base pairs, an unmatched AG, and an unmatched UU ( Figure 5).
The length of rrnL is 1 363 bp long, located between nad1 and trnV, and rrnS 788 bp long in minority strand between trnV and CR. The A + T content of two genes is 84.29% (rrnL and rrnS) ( Table 3).

A control region and a non-coding region
The CR plays an important role in regulating of replication and transcription of mitogenomes (Taanman 1999;Saito et al. 2005). The CR of O. a. aterrimus mitogenome is 1078 bp long, located between rrnS and trnI. The A + T content of this region (84.69%) is higher than other region of the O. a. aterrimus mitogenome. There is a tandem repeat model of 28 bp (TATTCCATTTAAGTTCGTAAAAACTAAT) which occurs more than eight times in the O. a. aterrimus mitogenome. Tandem repeat structures in the CR are different in different species (Peng et al. 2017). There is also a poly-T stretch of 13 bp, which may be as recognition site for the initiation of replication in the mitogenomes (Andrews et al. 1999). In the O. a. aterrimus mitogenome, a NCR is situated in position 128 -2 073 (1 946 bp) between trnM1 and trnQ, which is reported in most insect mitogenomes (Saito et al. 2005;Cameron et al. 2008;Jiang et al. 2016). The A + T content of NCR is 73.69%, among which there is 97 bp (close to trnQ gene) with obviously high A + T content 90.72%. In addition, two tandem repetitive sequences are found in the NCR, which repeated 17 and 18 times, respectively.

Conclusions
According to nine complete mitogenomes reported in the family Vespidae, gene numbers of two species (38 and 41 genes) of the subfamily Eumeninae are more than those of the other seven species (34 -37 genes) of both Polistinae and Vespinae. The rearrangements of tRNAs are common in Vespidae, but rearrangement rules are different in different subfamilies. The translocation between trnS1 and trnE only happens in the subfamily Vespinae, and there are the same rearrangements in these four complete mitogenomes of Vespa mandarinia, V. ducalis, V. orientalis, and V. velutina nigrithorax. The translocation of trnY occurs in both Vespinae and Polistinae, whereas trnY location in Eumeninae is consistent with that of the sawfly Perga condei. The number of absent codons in Eumeninae is less than Vespinae and Polistinae. The phylogenic results of mitogenomes show that O. a. aterrimus and Abispa ephippium belong to Eumeninae and (Polistinae + Vespinae) and Eumeninae constitute a sister group. Lastly, these results of this study might suggest that Eumeninae derived earlier than both Polistinae and Vespinae, which is consistent with reported research based on morphology.