The complete mitochondrial genome of Pentatoma rufipes (Hemiptera, Pentatomidae) and its phylogenetic implications

Abstract Pentatoma rufipes (Linnaeus, 1758) is an important agroforestry pest widely distributed in the Palaearctic region. In this study, we sequence and annotate the complete mitochondrial genome of P. rufipes and reconstruct the phylogenetic trees for Pentatomoidea using existing data for eight families published in the National Center for Biotechnology Information database. The mitogenome of P. rufipes is 15,887-bp-long, comprising 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a control region, with an A+T content of 77.7%. The genome structure, gene order, nucleotide composition, and codon usage of the mitogenome of P. rufipes were consistent with those of typical Hemiptera insects. Among the protein-coding genes of Pentatomoidea, the evolutionary rate of ATP8 was the fastest, and COX1 was found to be the most conservative gene in the superfamily. Substitution saturation assessment indicated that neither transition nor transversion substitutions were saturated in the analyzed datasets. Phylogenetic analysis using the Bayesian inference method showed that P. rufipes belonged to Pentatomidae. The node support values based on the dataset concatenated from protein-coding and RNA genes were the highest. Our results enrich the mitochondrial genome database of Pentatomoidea and provide a reference for further studies of phylogenetic systematics.


Introduction
The mitochondrion is a semi-autonomous organelle with its own genetic material, known as the mitochondrial genome (mitogenome) (Nass and Nass 1963). The mitogenome is widely used in the fields of molecular evolution, phylogenetic analysis, molecular ecology, biogeography, and population genetics because of its advantages of small size, stable genetic composition, and maternal inheritance (Ballard and Whitlock 2004;Simon and Hadrys 2013;Cameron 2014;Yuan and Guo 2016). Insects, as the most diverse, numerous, and widely distributed animals on Earth, are hotspots in mitogenome research (Boore 1999). To date, mitochondrial genome research has been very extensive, covering all orders of insects (Cameron 2014). Insect mitogenomes are covalently closed, double-stranded, circular DNA molecules (14-20 k bp long), and usually contain a control region and 37 genes: 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, and two ribosomal RNA (12S rRNA and 16S rRNA) genes (Boore 1999;Cameron and Whiting 2008;Cameron 2014). The structure of mitogenome in most known insects is stable, and the gene arrangement is relatively conservative, which are consistent with the genome composition and arrangement of the most typical insect mitochondrial genome, namely Drosophila yakuba Burla (Clary and Wolstenholme 1985).
Pentatomoidea, one of the most commonly encountered groups in Hemiptera, includes 1,410 genera and 8,042 species which are widely distributed worldwide (Rider et al. 2018). Pentatomoid insects have diverse feeding habits, although the majority are herbivorous. Some cause huge economic losses, such as Dolycoris baccarum (Linnaeus) and Halyomorpha halys Stål. In addition, some pentatomoid insects are predatory, including most of the species of Asopinae; and a few groups are suspected to be fungus feeders, such as members of the Canopidae and Megarididae (Rider et al. 2018;Zhao et al. 2018). Classification of the superfamily Pentatomoidea has long been contentious; and different scholars have distinct opinions. For example, Schaefer (1993) divided Pentatomoidea into 16 families, whereas Henry (1997) recognized 17 families, placing Eumenotidae and Thyreocoridae at the family level. Grazia et al. (2008) supported the monophyly of Pentatomoidea and most of the included families, which was based on morphological characters and molecular markers (16S rRNA, 18S rRNA, 28S rRNA and COI); Wu et al. (2016) reconstructed the phylogenetic relationships of 16 families within Pentatomoidea using 18S and 28S rDNAs sequences and showed that Cydnidae and Tessaratomidae might be polyphyletic; Lis et al. (2017) combined 28S+18S rDNA sequence, questioned the monophyleticity of the "cydnoid" complex of pentatomoid families (Cydnidae, Parastrachiidae, Thaumastellidae, and Thyreocoridae), and demonstrated the polyphylicity of Cydnidae. Recently, many taxonomists reorganized the families, genera, and species of Pentatomoidea, and divided Pentatomoidea into 18 families (Rider et al. 2018). With the development of next-generation sequencing (NGS), an increasing number of pentatomoid mitogenome sequences have been obtained, which provide the possibility of resolving the phylogenetic relationships among the superfamily at the genetic level (Yuan et al. 2015b;Bai et al. 2018;Zhao et al. 2018). Furthermore, Wu et al. (2017) confirmed the monophyly of Scutelleridae (based on 18S + 28S rDNAs + 13PCGs), and Liu et al. (2019) reconstructed the phylogeny of Pentatomomorpha and Pentatomoidea based on PCGRNA and PCG12R-NA. However, despite the abundance of species in the superfamily, only 97 species have complete or nearly complete mitogenomes published in the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/2020.07); these represent only eight families. Moreover, there has been no discussion about the phylogenetic position of Pentatoma species, except for the description of Pentatoma semiannulata (Motschulsky) mitogenome by Wang et al. (2021). Therefore, it is necessary to determine more mitogenome sequences of Pentatoma species to better understand its phylogenetic relationships.
Pentatoma rufipes (Linnaeus, 1758) (Hemiptera, Heteroptera, Pentatomidae) is a medium-sized to large, dark brown insect with reddish-orange spots and bright orange legs (Hsiao 1977;Bantock and Botting 2013). These insects are widely distributed in the Palearctic region (Ling and Zheng 1987;Fan and Liu 2012). They can damage oak, poplar, elm, hawthorn, apricot, pear, and other trees, and they constitute an important agricultural and forestry pest (Hsiao et al. 1977;Powell 2020). There are also records of P. rufipes preying on Zygaena filipendulae L.(Lepidoptera, Zygaenidae) (Hamilton and Heath 1976). Previous studies on P. rufipes mostly focused on its physiological and morphological characteristics (Ling and Zheng 1987;Neupert et al. 2009), with limited molecular data on the mitochondrial COI and COII genes (Bu et al. 2005;Liang 2009), along with some studies identifying biological characteristics and potential control strategies (Peusens and Beliën 2012;Powell 2020).
In this study, we sequenced and annotated the mitogenome of P. rufipes and analyzed its mitogenome in detail, including the genome structure, nucleotide composition, and codon usage, and constructed RNA secondary structures. In addition, we combined the complete mitogenome of P. rufipes with the existing data for the eight families of Pentatomoidea to explore the phylogenetic position of P. rufipes.

Sample collection
Adult Pentatoma rufipes specimens were collected in Baiji Hill (Tonghua City, Jilin Province, China; 41°58.14'N, 126°06.58'E) on 24 July 2015. All samples were immediately placed in absolute ethanol and stored in a freezer at -20 °C until DNA extraction. Specimen identification was performed by Qing Zhao. The voucher specimen is maintained at the Institute of Entomology of Shanxi Agricultural University (voucher number: SXAU 007; Taigu, China). The complete mitogenome of P. rufipes has been submitted to GenBank (accession number: MT861131).

DNA extraction and sequencing
Whole-genome DNA was extracted from the thoracic muscle of adult samples using the Genomic DNA Extraction Kit (Sangon Biotech, Shanghai, China). The mitogenomes were sequenced using the whole-genome shotgun method on the Illumina Miseq platform (Personalbio, Shanghai, China), with 400-bp inserts and paired-end model. A5-miseq v. 20150522 (Coil et al. 2015) andSPAdes v. 3.9 (Bankevich et al. 2012) were used to assemble the data.

Genome annotation and sequence analysis
After assembly, the complete mitogenome was manually annotated using Geneious v. 8.1.4 software (Kearse et al. 2012). A reference sequence of Eurydema gebleri Kolenati for annotation was obtained from the basic local alignment search tool (BLAST) in the NCBI database. The boundaries of the PCGs were determined using Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) on the NCBI website. MEGA v. 7.0 (Kumar et al. 2016) was used to translate the proteins to verify the start codons, stop codons, and amino acid sequences and to ensure the accuracy of the sequences. We annotated tRNA sequences using tRNAscan-SE (http://lowelab. ucsc.edu/tRNAscan-SE/) (Lowe and Eddy 1997) and MITOS (http://mitos.bioinf. uni-leipzig.de/index.py/) (Bernt et al. 2013) with the invertebrate mitochondrial code. The boundaries of rRNA genes were completed according to the positions of adjacent genes and published rRNA gene sequences from Pentatomidae insects in GenBank (Boore 2006). The codon usage, base composition, and amino acid composition of the mitogenome were analyzed using MEGA v. 7.0. The skew of the nucleotide composition was calculated with the formulas: AT-skew = (A -T) / (A + T) and GC-skew = (G -C) / (G + C) (Perna and Kocher 1995).

Phylogenetic analyses
In this study, we selected the mitogenomes of P. rufipes, representative species from eight other Pentatomoidea families, and two Coreoidea species (outgroup) to analyze the phylogenetic position of P. rufipes and the phylogenetic relationships within Pentatomoidea. All species included in this analysis are listed in Table 1. The nucleic acid sequences of the 13 PCGs were extracted using Geneious v. 8.1.4. All PCGs were translated into their amino acid sequences and aligned using MUSCLE with default parameters in MEGA v. 7.0 (Edgar 2004). The tRNA and rRNA genes were also aligned using the MUSCLE algorithm in MEGA v. 7.0. The resulting alignments were concatenated into a combined matrix.
To determine if the sequences contained phylogenetic information, we tested nucleotide substitution saturation, and plotted transition and transversion substitutions against the TN93 distance for all datasets before reconstructing the phylogenetic trees using DAMBE v. 4.5.32 (Xia and Xie 2001;Xia and Lemey 2009). The optimal sub-  (Ronquist et al. 2012) under the GTR+G+I substitution model with four independent Markov chains run for 10,000,000 generations and stopped when the average standard deviation value was below 0.01. The first 25% of trees were discarded as burn-ins, and the remaining trees were used to construct a 50% majority-rule consensus tree (Zhao et al. 2018). The phylogenetic trees were constructed using three types of datasets: (1) all codon positions of the 13 PCGs; (2) the 13 PCGs, excluding the third codon position (PCG12); and (3) the PCGs, 22 tRNA genes, and two rRNA genes (PCGRNA).

Genomic features
The mitochondrial genome of Pentatoma rufipes is 15,887-bp-long and contains a control region and 37 genes comprising 13 PCGs, 22 tRNA genes and two rRNA genes ( Fig. 1; Table 2). Among these genes, 14 genes are located on the minority strand (N-strand), including four PCGs (ND5, ND4, ND4L, and ND1), eight tRNA genes (trnQ, trnC, trnY, trnF, trnH, trnP, trnL1(CUN), and trnV), and two rRNA genes (12S rRNA and 16S rRNA genes), whereas the remaining 23 genes are encoded on the majority strand (J-strand). The mitogenome is compact, with a total of nine gene overlaps, ranging in length from 1 to 8 bp; the longest overlap is between trnW and trnC. Furthermore, there were 16 gene spacers from 1 bp to 23 bp, comprising 116 bp in total; the longest spacer region falls between trnS2 and ND1.

Nucleotide composition and codon usage
The base content and skewness of the genes in the P. rufipes mitogenome is shown in Table 3. The base composition of the entire sequence is in the order of A(42.0%)>T( 35.7%)>C(12.4%)>G(9.9%), with a bias toward A + T. This bias was observed in all genetic elements, with an A + T content of 77.1% in PCGs, 77.7% in tRNAs, 79.8% in rRNAs, and 78.7% in the control region. The complete genome also shows a clear AC-skew (AT-skew = 0.08, GC-skew = −0.11), indicating a greater abundance of A than T and of C than G. The preference for nucleotide composition is also reflected in codon use. The relative synonymous codon usage values for the P. rufipes mitogenome are summarized in Figure 2 and Table 4. Figure 3 shows the amino acid composition of the P. rufipes mitogenome. The most common amino acids are Phe, Leu, Ile, and Met, and their most abundant codons (UUU for Phe, UUA for Leu2, AUU for Ile, and AUA for Met) are all composed of A and/or T. For each amino acid, the most commonly used coded codons are NNA and NNU, reflecting the skew of the nucleotide composition toward AT. In addition, the most frequently used codons do not strictly correspond to the tRNA anticodons for most amino acids.

PCG regions
Most P. rufipes PCGs share the ATN start codon (five with ATG, three with ATT, two with ATA, and one with ATC), except for COX1 and ATP8, which start with TTG. COX1 and COX2 sequences terminate with a single T, and the stop codon for the remaining genes is TAA. The AT content (77.1%) of the 13 PCGs exceeded the GC content (22.9%), and the AT bias is moderately negative (absolute value: 0.1-0.2).  In addition, we calculated the synonymous substitutions (Ks), non-synonymous substitutions (Ka), and the Ka/Ks ratios of the 13 PCGs from Pentatomoid insects. We also compared the evolutionary rates of the 13 PCGs (Fig. 4). The evolutionary rate of ATP8 was the fastest, followed by that of ND6, and the COX1 gene was the most conservative with the slowest rate. The evolutionary rates of the other genes were in the order of ND2 > ND4 > ND5 > ND4L > ATP6 > ND3 > ND1 > COX2 > COX3 > CYTB. Moreover, the Ks values of the 13 PCGs were all greater than the Ka values, and the Ka/Ks ratio was <1, which indicates that the genes are subject to purifying selection.
tRNA genes, rRNA genes, and the control region We detected 22 tRNA genes, which can transport all 20 amino acids, in the mitogenome of P. rufipes. There are two tRNAs each for leucine and serine: trnL1 (CUN) and trnL2 (UUR), and trnS1 (AGN) and trnS2 (UCN), respectively. The anticodons of trnL are TAA and TAG, and the anticodons of trnS are ACT and TGA. The 22 tRNA genes span 1,481 bp, between 62 and 74 bp in length. Although trnS1 lacks a dihydrouridine arm, the other tRNA genes all have the classic clover leaf secondary structure. In addition to the typical base pairs (A-U and G-C), some wobble G-U pairs appear in these secondary structures, which can form stable chemical bonds between G and U; In addition, atypical pairing of U-U and U-C is also found (Fig. 5).
The two P. rufipes rRNA genes (12S rRNA and 16S rRNA) are encoded on the N-strand. The 16S rRNA gene is located between trnL1 (CUN) and trnV, which is  1,277 bp in length, and there is no gene overlap between 16S rRNA and the two tRNA genes. The 12S rRNA gene (821 bp) is located between trnV and the control region, similar to the published pentatomid mitogenomes. The base content of the rRNA genes is in the order of T (44.2%) > A (35.6%) > G (12.6%) > C (7.6%). The ATskews are negative, and the GC-skews are positive. The complete secondary structures of the 12S rRNA and 16S rRNA genes are shown in Figures 6, 7, respectively. The control region of the mitogenome of P. rufipes is located between the 12S rRNA gene and trnI. The control region is 1,180 bp long, making it the longest noncoding region in the mitogenome, and has an A + T content of 78.7%. The AT- skew and GC-skew in the control area are -0.03 and -0.28, respectively, indicating that the content of T is higher than that of A and the content of C is higher than that of G.

Saturation test
To eliminate the negative effect of the substitution saturation in the phylogenetic analysis, saturation tests on the three data sets were conducted. Nucleotide sequence substitution saturation is usually determined by analyzing the relationship between the transition and transversion values against the corresponding corrected genetic distance. In all tests, the Xia saturation index (Iss) was below the critical values for a symmetric (Iss.cSym) and asymmetric (Iss.cAsym) topology (Fig. 8). The values for base transition and transversion were linearly associated with the corrected genetic distance, indicating that the nucleotide sequences of these three datasets were not saturated, making them suitable for constructing phylogenetic trees.

Phylogenetic analyses
We reconstructed the phylogenetic trees of eight families in Pentatomoidea from three datasets (PCGRNA, PCG, and PCG12) using Bayesian inference method. The topological structures of the trees were similar, especially PCG and PCG12 showed similar family-level relationships (Figs 9-11). Among the three trees, the Bayesian posterior probability value of the phylogenetic tree based on the PCGRNA data-   set was the highest. Phylogenetic analysis based on PCGRNA data showed that P. rufipes and Dinorhynchus dybowskyi Jakovlev were closely related, these two species formed sister groups with E. gebleri, and P. rufipes and Graphosoma rubrolineatum (Westwood) had the farthest relationship. However, the results in the phylogenetic analysis based on PCG data were somewhat different from the above. In this analysis, P. rufipes and E. gebleri were the most closely related species, and they were sister groups with D. dybowskyi.

Discussion and conclusions
In this study, we sequenced the complete mitogenome of P. rufipes using NGS technology, revealing a mitogenome that is 15,887-bp-long containing 37 genes. The order of the 37 genes is consistent with other published mitogenome of Hemiptera (Hua et al. 2009;Zhang et al. 2018;). There are three obvious overlapping regions in mitochondrial genome of P. rufipes. The longest overlap located between trnW and trnC, which is 8 bp in length, and the overlap bases are AAGCTTTA. This overlap also showed in most species of Pentatomidae (Yuan et al. 2015a and. The other two pairs of genes, namely ATP8/ATP6 and ND4/ND4L, overlap by 7 bp, and both overlap bases are ATGATAA, which is consistent with other hemipteran insects Zhao et al. 2020). The longest spacer region falls between trnS2 and ND1, which is consistent with the findings of other studies (Hua et al. 2008;. The difference of mitogenome size between P. rufipes and other species of Hemiptera due to the length difference of the noncoding region.
The composition of the four bases in the P. rufipes mitogenome suggested highly unbalanced (A>T>C>G). The nucleotide composition shows an obvious AT preference, and the entire genome shows AT-skew and CG-skew. The above characteristics of mitogenome base composition of P. rufipes are ubiquitous to all sequenced species of Pentatomidae. The preference of bases composition is generally considered to be caused by asymmetric mutation and selection pressure of the four bases (Brown et al. 2005). Consistent with most species of Hemiptera, the PCGs of this species use the common triplet codon ATN as the start codon, TAA and a single T as the stop codon (Hua et al. 2008;. The secondary structures of tRNAs for P. rufipes is conserved and trnS1 lacks DHU arm, these features meet the character of metazoan mitochondrial genomes (Wolstenholme 1992). In addition to the typical Watson-Crick pairing (G-C and A-U), there are also some typical pairings such as U-G, U-C and U-U. Some scholars have proposed that those tRNAs with non-Watson-Crick matches can be transformed into fully functional proteins through post-transcriptional mechanisms (Chao et al. 2008;Pons et al. 2014). The rRNA secondary structure of this species is also conserved. The 12S rRNA sequence includes three domains and the 16S rRNA sequence includes six domains (domain III is absent), which is similar to pentatomoid insects.
The phylogenetic result suggested that there are some different topology compared to other studies, but we infer that the possible reasons are as follows: first, the number and taxon of samples selected are different. In this study, the phylogenetic relationship between Pentatoma and Plautia was relatively close, and they were far from Graphosoma. However, when the phylogenetic tree was constructed with Pentatoma semiannulata, the relationship between Pentatoma and Graphosoma was closer (Wang et al. 2021). Second, the selection of outgroup also affects the topological structure of phylogenetic tree. Comparing our results with Zhao et al. (2017) and Liu et al. (2019), because of the three studies choose different species as the outgroup, we got different phylogenies. Third, different molecular markers also might affect phylogenetic relationships. Grazia et al. (2008) supported the monophyly of Pentatomoidea and most of the included families based on morphological characters and molecular markers (16S rRNA, 18S rRNA, 28S rRNA, and COI). Lis et al. (2012) constructed similar phylogenetic trees to our study using 12S and 16S rDNA datasets. Tian et al. (2011) (based on Hox genes), Liu et al. (2019) (based on PCGRNA and PCG12RNA) and Li et al. (2005) (based on 18S rDNA and COX1 sequence) also put forward their own opinions on the phylogenetic relationship of the superfamily. Our three topologies revealed that the Bayesian posterior probability of the tree based on PCGRNA sequences was significantly higher than that of the trees based on the PCG data, indicating that inclusion of tRNA and rRNA genes improves the accuracy of the analysis, which is consistent with the findings of the study conducted by Cameron et al. (2007Cameron et al. ( , 2009.
In summary, the mitogenome of P. rufipes has typical sequence structures, and the gene content, nucleotide composition, codon usage, RNA structures, and rates of PCGs evolution are similar to those of other published pentatomid genomes. The mitochondrial genome of P. rufipes reveals the phylogenetic location of Pentatoma, indicating that the mitogenome can be used to reveal phylogenetic relationships among different taxonomic levels of insects. However, more insect mitogenomes should be sequenced, which would provide more insight into the phylogenetic relationships of species from different taxa.