﻿Characterization of the complete mitochondrial genome of the longhorn beetle, Batocerahorsfieldi (Coleoptera, Cerambycidae) and its phylogenetic analysis with suitable longhorn beetles

﻿Abstract Mitochondrial genome analysis is an important tool for studying insect phylogenetics. The longhorn beetle, Batocerahorsfieldi, is a significant pest in timber, economic and protection forests. This study determined the mitochondrial genome of B.horsfieldi and compared it with the mitochondrial genomes of other Cerambycidae with the aim of exploring the phylogenetic status of the pest and the evolutionary relationships among some Cerambycidae subgroups. The complete mitochondrial genome of B.horsfieldi was sequenced by the Illumina HiSeq platform. The mitochondrial genome was aligned and compared with the existing mitochondrial genomes of Batoceralineolata and B.rubus in GenBank (MF521888, MW629558, OM161963, respectively). The secondary structure of transfer RNA (tRNA) was predicted using tRNAScan-SE server v.1.21 and MITOS WebSever. Thirteen protein-coding genes (PCGs) and two ribosomal RNA gene sequences of 21 longhorn beetles, including B.horsfieldi, plus two outgroups, Dryopsernesti (Dryopidae) and Heterocerusparallelus (Heteroceridae), were analyzed. The phylogenetic tree was constructed using maximum likelihood and Bayesian inference methods. In this study, we successfully obtained the complete mitochondrial genome of B.horsfieldi for the first time, which is 15 425 bp in length. It contains 37 genes and an A + T-rich region, arranged in the same order as the recognized ancestor of longhorn beetles. The genome of B.horsfieldi is composed of 33.12% A bases, 41.64% T bases, 12.08% C bases, and 13.16% G bases. The structure, nucleotide composition, and codon usage of the new mitochondrial genome are not significantly different from other longhorn mitochondrial genomes. Phylogenetic analyses revealed that Cerambycidae formed a highly supported single clade, and Vesperidae was either clustered with Cerambycidae or formed a separate clade. Interestingly, B.horsfieldi, B.rubus and B.lineolata were clustered with Monochamus and Anoplophora species in both analyses, with high node support. Additionally, the VesperidaeSpiniphilusspinicornis and Vesperussanzi and the 19 Cerambycidae species formed a sister clade in the Bayesian analysis. Our results have produced new complete mitogenomic data, which will provide information for future phylogenetic and taxonomic research, and provide a foundation for future relevant research.


Introduction
The longhorn beetle family Cerambycidae is one of the larger families in Coleoptera, with about 40 000 recorded species worldwide. As of 2005, the recorded number of longhorn beetle species in China had surpassed 3100 (Zhang 2011). Many of the Cerambycidae are notorious pests, as they damage crops, forests, orchards and garden plants. Batocera horsfieldi (Hope, 1839) is a significant pest in timber, economic and protection forests. This species is mainly distributed in Asian countries such as China, Vietnam, India, Myanmar, etc. (Xiao 1992). It has a wide range of host plants, including Populus tomentosa Carr., Juglans regia L., Eucalyptus robusta Sm. (Zhuo et al. 2022). Adults of B. horsfieldi fed on the tender bark and leaves of supplemental host plants, such as Rosa cymosa Tratt., Rosa multiflora Thunb., Fraxinus chinensis Roxb. (Xiao 1992), while larvae bore into the trunk and feed on the woody tissue of the host plants. Larvae of B. horsfieldi burrow inside tree trunks, causing the host's xylem to decay and become hollow. The entry holes are blocked by sawdust and excrements, making it difficult to control with pesticide. Due to their biological habits, larvae are prone to causing host tree weakness, fungal infections, wind breaks and even the death of the entire host tree (Zhuo et al. 2022).
The mitochondrial genome is widely used in population genetics, phylogenetics and molecular evolution studies due to its maternal inheritance, stable genome composition, relatively conservative gene arrangement, and rare occurrence of recombination (Wolstenholme 1992;Nelson et al. 2012;Driscoll et al. 2015;Feng et al. 2015). With the development of next-generation sequencing technology in the mitochondrial genome, it is possible to obtain more molecular markers for phylogenetic analysis, resulting in more robust relationships (Li et al. 2015;Fagua et al. 2018). Tracking the expansion pathways and dynamic hybridization of arthropods is often accomplished through mitochondrial sequence analysis (Rubinoff et al. 2010;Moore et al. 2013;Wang et al. 2017). The mitochondrial genome is widely used for phylogenetic inference in animals from high to low taxonomic levels (Li et al. 2016;Zheng et al. 2018), including tribe-level (Nie et al. 2018), subfamily-level (Nie et al. 2020) and family-level (Nie et al. 2021) relationships of beetles. To date, only seven species of Batocera have had their mitochondrial genomes sequenced. Therefore, this study reports the first complete mitochondrial genome sequence of B. horsfieldi using second-generation sequencing technology and compares it with the 20 existing mitochondrial genome sequences of other longhorn beetles in GenBank. In addition, we inferred the phylogenetic relationships of 21 longhorn beetle species to better study the phylogeny of Cerambycidae. The establishment of this genome resource will provide information for future phylogenetic and taxonomic research, providing a foundation for future relevant research.

Sample collection and DNA extraction
The specimens of B. horsfieldi, three male adults and one female adult, were collected from the host plant Populus tomentosa in Xishan Forest Park, Nanchong City, Sichuan Province, China, on June 17, 2022 (30.803038°N, 106.063072°E, alt. 480 m), and were deposited in the Laboratory of Forest Conservation, College of Life Science, China West Normal University (Voucher No. SCNC-BH-20220617.1-3 for male adult, SCNC-BH-20220617.4 for female adult). These specimens were preserved in 95% ethanol at -24 °C for long-term storage in the specimen collection room at China West Normal University. The total genomic DNA was extracted from muscle tissue of individual specimens using the Ezup Column Animal Genomic DNA Purification Kit (Shanghai, China), following the manufacturer's instructions. For sequencing, the extracted DNA was stored at -24 °C.

DNA sequencing, mitogenome assembly and annotation
Next-generation sequencing and assembly were performed by Beijing Aoweisen Gene Technology Co. Ltd (Beijing, China) to obtain the complete mitogenome of B. horsfieldi for the first time. After quantification of the total genome DNA, a whole-genome shotgun strategy was employed for sequencing on the Illumina HiSeq platform. A total of 14 348 102 paired-end reads with a read length of 150 bp were obtained from the sequencing process. Among these, 22 966 sequences were used for mitochondrial genome assembly. Based on the invertebrate genetic code, such as Batocera lineolata Chevrolat, 1852 (MF521888), B. lineolata (MW629558) and B. rubus Linnaeus, 1758 (OM161963), the assembly of B. horsfieldi was carried out by referring to Hahn et al. (2013), and the gene annotation was performed using Geneious v.11.0.2 by referring to Greiner et al. (2019). The tRNA genes were re-verified by tRNAScan-SE (Lowe and Eddy 1997) and MITOS WebSever (Bernt et al. 2013). The base composition and relative synonymous codon usage (RSCU) values were calculated using Shengxin Cloud Online Server. Strand asymmetry was calculated in terms of formulae: ATskew = (A -T)/(A + T) and GC-skew = (G -C)/(G + C) (Perna and Kocher 1995).

Phylogenetic analyses
A total of 21 complete mitochondrial genomes of longhorn beetles, including one newly sequenced species (B. horsfieldi), as well as the complete mitochondrial genomes of Dryops ernesti DesGozis, 1886 and Heterocerus parallelus Gebler, 1830, were used in this study (Table 1). These species belong to 13 longhorn genera, with Heterocerus parallelus (Heteroceridae) and Dryops ernesti (Dryopidae) as outgroups. The utilization of tRNA genes in phylogenetic analysis is relatively infrequent due to their short length and highly conserved characteristics (Chen 2005), thus we constructed the phylogenetic tree based on the 13 PCGs and rRNAs by using PhyloSuite v.1.4.4, and analyzed the tree by maximum likelihood (ML) and Bayesian inference (BI) methods with different best-fit substitution models. The sequences were aligned using MAFFT(Multiple Alignment using Fast Fourier Transform)and trimmed by trimAl. The bestfit substitution models used in the ML and BI analyses were calculated using ModelFinder v.2.3. Maximum likelihood analyses were run with 1000 ultrafast bootstrap and 1000 SH-aLRT replicates to estimate node reliability. Bayesian analyses were run with two independent chains spanning 1 million generations, four Markov chains, sampling at every 100 generations, and with a burn-in of 25%. The phylogenetic tree was visualized and edited using the iTOL online server (https://itol.embl.de/).

Mitogenome organization and composition
The complete mitochondrial genome of B. horsfieldi is 15 425 bp in length and is a closed circular molecule (Fig. 1). It contains 37 genes, consisting of 22 transfer RNAs, two ribosomal RNAs, 13 protein-coding genes (PCGs), and an A+T-rich region, which is a feature shared with other typical Cerambycidae mitogenomes. Of the 13 PCGs, 9 (ND2, ND3, ND6, COI, COII, COIII, ATP8, ATP6, and CYTB) are encoded on the J strand, along with 14 tRNAs (Ile, Met, Trp, Leu2, Lys, Asp, Gly, Ala, Arg, Asn, Ser1, Glu, Thr, and Ser2), and a control region. The remaining four PCGs (ND5, ND4, ND4L, and ND1), eight tRNAs (Gln, Cys, Tyr, Phe, His, Pro, Leu1, and Val), and two rRNAs (12S rRNA and 16S rRNA) are encoded on the N strand (Fig. 1). The gene order of the new mitogenome follows the ancestral arrangement proposed for all arthropods (Braband et al. 2010). The overall mitogenome is composed of 33.12% A, 41.64% T, 12.08% C and 13.16% G, with a highly biased A+T content of 74.76% (Table 2). The calculation of asymmetrically distributed nucleotide content was determined by the formulas AT-skew = (A -T)/(A + T) and GC-skew = (G -C)/(G + C). The AT-skew and GC-skew of the B. horsfieldi mitogenome are -0.11 and 0.04, respectively (Table 2), indicating a negative AT-skew and a positive GC-skew. Overall, the nucleotide composition and genomic structure of B. horsfieldi exhibit typical features of the mitogenomes of insects in the order Coleoptera.

PCGs and codon usage
The total length of the 13 PCGs in the mitochondrial genome of B. horsfieldi is 11 150 bp, accounting for 72.29% of the entire genome. Among the 13 PCGs, four genes (ND4, ND4L, ND5, and ND1) are encoded on the N strand, while the other nine genes (COI, COII, COIII, ATP8, ATP6, NAD2, NAD3, NAD6, and CYTB) are encoded on the J strand. Among the 13 PCGs, the longest gene is COX1, but its A+T content is the lowest at 66.30%, while the shortest gene is ATP8, but its A+T content is the highest at 87.18%. The start codons for COI, COII, ATP8, ND5, and ND6 are ATT, while the start codon for ND1 is TTG. The start codons for COIII, ATP6, ND3, ND4, ND4L, and CYTB are ATN (N represents G or C) ( Table 3). Seven of the 13 PCGs share the typical termination codons TAA and TAG, while CYTB uses CTC as the termination codon, and the remaining four PCGs use a single T residue as the termination codon (Table 3). It is widely believed that the incomplete codon structure signifies the stop of protein translation in insects and other invertebrates (Cheng et al. 2016). Besides, the calculation of relative synonymous codon usage (RSCU) values and the analysis of base composition bias are important references for studying the replication and transcription mechanisms of mitochondrial genomes (Wei et al. 2010). We summarized the usage of relative synonymous codons. The results showed that AGA, UUA, UAA, and UCU are the four most commonly used codons (Fig. 2).

tRNA genes, rRNA genes and A+T-rich region
The total length of the 22 tRNAs in B. horsfieldi is 1450 bp, ranging from 61 to 70 bp (Table 3). The nucleotide composition of the tRNAs is 38.55% A, 37.79% T, 10.07% C, and 13.59% G. The AT-skew and GC-skew are 0.01 and 0.15, respectively ( Table 2). Eight of the 22 tRNA genes are located on the N strand, while the other 14 are located on the J strand. The tRNA secondary structure of B. horsfieldi was forecasted by the tRNAScan-SE server v.1.21.
The outcomes revealed that all tRNA genes conform to the standard cloverleaf structure, with the exception of tRNA-ser1, which is lacking a characteristic feature -the dihydrouridine arm (Fig. 3). The result is aligned with previous studies, such as those on stingray [Acroteriobatus annulatus, Acroteriobatus blochii (Van Staden et al. 2022)] and beetles [Coomaniella copipes, Coomaniella dentata and Dicerca corrugata (Huang et al. 2022)]. Moreover, the lengths of the 16S rRNA and 12S rRNA genes are 1283 bp and 751 bp, respectively ( Table 2). The 16S rRNA is located between tRNA-Leu and tRNA-Val and is 1283 bp, constituting 34.53% A, 39.91% T, 7.17% C, and 18.39% G. The 12S rRNA is located between tRNA-Val and the A+T-rich region and is 751 bp in length. The nucleotide composition of the 12S rRNA is 33.95% A, 43.14% T, 7.46% C, and 15.45% G. The AT-skew and GC-skew of the RRNs are -0.09 and 0.41, respectively (Table 2). Compared with other longhorn beetles, the positions and characteristics of the 16S rRNA and 12S rRNA genes are similar. The A+T-rich region of B. horsfieldi is located between trnI and rns genes and is 791 bp in length, with an A+T content of 87.1%, an AT-skew of 0.01, and a GC-skew of -0.06 under normal conditions.

Phylogenetic analysis
Thirteen protein-coding genes (PCGs) and two ribosomal RNA (rRNA) sequences were utilized to construct the phylogenetic trees of B. horsfieldi via maximum likelihood (ML) and Bayesian inference (BI) methods (Figs 4, 5). Based on the results obtained from BI, the monophyly of Cerambycidae was once again confirmed, as all the species belonging to this family formed a highly supported single clade. However, the results obtained from ML showed that Ceramby-

Discussion
The complete mitochondrial genome sequence of B. horsfieldi obtained in this work enriches the species data of Batocera, Cerambycidae. Comparative analysis and phylogenetic analysis of the sequence features of Cerambycidae mitochondrial genomes were carried out by combining 13 protein-coding genes (PCGs) and two ribosomal RNA (rRNA) data from 21 species of longhorn beetles publicly available on NCBI. The results of this study will lay the foundation for population genetics research on B. horsfieldi and phylogenetic reconstruction of the Cerambycidae. The secondary structures of tRNAs in Cerambycidae mitochondrial genomes have a typical cloverleaf structure, except for the absence of the D-arm in tRNA-ser1 (AGN) . The absence of the D-arm in trnS1 (AGN) is commonly found in other animal and insect orders, such as Lepidoptera, Diptera and Hemiptera (Wolstenholme 1992;Liao et al. 2010). Based on the start and stop codons of the protein-coding regions, most of the mitochondrial protein-coding genes in Cerambycidae begin with the typical ATN start codon, with only a few species of ND1 having a special start codon TTG, and the stop codons are generally the common TAN and incomplete T codons . Incomplete stop codons (T or TA) are common in animal mitochondrial genomes (Boore 1999) and Coates et al. (2005) suggest that they can be completed to TAA by the A at the 3' end of the transcript. There has been ongoing controversy regarding the start codon for COI, and the use of different start or stop codons can result in variations in the number of base pairs between genes (Nie and Yang 2014). According to Sheffield et al. (2008), the start codon for COI in the suborder Polyphaga is either AAT or AAC; however, a reannotation was conducted in this study, which revealed that ATT is the start codon for COI in B. horsfieldi. Interestingly, in a recent study (Mei et al. 2022), the start codon for COI was reported to be ATG, which is similar to our findings. The A+Trich region is the most important non-coding region in mitochondrial genomes, with extremely high A+T content. Length variation in this region can be very large, even among closely related species (Zhang et al. 1995;Zhang and Hewitt 1997). Batocera horsfieldi has an A+T-rich region of 791 bp, which results in a significantly longer mitochondrial genome compared to other genomes.
Currently, traditional taxonomic studies of longhorn beetles are mainly based on morphological features; however, there are still many controversies due to the complexity and instability of the morphological features. In this study, a phylogenetic tree was constructed using BI and ML methods based on 13 PCGs and 2 rRNAs, exhibiting differences with previous related studies. For instance, our results suggest that A. glabripennis and A. chinensis belong to the same branch, whereas in the study by Li et al. (2011), they were not grouped together. Furthermore, our findings indicate a closer relationship between M. sartor urussovii and M. alternatus alternatus, which contrast with the results of the study conducted by Yang et al. (2017). This work aimed to analyze the phylogenetic relationships of 21 longhorn beetle species and two outgroups (H. parallelus and D. ernesti) using mitochondrial genome data. Although the ML and BI trees constructed from 13 PCGs and two rRNA sequences did not exhibit identical topologies, their expression results were quite similar. Notably, the node support values of BI trees were consistently higher than those of ML trees for the same dataset, which has been observed in many previous studies of other taxa (Yuan et al. 2016;Chen et al. 2018;Li et al. 2019). Our results indicate that mitochondrial genome sequences are useful tools for inferring relationships among Cerambycidae. Furthermore, the technological development of sequencing and assembly of mitochondrial genomes will facilitate future work in mitochondrial genome sequencing (Crampton-Platt et al. 2015;Bourguignon et al. 2017). Increasing the sampling of taxa and genome sequencing will further help resolve the classification issues of longhorn beetles. Although this study has contributed one new complete mitochondrial genome to the phylogeny of Cerambycidae, the interrelationships among subfamilies and tribes still require more data to be completely determined. These questions will be well addressed in the future when sufficient numbers of complete mitochondrial genomes of longhorn beetles are accumulated. Although our study data cannot fully resolve the phylogenetic relationships within Cerambycidae, it provides a reference for further research on the phylogeny of Cerambycidae.

Conclusions
In this study, we successfully obtained the complete mitochondrial genome of Batocera horsfieldi for the first time, which is 15 425 bp in length. The mitochondrial genome of B. horsfieldi exhibits a molecular pattern similar to that of other Cerambycidae species, comprising 37 gene segments, including 22 transfer RNAs, two ribosomal RNAs, 13 protein-coding genes (PCGs), and an A + T-rich region typical of other Cerambycidae mitogenomes. Four of the 13 PCGs are encoded on the N strand, while the remaining nine genes are encoded on the J strand. The start codons of COI, COII, ATP8, ND5, and ND6 are ATT, while that of ND1 is TTG, and the start codons of COIII, ATP6, ND3, ND4, ND4L, and CYTB are ATN (N represents G or C). Seven of the 13 PCGs share typical stop codons (TAA and TAG), while CYTB uses CTC as a stop codon and the remaining four PCGs use a single T residue as a stop codon. All tR-NAs exhibit typical cloverleaf structures except for tRNASer1. Phylogenetic analyses revealed that Cerambycidae formed a highly supported single clade, and Vesperidae was either clustered with Cerambycidae or formed a separate clade. Interestingly, B. horsfieldi, B. rubus and B. lineolata were clustered with Monochamus and Anoplophora species in both analyses, with high node support values. Additionally, Spiniphilus spinicornis and the 20 longhorn beetles formed a sister clade in the Bayesian analysis.