The complete mitochondrial DNA sequence of the pantropical earthworm Pontoscolex corethrurus (Rhinodrilidae, Clitellata): Mitogenome characterization and phylogenetic positioning

Abstract Pontoscolex corethrurus (Müller, 1857) plays an important role in tropical soil ecosystems and has been widely used as an animal model for a large variety of ecological studies in particular due to its common presence and generally high abundance in human-disturbed tropical soils. In this study we describe the complete mitochondrial genome of the peregrine earthworm P. corethrurus. This is the first record of a mitochondrial genome within the Rhinodrilidae family. Its mitochondrial genome is 14 835 bp in length containing 37 genes (13 protein-coding genes (PCG) 2 rRNA genes and 22 tRNA genes). It has the same gene content and structure as in other sequenced earthworms but unusual among invertebrates it hasseveral overlapping open reading frames. All genes are encoded on the same strand. Most of the PCGs use ATG as the start codon except for ND3 which uses GTG as the start codon. The A+T content of the mitochondrial genome is 59.9% (31.8% A 28.1% T 14.6% G and 25.6% for C). The annotated genome sequence has been deposited in GenBank under the accession number KT988053.


Introduction
Excluding a few aquatic taxa, earthworms (Annelida: Clitellata) are mostly terrestrial and include ca. 5,500 species (Blakemore et al. 2006). Believed to have originated in the Guyana Shield (Righi 1984), the earthworm Pontoscolex corethrurus (Müller, 1857) is a globally distributed species found in most tropical regions. It mainly occurs in humandisturbed areas and can be used as an indicator of ecosystem disturbance (Brown et al. 2006), and is commonly used in ecotoxicological studies (e.g. Buch et al. 2011;Buch et al. 2013;Da Silva et al. 2016). The species formerly belonged in the Glossoscolecidae family, but was recently allocated to the Rhinodrilidae family by James (2012), following the phylogeny of James and Davidson (2012). It is also the most well-known earthworm species in the humid tropics, frequently used in ecological and agronomic studies (Bhattacharjee and Chaudhuri 2002;Buch et al. 2013;Chapuis-Lardy et al. 2010;Dupont et al. 2012;Hamoui 1991;Marichal et al. 2010). Being a geophagous endogeic species, P. corethrurus shows high plasticity regarding its tolerance to soil physicochemical characteristics, including variable moisture, high temperatures, exceptionally high carbon dioxide and low oxygen levels, and is capable of inhabiting nutrient-poor soils (Cunha et al. 2014;Hamoui 1991;Lavelle et al. 1987), as well as rotten logs (Buch et al. 2011).
Molecular data have become increasingly important in recent years. In animals, the mitochondrial DNA (mtDNA) typically contains 37 genes, encoding 13 proteins for the enzymes required for oxidative phosphorylation, the two ribosomal RNA units (rRNA), and 22 transfer RNAs (tRNAs) necessary for the translation of the proteins encoded by mtDNA (Anderson et al. 1981;Boore 1999;Zhao et al. 2015). Remarkable progress has been made over the past several years in the field of the molecular systematics of annelids. Compared with individual genes, the mitochondrial genome is still a promising tool for inferring phylogenetic relationships due to its high content of information, and has been applied in some phylogenetic studies involving earthworms (Zhang et al. 2015;Zhang et al. 2016b).
In this study, we sequenced the complete mtDNA sequence of P. corethrurus for the first time and analyzed its structure. Additionally, we conducted phylogenetic analyses based on the mitochondrial sequence data available elsewhere with the purpose of investigating the phylogenetic position of P. corethrurus within Clitellata. The information reported in this article will facilitate further investigations of phylogenetic relationships of different Annelida species.

Sample collection and DNA extraction
A group of clitellate (adult) P. corethrurus was collected in São Miguel Island (Azores, Portugal) inside pineapple greenhouses (Locality: Fajã de Baixo, 37°45'12.2"N, 25°38'21.3"W) during January 2015. Animals were euthanized in 10% ethanol and preserved in 96% ethanol for later work. A piece of body wall tissue was used for genomic DNA extraction using standard phenol/chloroform (Sambrook and Russell 2001) procedure followed by ethanol precipitation and kept at 4°C for subsequent use.

Mitochondrial DNA amplification
The complete P. corethrurus mitogenome was amplified using seven sets of primers (Table 1) designed based on sequences retrieved from a previous study (Cunha et al. 2014).
Long PCR targets were amplified using different combinations of the primer sets, and initially sequenced with the same forward or reverse primers. Subsequent primer walking method was used to close the sequencing gaps. To ensure the accuracy of the sequence, every two contiguous segments overlapped by at least 80 bp. PCRs were performed using ~40 ng of DNA and 0.4 μM forward and reverse primers, 0.2 mM dNTP mix and 1.25 U Platinum HiFi DNA polymerase buffered with 1X Mg-free buffer (Thermofisher Scientific, UK). PCR amplification buffer was supplemented with MgCl 2 to achieve a final concentration of 1.75 mM in a total volume of 25 μl reaction mixture. The reaction was denatured at 95°C for 3 min, cycled 35 times at 95°C for 30 s, 30 s at the required primer annealing temperature and 72°C for 1 min per 1000 bp depending on target fragment length. Negative controls were included in all PCR amplifications to confirm the absence of contaminants. Before sequencing, PCR cleanups were performed using Exo-SAP-IT (Amersham Pharmacia, UK) reagents. Exonuclease 1 (0.25 μl) and Shrimp Alkaline Phosphatase (0.5 μl) were mixed with the PCR product (10 μl) and incubated at 37°C for 45 min followed by 80°C for 15 min. DNA was sequenced using ABI PRISM ® BigDye v3.1 Terminator Sequencing technology (Applied Biosystems, USA) on an ABI PRISM ® 3100 DNA automated Sequencer.

Sequence editing and analysis
Sequence trace files were corrected and aligned with the MEGA v.7 (Kumar et al. 2016). The sequence overlap and mitogenome assembly were performed using CLC main workbench v.6 (Qiagen). The annotation of the 13 protein-coding genes and two rRNA genes were determined using the MITOS v.2 web server (Bernt et al. 2013) and manually curated using other published annelid mitogenomes as shown in Table 2, whereas the tRNA genes were identified using the program tRNAscan-SE 1.21 (Lowe and Eddy 1997). The annotated genome sequence was deposited in GenBank under accession number KT988053.

Phylogenetic analyses
To clarify the phylogenetic position of P. corethrurus within the Clitellata, the complete mitogenome sequences of eight representative Clitellata species (Table 2) were incorporated together with the presently obtained P. corethrurus mitogenome sequence for phylogenetic analysis. Phylogenetic analyses were based on 13 protein-coding genes and the two rRNA units, which were aligned separately using MEGA v.7 (Kumar et al. 2016) with minor manual adjustments and then concatenated. The possible bias of substitution saturation at each codon position of protein-coding genes and two rRNA genes was investigated using DAMBE v.4.5.57 (Xia and Xie 2001) and MEGA v. 7 (Kumar et al. 2016). Two different methods, Bayesian inference (BI) and maximum likelihood (ML) were used to construct the phylogenetic tree. Bayesian analyses were undertaken with MrBayes v.3.1.2 (Ronquist and Huelsenbeck 2003) under the best-fit model of nucleotide evolution selected in MrModeltest v.2.3 (Nylander 2004), using the Assignment Index Criterion (AIC). Analyses were run for 1,000,000 generations, and sampled every 100 generations to assess convergence. Trees that produced non-stationary log-likelihood values were discarded as part of a burn-in procedure and combined the remaining trees that resulted in convergent log-likelihood scores from both independent searches. These trees were used to construct a consensus tree.
Maximum likelihood analysis (ML) was performed with MEGA v.7 (Kumar et al. 2016). Initial tree(s) for the heuristic search were obtained automatically by applying Neighbour-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, alpha parameter = 0.9143)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 17.7596% sites). The analysis involved nine mitogenome sequences ( Table 2). All positions containing gaps and missing data were eliminated. The bootstrap consensus tree inferred from 1000 replicates was taken to represent the evolutionary history of the taxa analysed (Felsenstein 1985).

Mitochondrial genomic structure
The mitochondrial genome of P. corethrurus was determined to be 14 835 bp in length, comprising 13 protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), two ribosomal RNAs (rRNAs), and one putative control region with a length of 318 bp (Figure 1).
The mitochondrial genome structure is detailed in Table 3. Gene order and orientation are similar to the previous earthworm mitochondrial genomes (Boore and Brown 1995;Zhang et al. 2015) but slightly smaller and more condensed (with several intergenic overlaps, see Table 3). The gene organization is similar to other earthworm species (e.g. Lumbricus terrestris: Boore and Brown 1995).
The nucleotide composition is asymmetric (31.9% A, 27.9% T, 14.9% G, and 25.3% for C) with an overall A+T content of 59.9%. One remarkable trait of metazoan mitogenomes is the strand-specific bias in nucleotide composition (Hassanin et al. 2005;Reyes et al. 1998). Such bias is measured as G/C-skew (G%-C%)/(G%+C%) and A/T-skew (A%-T%)/(A%+T%), respectively (Perna and Kocher 1995). The overall GC-and AT-skews of the H-strand of P. corethrurus mitogenome were -0.258 and 0.066, respectively, indicating a compositional bias associated with an excess of C over G nucleotides and a slight excess of A over T nucleotides on the H-strand.  (Müller, 1857). Gene order and positions are shown, including the putative control region. IUPAC single letter codes are used to identify transfer RNA. The L1, L2, S1, and S2 transfer RNAs are differentiated on the basis of their anti-codons TAG, TAA, TCT, and TGA, respectively.

Protein-coding genes
The P. corethrurus genome contained the expected 13 protein-coding genes with a total of 11,131 bp in size, accounting for 75.03% of the whole mitogenome. Most of the PCGs are initiated with ATG codons, except for ND3 gene, which uses GTG as the initiation codon. Six PCGs (COX1, ATP8, COX3, ND6, ND3, and ND2) are terminated with an incomplete codon T or TA, which could be completed to TAA by polyadenylation posttranscriptionally (Ojala et al. 1981). COX2 and ND1 use TAG as a termination codon.
Nucleotide composition and codon usage frequencies were calculated from a concatenated sequence of all protein-coding genes on the H-strand. The base composition of protein-coding genes revealed a negative bias for A (14.4%), especially at second codon positions (12.9%, Table 4). For all protein genes, T was the most frequent nucleotide at the first and third positions whereas G was most frequent at the second position.

Ribosomal and transfer RNA genes
Like other mitochondrial genomes (Inoue et al. 2000;Zardoya et al. 1995), twentytwo tRNA genes were identified (Supplementary figure 1). The tRNA genes were scattered throughout the mitochondrial genome and ranged in size from 60 to 67 bp (Table 3). The P. corethrurus mitogenome also contained a small subunit of rRNA and a large subunit of rRNA, which were 789 bp and 1212 bp in length, respectively. As in other Clitellata genomes, these genes were located between the tRNA Met and tRNA Val genes and between tRNA Val and tRNA Leu genes, respectively (Zhang et al. 2016b).

Non-coding regions
As shown in Table 3, there are 22 intergenic spacer regions, ranging in size from -7 to 4 bp observed in P. corethrurus. As in most Clitellata, the major non-coding region in P. corethrurus mitochondrial genome was located between tRNA-Arg and tRNA-His . It was determined to be 318 bp in length, less than other reported Clitellata species (Zhang et al. 2016a), and it had a base composition that was rich in A and T (A+T=67.6%).

Phylogenetic analyses within the Clitellata
The phylogenetic trees (the 50% majority-rule consensus tree is shown in Figure 2) were highly consistent regardless of the analytic method used, and were statistically supported by high posterior probability and intermediate bootstrap values. This phylogenetic analysis represented the first investigation of P. corethrurus relationships within the Clitellata based on the complete mitogenome. As indicated by the tree, different species from the same family clustered together (Megascolecidae: M. guillemi, D. schmardae, A. gracilis, P. excavatus and T. birmanicus), and the species from Lumbricidae and the P. corethrurus formed a monophyletic group. The species D. japonica belongs to the Moniligastridae, the sister group to Crassiclitellata (earthworms), which explains its phylogenetic position. The Moniligastridae are not Crassiclitellata because they have a single cell layer in the clitellum.

Conclusion
For the first time, the sequencing, annotation and analysis of the mitochondrial genome of a member of Rhinodrilidae was completed. The mitogenome of P. corethrurus was found to be 14,835 bp in length and showed a similar composition in size, low GC content and gene order to earthworm mitogenomes already available. The complete mitogenome reported here is expected to allow for further studies of the P. corethrurus phylogeny and for analyses on the taxonomic status of the family Rhinodrilidae.

Declaration of interest section
The authors report no conflicts of interest and are responsible for the content and writing of the paper.