The mitochondrial genome of the land snail Cernuella virgata (Da Costa, 1778): the first complete sequence in the family Hygromiidae (Pulmonata, Stylommatophora)

Abstract The land snail Cernuella virgata (da Costa, 1778) is widely considered as a pest to be quarantined in most countries. In this study, the complete mitochondrial genome of Cernuella virgata is published. The mitochondrial genome has a length of 14,147 bp a DNA base composition of 29.07% A, 36.88% T, 15.59% C and 18.46% G, encoding 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. The complete nucleotide composition was biased toward adenine and thymine, A+T accounting for 69.80%. Nine PCGs and 14 tRNA genes are encoded on the J strand, and the other four PCGs and eight tRNA genes are encoded on the N strand. The genome also includes 16 intergenic spacers. All PCGs start strictly with ATN, and have conventional stop codons (TAA and TAG). All tRNAs fold into the classic cloverleaf structure, except tRNAArg, tRNASer(UCN), tRNASer(AGN) and tRNAPro. The first three lack the dihydrouridine arm while the last lacks the TψC arm. There are 502 bp long noncoding regions and 418bp long gene overlaps in the whole mitochondrial genome, accounting for 3.54% and 2.95% of the total length respectively. Phylogenetic analyses based on the sequences of the protein coding genes revealed a sister group relationship between the Hygromiidae and the Helicidae.


Introduction
The land snail Cernuella virgata (da Costa, 1778), also known as the Mediterranean white snail or Common white snail, is endemic to the Mediterranean and western Europe, and has been introduced to America, Australia and Morocco (Barker 2004). The snail is omnivorous, feeding on detritus and plant matter, such as bark, stems and leaves of various green plants. Not only does it destroy agricultural crops, such as beans, cereal, various fruits and vegetables, it also can spread zoonotic food-borne parasitic diseases. For example, the species acts as intermediate host for the terrestrial trematode parasite Brachylaima cribbi (Kerney and Cameron 1979;Butcher and Grove 2006). Because of its remarkable adaptability and the severe damage it causes to agriculture, the natural environment and humans, the snail is considered a serious pest in the USA, Australia, Japan, Chile and other countries (Dennis 1996;Barker 2004;USDA 2008;MOA and AQSIQ 2012). One ship carrying barley from Australia was refused entry and berthing by Chile because of the presence of this snail causing huge economic losses (USDA 2008). It is also one of the more important quarantine terrestrial mollusks in America. To prevent invasion and proliferation, the U.S. government has invested considerable human and financial resources to eradicate the snails in Washington, Michigan and North Carolina (USDA 2008). Recently, Chinese ports have intercepted snails in barley, rapeseed and other consignments from abroad. Owing to its great harm, the snail was listed in "The People's Republic of China entry plant quarantine pest list" by the government in 2012 to prevent its introduction (MOA and AQSIQ 2012).
The metazoan mitochondrial (mt) genome usually comprise 37 genes and some noncoding regions, such as 13 protein coding genes (PCGs) (COI−COIII, Cytb, ND1−ND6, ND4L, ATP6 and ATP8), two ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and the AT-rich region or control region (Wolstenholme 1992;Boore 1999). It has been extensively used to study the origin of species, phylogeography and population genetic structure and so on due to its small genome size, fast evolution, uniparental inheritance and lack of extensive recombination (Saccone et al. 1999;Elmerot et al. 2002). To date, only nine species from the order Stylommatophora have been determined as dispersing in Helicidae (Terrett et al. 1995;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013), Bradybaenidae (Yamazaki et al. 1997;Deng et al. 2014), Clausiliidae (Hatzoglou et al. 1995), Succineidae (White et al. 2011), Achatinidae (He et al. 2014) and Camaenidae (Wang et al. 2014). However, there are no reports on the mt genome of the family Hygromiidae. In this work, the complete mt genome of the snail C. virgata was obtained firstly using primer walking and shotgun sequencing techniques based on PCR. Studying the mitochondrial genome of C. virgata can not only offer more worthwhile information for phylogeny but also be applied to molecular alignment and identification in international plant quarantine measures.

Specimen collection and DNA isolation
Adult snail was intercepted from barley shipments imported to China from southern Australia on 1 March 2012 and stored at -20 °C in the Key Laboratory of Molluscan Quarantine and Identification of AQSIQ, Fujian Entry-Exit Inspection & Quarantine Bureau, Fuzhou, Fujian, China (FJIQBC). Voucher specimens (FJIQBC000123) were deposited in FJIQBC. Total genomic DNA was obtained from approximately 50 mg fresh foot tissue, using the DNeasy Blood and Tissue kit (Qiagen) according to the manufacturer's instructions.

DNA sequencing
The entire genome was successfully amplified by polymerase chain reaction (PCR) in overlapping fragments with four pairs of mitochondrial universal primers chosen from previous works (Palumbi et al. 1991;Folmer et al. 1994;Merritt et al. 1998;Hugall et al. 2002) and four pairs of perfectly matched primers designed from sequenced short fragments with Primer Premier 5.0 (Table 1). Short PCRs (< 2 kb) were performed using Takara Taq DNA polymerase (TaKaRa, Dalian, China), with the following cycling conditions: 30s at 94°C, followed by 35 cycles of 10s at 94°C, 50s at 40°C or 45°C, and 1 min at 72°C. The final elongation step was continued for 10 min at 72°C. Long range PCRs (> 4 kb) were performed using Takara Long Taq DNA polymerase (Ta-KaRa, Dalian, China) under the following cycling conditions: 1 min at 94°C, followed by 40 cycles of 10s at 98°C, 50s at 60°C, 4−8 min at 68°C, and the final elongation step at 72°C for 6 min. The PCR products were checked by spectrophotometry and 1.0% agarose gel electrophoresis.
The BigDye Terminator Sequencing Kit (Applied Biosystems, San Francisco, CA, USA) and the ABI PRIMER TM 3730XL DNA Analyzer (PE Applied Biosystems) were used to sequence short fragments from both directions after purification. For the long fragments, the shotgun libraries of C. virgata were constructed, and the positive clones were then sequenced using the above kit and sequenator with vector-specific primers BcaBest primer M13-47 and BcaBest Primer RV-M.

Genome annotation and inference of secondary structure
To control sequencing errors, each partial sequence was evaluated at least twice. Annotations and editing procedures of the mitochondrial genomes of C. virgata were performed in MEGA5.0. Mitochondrial PCGs and rRNA genes were identified by  Yang et al. 2014). The limits of both protein coding and rRNA genes were adjusted manually based on location of adjacent genes, and the presence of start and stop codons. The tRNA genes were located using DOGMA (Wyman et al. 2004) and tRNAscan-SE v.1.21 (Lowe and Eddy 1997), while others that could not be determined by DOGMA and tRNAscan-SE were identified by comparison with other land snails (Terrett et al. 1995;Yamazaki et al. 1997;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;Wang et al. 2014). The base composition and codon usage were analyzed with MEGA 5.0 (Tamura et al. 2007). AT skew and GC skew were used to describe strand asymmetry according to the formulae AT = [A−T]/[A+T] and GC = [G−C]/[G+C] (Perna and Kocher 1995).

Phylogenetic analyses
Phylogenetic analyses were performed based on 15 complete mt genomes of gastropods from GenBank (Table 2) using maximum likelihood (ML) method. Two species from Basommatophora and Opisthobranchia were selected as outgroups. A DNA alignment with 10,362 bp length was inferred from the amino acid alignment of 13 PCGs using MEGA 5.0 (Tamura et al. 2007). The selection of best-fit-substitution model for ML estimation was performed using MEGA 5.0 with corrected Akaike information criterion (AIC). Node supports for ML analyses were calculated through 1000 bootstrap replicates. All other settings were kept as default.

Genome structural features
The entire circular genome was 14,147 bp in length (GenBank: KR736333), containing 13 PCGs, 22 tRNA genes and two rRNA genes ( Figure 1). Twenty-four genes were encoded on the majority coding strand (J strand) while 13 genes were encoded on the minority coding strand (N strand) (tRNA Gln , tRNA Leu(UUR) , tRNA Asn , tRNA Arg , tRNA Glu , tRNA Met , tRNA Ser(UCN) , tRNA Thr , ATP6, ATP8, ND3, COIII and SrRNA) ( Table 3). The nucleotide composition of the whole genome was biased toward adenine and thymine, accounting for 69.80% of base composition (Table 4). Gene overlaps with a total of 418 bp have been found at 14 gene junctions; the longest overlap (85 bp) existed between ND5 and ND1. In addition, 502 nucleotides were dispersed in 16 intergenic spacers, the largest of which was 149 bp long between tRNA Trp and tRNA Gly . Additionally, two long spacers of 77 bp and 76 bp each were found between ND4L and ND1, tRNA Ser(UCN) and tRNA Ser(AGN) , respectively. There were seven close gene junctions with no intergenic spacers or overlap (Table 3).

Protein coding genes
The total length of all PCGs was 10, 977 bp, accounting for 77.59% of the entire mt genome (Table 4). All PCGs started strictly with the Start Codon ATN (four with ATG, five with ATT, and four with ATA) and ended with the conventional stop codons TAA or TAG. (Table 3).  Genes with underline illustrate the direction of transcription from 3' to 5', and without underline revealing from 5' to 3'. Numbers and overlapping lines within the circle indicate PCR fragments amplified for sequencing (see Table 1).
Codon usage could reveal nucleotide bias. NNA and NNU as codons were used frequently in most PCGs. Additionally, the codons TTT (phenylalanine), TTA (leucine) and ATT (isoleucine) composing A and T were used widely (Figure 2).

Transfer RNA genes
The length of tRNA genes ranged from 53 to 69 bp.The 22 tRNA genes typically found in metazoan mt genomes were also discovered in C. virgata; eleven of them were determined by tRNAscan-SE and eight of them were determined by DOGMA. Another three tRNA genes that could not be detected by the above two programs were identified and passed through comparisons with known patterns of previous research  Fourteen tRNA genes were encoded on the J strand and the remainder on the N strand. Most tRNA genes could be folded into classic clover leaf structures except for tRNA Arg , tRNA Ser(UCN) and tRNA Ser(AGN) , which lack the dihydrouridine arm. The gene tRNA Pro has a loop in its TψC arm (Figure 3).
In some tRNA genes, non-Watson-Crick matches and aberrant loops had been found. For example, a total of 41 unmatched base pairs existed in some tRNAs, and 18 of them were G-U non-classical pairs, most of which existed in Discriminator nucleotide, anticodon arm and Dihydrouridine arm (Figure 3).

Ribosomal RNA genes
The rRNA genes of C. virgata encompassed the lrRNA and srRNA genes with a length of 1,013 bp and 699 bp, repsectively. The former was situated between tRNA Val and tRNA Leu(CUN) and the latter was located between tRNA Glu and tRNA Met (Table 3).

Noncoding regions
In the mitochondrial genome of C. virgata, there are 16 noncoding regions with total 502 bp length, accounting for 3.54%. The longest was 149bp, between tRNA Trp and tRNA Gly . The shortest was 1 bp existing three regions, respectively locating tRNA Pro and tRNA Ala , tRNA His and tRNA Gln , ND2 and tRNA Lys (Table 3).

Phylogenetic reconstruction
The ML tree (Figure 4) presented nine major clades containing the families Helicidae, Hygromiidae, Camaenidae, Bradybaenidae, Succineidae, Clausiliidae, Achatinidae, Lymnaeidae and Aplysiidae. The four bradybaenid species and three helicid species each formed a clade and a sister pair. In addition, we found that Camaenidae and Bradybaenidae each were monophyletic and also in a sister group relationship with each other.

Discussion
The length of mt genome of C. virgata was 304 bp longer than Camaena cicatricosa and 97 bp longer than Cornu aspersum. All gene directions showed similarity to the sequenced mt genome of C. cicatricosa, but gene order was different, especially with respect to the positions between CYTB and ATP8 genes (Gaitán-Espitia et al. 2013;Wang et al. 2014). The overall mt genome of C. virgata was loose particularly, with more and longer intergenic spacers.
In the study of mt genome of C. cicatricosa, GTG is the start codon of the COII gene, and COI and ND6 genes of C. aspersum start with TTG (Gaitán-Espitia et al. 2013;Wang et al. 2014). From previous studies we can see that most start signals of land snails were consistent with C. virgata factually, but ATC, TTA, TTG, CTT, TCG and CGA as start signals have been found (Raay and Crease 1994;Crease 1999;Yamazaki et al. 1997;Yu et al. 2007;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;Wang et al. 2014). Conventional stop codons TAA and TAG have been found in all PCGs of C. virgata, which corresponds to C. cicatricosa (Wang et al. 2014). However, COII, CYTB, ND3 and ATP8 genes of C. aspersum from the family Helicidae ended with T, and this phenomenon has also been discovered in other snails as well (Terrett et al. 1995;Hatzoglou et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et al. 2012;Wang et al. 2012;Gaitán-Espitia et al. 2013). Some authors suggested that this nucleotide exchange was caused by post-transcriptional polyadenylation (Ojala et al. 1981;Cha et al. 2007).
Usually, in the tRNA, the Acceptor arm (7 bp) and Anticodon arm (5 bp) were conservative in size (Kinouchi et al. 2000). However, the length of Acceptor arm of tRNA Leu(CUN) in C. virgata was distinctive, with only 4 bp in size. The Anticodon arm of tRNA Ser(AGN) (8 bp) and all Anticodon loops (7 nucleotides) was coincident with the snail C. cicatricosa (Wang et al. 2014). The remaining arms and loops changed apparently in size comparing to that of other land snails (Hatzoglou et al. 1995;Groenenberg et al. 2012;Wang et al. 2014). Some non-Watson-Crick matches existed in all tRNA, including G-U pairs, A-C mismatch, U-C mismatch etc. Tomita et al. (2001) raised that these mismatches may can be rectified by post-transcriptional RNAediting mechanism to hold tRNA function.
Noncoding regions are assumed to splice recognition sites during the process of transcription (He et al. 2005). In the previous sequenced complete mt genome of the order Stylommatophora, the noncoding regions range from 1 bp to 65 bp (Hatzoglou et al. 1995;Terrett et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;Deng et al. 2014;Wang et al. 2014) except Achatina fulica with 551 bp length (He et al. 2014). In metazoan mt genomes, these noncoding regions are normal. The longest one can be called control region or AT-rich region (Boore 1999). Usually, changes in length of the whole mt genome are mainly caused by difference of the control region (Zhang and Hewitt 1997). However, the control region may not be aligned accurately in gastropods (Groenenberg et al. 2012) except in A. fulica which included a 551 bp putative control region (POR) between COI and tRNA Val (He et al. 2014). Another ten sequenced stylommatophoran species may possess short putative control region located in different places (Hatzoglou et al. 1995;Terrett et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et  al. 2012; Gaitán-Espitia et al. 2013;Deng et al. 2014;Wang et al. 2014;Huang et al. 2015;2015). The PORs of C. virgata, Mastigeulota kiangsinensis and Dolicheulota formosensis are situated adjacent to tRNA Trp , at 149 bp, 216 bp and 245 bp respectively. The PORs of C. cicatricosa (29 bp) and Succinea putris (48 bp) were located between COIII and tRNA Ile . Two other helicid species had PORs located between COIII and tRNA Ser with lengths of 158-186 bp, whereas the PORs of Albinaria caerulea (65 bp), Aegista diversifamilia (93 bp), Cylindrus obtusus (395 bp) and Euhadra herklotsi (78 bp) were specific, respectively between ND3 and tRNA Ser , tRNA Met and tRNA Ser , ND5 and tRNA Ala , tRNA Se r (UCN) and tRNA Ser(AGN) . The absence of a control region was consistent with other gastropods (Deng et al. 2014;Wang et al. 2014;Yang et al. 2014). In the present study, the longest noncoding region was 149 bp, which was the second longest one by far.
Three species in the Helicidae were sister groups and consistent with previous works (Gaitán-Espitia et al. 2013). However, the systematics of Camaenidae, Helicidae and Bradybaenidae are complicated and have not been fully resolved; systematic and phylogenetic studies based on analyses of morphological and molecular markers have produced inconsistent results (Scott 1996;Cuezzo 2003;Wade et al. 2007;Hirano et al. 2014). More complete taxon sampling need to be prepared to assess the phylogenetic relationship of these three families.