The mitochondrial genome of the land snail Camaena cicatricosa (Müller, 1774) (Stylommatophora, Camaenidae): the first complete sequence in the family Camaenidae

Abstract The complete mitochondrial (mt) genome of the snail Camaena cicatricosa (Müller, 1774) has been sequenced and annotated in this study. The entire circular genome is 13,843 bp in size and represents the first camaenid mt genome, with content of 31.9%A, 37.9%T, 13.5%C and 16.7%G. Gene content, codon usage and base organization show similarity to a great extent to the sequenced mt genome from Stylommatophora, whereas, gene order is different from them, especially the positions of tRNACys, tRNAPhe, COII, tRNAAsp, tRNAGly, tRNAHis and tRNATrp. All protein coding genes use standard initiation codons ATN except for COII with GTG as start signal. Conventional stop codons TAA and TAG have been assigned to all protein coding genes. All tRNA genes possess the typical clover leaf structure, but the TψC arm of tRNAAsp and dihydrouridine arm of tRNASer(AGN) only form a simple loop. Shorter intergenic spacers have been found in this mt genome. Phylogenetic study based on protein coding genes shows close relationship of Camaenidae and Bradybaenidae. The presented phylogeny is consistent with the monophyly of Stylommatophora.


Introduction
The mitochondrial (mt) genome of metazoa usually comprise 37 genes, including 13 protein coding genes (PCGs) (COI−COIII, Cytb, ND1−ND6, ND4L, ATP6 and ATP8), two ribosomal RNA (rRNA) genes, and 22 transfer RNA (tRNA) genes (Boore 1999). Additionally, it also contains noncoding regions, such as the AT-rich region and short intergenic spacers (Wolstenholme 1992). The mt genome is characterized by small size (13−36 kb), maternal inheritance, lack of recombination, conserved genomic organization and rapid evolutionary rate compared with the nuclear genome (Avise 1994). It has been widely used in studies of systematics, phylogenetic analysis, phylogeography, population structure at diverse taxonomic groups (White et al. 2011;Gaitán-Espitia et al. 2013;Menegon et al. 2014). The mt genome is the most popular genetic marker though there are numerous debates on their utilization in systematic research (Delsuc et al. 2003;Cameron et al. 2004;Talavera and Vila 2011;Simon and Hadrys 2013;Cameron 2014). Over the last years, next generation sequencing technologies have accelerated further developments of mt genomics. The mt genomes of many vertebrates and insects are well sequenced and studied (Boore 1999;Hahn et al. 2013;). However, studies on molluscan mt genomes are poor relatively (Kurabayashi and Ueshima 2000;Boore et al. 2004;Grande et al. 2008). Only 80 mt genomes of Gastropoda snails have been deposited in GenBank (up to 2014.9.20).
Camaenidae, one of the most diverse families, was erected by Pilsbry in 1893 (Pilsbry 1893(Pilsbry -1895. The camaenids mainly feed on green plants and humus, and often harm a large number of crops, landscape plants and forest, leading to a depression in yield and a reduction in quality. Besides, they can spread zoonotic food borne parasitic disease and have great damage to human and animal health (Zhou et al. 2007). When humans are infected by ingesting snails, the nervous system can be injured (Liang and Pan 1992). The camaenids also play an important part in agricultural production and human activities as food, drug, arts, crafts, etc. (Chen and Gao 1987). Camaena cicatricosa (Müller, 1774), the type species of the type genus Camaena (Albers, 1850), occurs only in China, distributing in Guangdong, Guangxi, Guizhou, Yunnan and Hainan. Adult shell is large, thick and depressed conic. This snail usually feeds a broad range of fruits, vegetables, leaves and weeds (Xiao 1989).
The mt genome of land snail is similar to other invertebrates in containing 37 genes. Since the first mt genome of Albinaria caerulea was obtained in 1995 (Hatzoglou et al. 1995), only ten mt genomes from eight species in the order Stylommatophora were determined prior to this study, consisting of three species in Helicidae (Terrett et al. 1995;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013), two in Bradybaenidae (Yamazaki et al. 1997;Deng et al. 2014), one in Clausiliidae (Hatzoglou et al. 1995), one in Succineidae (White et al. 2011) and one in Achatinidae (He et al. 2014).
Although researchers have done some phylogenetic studies on Camaenidae, they often pay much attention to analyses of shell morphology or single gene fragment (Scott 1996;Wade et al. 2007). Complete mt genome evidence is still limited. We select C. cicatricosa as subject because of not only relatively wide distribution and varied morphology but also acting as type species of the type genus Camaena. We have analyzed nucleotide composition, codon usage, compositional biases, and constructing models of the secondary structure of tRNAs. Besides, we also discussed the phylogenetic relationships with other representative gastropods. This snail mt genome is the first model in the family Camaenidae, thus it can offer worthwhile information to other camaenids.

Genomic DNA extraction, PCR amplification and DNA sequencing
Adults of C. cicatricosa were collected from Xishan Park in Guiping (23°23'58"N, 110°3'46"E), Guangxi, China in November 2, 2013. Specimens were initially preserved in 100% ethanol in the field, and then stored at -20 °C at Fujian Entry-Exit Inspection & Quarantine Bureau (FJCIQ). Total genomic DNA was extracted from the pedal muscle tissue of single individual using the DNeasy Blood and Tissue kit (Qiagen) according to the manufacturer's instructions. Voucher specimen (FJCIQ 18483) is deposited at the Key Laboratory of Molluscan Quarantine and Identification of AQSIQ, Fujian Entry-Exit Inspection & Quarantine Bureau, Fuzhou, Fujian.
The entire genome was successfully amplified by polymerase chain reaction (PCR) in overlapping fragments with four pairs of mitochondrial universal primers from previous works (Palumbi et al. 1991;Folmer et al. 1994;Merritt et al. 1998;Hugall et al. 2002), and four pairs of perfectly matched specific primers designed from sequenced short fragments in this study (Table 1). Short PCRs (< 2 kb) were performed using Takara Taq DNA polymerase (TaKaRa, Dalian, China), with the following cycling conditions: 30 s at 94 °C, followed by 35 cycles of 10 s at 94 °C, 50 s at 40 °C or 45 °C, and 1 min at 72 °C. The final elongation step was continued for 10 min at 72 °C. Long range PCRs (> 4 kb) were performed using Takara Long Taq DNA polymerase (TaKa-Ra, Dalian, China) under the following cycling conditions: 1 min at 94 °C, followed by 40 cycles of 10s at 98 °C, 50 s at 60 °C, 4−8 min at 68 °C, and the final elongation step at 72 °C for 6 min. The PCR products were checked by spectrophotometry and 1.0% agarose gel electrophoresis.
Short fragments were sequenced from both directions after purification using the BigDye Terminator Sequencing Kit (Applied Biosystems, San Francisco, CA, USA) and the ABI PRIMER Tm 3730XL DNA Analyzer (PE Applied Biosystems) with internal primers for primer walking. For the long fragments, the shotgun libraries of C. cicatricosa were constructed, and then the positive clones were sequenced using above kit and sequenator with vector-specific primers BcaBest primer M13-47 and BcaBest Primer RV-M.

Genome annotation and inference of secondary structure
Raw sequences were proof-read and aligned into contigs with BioEdit v.7.0.5.3 (Hall 1999). The tRNA genes were identified with tRNAscan-SE Search Server v.1.21 (Lowe and Eddy 1997) and DOGMA (Wyman et al. 2004), while others that could not be determined by these two tools were predicted by similarity comparison with other published land snails (Terrett et al. 1995;Yamazaki et al. 1997;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;He et al. 2014;Deng et al. 2014). The PCGs and rRNA genes were annotated by BLAST in Genbank with published available mitochondrial sequences of terrestrial snails.
PCGs were aligned with Clustal X (Thompson et al. 1997). The nucleotide composition and codon usage were analyzed with MEGA 5.0 (Tamura et al. 2011). Strand asymmetry was denoted by skew values, which were calculated according to the formulas: AT skew = [A−T]/[A+T] and GC skew = [G−C]/[G+C] (Perna and Kocher 1995).
Phylogenetic analyses were performed based on 11 representative gastropod mt genomes from GenBank (Table 2) using maximum likelihood (ML) and maximum parsimony (MP) methods. One species of Opisthobranchia was selected as outgroup. A DNA alignment with 9,892bp length was inferred from the amino acid alignment of 13 PCGs using MEGA 5.0 (Tamura et al. 2011). The selection of best-fit-substitution model for ML estimation was performed using MEGA 5.0 with corrected Akaike information criterion (AIC). Node supports for ML and MP analyses were calculated through 1000 bootstrap replicates. All other settings were kept as default.

Results and discussion
The complete mt genome of C. cicatricosa was a double-stranded circular molecule of 13,843 bp in length (GenBank: KM365408). It contained 13 PCGs, 22 tRNA genes, two rRNA genes, similar to other mt genomes of land snails from the order Stylommatophora. All genes were divided into two groups, encompassing 24 genes on the majority coding strand (J strand) and others on the minority coding strand (N strand) (Fig. 1). However, the gene arrangement differed from that of the known land snails in the order Stylommatophora, specially the locations of tRNA Cys , tRNA Phe , COII, tRNA Asp , tRNA Gly , tRNA His and tRNA Trp (Fig. 2). Gene overlaps with a total of 242 bp were found at 16 gene junctions, and the longest overlap (50 bp) existed between ND6 and ND5. Besides, there were 144 nucleotides dispersed in 14 intergenic spacers with the shortest 1 bp and the longest 29 bp. The 29 bp long noncoding region was situated between COIII and tRNA Ile ; the shortest 1bp in three gene spacers (Table 3).

Protein coding genes
The length of PCGs was 10,941bp, accounting for 79.04% of the whole mt genome (Table 4). Most PCGs started with ATN as initiation codons (four with ATG, three with ATT, and five with ATA) except for COII gene with GTG (Table 3), while ATC, TTA, TTG, CTT and TCG as unconventional start signals have been found in other invertebrates (Raay and Crease 1994;Crease 1999;Yamazaki et al. 1997;Yu et al. 2007;Groenenberg et al. 2012). Conventional stop codons TAA and TAG had been assigned to all PCGs (Table 3). However, an incomplete terminator signal (T) has been found in other snails (Terrett et al. 1995;Hatzoglou et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013).

Transfer RNA genes
The 22 tRNA genes typically found in metazoan mt genomes were also discovered in C. cicatricosa, and 18 of them were determined by tRNAscan-SE (Lowe and Eddy 1997) and DOGMA (Wyman et al. 2004). The other four tRNA genes that could not be detected by the two programs were identified and drawn through comparison with known patterns of previous researches (Terrett et al. 1995;Grande et al. 2002;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013). Fourteen tRNA genes were encoded on the J strand and the remainings on the N strand. Most Figure 1. The mt genome of Camaena cicatricosa. The tRNA genes are labeled based on the IUPACIUB single letter amino acid codes. Genes with underline illuminate the direction of transcription from 3' to 5', and without underline illuminating from 5' to 3'. Numbers and overlapping lines within the circle indicate PCR fragments amplified for sequencing (see Table 1). tRNA genes could be folded into classic clover leaf structures exclusive of tRNA Asn and tRNA Ser(AGN) , in which their TψC arm and dihydrouridine (DHU) arm simply formed a loop (Fig. 3). The length of tRNA genes ranged from 53 to 65 bp (Table 3). All amino acid acceptor (AA) arms (7 bp), anticodon (AC) loops (7 bp) and arms (5 bp) were almost invariant. However, other arms and loops changed considerably in size. Additionally, in some tRNA genes, non-Watson-Crick matches and aberrant loops had been found. For example, a total of 73 unmatched base pairs existed in some tRNAs, and 38 of them were G-U pairs, situated in the AA stem (13 bp), the T stem (10 bp), the AA stem (8 bp) and the DHU stem (7 bp). The remaining five base pairs included U-U mismatches, U-C mismatches, A-C mismatches, A-G mismatches and A-A mismatches (Fig. 3). Nevertheless, the post-transcriptional RNA-editing mechanism can rectify these mismatches to maintain tRNA functions (Tomita et al. 2001).

Ribosomal RNA genes
The rRNA genes comprising large rRNA subunit (lrRNA) and small rRNA subunit (srRNA) are presumed to block in the spaces of flanking genes (Boore 2001;. The lrRNA gene was situated between tRNA Val and tRNA Leu(CUN) revealing 78.23% consistency with Euhadra herklotsi and Mastigeulota kiangsinensis. The srRNA gene was located between tRNA Glu and tRNA Met (Fig. 1). The length of them were determined to be 997 bp and 682 bp respectively (Table 3).

Base composition and codon usage
Like other snail mt genomes, the nucleotide composition of the C. cicatricosa mt genome was obviously biased toward adenine and thymine (A = 31.90%, T = 37.90%, C = 13.50%, G = 16.70%). The entire mt genome had a high A+T content of 69.80%, by the composition of 69.32% in PCGs, 71.41% in tRNA genes, 72.42% in rRNA genes. Nucleotide bias can also be reflected by codon usage. Evidently, we can see that NNA and NNU were applied frequently in most PCGs. Furthermore, codons TTT (phenylalanine), TTA (leucine), ATT (isoleucine) and ATA (methionine) which were used widely were all composed of A and T. Especially, more and more codons were biased in favor of those codons with A or T in the third position (Fig.4). The nucleotide composition of metazoan mt genomes usually demonstrate an obvious strand bias (Hassanin et al. 2005;Hassanin 2006) that can be described as AT and GC skews (Perna and Kocher 1995). The PCGs skew statistics of C. cicatricosa showed a great TA skew and nearly equal G and C on the N strand, whereas a great GC skew on the J strand. The nucleotide composition of tRNAs on the J strand were GC and TA skews, evidently exceeding values on the N strand (Table 4). AT and GC skews of C. cicatricosa mt genome differ from the strand biases of metazoan mtDNA (generally positive AT skew and negative GC skew for the J strand, contrary to the N strand for most metazons).

Noncoding regions
The noncoding regions of C. cicatricosa mt genome contained some short intergenic spacers. These short sequences possibly acted as splicing recognition sites during the process of transcription (He et al. 2005). In the sequenced complete mt genome of the order Stylommatophora, the short intergenic spacers range from 1 bp to 65 bp (Hatzoglou et al. 1995;Terrett et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;Deng et al. 2014) except Achatina fulica with 551 bp long noncoding region (He et al. 2014). However, the longest noncoding region was only 29 bp in C. cicatricosa. The shorter lengths of noncoding regions indicated that the mt genome of stylommatophorans are quit compact.
A large noncoding region called control region or AT-rich region is commonly seen in metazoan mt genomes (Boore 1999). In fact, variation of size for the entire mt genome can be chalked up to the presence of a number of tandem repeats (Zhang and Hewitt 1997) in control region, which may be caused by replication slippage (Levinson and Gutman 1987;Fumagalli et al. 1996). Nevertheless, putative control region (POR) was not aligned confidently in gastropods (Groenenberg et al. 2012) except A. fulica having a 551 bp POR between COI and tRNA Val (He et al. 2014). Other eight stylommatophoran species may possess short POR regions located adjacent to COIII (Hatzoglou et al. 1995;Terrett et al. 1995;Yamazaki et al. 1997;White et al. 2011;Groenenberg et al. 2012;Gaitán-Espitia et al. 2013;Deng et al. 2014). The POR regions of three helicid species and M. kiangsinensis were located between COIII and tRNA Ser with lengths of 158-189 bp, whereas in the other three species were located between COIII and tRNA Ile with lengths of 42-47 bp. The 29 bp noncoding region of C. cicatricosa was located between COIII and tRNA Ile , but its length was shorter than other stylommatophorans.   (Scott 1996;Cuezzo 2003;Wade et al. 2007;Hirano et al. 2014). A final assessment of the systematic relationships of the three families is pending requiring a more complete taxon sampling.