Nuclear ribosomal internal transcribed spacer 1 (ITS1) variation in the Anastrepha fraterculus cryptic species complex (Diptera, Tephritidae) of the Andean region

Abstract The nuclear ribosomal internal transcribed spacer 1 (ITS1) was sequenced for Anastrepha fraterculus (Wiedemann, 1830) originating from 85 collections from the northern and central Andean countries of South America including Argentina (Tucumán), Bolivia, Perú, Ecuador, Colombia, and Venezuela. The ITS1 regions of additional specimens (17 collections) from Central America (México, Guatemala, Costa Rica, and Panamá), Brazil, Caribbean Colombia, and coastal Venezuela were sequenced and together with published sequences (Paraguay) provided context for interpretation. A total of six ITS1 sequence variants were recognized in the Andean region comprising four groups. Type I predominates in the southernmost range of Anastrepha fraterculus. Type II predominates in its northernmost range. In the central and northern Andes, the geographic distributions overlap and interdigitate with a strong elevational effect. A discussion of relationships between observed ITS1 types and morphometric types is included.


Introduction
The Anastrepha fauna of the central and northern Andes and Andean periphery is extremely rich, but incompletely known, and includes the poorly understood Andean populations of the serious economic pest species A. fraterculus (Wiedemann), which is interpreted as a cryptic species complex as recently reviewed by Hernández-Ortiz et al. (2012, 2015. The taxonomic structure of the A. fraterculus cryptic species complex remains inadequately resolved, even as to the number of species involved and their distributional patterns. As part of an ongoing project to improve the identification of Anastrepha species, an effort to better resolve and understand the taxonomic structure of this complex was undertaken utilizing sequence analysis of select mitochondrial and nuclear gene regions, with emphasis on the fauna of the Andean region. Preliminary Sanger sequencing of potentially useful DNA regions for a subset of A. fraterculus specimens from two sites in the central Andes of Peru representing "low" and "high" elevation populations (Peru colony, Lima, (n=5) and Cusco, Calca, (n=5) respectively) included the mitochondrial cytochrome oxidase I (COI, in part, including the Folmer fragment) and ribosomal r16S (in part), and the nuclear ribosomal r18S (in part), internal transcribed spacer (ITS, complete), and r28S (in part, including the expansion domains D1-D8). It also included the nuclear protein coding genes elongation factor 1 (EF1∝, in part) and the carbamoyl phosphate synthetase domain (CAD,in part). These were also sequenced for other Anastrepha fraterculus species group taxa including A. suspensa (Loew), A. distincta Greene, A. ludens (Loew), and A. obliqua (Macquart).
This initial survey established that taxonomically informative and diagnostic sequence polymorphism was present in the ribosomal internal transcribed spacer 1 (ITS1). Further sequence analysis then concentrated on ITS1 and geographical coverage was expanded to include the entire project region. We here report ITS1 sequence polymorphism patterns, their geographical correlations, and taxonomic implications for Andean populations of A. fraterculus.

Methods
The majority of specimens in this study were adult females. Taxonomic determinations as A. fraterculus were based on adult morphological criteria following Norrbom et al. (2012 onwards). After removal of tissue for DNA analysis specimens were deposited in the FSCA as vouchers.
The geographical region chosen for the initial stage of this project included the central and northern Andes and Andean periphery of Bolivia, Perú, Ecuador, Colombia, Venezuela, and Argentina. For comparison, limited numbers of specimens were also available from outside of the core region including México, Guatemala, Costa Rica, Caribbean Colombia (Isla de San Andrés), southern and southeastern Brazil, and coastal Venezuela (Los Caracas), as well as ITS1 sequences (NCBI) derived from A. fraterculus originating from localities in Paraguay (Lopes 2010, Lopes et al. 2013. Most specimens originated from McPhail-type fruit fly traps using yeast hydrolysate bait with, or without, borate preservative. Lesser numbers were collected using multi-lure traps (ML) or reared from host fruits. Trapped or reared specimens were stored in alcohol (70-100% isopropanol or ethanol); or in the case of some archival FSCA samples, stored frozen at -80 °C from as early as 1988. Acronyms for organizations that provided specimens or DNA sequences are as follows: ITS1 sequences were constructed for 299 Anastrepha fraterculus specimens representing 85 collections ( Figure 1) from the central and northern Andes including Perú (n=119; SENASA, FSCA, IAEA-IPCL, this project), Bolivia (n=28; SENASAG, DSA, FSCA, this project), Colombia (n=125; ICA, IAEA-IPCL, this project), Ecuador (n=14; FSCA, this project), Venezuela (n=7; FSCA), and Argentina (n=6; IAEA-IPCL).
Paraguay was represented by ITS1 sequences deposited in the NCBI (HQ829865 to HQ829879) (Lopes 2010, Lopes et al. 2013. Each collection listed below is described by country, collection number, locality, geographical coordinates (latitude and longitude) when provided, date, collector, other relevant label data, number of specimens, and sex (male=M, female=F, unknown=?):  ITS1 in Anastrepha is highly base-biased with ~84-85% AT in the fraterculus species group. In addition, poly(A)/poly(T) subsequences of up to 20 or more bases are predominant. Initial PCR and Sanger sequencing of ITS1 resulted in what could be interpreted as substantial intra-individual polymorphism and/or PCR polymerase artifacts (slippage). Cloning of ITS1 PCR amplicons followed by comparative re-PCR/ Sanger sequencing or direct Sanger sequencing of cloned DNA indicated that (1) PCR polymerase slippage-type artifacts were predominant (this does not rule out some level of intra-individual polymorphism) and (2) that the BigDye Terminator 3.1® (Thermo Fisher Scientific, Waltham, MA USA) sequence by synthesis (SBS) chemistry is highly resistant to polymerase slippage at this level of AT bias. In addition, it was found that less than perfect DNA quality can greatly increase polymerase artifacts.
A considerable effort was made to reduce PCR polymerase artifacts using high processivity polymerases, modified PCR conditions, and reaction adjuncts. The polymerases with the best performance included Phusion® (Thermo Fisher Scientific) under modified PCR conditions, KAPA® Hi Fi (KAPABiosystems, Wilmington, Massachusetts USA), and the KOD-based Platinum® Pfx (Thermo Fisher Scientific). No perfect solution was found; however, the overall level of PCR slippage in most non-degraded DNA samples was reduced to a point where confident analysis was possible. Homopolymer regions having greater than 14 or so A or T bases tended to exhibit PCR slippage even under optimal conditions, as also did sufficiently large repeats such as (ATT) n with n>5. Since polymerase artifacts can originate from either PCR or replication, or both, it is possible, if not probable, that homopolymeric or repeat regions are exhibiting some level of intra-individual polymorphism. These regions, fortunately, tended to be localized as single isolated homopolymers or repeats towards the 5' and/or 3' ends of ITS1 in the A. fraterculus cryptic species complex allowing bidirectional resolution.
In addition, individual DNA samples were found to often react differently to PCR optimizations, particularly in the all too common situation of trapped specimens having a significant probability of degraded DNA. The use of high-salt PCR buffers with relatively high annealing temperatures was generally useful in these cases; however, PCR reaction adjuncts such as TMAC (tetra ammonium carbonate) used to preferentially improve AT bonding while significantly reducing polymerase slippage and artifacts in some cases often increased polymerase artifacts in other samples having the same ITS1 sequence. This latter effect generally occurred in cases of poor DNA quality. Resource limitations precluded PCR optimization for individual samples except in special cases. As a general strategy, KAPA® HiFi became the standard polymerase for PCR.
Primers utilized for PCR and sequencing are given in Table 1: ADL 18sF was based upon the 18s sequences NCBI EU179519 (A. ludens) and AF187101 (A. fraterculus); ADL 5.8sR was modified from CAS5p8sB1d (Ji et al. 2003); ADL ITS1 internal F and R were constructed from complete ITS1 sequences of A. fraterculus generated by direct sequencing using the above PCR primers. The primers internal to ITS1 allowed improved sequence quality for regions subject to polymerase slippage artifacts.
Primers supplied in lyophilized form by Integrated DNA Technologies (IDT) were reconstituted in 1X pH 8.0 Tris-EDTA (TE) (Thermo Fisher Scientific) to a 100µM stock solution. Working solutions were diluted in HyClone® nuclease-free water (Thermo Fisher Scientific) to 10µM for PCR amplification and/or 2.5µM for Sanger reactions. Experimentally, modified primers were synthesized incorporating 3' phosphorothioate linkages to counter exonuclease degradation but this was not found to be helpful and was not continued.
Samples for DNA extraction generally consisted of 1 or more legs with attached muscle, or in some cases the complete thorax, frozen or in alcohol. The latter were air dried in a laminar flow hood to remove alcohol, then in both cases the samples were chopped, frozen in LN 2 , and powdered with a Mini-Beadbeater-96 (Biospec Products, Inc., Bartlesville, Oklahoma USA) using a glass bead. DNA extraction and cleanup utilized the DNeasy Blood and Tissue Kit ® (QIAGEN, Venlo, Netherlands) following the manufacturer's protocol with overnight digestion by Proteinase K using the supplied ATL buffer. Final elution was in 50µL of AE buffer. PCR amplifications were carried using a GeneAmp® PCR System 9700 thermal cycler (Thermo Fisher Scientific) with the temperature program recommended by KAPA Biosystems for the KAPA HiFi hotstart polymerase: initial denaturation 95 °C/2 min followed by denaturation 98 °C/20 sec, annealing at 65 °C/15 sec, and extension at 72 °C/15 sec for 30 cycles with no final extension. PCR reactions also followed manufacturer's recommendations with a total volume of 20 µL, 0.3 µM final primer concentration, and 1U of polymerase. Mineral oil overlays (10 µL) were used. Visualization of amplification products was by planar gel electrochromatography on a 2-2.5% agarose gel and TAE buffer. Gels were stained using SYBR Safe (Thermo Fisher Scientific). If the PCR amplicons were deemed acceptable, then the PCR reactions were cleaned up by spin column using the Roche High Pure PCR Product Purification Kit (Roche Diagnostics Corporation, Indianapolis, Indiana USA) following the manufacturer's recommended protocol and the DNA concentration quantified by a NanoDrop® (Thermo Fisher Scientific) µv spectrophotometer prior to Sanger reaction assembly.

Primer Primer Sequence 5'-3' ADL 18sF
TAA CTC GCA TTG ATT AAG TCC C ADL 5.8sR GAT ATG CGT TCA AAT GTC GAT G ADL ITS1 internal F GAT TGA ATG ATA AGT TAA TTT GTT CAC ADL ITS1 internal R GTT GCG AAT GTC TTA GTT CAA C The Sanger sequencing utilized the BigDye® Terminator™ 3.1 chemistry (Thermo Fisher Scientific) following the manufacturer's recommendations for a 0.25X reaction mix and thermocycler temperature program. Sanger reaction products were cleaned of unincorporated dye terminators using the BigDye® Xterminator™ Purification Kit (Thermo Fisher Scientific) with a modified protocol using PCR tubes rather than a 96-well plate. Separation of Sanger reaction products and visualization utilized an Applied Biosystems 3100-Avant Genetic Analyzer (Thermo Fisher Scientific) upgraded to 3130 specifications.
Sequence Scanner v1.0 (Thermo Fisher Scientific) was used for base calling of raw sequences with visual interpretation for final decisions. Base editing and manual alignment, and cluster analysis utilized MEGA6: Molecular Evolutionary Genetics Analysis software version 6.01 (Tamura et al. 2013). Overall similarity between sequences was inferred by UPGMA (unweighted pair group method with arithmetic mean) cluster analysis (Sneath 1973) with distances computed by the maximum composite likelihood method (Tamura et al. 2004) in number of base substitutions per site with gaps eliminated.
Given the high variability in DNA quality, even between individual specimens from the same collection, a triage-type sequencing strategy evolved in which individual samples were evaluated at multiple stages of the analysis and rejected if they failed to meet certain experimentally determined criteria. This included extracted DNA concentration as determined by µv spectroscopy, visual intensity of amplicons from high stringency PCR reaction conditions, and initial single primer reverse strand sequencing of the r5.8s-r18s region. Only approximately 1/2 of the DNA from trapped specimens resulted in acceptable ITS1 sequence quality.
A subset of, or all, samples from each locality having the best DNA quality were sequenced using the full set of four primers to increase coverage and to allow bi-directional coverage of the regions showing possible intra-individual polymorphism and/or PCR slippage artifacts. Generally in these regions a consistent dominant sequence was present and could be reconstructed with bi-directional coverage and/or manual deconvolution of the overlapping sequences. The accuracy of this approach was verified during method development by direct sequencing of cloned PCR products.

Results
The polymorphic ITS1 region of the Andean A. fraterculus cryptic species complex ranges from 534-563 nucleotides (nt) in length and is significantly AT-biased at approximately 84% AT (Figure 2, all base numbers with respect to the hypothetical alignment in this figure). In this group, ITS1 can be considered as a mix of A/T homopolymers of variable length with intervening base interruptions including G and/or C and bounded by 5' and 3' interrupted poly(A) subregions 52-89nt (bases 1-123) and 44-46nt (bases 588-637) in length, respectively, having individual A homopolymers up to 22 bases. At the 3' end of the 5' poly(A) region (bases ~125-150) is a small variable region. The majority of sequence polymorphism in the intervening subregion between the bounding interrupted poly(A) regions is concentrated in 2 highly variable sequence regions, here designated HRV I (bases 369-418) and HRV II (bases 516-556) respectively ( Figure 2). These regions are polymorphic in sequence within the A. fraterculus cryptic species complex with respect to single nucleotide polymorphisms (SNPs) as well as in the lengths of repeats and/or progressive repeats. Within the relatively conserved regions between HRV I, HRV II, and the bounding poly(A) regions are occasional scattered SNPs as well as 2 unique expansions or insertions.
The ITS1 sequences of the Andean A. fraterculus cryptic species complex can be placed into 4 groups, here designated as ITS1 sequence types TI, TII, TIII, and TIV, with a variant of TI designated TIa and TIII further subdivided into 3 subtypes designated TIIIA, TIIIB, and TIIIC (Table 2). Representative sequences have been deposited in the NCBI. The pattern of overall similarity among these sequence types is visualized by UPGMA clustering (Figure 3).
The ITS1 sequence type TI has a polymorphic region of length 534 or 537nt and is characterized by a unique CA to CATCACATATA expansion located approximately  Overall similarity inferred by UPGMA (unweighted pair group method with arithmetic mean) cluster analysis (Sneath and Sokal (1973) of Andean Anastrepha fraterculus ITS1 sequence types (489nt). Distances were computed by the maximum composite likelihood method (Tamura et al. (2004) in number of base substitutions per site with gaps eliminated. 40nt from the 5' end of HVRI (bases 301 to 309), a 5' poly(A) region unique within this complex, a unique repeat expansion (ATT) 2 to (ATT) 3 starting at base 363, and unique SNPs at bases 142, 351, 401, and 415. Inter-individual polymorphism is present in an (ATT) n repeat with n=5 or 6 (the latter designated TIa in Figure 2) within HVRII starting at base 542 in the alignment. The geographical distribution of TI in the central Andean region ranges from the Chiquitano forests into the eastern Andean dry valleys to at least 2000 m in Bolivia north to the higher elevation eastern dry valleys of the Cusco-Ayacucho region of Peru above ca. 2800 m. This ITS1 sequence type also extends south to Argentina (Tucumán), and east to Paraguay (Lopes 2010, Lopes et al. 2013) and at least southern and southeastern Brazil (Figure 4).
Geographical variation in the HVRII (ATT) n repeat may be present with the n=5 variant (TI) most common in SE Brazil and n=6 (TIa) predominant in the Peru dry valleys; however, sample sizes are insufficient to determine if this is a real trend.
The ITS1 sequence type TII polymorphic region is 554nt in overall length and characterized by a unique expansion of TGG to TGGTGGGTGG starting at base 214, located 86nt from the 3' end of the 5' poly(A) region. The 5' poly(A) region is unique in sequence within the A. fraterculus complex; however, the presence of consecutive A 22 and A 15 homopolymers (estimated lengths) increases the likelihood of PCR artifacts. The putative A deletions of TATATA to TTT at the 3' end of this region is consistent but must be considered hypothetical. Additional unique ITS1 sequence elements includes a possible (T) 6 to (T) 7 expansion starting at base 251, SNPs at bases 149,191,408,and 630,and the (ATT) 4 repeat starting at base 539 in HVRII. An inter-individual polymorphism (A or T) was observed at base 418 and is indicated by the IUPAC ambiguity code W in the alignment.
The geographical distribution of TII in this study was restricted to the periphery of the northern Andes including Caribbean coastal sites of Colombia and Venezuela, and south along the eastern foothills of the Cordillera Oriental of Colombia to at least the Ecuador border ( Figure 4). This sequence variant also characterizes the specimens from Mexico, Guatemala, Costa Rica, and the Isla de San Andrés (Colombia).
The ITS1 sequence type TIII polymorphic region ranged from 550-563nt in length. The three sequence subvariants in this group are very close in overall ITS1 similarity ( Figure 3) but polymorphic in HVRI with a common CAATATA sequence starting at base 375. The geographical distribution of TIII was restricted to the central and northern Andean region ( Figure 4).
Subtype TIIIA with the ITS1 polymorphic region of length 563nt, is the most divergent of the three with a TAAAAA to (TAAAAA) 3 expansion starting at base 42 in the 5' poly(A) region, unique insertions in HVRI including an A (base 389) and TA (at bases 416 to 417), and a possible A expansion at base 577.
This sequence subtype characterizes A. fraterculus of the lower elevations of the Andean periphery in the Pacific plain and Andean foothills and valleys of Peru (generally less than ~1200 m in elevation), from Ica in the south (eradicated by SENASA south of Lima?) through western Ecuador to the north into at least SW Colombia. Specimens with this sequence were also seen from the eastern Andean valleys of Peru from the Kosñipata Valley of Cusco to at least Amazonas ( Figure 4).
The ITS1 sequence subtypes IIIB (length of polymorphic region 550nt) and IIIC (length 551nt) are very similar differing by 2 SNPs in the 5' poly(A) region, an SNP at base 210, a complex expansion ATA to AATATA in HVRI starting at base 381, and putative homopolymer length polymorphisms towards the 3' end of the ITS1 polymorphic region (bases 606 and 621). The geographical distributions of these subtypes overlap with scattered collection localities from the Cordillera de Mérida in Venezuela south to the Andes of Ecuador ( Figure 4) at elevations above ~1200 m. Specimens having these sequences were collected as larvae; so far no trapped adults have been seen. Larvae having the subtype IIIB sequence were collected from coffee (Coffea L. sp., Colombia) and toronja blanca, or pomelo (Citrus maxima (Burm.) Merr., Ecuador), and together with IIIC from Andean blackberry, (Rubus glaucus Benth.) in Venezuela.
The ITS1 sequence type TIV polymorphic region length is 553nt. This type is rather more generalized in ITS1 sequence within the A. fraterculus cryptic species complex, perhaps closer to the fraterculus species group groundplan than the others. Type TIV is characterized by a unique G interruption at base 99 in the 5' poly(A) region, unique SNPs at bases 133, 160, 341, and 419, and by a unique repeat expansion (ATT) 2 to (ATT) 3 in HVRII starting at base 510.
The geographical distribution of TIV in the northern Andes extends from the Cordillera de Mérida of Venezuela, the Cordilleras Central and Oriental of Colombia, south to at least the northern Andes of Ecuador ( Figure 4) generally above ~900 to 1000 m. In addition, specimens of A. fraterculus with this ITS1 sequence were collected in the vicinity of Echarate, in the southeastern Andean foothills of Peru at about 960 to 1400 m.

Discussion
The ITS1 sequence types were not randomly distributed and they present a geographical pattern that is generally consistent with previous concepts about the A. fraterculus cryptic species complex of the Andean region.
ITS1 sequence type TI is widespread in South America from at least SE Brazil west into northern Argentina and north in the eastern Andes to at least the Cusco region of Perú. Specimens having this ITS1 sequence type include those from the IAEA, IPCL colonies originating from Argentina (Tucumán, sampled 2014 and 2015) and Brazil (Vacaria, sampled 2010, and Piracicaba sampled 2014, 2015, as well as the published sequences of specimens from Paraguay (Lopes 2010, Lopes et al. 2013. Concurrently, this type seems identifiable with Anastrepha sp. 1 (Selivon et al. 2004) (=Anastrepha sp. 1 aff. fraterculus Yamada & Selivon 2001, Selivon et al. 2005) and morphotype 1 of Hernández-Ortiz et al. (2012).
Specimens of the A. fraterculus complex having ITS1 sequence type TII were restricted to Mesoamerica and northeastern South America and include specimens from the IAEA, IPCL, colony originating from México (Xalapa, sampled 2013). A specimen with this ITS1 sequence originated from the los Caracas, Venezuela collection (1988) on which the "Vz-Lowland" population of Steck (1991) was based. TII appears to correspond to the "Mexican" form of Hernández-Ortiz et al. (2012), and likely includes specimens from "lowland" Venezuela. The latter were apparently not compared with the Mesoamerican A. fraterculus by Hernández-Ortiz et al. (2012).
The identity of the group having the ITS1 sequence type TIII is more complicated. Subtype TIIIA appears to be well defined and includes specimens from the IAEA IPCL, Perú colony (sampled 2014) and can probably be identified with the Anastrepha sp. 4 aff. fraterculus of Selivon et al. (2004) and the Peruvian morphotype of Hernández-Ortiz et al. (2012). Anastrepha fraterculus having the TIIIB and TIIIC sequence subtypes, however, are of uncertain taxonomic status and identity. Specimens from the highland Venezuela A. fraterculus collections of Steck (1991) include those with TIIIB, TIIIC, and TIV ITS1 sequences; unfortunately, specimens analyzed in the earlier isozyme analysis were destroyed during the process preventing molecular analysis of the same individuals. In Colombia, both subtype IIIB and type IV ITS1 sequences were found in specimens reared from larvae collected from coffee near Bogotá (this project); specimens from an IAEA, IPCL colony from Tolima, Colombia (sampled 2015) include subtype TIIIC. Both subtypes and type IV are present in collections of A. fraterculus from the Andes of Ecuador as well. It is not clear if a polymorphic ITS1 species is present in the highlands of the northern Andes, or if multiple sympatric species exist. Only specimens having ITS1 sequence type TIV were seen in trap samples (ICA) from Colombia, as well as from the IAEA, IPCL colony from Colombia (Ibagué, sampled 2014), the latter now lost.
Specimens having the TIV ITS1 sequence pattern seem to fit the "Andean" form of Hernández-Ortiz et al. (2012), but it remains unclear how those with the TIIIB and TIIIC sequence types fit into this morphotypic scheme. Steck (1991) failed to find isozyme heterogenity in the same collections of A. fraterculus from the Cordillera de Mérida, Venezuela, having specimens with the TIIIB, TIIIC, and TIV ITS1 sequence types. This is consistent with ITS1 sequence polymorphism. Additional collections from this region will be required to clarify this.
These results indicate that ITS1 sequence patterns can help resolve taxonomic structure in the Anastrepha fraterculus cryptic species complex, at least at some level. It remains to determine the lower limits of this resolution, ie. if a finer level of taxonomic structure in A. fraterculus exists in the Andean region. In addition, the sampling here was not geographically exhaustive. Significant parts of the Andean region remain poorly sampled including, but not restricted to, Ecuador and Venezuela. Beyond the Andean region; moreover, the diversity of the Anastrepha fauna in general and A. fraterculus in particular, of Amazonia and Guayana remain poorly known.