Congruence between cytochrome oxidase I (COI) and morphological data in Anuraphis spp. (Hemiptera, Aphididae) with a comparison between the utility of the 5’ barcode and 3’ COI regions

Abstract The discrimination of species in the genus Anuraphis is particularly difficult due to the overlap of morphological characters. In this study, we used the 5’ (barcode) and 3’ regions of cytochrome oxidase I (COI) to test their utility in the identification of species in this genus as well as closely related species. Both regions were useful to discriminate all the species tested. However the non-barcode 3’ region resulted in higher resolution and support for species relationships when the data were analyzed using both Maximum Likelihood and MrBayes. We propose the development of an integrated database that encompasses morphological, molecular, life-cycle, host plant and bibliographic information to facilitate and increase the accuracy of aphid identification.


Introduction
Aphids are sap-sucking insects. Currently there are 5012 valid species (Favret 2014) associated with plants belonging to various botanical groups. Many species have a heteroecious life cycle that includes alternating between a primary host plant (usually a tree) and a secondary host (usually an herbaceous species). The genus Anuraphis Del Guercio presently ascribed to the tribe Macrosiphini includes a small number of taxonomically welldefined species, A. subterranea (Walker, 1852), A. farfarae (Koch, 1854), A. catonii Hille Ris Lambers, 1935, A. pyrilaseri Shaposhnikov, 1950 Stroyan, 1982, A. ferulae Shaposhnikov, 1995 andA. shaposhnikovi Cocuzza, 2003. In addition, Remaudière andRemaudière (1997) reported four other nominal species (i.e., A. capparidis Nevsky, 1929, A. cortusae Nevsky, 1929, A. floris Monzen, 1934and A. katsurae Shinji, 1952. However, the generic placement of A. capparidis has been questioned by Blackman and Eastop (2006) who noted that, based on the original description, this is probably not an Anuraphis species but an immature Aphis sp. The recognized Anuraphis species are distributed in the Ponto-Mediterranean area of the western Palaearctic region. A common trait of almost all Anuraphis species is the use of Apiaceae as host plants, with the exception of A. farfarae that feeds on Asteraceae (Tussilago, Petasites and Hieracium). Some populations of A. subterranea, A. pyrilaseri, A. farfarae and A. catonii have been shown to be heteroecious holocyclic with Pyrus spp. (Rosaceae) as primary host plants (Shaposhnikov 1951;Kolesova 1972;Lampel and Meyer 2007). However, some populations of A. farfarae (Shaposhnikov & Sharov, 1978), and probably other species, are solely anholocyclic on secondary host plants. For A. cachryos, A. shaposhnikovi and A. ferulae the primary host plants remain to be determined.
A. farfarae (pear-colt's foot aphid) and A. subterranea (pear-hogweed aphid) have been reported in the literature as pests of pear, where they cause direct damage to young foliage in spring (Kolesova 1972). However, damage due to their infestation has a negligible effect on production (Alford 2014).
All species belonging to the genus Anuraphis are morphologically similar to each other but easily discriminated from other genera. The main morphological features of the genus are an almost flat frontal profile, as a result of the minimally developed antennal tubercles, and a short cauda. Moreover, Anuraphis shares with a few other genera of Macrosiphini a typical spinulose ornamentation of siphunculi and a welldeveloped, often almost complete set of dorsal tubercles (both marginal and spinal). However, as already reported for other groups of aphids, the morphometric similarity among Anuraphis species leads to an overlap that renders their discrimination to species level difficult (Stroyan 1984;Heie 1986). Barbagallo and Cocuzza (2003) published a morphological key to discriminate viviparous morphs (for both apterae and alate) of Anuraphis species and a discriminant function to separate A. subterranea and A. shaposhnikovi. However, the discrimination of A. subterranea and A. shaposhnikovi using only morphological characters requires the skills of an experienced researcher, especially when specimens are collected on primary host plants or when the secondary host is unknown.
In some genus (e.g. Aphis), a recurrent and difficult problem in using only morphological characters to identify aphids is that for many species there are insufficient diagnostic characters, resulting in their identification being partially based on host plant association and life cycle characteristics (Stroyan 1984;Heie 1986). However, due to incomplete and/or missing knowledge of many aphid/plant associations, the use of this criterion to identify aphid species, could lead to misidentification (Stroyan 1984;Coeur d'acier et al. 2007). Many studies have used the 5' region of the cytochrome oxidase I gene (COI), more commonly referred to as the DNA barcode region, as a useful tool to discriminate various groups of insects (Hebert et al. 2003a, b, Deng et al. 2012Derocles et al. 2012;Williams et al. 2012;Julsirikul et al. 2013), including aphid species (Coeur d'acier et al. 2008;Foottit et al. 2008Foottit et al. , 2009aMiller and Foottit 2009;Wang and Qiao 2009;Kim et al. 2010;Lee et al. 2011;Zhang et al. 2010Zhang et al. , 2011Wang et al. 2011;Chen et al. 2013;Massimino Cocuzza and Cavalieri 2014). However, especially in some insect groups such as Aphididae, the DNA barcode region, due to low genetic diversity at this marker, was no more informative than morphological characters (Foottit et al. 2008;Lee et al. 2011). For instance, results obtained using the COI barcode region with adelgids were inadequate for the purpose of discriminating species that were morphologically indistinguishable or belonged to a species-complex (Žuroková 2010). Other studies have shown that the COI barcode region discriminated 96% of aphid taxa tested (Foottit et al. 2008).
Ideally the description of a species should result from a synthesis of information that encompasses morphological, molecular, biological, biogeographical, physiological, ecological and bibliographical data (Dayrat 2005;De Salle 2006;Waugh 2007;Padial et al. 2010;Taylor and Harris 2012), however, this compendium of information is lacking for the great majority of species.
This study was undertaken to improve the current taxonomic knowledge of the various taxa belonging to the genus Anuraphis by testing the utility of the COI gene, specifically comparing the widely used barcode 5' region with the much less studied 3' region, as a molecular tool for their identification. A further goal is to compare the results obtained with the COI gene to those previously published using only morphological characters (Barbagallo and Cocuzza 2003).

Materials and methods
This study was conducted with seven species (Table 1) belonging to the genus Anuraphis. Unfortunately, it was not possible to include A. ferulae, a species recorded only from Tajikistan on Ferula sp. When possible, species were collected in different geographic locations and on different host plants. Taxonomic nomenclature follows Remaudière and Remaudière (1997). Two samples of Nearctaphis bakeri (Cowen, 1895) were included in the analysis. The genus Nearctaphis is considered the vicariant (or sister) Nearctic relative of Anuraphis, from which it differs morphologically due to the lack of spinal tubercles, and biologically by the use of Malus sp. as a primary host plant  Table 1. For each sample, 5-6 apterae and alate individuals were slide-mounted for morphological identification. Specimens were morphologically identified by S. Barbagallo using characters in the keys provided by Heie (1992), Barbagallo and Cocuzza (2003) and Blackman (2010). Specimen slides are stored in the Aphididae collection of S. Barbagallo (Department of Agriculture, Food and Environment, University of Catania). Whole aphid specimens for DNA sequencing were stored in 95% ethanol at -20 °C, those used for morphological observations were stored in 70% ethanol and at room temperature.
Total genomic DNA was extracted by macerating entire single individuals using the DNeasy Blood & Tissue kit (Qiagen®, Hilden, Germany) in 50 µl of extraction buffer and stored at -20 °C. To compare the utility of the 5', barcode region, and the 3' region of COI we amplified the following regions: for the 5' end, a 600 bp region using primers LCO1490 and HCO2198 (Folmer et al. 1994, widely used on a variety of organisms as well as aphids (Hebert et al. 2003, Coeur D'acier et al. 2008Kim et al. 2010;Lee et al. 2014), for the 3' end, a 648 bp fragment using primers C1-J-2195 and TL2-N-3014 (Simon et al. 1994), found to be informative in several aphid studies (Coeur d'acier et al. 2008;Massimino Cocuzza and Cavalieri 2014). PCR reactions were performed using 8.5 µl of buffer premix 2x F (FailSafe tm PCR Premix Selection Kit -Epicentre Technologies) 1 µl of each primer (10 µM), 0.5 µl Taq polymerase (Life Technologies) and 2 µl DNA template (quantified in 6-18 ng/ µl) in a total volume of 21 µl. The cycle conditions for primer set LCO1490 and HCO2198 was 94 °C for 3 min (initial denaturation), followed by 35 cycles of 94 °C for 30 s (denaturation), 48 °C for 1 min (annealing) and 72 °C for 1 min (extension). Primer set C1-J-2195 and TL2-N-3014 conditions were 96 °C for 5 min (initial denaturation) and 35 cycles of 96 °C for 5 s (denaturation), 45 °C for 1 min (annealing), 72 °C for 1 min (extension). PCR products were run in 1.6% agarose gels stained with Syber Safe DNA gel stain (Life Technologies). PCR products were sequenced at BMR genomics (Padua, Italy) or at the W. M. Keck Center at the University of Illinois (Urbana-Champaign, IL) and run on an ABI PRISM 3730XL DNA analyzer (Life Technologies Corporation, Carlsbad, CA, USA). For each sample 2-8 individuals were sequenced, and one representative sequence for each sample was subsequently chosen. Sequences of Anuraphis available in Genbank and or BOLD databases were utilized in the analysis and are identified in Table 1 by their accession number.
The COI sequences were edited manually using BioEdit (Hall 1999) or Sequencher v. 5.0 (GeneCodes Corporation, AnnArbor, MI, USA). Nucleotide sequences were translated using EPoS (Griebel et al. 2008) to check for stop codons (Zhang and Hewitt 1996). Sequence divergences were calculated using the p-distance model as suggested by Srivathsan and Meier (2012), and a neighbour-joining (NJ) tree (Saitou and Nei 1987), as implemented in MEGA 6 (Tamura et al. 2011), was used to visualize the distance matrix among taxa and population samples. The Bayesian phylogenetic analysis was conducted using Mr.Bayes v 3.2.1 (Ronquist et al. 2012) implementing the GTR + I model of sequence evolution selected by JModel test 2.1.4 (Posada 2008) based on the Akaike information criterion (AIC). Beginning with random trees, four independent runs with four Markov chains were run for 25,000,000 generations. Bayesian trees were sampled every 1000 th generations. All other parameters were set at default. Convergence was assessed using TRACER 1.6 (Rambaut et al. 2014) using a 25% burn in value. Posterior probabilities (pp) and the consensus trees were computed in MrBayes. The Bayesian analysis was run on the CIPRES Science Gateway (Miller et al. 2010). A maximum likelihood analysis was also performed using RAxML v. 8 (Stamatakis 2014) with the GTR +I model; clade support for the maximum likelihood tree was determined in RAxML by bootstrap, based on 1000 pseudoreplicates.

Results
COI was easily amplified for all specimens analysed using the primers indicated above. No frame shift or premature stop codons were detected.
The five prime end (5') constituted a 601 base pair (bp) fragment. With total bp frequencies of 75.3% for A/T and 24.7% for G/C. These latter results concur with those found for other aphid species (Shufran et al. 2000;Wang et al. 2011). The 5' end showed that there were 533 conserved and 125 variable nucleotides with 92 of the latter being parsimony informative. The overall average distance for the 5' end of the COI gene was 5.8, ranging from 0 (samples within a species) to 11.7 across species.
The three prime end (3') sequences analysed consisted of 648 bp with frequencies of 74.9% A/T and 25.1% G/C. The 3' end showed that there were 521 constant and 127 variable sites of which 111 were parsimony informative. The percentage of variable sites was slightly higher for the 3' (19.6%) than the 5' end (18.99%).
The clade including A. farfarae and A. pyrilaseri shows a genetic distance between the two species of 3.2% when using the 3'end and 1.7% when using the 5' end of COI. The various samples of A. farfarae were highly similar, regardless of host plant, locality and COI region examined. Similarly, the populations of A. pyrilaseri showed low genetic variability (0.3%). Differences in body colour, possibly due to host plant effects, as well as differences in dorsal abdominal sclerotisation, do not correlate with the low genetic diversity observed with the COI gene. The various samples of A. subterranea showed no genetic differences, regardless of their geographic origin, host plant or COI region used for the analysis. Genetic difference (3.7% with 3' and 4.7% with 5' region) between A. subterranea and A. shaposhnikovi clearly distinguishes the two species, despite the small morphological differences observed (length of ultimate rostral segment and number and distribution of abdominal spinal tubercles). A. shaposhnikovi and A. catonii showed the lowest genetic divergence (<1%) regardless of the COI region considered. However, while with 5' COI barcode showed a pairwise distance of 0.2%, the 3' region showed a difference of 0.8%.
A result similar to the one based on COI was found using a multivariate discriminant analysis with 16 morphometric characters (Barbagallo and Cocuzza 2003) and graphically as Mahalanobis' generalized distance (Fig. 3). The dendrogram indicates a distinction of A. subterranea and A. shaposhnikovi, and the similarity between the latter species and A. catonii.

Discussion
The molecular analysis based on the 3' and 5' COI gene regions indicates that the genus Anuraphis is a homogeneous taxonomic group. However, COI also provides information to distinguish the taxa at the species level as evidenced by the level of support, 89% bootstrap or more, on the likelihood tree (Fig. 2a). Thus, the analysis using COI confirms the species delimitation concepts previously reported using a multivariate analysis of morphological features (Barbagallo and Cocuzza 2003). The division of Anuraphis species in two groups (one clade consisting of A. farfarae and A. pyrilaseri, a second clade including A. subterranea, A. cachryos, A. shaposhnikovi and A. catonii) is easily observable by comparing the phylogenetic trees and Mahalanobis' generalized distance. The COI-based molecular analysis permitted a better discrimination of A. shaposhnikovi and A. subterranea than the multivariate analysis based on morphometric features. It is useful that the COI gene can also differentiate A. subterranea and A. catonii, because the taxonomic status of the latter species has been questioned. Hille Ris Lambers (1935), regarded A. catonii as a subspecies of A. subterranea. The only morphological difference between A. subterranea and A. catonii noted by Stroyan (1950) was in the number of secondary rhinaria on the antennae of alatae, more numerous in the former species. However, Blackman (2010) has reported other morphological differences between these two species, both in apterae and alatae. Biologically, it has been  shown that when transferred to Pastinaca sativa, the nymphs of A. catonii can reach adulthood (Stroyan 1959);conversely, Shaposhnikov (1951) observed that nymphs of A. catonii transferred from pear survive on Pimpinella sp. but not on Pastinaca sativa. A further intricacy was the recovery by Kolesova (1972) of a sample of A. catonii on P. sativa, although this could be a case of misidentification. Barbagallo and Cocuzza (2003) reported that A. shaposhnikovi, collected on Magydaris pastinacea has slight morphological differences from those developing on Opopanax chironium, (i.e., the length of the last rostral segment and the number of abdominal spinal tubercles). The putatively fixed nature of the morphological differences is confirmed by the COI analysis and can be the result of intraspecific variability and possibly geographic isolation, since M. pastinacea occurs in very restricted areas of Sicily and Sardinia. Another interesting observation is the low genetic divergence observed between A. catonii and A. shaposhnikovi, a similarity already evidenced in the morphological analysis (Barbagallo and Cocuzza 2003). These species may have diverged recently from a common ancestor as a result of differences in the habitats of their respective host plants. The genus Pimpinella is typical of herb-rich areas and wooded pastures, whereas O. chironium prefers uncultivated dry land with a Mediterranean climate (Pignatti 1982). The phenomenon of host-races as a first step leading to speciation has been repeatedly observed in phytophagous insects (Drès and Mallet 2001) and is common in aphids (Sunnucks et al. 1997;Margaritopoulos et al. 2007), especially in populations that have partially or totally lost the sexual generation in favour of continuous parthenogenetic reproduction. Host-plant use may represent a food resource niche that favours the speciation process of species in sympatry (Peccoud et al. 2010). Moreover, low genetic diversity at the COI level is typical of taxa with recent ecological divergence (Jimbo et al. 2011) and can explain the low genetic divergence (<1%) reported in some aphid groups (Foottit et al. 2008;Lee et al. 2011;Massimino Cocuzza and Cavalieri 2014). Lee et al. (2014) found that the COI barcode region was not helpful in the identification of 7% of the aphid species they examined. This lack of resolution could be resolved by the development of additional molecular markers with higher diversity, leading to greater accuracy in species identification (Lozier et al. 2009;Sano and Akimoto 2012;Chen et al. 2013;Lee et al. 2014  case of A. catonii and A. shaposhnikovi the genetic difference, albeit low, was consistently observed in all samples analysed. We observed a difference in genetic distances when using the 5' barcode or the 3' regions of COI. Most "barcode" studies on aphids are carried out using the 5' region of COI that has produced some ambiguous results (Foottit et al. 2008;Žuroková et al. 2010;Lee et al. 2011). This study demonstrates that in Anuraphis the 3' COI region has a higher capacity of discrimination. In the case of A. catonii and A. shaposhnikovi the difference recorded with the 3' (0.8%) and 5' regions (0.2%) is crucial, especially when considering that a distance of 0.5% in aphids is usually considered as the "borderline" between species (Massimino Cocuzza and Cavalieri 2014; Rakauskas et al. 2014). However, low genetic difference in species that are morphologically different is not an unknown phenomenon in aphids. For example, despite Aphis hederae Kaltenbach, 1843 and Aphis newtoni Theobald, 1927 having well-defined morphological and biological differences, they have a low interspecific divergence (0.17%) in the 5' COI region (Lee et al. 2014). The genetic results observed here in Anuraphis spp. closely mirror previous morphometric findings. The lack of appreciable differences in morphological characters is a phenomenon well known in various groups of aphids (Stroyan 1984;Foottit 1997;Wang et al. 2011) and this peculiarity can easily lead to the misidentification of species (Coeur d'acier et al. 2007). Because of this difficulty, there is a need for methods of investigation that can be used in conjunction with classic morphometric analysis. Confirming the finding of previous studies on aphids (Foottit et al. 2008;2009c), the present study indicates that the COI gene may significantly aid in the correct identification of aphid species, especially in cases where morphological characters are insufficient to clarify taxonomic status. Morphometrics and the COI gene can be used in parallel to improve the discrimination of aphid species. However, an identification-integrated system that links molecular data, morphological features, life cycle, host plant, photos (in vivo and on slides) and a bibliography for each aphid species would further facilitate and improve the accuracy of aphid species determination.