Revealing the biodiversity of Chilean birds through the COI barcode approach

Abstract The mitochondrial cytochrome c oxidase subunit I (COI) gene is an effective molecular tool for the estimation of genetic variation and the identification of bird species. This molecular marker is used to differentiate among Chilean bird species by analyzing barcodes for 76 species (197 individuals), comprising 28 species with no previous barcode data and 48 species with sequences retrieved from the BOLD and GenBank databases. The DNA barcodes correctly identified 94.7% of the species analyzed (72 of 76 species). Mean intraspecific K2P distance was 0.3% (range 0–8.7%). Within the intraspecific divergence range, three species, Phrygilus gayi, Sephanoides sephanoides and Curaeus curaeus, showed relatively high intraspecific divergence (1.5–8.7%), possibly due to the presence of a species complex or geographic isolation of sub-populations. Mean interspecific K2P distance was 24.7% (range 1.3–43.5%). Consequently, the intraspecific K2P distance showed limited overlap with interspecific K2P distance. The mean intraspecific divergence in our study was similar to that found in temperate regions of South America (0.24%). However, it was approximately one order of magnitude lower than values reported for bird species in tropical regions of northern South America (1.8–2.13%). This result suggests that bird species from Chile show low levels of genetic structure and divergence. The small overlap between intra- and inter-specific distances implies that COI barcodes could be used as an effective tool to identify nearly all the Chilean bird species analyzed.


Introduction
Birds are among the animal groups that have been subjected to extensive DNA barcoding. Currently, DNA barcodes are publicly available for 41% of known bird species in the world, with data for nearly 4300 species from 37 of 39 recognized avian orders (Barreira et al. 2016). The reported accuracy of species-level DNA barcoding is 93%-99% for birds, confirming its efficacy in discriminating among avian species and its potential as a tool for assigning unknown avian individuals to a species. The accuracy of this method in birds is attributable to fact that the maximum intraspecific distance is typically smaller than the minimum interspecific distance in these species. This so-called barcode gap (Meyer and Paulay 2005) means that the COI gene has the power to delineate species boundaries.
In recent years, a number of DNA barcoding studies have assessed the efficacy of using COI data to identify South American birds. Analyses of hundreds of bird species from countries in this region, such as Argentina (Kerr et al. 2009), Brazil (Chaves et al. 2015), and Ecuador and French Guiana (Milá et al. 2012), have shown that COI sequences are highly accurate for species-level identification (93-98%). Findings for birds from other geographic areas show similar levels of accuracy (Barreira et al. 2016). Moreover, these studies reveal deep intraspecific genetic divergence in Neotropical birds, likely associated with a more complex pattern of regional divergence than in the North American avifauna (Tavares et al. 2011;Milá et al. 2012). Thus, in some birds, intraspecific differences overlap with interspecies differences, especially in populations that include multiple sub-species or in samples from large geographic areas, ecoregions, or areas of endemism (Tavares et al. 2011). Therefore, the current taxonomy likely underestimates the biodiversity of the Neotropical avifauna. Given the complex genetic divergence patterns reported in this region, likely associated with a high incidence of non-monophyletic species, the general utility of DNA barcoding across different biogeographic regions of South America merits further attention.
The Chilean avifauna comprises of 443 species, if we only consider species that are residents or regular visitors (Couve et al. 2016), or those that meet the criteria of at least five records in the national territory (Martínez-Piña and González-Cifuentes 2017). These birds belong to 65 families distributed across broad altitudinal (0-6000 m) and latitudinal (18°S-56°S) gradients in various ecoregions of the country. This avifauna represents around 13% of all Neotropical bird species, estimated at 3370 species (Newton 2003). The Chilean avifauna is characterized by a fusion of South American relic elements (e.g., Pygarrhichas and Furnariidae) of Cenozoic origin with North American elements (e.g., members of the genus Phrygilus) that arose during the Pleistocene (Díaz 2005;González and Wink 2008;Álvarez-Varas et al. 2015). During this period, glacial cycles profoundly affected species distribution and gene flow patterns in populations throughout the world (Shepard and Burbrink 2009). Some studies have shown that Pleistocene climate fluctuations may have altered the distributions, sizes, and genetic structures of avian populations (Lovette 2005;van Els et al. 2012;Lougheed et al. 2013). Most of the elements that compose the Chilean avifauna originated during the Pleistocene, due to population differentiation events that occurred in paleorefuges in the Altiplano, central Chile, Patago nian temperate forests, and other areas (Cracraft 1985;Vuilleumier 1991;Díaz 2005;Masselo et al. 2011). These geological processes, along with the geographic isolation of Chile imposed by geographic barriers such as the Andes, the Pacific Ocean, and the northern desert, resulted in depleted species richness and a high level of endemism in relation to the size of this geographic area (Kelt et al. 2016). For example, around of 2.3% of terrestrial avifauna of Chile are endemic species (Kelt et al. 2016) that are largely restricted to certain ecoregions such as the Atacama Desert, Mediterranean forests, Valdivian temperate rain forests, Patagonian steppe, and dry Puna (Jaramillo 2014). Of note is that natural selection and local adaptation mechanisms associated with this geographic isolation seem to have also played an important role in the diversification of Chilean avifauna along the Andes mountains (Campagna et al. 2011;Álvarez-Varas et al. 2015).
Little is known about the population structure and genetic diversity of the taxa that currently compose the diversity of birds in Chile (Colihueque and Gantz 2019). Recent molecular studies of Chilean birds using COI sequences have provided insight into phylogenetic, phylogeographic and taxonomic issues (Masello et al. 2011;González 2014;Álvarez-Varas et al. 2015;Colihueque et al. 2015). This molecular tool also offers the opportunity for an in-depth evaluation of the genetic differentiation within and between Chilean bird species and to test its species-level resolution for bird identification. The particular evolutionary history of Chilean birds, associated with glacial events and isolation imposed by strong geographic barriers, has likely affected the gene flow and genetic variability of many species. Therefore, this approach may provide clues that would clarify, for example, the taxonomic status of various species, divergence patterns across different ecoregions, and the speciation process in Chile. Here, we examine the pattern of barcode divergence in a significant proportion of Chilean bird species, based on new COI sequences and sequences previously published in the Barcode of Life Data Systems (BOLD) (http://www.barcodinglife.org/) and GenBank databases.

Sampling
We obtained samples from across central and southern Chile, mainly from the Cachapoal (34°S) and Osorno, Ranco, and Valdivia provinces (40°-41°S). Samples corresponded to dead birds found along the highways which were collected between 2012 and 2019 by volunteers and the authors; date and site of collection (at least at province level) were recorded immediately. Collection sites were georeferenced using Google Earth based on locality names. Information about vouchers and collection sites is available in Suppl. material 2: Table S1. The author A. Gantz, given his broad expertise in ornithology, performed the species identification according to standard diagnostic criteria based on morphology and feather coloration. Identification guides of Chilean birds (Jaramillo 2014;Couve et al. 2016) were also used to assist in the identification task. After identification, the specimens were photographed, and deposited in the bird collection of the Laboratorio de Biología Molecular y Citogenética of the Universidad de Los Lagos ( Table S1). These specimens are publicly available for further investigation. A set of photographs of representative analyzed specimens is provided in Suppl. material 1. Tissue samples were taken mainly from the pectoral and femoral muscles as the size of these muscles facilitates dissection. Tissue samples were then fixed in 80% ethanol. DNA was extracted from the fixed muscle tissue using the phenolchloroform method, as described in Taggart et al. (1992). Extracts were standardized at 100 ng/μL using Tris-EDTA buffer, pH 8.0. To obtain COI sequences of birds from Chile, we searched the BOLD Public Data Portal using the search terms [Aves Chile]. Subsequently, the recovered records were assessed for correct collection site (i.e., Collected in: Chile) and also by checking the correct geographic coordinates within Chile using Google Earth. The same verification process was used for COI sequences recovered from GenBank. The addition of these sequences enhances assessment of COI gene variability at the intraspecific and interspecific levels. Based on the location of collection sites, some species cover a wide latitudinal range across Chile. For example, Thinocorus orbignyianus covers a range from Tarapaca (20°S) to Santiago (33°S) (ca. 1700 km), and Vanellus chilensis from Santiago (33°S) to Magallanes (53°S) (ca. 2200 km). Species nomenclature follows the Clements et al. (2016) taxonomy. We analyzed 116 individuals from 42 species. In total we obtained sequences from 68 individuals of 32 species, including 28 species not previously barcoded. Lists of the newly sequenced specimens with those from BOLD and GenBank, as well as information about vouchers and collection sites, are available in Suppl. materials 2 (Table S1) and 3 (Table S2), respectively.

Data analyses
The sequences obtained were aligned and edited using GENEIOUS 4.0.2 software (Biomatters Ltd.). Base substitution saturation, a phenomenon that may decreases the amount of phylogenetic information contained in a sequence dataset, was tested based on the index of substitution saturation (ISS) (Xia et al. 2003), which assumes a critical index of substitution saturation (ISSc) that defines a threshold for significant saturation in the data. This analysis was performed using DAMBE v. 5.3.105 (Xia 2013). For all sequence comparisons, the Kimura 2-parameter distance model (K2P) (Nei and Kumar 2000) was used. This metric was chosen because its performance for species identification is equivalent to other models (Collins et al. 2012). This genetic distance was calculated using MEGA 5.05 software (Tamura et al. 2011). Pairwise sequence divergence was calculated separately for intraspecific and interspecific distances as well as for intrageneric comparisons. Mean intraspecific distances were calculated for species that were represented by at least two specimens. Intrageneric K2P genetic distances were calculated based on at least two species for a particular genus. The best-fit nucleotide substitution model was determined using jModelTest 2.1. (Darriba et al. 2012) based on Bayesian Information Criterion (BIC). The best model was then used with maximum likelihood (ML) analyses to construct a ML tree. The consistency of topolo-gies (nodal support) was estimated using a bootstrap approach with 1000 bootstrap replications (Felsenstein 1985). Phylogenetic trees were rooted with one representative of Great Tinamu (Tinamus major). To include all sequences developed in this study and as many as sequences as possible from BOLD and GenBank, we used 443-bp sequence length alignments. A species was considered distinguishable by DNA barcode if: a) it was monophyletic (i.e., the species formed a single cluster) and b) it did not share a barcode with any other species.

Sequences dataset and model of nucleotide substitution
Barcodes were analyzed for a total of 76 unique species (197 individuals), including 32 species sequenced in this study (68 individuals) and 48 species (129 individuals) from BOLD and GenBank (see Suppl. material 2, 3 -respectively). These sequences represent 17.2% (76/443) of known species from Chile. This dataset comprised 36 families. The COI sequences of new specimens ranged in size from 464 to 750 bp, with a mean length of 655.3 bp. The average number of sequences per species was 2.6 (range 1-17), with 43 species (56.6%) presenting between 2 and 17 sequences. After aligning the 197 sequences, a dataset of 443-pb sequence length were obtained. For this alignment, 193 variable positions and 182 parsimony-informative sites were found. Overall nucleotide frequencies were: A (24.7%), T (26.5%), C (33.3%) and G (15.4%). The best fit-model of nucleotide substitution was Hasegawa-Kishino-Yano model (HKY) with a fraction of invariable sites and gamma distribution (HKY+I+G) (BIC value = 22738.5785). No evidence of base saturation was found, since the ISS value (0.219) was significantly lower (T = 15.4, d.f. = 443, P < 0.0001) than the observed ISS.c value (0.696).

Species identification
The COI barcode correctly identified 94.7% of the species studied (72 of 76 species). That is, these 72 species had unique DNA barcodes that did not overlap with the barcodes of any other species. Interspecific K2P distance ranged from 1.3 to 43.5% (mean 24.7%). In most cases, the ML tree, as shown in Figure 1, reflected a relatively low within-species divergence as compared to between-species divergence. Most of the terminal groups included specimens of the same species in a single cluster, as expected for samples of the same species in a monophyletic group. Bootstrap values were above 80, except for Curaeus curaeus, Phrygilus gayi and Phrygilus plebejus (Figure 1). The phylogenetic analysis showed a close phylogenetic relationship in the Phalacrocorax and Spheniscus genera, i.e., some species clustered together within the same branch. For example, P. atriceps clustered together with P. magellanicus and S. humboldti grouped in the same cluster with S. magellanicus. In addition, these species pairs exhibited a very similar COI barcode given that the K2P distance was relatively low (3.8% and 2.2%, respectively). These clusters were relatively well-supported (86-90% bootstrap support), we considered that this result may suggest the occurrence of reciprocal nonmonophyly. However, this conclusion should be treated with caution due to the low number of COI sequences analyzed and, therefore, further analysis of additional samples will be necessary to support this result.

Comparisons with previous studies on Neotropical birds
The mean intraspecific divergence in our study (0.3%) was ca. one order of magnitude lower than values reported for bird species in tropical regions of northern South America (1.8 and 2.13%), reported by Chaves et al. (2015) and Milá et al. (2012), respectively. The mean intraspecific divergence in our study was also approximately one-third lower than the value reported by Tavares et al. (2011) (0.9%) for birds  distributed throughout South America, including temperate areas of Argentina. However, as shown in Figure 4 our result was similar to the value reported for birds in temperate regions of South America, particularly from Argentina (0.24%) (Kerr et al. 2009). Moreover, our analysis revealed that only 7% of the intraspecific K2P distances exceeded 1.5% K2P, similar to reported findings for Argentine birds, in which 5.4% of species analyzed had a maximum intraspecific distance above 1.5% (Kerr et al. 2009). However, the proportion of species with intraspecific divergence above 1.5% recorded in this study was lower than that reported for birds distributed throughout South America (18.6%) (Tavares et al. 2011).

Discussion
The mean intraspecific distances recorded in this analysis of Chilean birds (0.3%, based on 43 species) are largely consistent with other reported values for birds in temperate regions of South America, particularly Argentina (0.24%) (Kerr et al. 2009). However, this value was approximately one order of magnitude lower than those found for bird species in tropical regions of northern South America, such as in Brazil (1.8%) (Chaves et al. 2015) and in Ecuador and French Guiana (2.13 %) (Milá et al. 2012). Thus, our results, along with those of Kerr et al. (2009), indicate that birds in temperate regions of South America show less intraspecific genetic variation than birds from tropical regions of the continent. As in other studies, in our case the COI sequence was highly accurate for species-level identification of birds (94.7%), due to a marked barcoding gap. This figure is consistent with previous data indicating that this mitochondrial DNA marker can typically identify 93% or more of the bird species distributed throughout the world, including birds in South America and the northern hemisphere (Barreira et al. 2016). This level of resolution represents a good performance for a single genetic marker. Therefore, the COI sequence may provide a cost-effective tool for screening of biodiversity. DNA barcoding studies of birds from temperate regions of South America have reported that a relatively small number of species show deep divergence. For instance, Kerr et al. (2009) reported that only 21 of 389 species from Argentina (5.4% of the species studied) showed a maximum intraspecific distance above 1.5%. On the other hand, bird species from tropical regions of South America show substantial genetic divergence. For example, Chaves et al. (2015) reported that 11.6% of Brazilian species had an intraspecific distance above 8.1%. Milá et al. (2012) observed that 75% of species from northern South America had a mean intraspecific divergence above 1%, and more than 50% of species had intraspecific lineage distances above 3%. In our case, few of the species studied (7%) showed a mean intraspecific distance higher than 1.5%. Given that divergence analyses may be affected by various factors such as incomplete or geographically restricted sampling (Meyer and Paulay 2005), further analysis is needed to corroborate our results. Nevertheless, our data are consistent with a previous report on birds from temperate regions of South America (Kerr et al. 2009), supporting the notion that bird species in these regions show low levels of genetic structure and divergence as compared to those from tropical regions. Previous COI studies of Chilean birds that focused on phylogenetic relationships also have also produced evidence of limited intraspecific divergence. For example, three species from different orders exhibited only slight intraspecific genetic divergence, such as Cyanoliseus patagonus (Masello et al. 2011), Aphrastura masafuerae (González 2014), and Tyto alba (Colihueque et al. 2015). Studies based on different molecular markers (ISSR markers) have found similar patterns; Aphrastura spinicauda, for instance, showed low levels of genetic diversity among populations distributed across the country (González and Wink 2010).
The limited numbers of bird species with deep divergence in temperate regions of South America may reflect distinct a pattern of regional biodiversity in comparison with tropical avifauna. Alternatively, as noted by Weir and Schluter (2007), the divergence pattern of birds distributed throughout the Americas may reflect a general latitudinal gradient in species diversity. Under this scenario, it would be expected that birds throughout the continent at higher latitudes (toward the poles) would tend to show lower levels of intraspecific divergence than those at lower latitudes (toward the equator). Factors involved in this divergence pattern may include different extinction and speciation rates in the two regions (Weir and Schluter 2007) and greater intraspecific genetic variation in tropical regions (Smith et al. 2017). Thus, the low level of genetic divergence recorded in our study could be related to a general pattern for birds that inhabit the temperate regions of South America. However, it should also be noted that southern South America (above 38°S) was affected by strong glaciation processes in the Quaternary (Hulton et al. 2002). The occurrence of these processes is thought to have impacted various species in southern Chile, including the avifauna (Vuilleumier 1991), by fragmenting their geographical distribution (Villagrán 1990). Therefore, the incidence of local phenomenon related to such climatic fluctuations may also be involved. During glaciation, species survived in refuges and then recolonized sites after resolution of glacial events. As a result, genetic drift, founder effect, or selection may have modified the genetic structure of many species. In the boreal avifauna, the literature suggests that repeated Pleistocene glaciations events may have produced a rapid rate of diversification (Weir and Schluter 2004). Although little is known about the glaciation process associated with the genetic variation of the Chilean avifauna, recent studies suggest that glaciation impacted the genetic structure of specific bird species such as ovenbirds (González and Wink 2010). This notion is based on the finding that within this species, populations currently inhabiting paleorefuge sites show greater genetic variation than populations located in regions that were covered by ice sheets during the Last Glacial Maximum 21,000-14,000 years ago (González and Wink 2010). In addition, the varied genetic patterns of Phrygilus species are also consistent with the environmental history of southern South America, including vicariant events and climate changes (Campagna et al. 2011;Álvarez-Varas et al. 2015). Further sampling from a wider latitudinal range will be necessary to produce a more complete view of the genetic structuring and divergence pattern of the bird species analyzed in this work, especially those that include a southern or Patagonian distribution. However, the low levels of genetic variation observed in some species with Patagonian distributions, such as Vanellus chilensis and Charadrius alexandrinus (currently known as Charadrius nivosus) would reveal a genetic structure shaped by glaciation. This conclusion is concordant with the evolutionary history of several Patagonian bird species, whose speciation processes were closely associated with Pleistocene glaciations (Vuilleumier 1991).
A practical utility of DNA barcoding lies in the use of divergence values as a preliminary screening of taxonomic diversity, for example, to screen for within-species divergence. Future work can follow up by examining unusual cases, especially species that show deep divergence. This approach may be useful in understanding the genetic structure of Curaeus curaeus, Phrygilus gayi and Sephanoides sephanoides, as these two species showed the largest intraspecific distances among the species analyzed (above 1.3%). A possible interpretation of this result is that these distances reflect a species complex. In fact, the maximum likelihood tree indicated that P. gayi individuals formed at least 3 clusters rather than a cohesive unit, suggestive of different lineages. Álvarez-Varas et al. (2015) have also noted this type of divergence pattern in this species, characterized by genetic clusters associated with different distribution ranges among northern Chilean populations (lowlands vs. Altiplano or highlands). Deep intraspecific divergence in birds may be attributable to the dispersal capacity of the species and/or the presence of barriers to gene flow and environmental heterogeneity. In the case of P. gayi, the available evidence indicates that its genetic structure appears to be attributable to ecological factors and a limited dispersal capacity of this species than to geographical factors per se, as compared to other species of Phrygilus (Álvarez-Varas et al. 2015). For S. sephanoides, our finding contrasts with data from a phylogenetic study of the species based on a different set of molecular markers (Cyt b and ND2) (Roy et al. 1998), which reported non-significant genetic differentiation, estimated as a homogeneity of haplotypes, among north-central and Juan Fernández Islands popu-lations. The misidentification of specimens of this species could be an explanation of for this strong genetic divergence. However, given that samples analyzed showed the typical diagnostic characteristics related to specimen size, feather coloration pattern and other features, this is unlikely. For example, both sequenced specimens show similar feather coloration (upper head and back of metallic green and whitish belly with iridescent green at flank) and body length (ca. 10 cm), which was concordant with the diagnostic criteria reported in ornithology guides of Chile (Jaramillo 2014;Couve et al. 2016). This set of characteristics yields a well-differentiated phenotype as compared to other sympatric species of hummingbirds, such as Patagona gigas (Jaramillo 2014). Therefore, it would be difficult to confuse this species with S. sephanoides. In sum, there is little doubt as to the conspecificity of the samples. Further analyses with new samples will help to confirm the marked level of divergence observed in S. sephanoides. However, given that the individuals analyzed were sampled from distant locations of the country (central and southern Chile, separated by a distance of ca. 800 km), it seems likely that the high level of divergence found in S. sephanoides may be related to isolation-by-distance.
The finding that the greatest intrageneric divergence was found in Charadrius (16.1%) and Phrygilus (14.3%) is noteworthy. In the case of the Phrygilus genus this divergence pattern has been interpreted in the context of a marked phylogeographic structure, which is associated with broad altitudinal and latitudinal distributions of species across the Andean mountains (Campagna et al. 2011;Álvarez-Varas et al. 2015). For instance, some species of this group, such as P. alaudinus, P. atriceps and P. unicolor, show a genetic differentiation mediated by allopatric mechanisms in response to specific geographic barriers. In contrast, some genera studied in this work showed low levels of genetic divergence, such as Larus (1.3%) and Spheniscus (2.2%). Although there is no genetic data for the Larus genus in Chile, our results are consistent with data reported for other Laridae species in the northern hemisphere, which show scarce genetic divergence. In fact, the lack of genetic differentiation within this group, reflected in a high sequence similarity (99.8%), gives rise to overlapping barcode clusters with one or more related species (Johnsen et al. 2010). In the case of Spheniscus, the evidence obtained in Chile indicate that this genus exhibit lower genetic divergence among species than other penguin genus (e.g., Pygoscelis, ca. three fold less variation), based on the analysis of complete mtDNA genomes of a small sample size (Ramos et al. 2018). Future studies in Spheniscus with more comprehensive sample sizes will be required to better support the interspecific genetic variation registered in this study based on COI sequence.
In conclusion, this study indicates that DNA barcoding with COI markers is highly accurate for identifying Chilean bird species, as the barcode sequence for nearly every species studied was markedly distinct from that of any other species. Our analysis identified significant interspecific divergence, roughly two orders of magnitude higher than the intraspecific values observed, reflecting a clear barcode gap. In addition, most of the species analyzed showed low intraspecific divergence. This pattern is consistent with data for birds from other temperate regions of southern South America but contrasts with studies on birds from tropical regions of South America, which often show deep intraspecific divergence. Thus, these data reflect the existence of different evolutionary patterns associated with specific regions within the continent. We hope that this step-by-step effort focused on obtaining and assessing the DNA barcodes of the Chilean avifauna, will be useful for increasing knowledge of national biodiversity. This approach may also facilitate the establishment of a dataset of avian barcodes from Chile, enhancing the scope of local studies.