﻿Biogeographic factors contributing to the diversification of Euphoniinae (Aves, Passeriformes, Fringillidae): a phylogenetic and ancestral areas analysis

﻿Abstract Factors such as the Andean uplift, Isthmus of Panama, and climate changes have influenced bird diversity in the Neotropical region. Studying bird species that are widespread in Neotropical highlands and lowlands can help us understand the impact of these factors on taxa diversification. Our main objectives were to determine the biogeographic factors that contributed to the diversification of Euphoniinae and re-evaluate their phylogenetic relationships. The nextRAD and mitochondrial data were utilized to construct phylogenies. The ancestral distribution range was then estimated using a time-calibrated phylogeny, current species ranges, and neotropical regionalization. The phylogenies revealed two main Euphoniinae clades, Chlorophonia and Euphonia, similar to previous findings. Furthermore, each genus has distinctive subclades corresponding to morphology and geography. The biogeographic results suggest that the Andean uplift and the establishment of the western Amazon drove the vicariance of Chlorophonia and Euphonia during the Miocene. The Chlorophonia lineage originated in the Andes mountains and spread to Central America and the Mesoamerican highlands after the formation of the Isthmus of Panama. Meanwhile, the ancestral area of Euphonia was the Amazonas, from which it spread to trans-Andean areas during the Pliocene and Pleistocene due to the separation of the west lowlands from Amazonas due to the Northern Andean uplift. Chlorophonia and Euphonia species migrated to the Atlantic Forest during the Pleistocene through corridors from the East Andean Humid Forest and Amazonas. These two genera had Caribbean invasions with distinct geographic origins and ages. Finally, we suggested taxonomic changes in the genus Euphonia based on the study’s phylogenetic, morphological, and biogeographic findings.


Supplemental material.
 Table S1.List of specimens studied and their provenance.
 Table S3.Results after filtering and clustering nextRAD data from Euphoniinae samples and outgroup samples.
 Table S4.Sample coverage for nextRAD data from Euphoniinae samples and outgroup samples.
 Table S5.Substitution models for nucleotide data partitions selected using the BIC in PartitionFinder.
 Text S1.Laboratory process and sequence preparation for nextRAD sequencing.Laboratory process and sequence preparation for nextRAD sequencing and quality filtering and Denovo alignment results.
 Text S1.Laboratory process and sequence preparation for nextRAD sequencing.
We extracted total genomic DNA from the tissue samples using the DNeasy tissues kit (Qiagen, Valencia, CA, USA) or the phenol: chloroform protocol (Hillis et al. 1996).The quality of DNA extractions was verified using gel electrophoresis.The DNA concentration was determined with a Qubit 3 fluorometer (ThermoFisher).The RAD sequence data was obtained using the nextRAD protocol by the company SNPsaurus (http://snpsaurus.com/) .We sent at least 50 ng of clean DNA at a concentration of ~5.0-10.0ng/ul (maximum of 25 ng/ul).Genomic DNA was converted to nextRAD genotypes using sequencing libraries (SNPsaurus, LLC) as in Russello et al. (2015).Genomic DNA was first fragmented with Nextera Flex reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments.The Nextera reaction was scaled for fragmenting 20 ng of genomic DNA.Fragmented DNA was then amplified for 27 cycles at 74 degrees, with one of the primers matching the adapter and extending 10 nucleotides into the genomic DNA with the selective sequence GTGTAGAGCC.Thus, only fragments starting with a sequence that could be hybridized by the selective sequence of the primer were efficiently amplified.The nextRAD libraries were sequenced on a single lane of an Illumina HiSeq 4000 with a single-end 150 bp protocol (University of Oregon).
 Text S2.Quality filtering and Denovo alignment results.
We found a strong relationship between the missing data and phylogenetic distances, with the highest values of Pearson correlations.The heatmap in figure 1 shows the threshold value test for these six clusters in which Pearson's correlation coefficient is less than 0.87.We found a strong relationship between the missing data and phylogenetic distances for 0.88 to 0.90 clust threshold values and the lowest correlation with 0.87 (Figure 2).This suggests that at 0.88, the most divergent alleles oversplit from their loci, which increases missing data.The graphs of the mean bootstrap values did not indicate significant differences between the clust threshold value.However, for phylogenies with only 10% of missing loci, the mean bootstrap values show more variation.The first 3 PCs' cumulative variance decreased from 0.85 to 0.90 clust value, with a breakpoint at 0.87 and a rapid decline beginning at 0.88 (3).This pattern suggests that as we increase the clustering value, the alignment variance decreases, probably due to the presence of more homozygous loci.For the fourth metric, we found a faster increase in total loci and total SNPs from 0.88 to 0.9 clust value (4 and 5).While heterozygous mean and percent of heterozygous sites decreased when increasing from 0.88 to 0.9, the highest rate of heterozygous loci was at a clust threshold value of 0.85 (6 and 7).The goal of optimizing a de novo alignment for Rad seq data is to minimize the Laboratory process and sequence preparation for nextRAD sequencing and quality filtering and Denovo alignment results.
presence of paralogous loci.This requires a balance between the total loci and heterozygosity.Low clustering values tend to include more paralogs in the alignment, which increases heterozygosity and decreases the number of loci.Meanwhile, higher clustering values drop the most divergent alleles as loci, which increases the missing data and decreases heterozygosity.Under this reasoning and based on the metric values, we set 0.87 as the optimal clustering for this alignment.The final alignment recovered 2,570 loci, 369,000 pb, and 37.43 % missing sites.The total SNPs recovered are 57, 575 and 37.26% missing sites.Laboratory process and sequence preparation for nextRAD sequencing and quality filtering and Denovo alignment results.

2.
Pearson correlation between genetic distance and missingness.
Laboratory process and sequence preparation for nextRAD sequencing and quality filtering and Denovo alignment results.
Laboratory process and sequence preparation for nextRAD sequencing and quality filtering and Denovo alignment results.

Table S1 .
List of specimens studied and their provenance.

Table S3 .
Results after filtering and clustering nextRAD data from Euphoniinae samples and outgroup samples.

Table S4 .
Sample coverage for nextRAD data from Euphoniinae samples and outgroup samples.

Table S5 .
Substitution models for nucleotide data partitions selected using the BIC in PartitionFinder.