DNA barcoding of Dutch birds

Abstract The mitochondrial cytochrome c oxidase subunit I (COI) can serve as a fast and accurate marker for the identification of animal species, and has been applied in a number of studies on birds. We here sequenced the COI gene for 387 individuals of 147 species of birds from the Netherlands, with 83 species being represented by > 2 sequences. The Netherlands occupies a small geographic area and 95% of all samples were collected within a 50 km radius from one another. The intraspecific divergences averaged 0.29% among this assemblage, but most values were lower; the interspecific divergences averaged 9.54%. In all, 95% of species were represented by a unique barcode, with 6 species of gulls and skua (Larus and Stercorarius) having at least one shared barcode. This is best explained by these species representing recent radiations with ongoing hybridization. In contrast, one species, the Lesser Whitethroat Sylvia curruca showed deep divergences, averaging 5.76% and up to 8.68% between individuals. These possibly represent two distinct taxa, S. curruca and S. blythi, both clearly separated in a haplotype network analysis. Our study adds to a growing body of DNA barcodes that have become available for birds, and shows that a DNA barcoding approach enables to identify known Dutch bird species with a very high resolution. In addition some species were flagged up for further detailed taxonomic investigation, illustrating that even in ornithologically well-known areas such as the Netherlands, more is to be learned about the birds that are present.

introduction DNA barcoding is used as an effective tool for both the identification of known species and the discovery of new ones (Hebert et al. 2003, Savolainen et al. 2005. The core idea of DNA barcoding is based on the fact that just a small portion of a single gene, comprising a 650 to 700 bp fragment from the first half of the mitochondrial cytochrome c oxidase subunit I gene (COI), shows a lower intraspecific than interspecific variation. An attribute which characterizes a threshold of variation for each taxonomic group, above which a group of individuals does not belong to the same species but instead forms an intraspecific taxon. In other words, the recognition of patterns in sequence diversity of a small fragment from the mtDNA genome has led to an alternative approach for species identification across phyla.
Initially, DNA barcodes were proposed for the Animal Kingdom in 2003, when Hebert and colleagues tested a single gene barcode to identify species and coined the term 'DNA barcoding' (Hebert et al. 2003). Since that time COI sequences have been used as identifiers in the majority of animal phyla including vertebrates (e.g. Hebert et al. 2004, Ward et al. 2005, Kerr et al. 2007, Smith et al. 2008, Nijman and Aliabadian 2010, Luo et al. 2011) and invertebrates (Hajibabaei et al. 2006, Bucklin et al. 2011, Hausmann et al. 2011. In recent years, the practical utility of DNA barcodes proved to be an appealing tool to help resolve taxonomic ambiguity (Hebert et al. 2004(Hebert et al. , 2010, to screen biodiversity (e.g. Plaisance et al. 2009, Naro-Maciel et al. 2009, Grant et al. 2011, and to support applications in conservation biology (Neigel et al. 2007, Rubinoff 2006, Dalton and Kotze 2011. Birds are among the best-known classes of animals and thus provide a taxonomically good model for analyzing the applicability of DNA barcoding. In the last seven years some 30 scientific papers have been published on the DNA barcoding of bird species, which combined have been cited 500 times (V. Nijman, unpubl. data April 2013). Most of the studies have shown that from this small fragment of DNA, individuals have been identified down to species level for 94% of the species in Scandinavian birds (Johnsen et al. 2010), 96% in Nearctic birds (Kerr et al. 2009a), 98% in Holarctic birds (Aliabadian et al. 2009) and 99% in Argentinean and South Korean birds (Kerr et al. 2009a, Yoo et al. 2006. Species delineation relying on the use of theshold set to differentiate between intraspecific variation and interspecific divergence has been criticized as leading to too unacceptable high error rates especially in incompletely samples groups (Meyer and Paulay 2005). However, even the critics of DNA barcoding concede that DNA barcoding holds promise for identification in taxonomically well-understood and thoroughly sampled clades. Birds are taxonomically well-known, especially those of the Western Palearctic to which the Netherlands, our study area, forms part. As noted by Taylor and Harris (2012), compared to other taxa that have been subjected to DNA barcoding, DNA barcoding studies of birds tend to represent aggregations of very large number of bird species barcodes. These often include (near) cosmopolitan species with samples from distant geographic locations potentially increasing the amount of interspecific variation in COI.
Here we explore the efficiency of identifying species using DNA barcoding from a large set of sympatric bird species in the Netherlands. Compared to previous studies on birds, our study area covers a very small geographic area, allowing to directly test the functionality of DNA barcoding 'in one's backyard'.

Sampling
The Netherlands is a small, densely populated country in northwestern Europe, with a land surface area of some 34,000 km 2 , and ornithologically it is arguable one of the best-covered countries (Sovon 2002). The tissue samples used for sequencing were collected from breeding areas in the Netherlands, excluding oversees dependencies. Given the small size of the country some 95% of the samples were collected within a 50 km-radius of each other. Samples were part of the tissue collection of the Zoological Museum of Amsterdam (ZMA), which were recently relocated and deposited in the Naturalis Biodiversity Center, Leiden. Most were collected in the period 2000-2012 by a network of volunteers, ringers, airport staff, and bird asylums; no birds were specifically collected or killed to be included in the collection of the ZMA. Species and subspecies identification was based on morphology and when necessary, external measurements. These identifications were done by authors HvB and CSR, with the help of Tineke G. Prins. Individual birds were frozen upon arrival to be thawed and skinned at a later date, and indeed many birds arrived frozen. Samples were mostly taken from the bird's pectorial muscles, because of its size and easy access, and stored in 96% ethanol. Species nomenclature follows the taxonomy of Dickinson (2003). The complete list of sampled specimens including information about vouchers and trace files is available from the project 'Aves of the Netherlands' at the BOLD website (http://www.barcodinglife.com/).

PCR and sequencing
The tissue samples were subsampled and subjected to DNA extraction using DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's protocol. PCR and sequencing reactions were performed, mainly following the same protocols described in Förschler et al. (2010), but with some minor modifications. Polymerase chain reaction (PCR) amplifications were initially performed using standard primers BirdF1 (TTCTCCAACCACAAA-GACATTGGCAC) and BirdR1 (ACGTGGGAGATAATTCCAAATCCTG). When amplification was unsuccessful, alternate reverse primer BirdR2 (ACTACATGTGAGA-TGATTCCGAATCCAG) was used in combination with BirdF1 or alternate primer pair CO1-ExtF (ACGCTTTAACACTCAGCCATCTTACC) and CO1-ExtR (AACCAG-CATATGAGGGTTCGATTCCT) was used (Hebert et al. 2004, Johnsen et al. 2010).
All PCRs were run under the following thermal cycle program: 3 min at 94 °C followed by 40 cycles of 15 s at 94 °C, 30 s at 50 °C and 40 s at 72 °C, and a final elongation of 5 min at 72 °C. For each reaction the PCR mixture consisted of 2.5 µl Qiagen Coral Load 10 × PCR buffer, 1.0 µl of each 10mM primer, 0.5 µl 2.5 mM dNTPs, 0.25 µl 5U/ µl QiagenTaq DNA polymerase, 18.75 µl milliQ and 1.0 µl template DNA for a total volume of 25 µl. Bi-directional sequencing was performed for all specimens at Macrogen. We checked the possible amplification of pseudogenes (Numts) by translating the protein coding genes into amino acids sequences, but we did not observe any unexpected stop codons, frameshifts or unusual amino acidic substitutions. Furthermore we amplified a longer sequence of the COI gene with primers (CO1-ExtF and CO1-ExtR) for selected samples, and also here we did not see any indication of pseudogene co-amplification. Lijtmaer et al. (2012) found that, in birds, full-length COI pseudogenes are uncommon noting that they might be more frequently encountered when working with avian blood samples as opposed to muscle tissue samples (as used in here).

Data analysis
Sequences shorter than 500 bp and containing more than 10 ambiguous nucleotides were excluded from the analyses. All sequences have been deposited in GenBank (Accession numbers KF946551 to KF946937). A full list of the museum vouchers, for all specimens applied in this study, is provided in Appendix - Table 1. For all sequence comparisons, the Kimura 2-parameter (K2P) model was used, because it is shown to be the best metric to compare closely related taxa (Nei and Kumar 2000, but for a contrasting view see Srivathsan and Meier 2012). Average intraspecific distances were calculated for those species that were represented by at least two specimens using Mega v5.1 software (Tamura et al. 2011).
For a group of birds that expressed a larger than expected intraspecific variation, the Sylvia warblers, we created a phylogenetic tree and created a haplotype network. We chose GTR+G+I as the best-fitting model of nucleotide substitution based on its Akaike's information criterion as implemented in JModelTest v0.1.1 (Posada 2008). A maximum likelihood (ML) tree was constructed in PAUP* v4.0b10 (Swofford 2002) using a heuristic search with the tree-bisection-reconnection branch-swapping algorithm and random addition of taxa. Relative branch support was evaluated with 500 bootstrap replicates (Felsenstein 1985). A minimum spanning haplotype network was constructed using a statistical parsimony network construction approach as implemented in TCS software package (Clement et al. 2000). This programme calculates the number of mutational steps by which pairwise haplotypes differ and computes the probability of parsimony (Templeton et al. 1992) for pairwise differences until the probability exceeds 0.95. The number of mutational differences associated with the probability just before the 0.95 cut-off point is then the maximum number of mutational connections between pairs of sequences justified by the parsimony criterion; these justified connections are applied in the haplotype network (Clement et al. 2000).
In general, 95% of species (134 species) showed a unique DNA barcode (these included the 58 species for which we only sequenced single individuals), while six congeneric species shared the same barcode and the mean intraspecific distance of them fell well below the threshold of species based on distance-based criterion (Hebert et al. (2003) 10 x rule). These congeneric species mostly included circumpolar species with close morphological similarities (Table 2).  Although most species possessed low intraspecific distances, one species showed high intraspecific K2P-distances clearly above the threshold of 2 to 3 per cent sequence divergence in our data set. This is the Lesser Whitethroat Sylvia curruca, with a mean interspecific divergence of 5.76% and a maximum interspecific distance of 8.68%. Two subspecies occur in the Netherlands, i.e. the Western Lesser Whitethroat S. c. curruca and, as a migrant, the Northeastern Lesser Whitethroat S. c. blythi. Both are morphologically somewhat distinct, with compared to the nominate S. c. blythi having a paler top of the head, separated from face by a white supercilium, and geographically the nominate occupies the western part of the species range and S. c. blythi the eastern part. A maximum likelihood tree for these two taxa based on Kimura 2-parameter is presented in Figure 2. Two different haplotype networks, one each for S. c. curruca and S. c. blythi were recovered by TCS (Figure 3), and given the large genetic distances between their haplotypes, the two taxa are not included in the same haplotype network.

Discussion
We here present the results of a modest effort to barcode the avifauna of the Netherlands. In terms of DNA barcoding of birds, the Netherlands form the southernmost part of one of the most densely sampled regions globally (Lijtmaer et al. 2012: figure  1). In addition, many of the species that overwinter in the country originate equally well-sampled regions to the north. As such our study adds to a growing number of studies allowing us to build up comprehensive public libraries of bird barcodes. Combined these allow us to explore new lines of scientific inquiry and practical applications (Hebert et al. 2010, Lijtmaer et al. 2012, but see Ebach and Carvalho 2010). The collection of our samples was done as part of the museum's standard collection man- agement of newly obtained material, and as such sample collection was inexpensive and required little effort in terms of manpower. All birds were collected and processed in the Netherlands and did not require specific permits other than the ones already required to curate the collections.
Recently, Taylor and Harris (2012) expressed the opinion that proponents of DNA barcoding consistently fail to recognize its limitations (including, but not restricted to, the functioning of COI as a universal barcoding gene, whether its use is to be restricted to species identification only or whether it has a role in species discovery and delimitation and the failure to have sufficient systems in place to deal with the large amounts of data generated), do not evolve their methodologies, and do not embrace the possibilities that next-generation sequencing offers. We agree that DNA barcoding will not offer a panacea for all the issues Taylor and Harris (2012) raised, or indeed some of its earlier critics (Will et al. 2005, Moritz and Cicero 2004) but we point out that for this was probably never the intention of DNA barcoding when envisaged some ten years ago. Irrespective of the aims and goals of DNA barcoding as a 'global enterprise' (Ebach and Carvalho 2010), we found it a useful tool in our studies on birds (cf. Baker et al. 2009). The bird collection of the Zoological Museum Amsterdam, and our sample reported in this study, was well-curated by knowledgeable staff, with a very high degree of taxonomic certainty attached to each individual specimen. We see immense value to having a DNA barcoding dataset linked to this reference collection. As such this work has added to the growing library of DNA barcodes of bird species of the world and subsequent improvement in our knowledge of biodiversity.
The mean intraspecific divergences found in the birds of the Netherlands (0.29%, based on 147 species) is congruent with that of for instance Argentina (0.24%, 500 species), North America (0.23%, 643 species) and the Holarctic (0.24%, 566 species) (Kerr et al. 2009a, Aliabadian et al. 2009). More importantly, like other studies on birds, the efficiency of DNA barcode sequences to identify species is high, showing a clear barcoding gap (Figure 1), and overall it seems that for birds typically 95% or more of the species can be identified (Hebert et al. 2003, Johnsen et al. 2010, Kerr et al. 2009a, b, Yoo et al. 2006, Aliabadian et al. 2009).
Most DNA barcoding studies of birds flag a small number of deep divergences (e.g. Johnsen et al. 2010, Kerr et al. 2009b, Aliabadian et al. 2009, Nijman and Aliabadian 2013, in our study involving the two subspecies of Sylvia curruca, where the two lineages diverge almost 6%. Similar results were found by Olsson et al. (2013) when analyzing the cytochrome b gene for these two taxa, with distances in the order of 11-14%. Based on COI sequences, the two taxa appear to be sister taxa, albeit with a relatively low support ( Figure 3), but no other members of the Sylvia curruca were included in the analysis. In contrast, having included a range of other members of this complex, Olsson et al. (2013) found curruca and blythi not to be sister taxa. Olsson et al. (2013: 81) concluded that while "due to their morphological similarity it is unclear where their ranges meet, [o]ur data suggest that blythi is a valid taxon, not closely related to curruca. It has its closest relatives to the south-east [Asia], and may have colonised the eastern taiga from this direction, ultimately coming into contact with curruca". When it comes to drawing conclusion from their work with respect to taxomomy, Olsson et al. (2013) were, in our view correctly, cautious. They noted that the Sylvia curruca complex comprised up to 13 taxa with little consensus as to circumscription and taxonomic rank. Of these, morphologically some taxa are very similar, including S. c. curruca and S. c. blythi, and the apparent conflict between morphology and phylogeny (based in their case on cyt b and in our study on COI) can be explained in different ways. One would be to accept the single mitochondrial gene trees at face value in which case the morphological similarities in pelage coloration may be a result of parallel evolution possibly in response to adaptations to similar temperate forest habitats -both taxa are then best treated as different species. Alternatively, the mitochondrial gene trees do not reflect the species tree and, based on morphological similarities, S. c. curruca and S. c. blythi are best treated as sister taxa (either as one or two species). Their divergent position on the mitochondrial gene tree, and the large genetic distances between these taxa, are due to ancient mitochondrial introgression. In either case, working with single mitochondrial markers cannot not resolve this issue and a more integrative approach ideally involving the analysis of nuclear genes is paramount.
Those cases where we found species sharing the same DNA barcodes were small in number but not insignificant. Seven of the eight cases involved closely related gulls with partially overlapping ranges, or allopatric distributions, that are part of a recent Holarctic radiation (Liebers-Helbig et al. 2010). Alternatively, the the sharing of DNA barcodes may be due to hybridization or, perhaps less likely, misidentification. Likewise, skuas are part of a recent radiation with, just like gulls, frequent hybridization between species (Ritz 2009). DNA barcoding using a relative slowly evolving maternally inherited gene, with, compared to other mitochondrial genes, small amounts of rate heterogeneity (Pacheco et al. 2011), will, on its own, not be able to differentiate between these taxa.
We conclude that DNA barcoding approach makes it possible to identify known Dutch bird species with a very high resolution. Although some species were flagged for further detailed taxonomic investigation, our study reaffirms once more that a short segment of COI gene can be used to handle large number of taxa and aid in detecting overlooked taxa and hybridizing species with low deep barcode divergences. ments: combined their efforts greatly improved the quality and clarity of the work. Our molecular work is funded in part by the Fonds Economische Structuurversterking. We dedicate this paper to the memory of Jan Wattel, former curator of birds at the Zoological Museum Amsterdam, who passed away in March 2013. Hebert