(C) 2011 David E. Schindel. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
For reference, use of the paginated PDF or printed version of this article is recommended.
The Division of Birds, National Museum of Natural History, Smithsonian Institution in Washington, DC, has obtained and released DNA barcodes for 2,808 frozen tissue samples. Of the 1,403 species represented by these samples, 1,147 species have not been barcoded previously. This data release increases the number of bird species with standard barcodes by 91%. These records meet the data standard of the Consortium for the Barcode of Life and they have the reserved keyword BARCODE in GenBank. The data are now available on GenBank and the Barcode of Life Data Systems.
DNA barcoding, GenBank, BOLD, genomics
The Division of Birds, National Museum of Natural History of the Smithsonian Institution (USNM), has released approximately 2800 DNA barcode data records into the public domain through GenBank and the Barcode of Life Data Systems (BOLD). These records were derived from the Division’s extensive collection of frozen tissues that are linked to voucher specimens in the Museum. The data adhere to the DNA barcode data standard (
This ‘Project Description’ has been submitted as part of a policy of rapid data release for genomic data known as the Fort Lauderdale Principles (
• Urge funding agencies to require the early and rapid release of large genomic datasets that represent research infrastructure with significant potential for use by the research community beyond the data producers;
• Encourage data producers to publish Project Descriptions such as this one to state their intended use of a newly released dataset within a stated, reasonable period of time;
• Propose that researchers should be expected to refrain from using the data for purposes and interval stated in the Project Description, but should be free to use the data for other applications with proper citation of the Project Description or other references to the dataset.
A full description of the dataset is in preparation with the goal of publication as a 'data release paper' in ZooKeys before June 2012, in accordance with guidelines issued by ZooKeys (
The data release paper will also discuss the relationship between clusters based on barcode data variability and taxonomic names attached to the voucher specimens from which the DNA barcodes were derived. The taxonomic identifications in the GenBank records have undergone screening relative to each other and there are some uncertainties associated with some species-level determinations. These will be investigated more carefully by re-examining voucher specimens and analysis of the barcode sequences relative to other public barcode records. All species determinations will be resolved by the time of publication of the full data release paper.
Data resourcesData are deposited in GenBank under accession numbers JQ173884-JQ176686 (http://www.ncbi.nlm.nih.gov/nuccore?term=JQ173884:JQ176686[accn]). The full dataset is also available on BOLD at http://www.barcodinglife.org as project name ‘USNMY’ under ‘Published Projects’.
Contents of the datasetThe dataset represents samples from 27 countries (Argentina, Australia, Botswana, Brazil, Gabon, Greece, Guyana, Iceland, Johnston Atoll, Mariana Islands, Mexico, Mongolia, Myanmar, Pakistan, Panama, Papua New Guinea, Philippines, Puerto Rico, Russia, South Korea, St. Vincent, Swaziland, Sweden, United Kingdom, United States, Uruguay, and the former Soviet Union).
Each GenBank record in the dataset carries the BARCODE keyword that indicates compliance with CBOL’s barcode data standard. Accordingly, each record includes the following data elements required by the standard:
• The name of the approved BARCODE region (COI in this case).
• A species level identification. All names can be found in the Integrated Taxonomic Information System (
• A structured identifier of the voucher specimen using the Darwin Core triplet consisting of institutional acronym, collection code, and specimen ID number.
• Country of origin.
• Forward and reverse primer sequences.
• A DNA sequence based on forward and reverse sequencing reactions with at least 75% coverage of the standard barcode region as specified in
In addition, many records include the following data fields that are strongly recommended by the standard:
• Latitude and longitude of collecting locality
• Date of collection
• Name of collector
• Name of identifier
Use of early release dataThe authors invite the research community to examine and analyze the data in their current form with the following understandings:
• As with all data released on GenBank, the National Center for Biotechnology Information places no restriction on their use or distribution.
• The authors intend to publish a descriptive paper summarizing the dataset and its implications for bird barcoding and any taxonomic issues arising from the data. Publication of this data release paper is anticipated by 1 June 2012. In accordance with the Fort Lauderdale Principles (
• Use of this dataset for purposes other than those described above are welcome and encouraged, contingent on proper citation of this publication.
• The authors invite members of the community to examine the data and test their accuracy relative to other datasets. We welcome your comments, suggestions and corrections. BOLD 3.0 includes the capability to submit annotations to data submitters and we encourage readers to use this new system to submit observations on this dataset.
• The species determinations are not yet final. Some of the species identification may be change by the time of publication of the data release paper (anticipated by 1 June 2012).
All laboratory procedures were performed in the Laboratories for Analytical Biology, Museum Support Center, National Museum of Natural History, Smithsonian Institution, in Suitland, MD. The authors thank Amy Driskell for her supervision of the process. The frozen tissue collection of the NMNH Division of Birds is a globally important research resource that has been built up over the past decade by dedicated, visionary researchers. This and other barcoding projects are designed to add value to their important contributions to future generations of researchers. Among the many builders of the collection we acknowledge Storrs Olson, James Dean and Carla Dove. Dr. Dove deserves special thanks for her early and continuing work on bird barcoding and her encouragement on this project.