AnthWest, occurrence records for wool carder bees of the genus Anthidium (Hymenoptera, Megachilidae, Anthidiini) in the Western Hemisphere

Abstract This paper describes AnthWest, a large dataset that represents one of the outcomes of a comprehensive, broadly comparative study on the diversity, biology, biogeography, and evolution of Anthidium Fabricius in the Western Hemisphere. In this dataset a total of 22,648 adult occurrence records comprising 9657 unique events are documented for 92 species of Anthidium, including the invasive range of two introduced species from Eurasia, A. oblongatum (Illiger) and A. manicatum (Linnaeus). The geospatial coverage of the dataset extends from northern Canada and Alaska to southern Argentina, and from below sea level in Death Valley, California, USA, to 4700 m a.s.l. in Tucumán, Argentina. The majority of records in the dataset correspond to information recorded from individual specimens examined by the authors during this project and deposited in 60 biodiversity collections located in Africa, Europe, North and South America. A fraction (4.8%) of the occurrence records were taken from the literature, largely California records from a taxonomic treatment with some additional records for the two introduced species. The temporal scale of the dataset represents collection events recorded between 1886 and 2012. The dataset was developed employing SQL server 2008 r2. For each specimen, the following information is generally provided: scientific name including identification qualifier when species status is uncertain (e.g. “Questionable Determination” for 0.4% of the specimens), sex, temporal and geospatial details, coordinates, data collector, host plants, associated organisms, name of identifier, historic identification, historic identifier, taxonomic value (i.e., type specimen, voucher, etc.), and repository. For a small portion of the database records, bees associated with threatened or endangered plants (~ 0.08% of total records) as well as specimens collected as part of unpublished biological inventories (~17%), georeferencing is presented only to nearest degree and the information on floral host, locality, elevation, month, and day has been withheld. This database can potentially be used in species distribution and niche modeling studies, as well as in assessments of pollinator status and pollination services. For native pollinators, this large dataset of occurrence records is the first to be simultaneously developed during a species-level systematic study.

dataset correspond to information recorded from individual specimens examined by the authors during this project and deposited in 60 biodiversity collections located in Africa, Europe, North and South America. A fraction (4.8%) of the occurrence records were taken from the literature, largely California records from a taxonomic treatment with some additional records for the two introduced species. The temporal scale of the dataset represents collection events recorded between 1886 and 2012. The dataset was developed employing SQL server 2008 r2. For each specimen, the following information is generally provided: scientific name including identification qualifier when species status is uncertain (e.g. "Questionable Determination" for 0.4% of the specimens), sex, temporal and geospatial details, coordinates, data collector, host plants, associated organisms, name of identifier, historic identification, historic identifier, taxonomic value (i.e., type specimen, voucher, etc.), and repository. For a small portion of the database records, bees associated with threatened or endangered plants (~ 0.08% of total records) as well as specimens collected as part of unpublished biological inventories (~17%), georeferencing is presented only to nearest degree and the information on floral host, locality, elevation, month, and day has been withheld. This database can potentially be used in species distribution and niche modeling studies, as well as in assessments of pollinator status and pollination services. For native pollinators, this large dataset of occurrence records is the first to be simultaneously developed during a species-level systematic study.
Funding: National Science Foundation grants DEB-0742998 and DBI-0956388. Study area description: The database covers a wide range of ecosystems found in both North and South America, from -62° to 79° in latitude and -174° to -22° in longitude. A large portion of the records in North America are from xeric regions (Great Basin, Colorado Plateau, Mojave, Sonoran, and Chihuahuan Deserts) and Mediterranean California, while those from South America are mostly from the xeric regions on the flanks of the Andes (Figs 1, 2). No records for Anthidium are known from the Caribbean islands. Much of the data set comes from general bee collecting. Additional material in western United States comes from multi-year intensive, systematic bee faunal studies in protected landscapes.
While the majority of species of Anthidium occupy a small number of ecoregions (< 5), some species such as A. tenuiflorae Cockerell are widespread, occurring in as many as 41 ecoregions. Many Anthidium have distributions that include critical, endangered, or vulnerable, as well as relatively stable or intact, ecoregions (Table 1) based on WWF (World Wild Fund for Nature) designations (Olson and Dinerstein 2002). Known distributions for 16 species are largely or entirely within critical or endangered ecoregions with at least 90% of collection records from such designated areas. An additional 22 species had at least 90% of collection records from within vulnerable ecoregions. Few native Anthidium spanned both Nearctic and Neotropic Realms (8.8%).
Design description: The purpose of this dataset is to make available data associated with bees of the genus Anthidium in the Western Hemisphere. The dataset was devel- Figure 1. Collecting intensity of Anthidium by ecoregion in the Western Hemisphere. Number of collection events defined as unique date and latitude and longitude combinations per each WWF ecoregion (Olson et al. 2001 oped during the course of a species-level revision of the genus . Most records come from specimens deposited in the first author's host institution or acquired on loan from multiple bee depositories, primarily in North America, but some from South American and European institutions (Fig. 3). Permitting issues limited access to some South American institutions. All such specimens were identified by V.H. Gonzalez and/or T. Griswold. Additional California records from  were captured for all species whose taxonomic concept was not modified in . Subsequent to identification, individual specimens were processed by a team of assistants at the USDA-ARS Bee Biology & Systematics Laboratory (BBSL). Individual specimens were entered into the US National Pollinating Insects Database (USNPID) using data entry forms where each specimen received a unique identifier (see below). These forms used authority files for bees, locations, collectors and plants. Where locations were not already georeferenced in the database they were georeferenced using Google Earth tm (http://earth.google.com/) or GEOlocate (http://www.museum.tulane.edu/geolocate/). Georeferencing used the form of decimal latitude and longitude in the WGS84 datum. Where georeferencing in the form of UTMs; township, range and section; or degree-minute-seconds was present  on the specimen label, these were transformed, but the original label georeferencing was captured in the location authority files. Records were analyzed geospatially using ArcGIS and WWF Biotic Regions. Twenty-two records (<0.1%) were excluded from biotic regions analysis due to questionable identification and/or label data. Databasing processes for the USNPID have evolved over the 25 years since initiation. Processing, originally considered as too costly, has since been incorporated into the databasing process. Verbatim label data capture originally only for holotypes, was expanded first to loaned specimens and now to all retro-active data capture. When validity of entry fields is questioned, verbatim information is queried before pulling the specimen from the collection, saving both time and potential handling hazards.  Addition of that tracking data (e.g. date of record entry, date of record modification, logging of entry person) and use of authority tables were essential to data quality, yet amounted to negligible additional data capture costs. The data underpinning the analysis reported in this paper are deposited at GBIF, the Global Biodiversity Information Facility, http://ipt.pensoft.net/ipt/resource.do?r=anthidium.

Taxonomic coverage
General taxonomic coverage description: The coverage of this dataset includes all 92 species of the bee genus Anthidium known to occur in the Western Hemisphere, including two that are introduced. Anthidium belongs to the tribe Anthidiini and is among the most diverse genera of the family Megachilidae. Based on the materials used in nest construction, anthidiines are broadly classed into two groups, carder bees and resin bees. While resin bees are generically diverse in the Western Hemisphere, Anthidium is the sole representative of carder bees in the Americas. As such this dataset documents all of a functional bee group for the Americas. The greatest number of data records are for two widespread western North American species, A. utahense Swenk (2409 records) and A. mormonum Cresson (1615 records) (Fig. 4). Though these species are rare in collections, there is no knowledge whether they are rare in nature, though at least for A. multispinosum, it is likely that it has a restricted distribution. No Anthidium in the Western Hemisphere have formally been listed as threatened or endangered.
Anthidium are occasionally associated with rare, threaten or endangered plants. Only a handful of such associations with state and/or federally listed plant bee records are included in the dataset (Table 2). Published records provide georeference only to the nearest degree, and floral host, month and day fields will have information hidden.
All specimens in this dataset have been reviewed by the authors or are easily determined taxa that have been reviewed by experts in bee taxonomy (e.g., John Ascher, for some AMNH material; A. A. Grigarick and L. A. Stange for California records in . Records with questionable determinations, label information or data have been withheld.

Temporal coverage
Records in AnthWest span more than a century, from May 1886 to February 2012. The majority of the records are from the past four decades (Fig. 2). In temperate North America, here restricted to Canada and the United States, Anthidium is most active during the late spring and summer months; the majority of the records are for May through August. In alpine regions (> 3000m) the season is narrowed to May through September, but largely June through August, peaking in July.

Datasets
Dataset description: AnthWest is a result of a broadly comparative study on the diversity, biology, biogeography, and evolution of bees in the genus Anthidium in the Western Hemisphere. The dataset includes 22,648 occurrence records for 92 species of Anthidium, including two introduced species from Eurasia. Each record consists of the species name, locality, collector's name, collection date, latitude, longitude, host plants, associated organisms, name of identifier, taxonomic value (i.e., type specimen, voucher, etc.), and repository. When coordinates for collection sites were not provided on the label, they were extracted using Google Earth tm (http://earth.google.com/) or GEOlocate (http:// www.museum.tulane.edu/geolocate/). To guarantee the high quality of the data, most records in the dataset correspond to individual specimens examined by the authors during this project, representing 60 biodiversity collections in Europe, Africa, North and South America (Fig. 3). A small fraction (4.8%) of the occurrence records were extracted from the literature. Only literature records for which there was a high degree of certainty in the identification were included. The vast majority of these published records were taken from the rigorous study of California Anthidiini by . Their records were included for all Anthidium species except A. atripes and A. emarginatum, which in  are recognized as species complexes. The balance, 30 records of the introduced A. manicatum and A. oblongatum , were included because these are distinctive species that could not be confused with any native species nor with each other. As with most other bees, floral resources are essential for reproductive success of Anthidium. Floral records indicate a broad array of floral visitation based on the quarter (24%) of AnthWest records that include floral visits. While visitation includes 56 plant families and over 100 species, Fabaceae and Boraginaceae dominated the dataset, together accounting for 75% of the records (Fig. 5).
Analysis of plant records at the generic level similarly shows the dominance of Fabaceae and Boraginaceae; all top ten floral associations belong to these two families, but Phacelia, the most visited genus belongs not to Fabaceae but to Boraginaceae (Fig. 6).
Records for 34 name-bearing types of Anthidium are also included in the database. Study extent: Because this dataset was developed as part of research that was focused on taxonomic revisionary work, sampling was not the focus of efforts; rather the data represents the aggregate of what we know about the distribution and behavior of Anthidium from existing material. Carder bees are diurnal, and are only active when temperatures are well above freezing and only during the growing season when floral resources are potentially available. Sampling description: Specimen records captured in AnthWest are the result of: 1) non-systematic collections usually as part of general entomological collecting events or ones focused on bees in general; 2) standardized biodiversity surveys conducted by the USDA Pollinating Insects Research Unit using a combination of net and pan traps; 3) trap nest studies; and 4) specimens resulting from studies on pollination and reproductive biology of threatened or endangered plants.
Quality control: All individual specimens included in this dataset were examined during the course of the taxonomic revision using distribution maps and raw data following standardized protocols (Figs 7,8). Records with questionable data on original insect labels were included in the dataset but distinguishable by notes in the DWC field "Identification Qualifier". These records were excluded from published distribution maps in the species-level revision of the genus . A small fraction (4.8%) of the occurrence records were taken from the literature (see above), largely California records from a taxonomic treatment with some additional records for the two introduced species (Anthidium manicatum and A. oblongatum). These records are highlighted in the Darwin Core [DWC] fields "Associated References" and "Occurrence Remarks" as well as a denoted with a "PUB" prefix in the catalog number.
Step description: Two separate work flows were employed for data capture, which differed fundamentally on where in the process material was determined by the revisionary authors. Retroactive data capture (Fig. 7) incorporated loaned specimens, publication records, and previously non-databased specimens in the U.S. National Pollinating Insects Collection, all of which follows after the identification process. Publication records were treated similarly to retroactive data capture except each re-  cord represents a summation of males and females with identical collecting event data. Beginning in 2005, new specimen records (Fig. 8) were batch entered into the database for projects and opportunistic collection events alike. Specimen identification and subsequent update to the database occurred after record and event metadata had been entered. New specimen collections also had a work flow that resulted in a greater number of data quality checks by technicians and primary researchers.
Purpose: The purpose of this dataset is to make available data associated with bees of the genus Anthidium in the Western Hemisphere. The dataset was developed during the course of a species-level revision of the genus . This dataset can potentially be used in species distribution and niche modeling studies, as well as in assessments of pollinator status and pollination services.

IP Rights: Licenses of use:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. http://creativecommons.org/licenses/by-nc-sa/3.0/ Records highlighted in the Darwin Core [DWC] fields "rights" and "rightsholder" indicate specimens that have addition usage rights.
Collection Data: For all collections, including those not listed in the Global Registry of Biodiversity Repositories (www.grbio.org) the Institution code listed below is included in the DWC field "owner Institution Code".

AMNH
American Specimen preservation method and curatorial units: Records represent pinned, dried adult individuals with attached label data stored in most cases in standard insect museum drawers preserved from dermestid damage by routine freezing of drawers at -20 C. Reviewed Anthidium specimens followed the basic process for Hymenoptera preservation and labeling outlined in Huber (1998). Newly collected BBSL specimens are given catalog numbers during initial labeling. Material sent for identification and loans were given unique catalog numbers after final identification and data entry.