A database on the distribution of butterflies (Lepidoptera) in northern Belgium (Flanders and the Brussels Capital Region)

Abstract In this data paper, we describe two datasets derived from two sources, which collectively represent the most complete overview of butterflies in Flanders and the Brussels Capital Region (northern Belgium). The first dataset (further referred to as the INBO dataset – http://doi.org/10.15468/njgbmh) contains 761,660 records of 70 species and is compiled by the Research Institute for Nature and Forest (INBO) in cooperation with the Butterfly working group of Natuurpunt (Vlinderwerkgroep). It is derived from the database Vlinderdatabank at the INBO, which consists of (historical) collection and literature data (1830-2001), for which all butterfly specimens in institutional and available personal collections were digitized and all entomological and other relevant publications were checked for butterfly distribution data. It also contains observations and monitoring data for the period 1991-2014. The latter type were collected by a (small) butterfly monitoring network where butterflies were recorded using a standardized protocol. The second dataset (further referred to as the Natuurpunt dataset – http://doi.org/10.15468/ezfbee) contains 612,934 records of 63 species and is derived from the database http://waarnemingen.be, hosted at the nature conservation NGO Natuurpunt in collaboration with Stichting Natuurinformatie. This dataset contains butterfly observations by volunteers (citizen scientists), mainly since 2008. Together, these datasets currently contain a total of 1,374,594 records, which are georeferenced using the centroid of their respective 5 × 5 km² Universal Transverse Mercator (UTM) grid cell. Both datasets are published as open data and are available through the Global Biodiversity Information Facility (GBIF).


761,660 records of 70 species and is compiled by the Research Institute for
in cooperation with the Butterfly working group of Natuurpunt (Vlinderwerkgroep). It is derived from the database Vlinderdatabank at the INBO, which consists of (historical) collection and literature data , for which all butterfly specimens in institutional and available personal collections were digitized and all entomological and other relevant publications were checked for butterfly distribution data. It also contains observations and monitoring data for the period 1991-2014. The latter type were collected by a (small) butterfly monitoring network where butterflies were recorded using a standardized protocol. The second dataset (further referred to as the Natuurpunt dataset -http://doi.org/10.15468/ ezfbee) contains 612,934 records of 63 species and is derived from the database http://waarnemingen.be, hosted at the nature conservation NGO Natuurpunt in collaboration with Stichting Natuurinformatie. This dataset contains butterfly observations by volunteers (citizen scientists), mainly since 2008. Together, these datasets currently contain a total of 1,374,594 records, which are georeferenced using the centroid of their respective 5 × 5 km² Universal Transverse Mercator (UTM) grid cell. Both datasets are published as open data and are available through the Global Biodiversity Information Facility (GBIF).

Rationale
Butterflies are among the best studied insects in the world and have always attracted the attention of both professional researchers, amateur naturalists, butterfly collectors, and the wider public (Kühn et al. 2008). Butterflies are widely considered as interesting study systems for ecology, evolution, behaviour, and conservation biology (e.g., Watt and Boggs 2003). Many butterflies have been collected and subsequently stored in museum or private collections. Furthermore, entomologists have often published lists of observed species during excursions to special habitats or have made overviews of regional or national butterfly faunas. In Belgium, entomology in general and lepidopterology in particular, have a long tradition with the first faunas already published only seven years after its independence in 1830 (De Selys-Longchamps 1837). Since then, several authors have updated the Belgian butterfly fauna based on collections or observations (e.g., Hackray et al. 1969;De Prins 1998). In 1991, the youth and nature organization Jeugdbond voor Natuur en Milieu (JNM) launched a butterfly project with the aim to publish a distribution atlas of the butterflies of Flanders, northern Belgium (Daniëls 1991). To do so, a first step consisted of collecting all historical collection and literature data. Secondly, a working group was organised in cooperation between JNM, De Wielewaal (which later became Natuurpunt) and the INBO that set up a citizen science project to obtain as many butterfly observations with a good spatial coverage over Flanders. The data gathered during this project (period 1991-1998) were used to compile a first Red List (Maes and Van Dyck 2001) and a distribution atlas of butterflies in Flanders, including the Brussels Capital Region (Maes and Van Dyck 1999).
Recently, both the Red List (Maes et al. 2012) and the distribution atlas (Maes et al. 2013) were updated using recent distribution data recorded through www.waarnemingen.be, a data portal launched by Natuurpunt, the largest nature conservation NGO in Belgium, where citizen-scientists can store and keep track of their recordings. Here, we publish both the historical and the more recent data used for the Red List and the distribution atlases as a data paper on a UTM grid cell resolution of 5 × 5 km².

Taxonomic coverage
The datasets cover all 67 indigenous and 3 regular migrant butterfly species ( Table 1 gives an overview of the species, together with the number of records present in the respective datasets. Common names: Butterflies Table 1. The number of records per species in the two datasets and the sum of the records in both datasets. v = observations with photographic evidence, but the species most probably do not have populations in Flanders. † indicates that a species is considered as extinct in Flanders; the year of extinction is also given. Observations after the year of extinction are considered as vagrant individuals. M : regular migrant species, (M) : the species is indigenous, but the regional population is supplemented by migrant individuals.

Flanders and the Brussels Capital Region
Flanders and the Brussels Capital Region cover an area of 13,522 km² and 162 km² respectively (13,684 km² in total - Figure 1). This area is situated in the northern of Belgium and represents 45% of the Belgian territory. Flanders is largely covered by agricultural land and urban areas while the Brussels Capital Region is mainly urban ( Table 2). Both regions have a very high population density (Table 2).

Georeferencing method
All distribution data of butterflies in Flanders and the Brussels Capital Region were attributed to grid cells of 5 × 5 km² of the Universal Transverse Mercator (UTM) projection ( Figure 2). The centroids of the 5 × 5 km² grid cells were calculated using the  WGS84 projection with a coordinateUncertaintyInMeters of 3,769 meters (Wieczorek et al. 2004). In total, Flanders and the Brussels Capital Region cover 638 (622 with records) and 9 (all nine with records) grid cells, respectively. The grid cells without records only cover a very small area within Flanders.

Temporal coverage
The INBO dataset mainly covers the historical museum and literature records (since 1830), butterfly monitoring records (since 1991) and observations (until 2008) while the Natuurpunt dataset covers the recent observations (mostly since 2008). Between 2000 and 2006, a butterfly survey project was organised in the province of West-Flanders (Cuvelier et al. 2007) and in the period 2006-2008, a similar project was undertaken in the Brussels Capital Region by the INBO on demand of Leefmilieu Brussel -BIM (Beckers et al. 2009). Both datasources were integrated in the INBO dataset. Since the introduction of the data portal www.waarnemingen.be for storing observations by the NGO Natuurpunt in 2008, the number of records has strongly increased and now reaches almost 150,000 records per year (Figure 3). The datasets will be updated on a yearly basis.

Sampling methods
Butterfly distribution data were collected in four different ways: i) collection data, ii) literature data, iii) monitoring transect data and iv) observations. Collection data were digitized from the following museum collections: Bosmuseum Groenendaal, Royal Institute for Natural Sciences (Brussels), Agricultural Faculty of Gembloux, Ghent university and the Antwerp Zoo. Furthermore, the private butterfly collections of the following people were also incorporated into the INBO dataset: A. Published observations were searched for in different literature sources (see section "References to literature checked for occurrence data" in the Suppl. material 1) and indicated in the field associatedReferences. Since most of the records in collections and in the literature were only reported at the municipality level, the UTM 5 × 5 km² grid cell of the centre of the municipality was attributed to the record.  (e.g., 1905 = 1901-1905, 1910 = 1906-1910, etc.). Note the different scales on the y-axis for both figures.
Butterfly monitoring counts were conducted along fixed transects of maximum 1 km, consisting of smaller sections, each with a homogeneous habitat (e.g., woodland, hay meadow, dry heathland -see van Swaay et al. 2008;van Swaay et al. 2011 for a detailed description of the monitoring method).
Observations (species, date, location, observer) were recorded by volunteers/citizen scientists and stored in the INBO dataset (mainly for the period 1991-2007, usually with a resolution of 1 × 1 km² or 5 × 5 km²) or in the Natuurpunt dataset. Since 2011, 69% of the records had a precision of 25m or less. Because of the increasing popularity of mobile apps using GPS readings in the field, this proportion increased with 5% per year to reach 77% in 2015. The number of observers in the INBO and the Natuurpunt datasets is given in Table 1. The frequency distribution of the recorders per number of records is given in Figure 4.
A list of references that used data described in this paper can be found in the section "Publications based on this dataset" in the Suppl. material 1.

Quality control
The data in both datasets were carefully verified by butterfly experts (including professional entomologists) taking collection specimens, the observer's species knowledge, added photographs and known species list of locations into account. The validation procedure from www.waarnemingen.be consists of an interactive procedure in which observers can be asked for additional information by a team of validators, after which the validator manually adds a validation status. Records that are not manually validated are additionally checked by an automated validation procedure that takes into account the number of manually validated observations within a specified date and distance range. 11% of the butterfly records submitted to the data portal www.waarnemingen.be are supported by photographs. The validation status is indicated in the field identificationVerificationStatus.

Information withheld
In the original databases, the observer's name, the exact XY-coordinates and the toponym are known.

Dataset description
The butterfly occurrence data are published as two separate Darwin Core Archives: 1) collection and literature data, observations and butterfly monitoring in Flanders and in the Brussels Capital Region (1830-2014) hosted at the Research Institute for Nature and Forest (INBO) and 2) recent observations  from the Natuurpunt data portal (www.waarnemingen.be). The data models used for both datasets are identical and can be merged easily. The INBO dataset contains 761,660 records and the Natuurpunt dataset 612,934 records totalling to almost 1.4 million records. The data compiled for the butterfly atlas of the Brussels Capital Region are marked as INBO/ LB-BIM in the ownerInstitutionCode field in the INBO dataset.
The distribution of the number of records and species per grid cell for both datasets is given in Figure 5.

Usage norms
To allow anyone to use the datasets described here, we released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). Users of published datasets are encouraged to follow the respective norms for data use (http://www.inbo.be/en/norms-for-data-use and http://www.natuurpunt.be/normen-voor-datagebruik [in Dutch]) and to provide a link to the original dataset (http://doi.org/10.15468/njgbmh and http://doi.org/10.15468/ezfbee), whenever appropriate. If used for a scientific paper, it is recommended to cite the dataset following the applicable citation norms (e.g. GBIF 2012) and/or to contact the authors for additional information (dirk.maes@inbo.be, marc.herremans@natuurpunt.be or dimitri.brosens@inbo.be). Dataset issues can also be reported via opendata@inbo.be.