The collection of Bathynellacea specimens of MNCN (CSIC) Madrid: microscope slices and DNA extract

Abstract This is the first published database of a Bathynellacea Chappuis, 1915 collection of slices and DNA extracts. It includes all data of bathynellaceans (Crustacea: Syncarida) collected in the last 48 years (1968 to 2016) on the Iberian Peninsula and Balearic Islands, studied since 1984. It also includes specimens studied across many countries of Europe (Portugal, Romania, France, Italy, Slovenia, Bulgaria, and England), as well as some specimens obtained from samples of North America (Montana, Washington, Alaska and Texas), South America (Brazil, Chile and Argentina), Asia (China, Thailand, Vietnam, Mongolia and India), Africa (Morocco and Chad) and Australia (New South Wales –NSW- and Queensland). The samples come from groundwater (caves, springs, wells and hyporrheic habitat associated with rivers) obtained from both, sampling campaigns and occasional sampling efforts. The data set includes 3399 records (2657 slices and 742 DNA extracts) corresponding to three families (Parabathynellidae Noodt, 1965, Leptobathynellidae Noodt, 1965 and Bathynellidae Grobben, 1905) of the order Bathynellacea; the existence of three families is accepted, but this is a controversial issue and here is not the appropriate context to address this problem; 52 genera and 92 species formally described, in addition to 30 taxa under study and, thus, still unpublished. This represents more than half of all the genera known worldwide (80) and almost one third of the species currently known in the world (329, which increases every year). This dataset contains especially relevant collection that includes holotypes and type series of 43 new species of Bathynellacea (33 from the Parabathynellidae and ten from the Bathynellidae) described by Ana I. Camacho (AIC hereinafter); eleven of these are the type species for new genera described from all around the world, ten belonging to the Parabathynellidae and one from the Bathynellidae. As previously mentioned, these new species come from all continents, although 26 of them are from the Iberian Peninsula. The most important feature of this collection is that it has been created and reviewed by a specialist of the group (AIC), and each specimen, regardless of its shape (either permanent slices or DNA extracts), includes taxonomic, geographical and authorship information. The specialist has been involved in all stages of the process, from field sampling to the digitization of the results we are now presenting, and has worked in close collaboration with the curators responsible for the different collections involved in this project.

The data set includes 3399 records (2657 slices and 742 DNA extracts) corresponding to three families (Parabathynellidae Noodt, 1965, Leptobathynellidae Noodt, 1965 and Bathynellidae Grobben, 1905) of the order Bathynellacea; the existence of three families is accepted, but this is a controversial issue and here is not the appropriate context to address this problem; 52 genera and 92 species formally described, in addition to 30 taxa under study and, thus, still unpublished. This represents more than half of all the genera known worldwide (80) and almost one third of the species currently known in the world (329, which increases every year).
This dataset contains especially relevant collection that includes holotypes and type series of 43 new species of Bathynellacea (33 from the Parabathynellidae and ten from the Bathynellidae) described by Ana I. Camacho (AIC hereinafter); eleven of these are the type species for new genera described from all around the world, ten belonging to the Parabathynellidae and one from the Bathynellidae. As previously mentioned, these new species come from all continents, although 26 of them are from the Iberian Peninsula.
The most important feature of this collection is that it has been created and reviewed by a specialist of the group (AIC), and each specimen, regardless of its shape (either permanent slices or DNA extracts), includes taxonomic, geographical and authorship information. The specialist has been involved in all stages of the process, from field sampling to the digitization of the results we are now presenting, and has worked in close collaboration with the curators responsible for the different collections involved in this project.

General description
Purpose: The collections of the MNCN in Madrid hold the largest collection of Crustacea Bathynellacea in the world, with 3399 records (Figure 1) corresponding to 2657 permanent slices and 742 DNA extracts and their relevant taxonomic, geographical, and authorship information. From these, 2169 records (1683 permanent slices and 486 DNA extracts) belong to the Parabathynellidae, 1211 (974 permanent slices and 237 DNA extracts) belong to the Bathynellidae, and 20 (all DNA extracts) to the Leptobathynellidae ( Figure 1). The objective of this work is to highlight the value of this collection by presenting it to the researcher community. Its importance is not only due to the number of specimens, but also due to their representativeness both taxonomically and spatially. What is also important is the number of types and type series it includes (holotypes and type series of 43 species coming from all continents) ( Figure 2) and in their state of preservation which ensures its future utility. There are specimens from 31 different genera, from the 80 in total that are recognized worldwide ( Figure 3), which belong to the three families currently known. This adds up to almost one third of all the species known in the world (94 of the 329 species formally described) ( Table 1) (Figure 4). The collection includes specimens from all continents, from populations in Alaska to the South of Australia, although there is a predominance of European species, particularly from the Iberian Peninsula.
This particular group of crustaceans is slowly showing the true magnitude of its diversity, and the collection presented here is a proof of this. It was traditionally  considered a rare group with very low diversity mainly due to the fact that its habitat (groundwater) is rarely sampled, and that its presence and density is on average low. This, together with the difficulty for humans to access its environment, as well as the  complex and time-consuming taxonomic research the group implies due to the small size of the species (most of the species are not larger than a millimeter) and their morphological complexity of their numerous appendices (e.g., thoracopod VIII male transformed into a copulatory organ), has prevented many researchers devoting their time to their study over the years. Nevertheless, one of the authors (AIC) has devoted over 30 years of work to produce the collection we are presenting here. We are con- Table 1. Present taxa (families and genera) and species number from these genera in the collections of the MNCN and in the world by continent. % world representation in this database. *Oceania= Geopolitic region (Australia and New Zealand in this paper). ** Total number of world species is approximate, because there are new species in study and "in press", and the number change every year.  vinced that the relevance of the collection is already reason enough for its publication, especially due to the important information on the Iberian Peninsula and Balearic Islands, which is currently one of the best-studied regions in terms of bathynellaceans, and linked with this effort, also the region with the highest diversity of this group of crustaceans in the world (Camacho et al., 2014). There are 58 species known for this particular region, 41 formally described, and at least 17 more that have been identified as new species, but are pending description. This includes many cryptic species identified thanks to molecular studies (Camacho et al., , 2012(Camacho et al., , 2013a. All of the above are represented through permanent slices in the collection we present here, plus DNA extracts of 41 of the species, although currently not all of them include the gene sequences. In addition to all of these, the collection also includes many other European species (66), as well as species from Asia (6), America (9), Australia (8) and Africa (3) (see Table 1 and Figure 5). The present paper is an important contribution that offers basic and rigorous taxonomic information, which is updated and can be potentially useful for subterranean biodiversity studies (identifying hotspots), and also for ecology and conservation studies, particularly for estimating future global changes as the specimens recorded range from 1986 to the present.
Our aims for publishing this dataset are 1) describing the Bathynellacea collection of permanent slices and DNA extract of the MNCN, 2) show the first data set of holotype and type series collection of Bathynellacea in the world, 3) providing information on the diversity and distribution of groundwater fauna in the world and 4) offering the first dataset of Bathynellacea permanent slices in the world to the scientific community in the hopes of promoting other researchers to publish their different groundwater fauna datasets.  Table 1 shows the present taxa (families, genera and species) in the collections of the MNCN and in the world by continent with % representation in the collections. Table 2 includes information on all the new species of Bathynellacea described by authors, including the catalogue number of holotype and DNA voucher from specimens of type localities (where available) from classic Crustacea and "Tissue and DNA" collections of the MNCN and the numbers of specimens of type series. Table 3 is a short list of species and localities of Bathynellacea of which there are DNA extracts in the collection of the MNCN. Section 1 of the bibliography includes a list of the publications citing the bathynellaceans included in this dataset.
Study area descriptions/descriptor: The area of study includes the whole world. There are over 200 sites from the Iberian Peninsula and Balearic Islands (Camacho et al., Table 2. List of species of Bathynellacea with holotypes and type series deposited in the collections (Arthropods and Tissues and DNA) of the Museo Nacional de Ciencias Naturales de Madrid (CSIC) (Spain). (H) Hyporheic habitat, gravel bank of rivers; (*) Genus described by author(s) of this paper. (**) The holotype and type series of new species described from Spain not deposited in MNCN. The samples come from groundwater caves, springs, wells and interstitial environment (hyporheic) of the epigean river where the stygobionts fauna living in them can be collected.

Taxa
Design description: This dataset was developed to contribute to the knowledge of a group of groundwater Crustacea, Bathynellacea, of worldwide distribution and sparse study; to identify endemic fauna at different geographic scales (country, counties and localities); to value this collection of Madrid MNCN and encourage other colleagues to show less striking results of their work. Prior to digitization, the taxonomic identification pre-existing was reviewed by the specialist AIC. The dataset is exported to Darwin Core v1.2 format and uploaded to the IPT of the GBIF Spanish node (http://www.gbif.es/ipt/resource?r=mncn-artp). Darwin Core elements included in the dataset structure are listed in the dataset description section.

Taxonomic coverage
General taxonomic coverage description: This is a collection of slices and DNA extracts of Bathynellacea, a group of Crustacea Malacostraca ( Figure 6) containing specimens from all known species for Spain, and high percentages of all species known in Europe, as well as some of those described in recent years (2006 onwards) in the other continents (Tables 1, 2 and 3). The collection includes all the samples obtained in the Iberian Peninsula and Balearic Islands since 1983 by AIC, also donated material from these areas and from different parts of the world to AIC for study, as detailed above. Most of the collection is identified to species level. The specimens still without identification to species level have been identified to genus or family level.
The three families of the order Bathynellacea: Bathynellidae, Parabathynellidae and Leptobathynellidae, are all represented in the collection, and in the case of the first two, in the shape of both DNA extracts and permanent slices (Table 3, Figs 1, 3,  7). Leptobathynellidae has been found in North America and southern hemisphere (Asia, Africa and South America) and includes 8 genera and 19 species, while in the collection of the MNCN contains 20 specimens in the shape of DNA extracts, which belong to a species from southern India Parvulobathynella distincta Ranga Reddy et al., 2011 (Table 1). All in all, of the 80 genera known worldwide, almost 40% (31 genera) are represented in the collection (Table 1). This is around 40% of the genera belonging to families Parabathynellidae (18 genera out of 43) and Bathynellidae (12 genera out of 29), and 13% of the genera from Leptobathynellidae (Figure 3). Europe is the continent with most representation in the collection, with 90% of the total genera known included (18 out of 20), followed by Australia with 45% of the genera (five out of 11). On the other hand, Africa remains with the lowest representation with only 14% of the known genera present in the collection (three out of 21). Asia (six out of 29) and America (four out of 19) are equally represented with 21% of the known genera included in the collection (Figure 4). Within the whole set of specimens included in the collection of the MNCN, the family Parabathynellidae has a higher number of genera included (18)    species known (6), there are more species of Bathynellidae in total (11), due to their higher diversity. In the case of Africa, the collection does not include a single genus of the Bathynellidae family. In the case of America, Asia and Australia, only one genus is included (Figure 8). The family Parabathynellidae includes approximately 207 species in total, and 50 of these are preserved in the collection (Tables 1, 2, 3). Out of these, more than half (27 species) are also represented by DNA extracts. There is also a high number of undetermined species, most with DNA extracts. The continent most widely represented in the collection is Europe with 100% of the know genera included, and over 75% (31) of all species known (41) ( Figure 5). On the other hand, the least represented continent is Asia with hardly 9% of the known species included in the collection (four of 45 species). The rest of continents range between 13% and 17% of the species included in this collection. The genus Iberobathynella Schminke, 1973, endemic to the Iberian Peninsula and Balearic Islands, is the most diverse with 22 species, and also the most represented in the collection with 20 species. In addition, the collection of the MNCN also includes the 3 known species of the genus Paraiberobathynella Camacho & Serban, 1998, the 2 known species of de Hexaiberobathynella Camacho & Serban, 1998, and the only known species of the genus Guadalopebathynella Camacho & Serban, 1998. The genus Parabathynella Chappuis, 1926 has a total of three species in all of Europe, and two of them are included in the collection. Finally, the cosmopolitan genus Hexabathynella Schminke, 1972, which includes 23 species worldwide, is represented in the collection by six species, three of them including DNA extracts (Table 3).  The Leptobathynellidae, only known from North America and the Austral hemisphere in Asia, Africa and America with 19 species, is included in the collection through 20 specimens belonging to a single species.
The Bathynellidae is less known across the world than the Parabathynellidae, although particularly in Europe, where its generic and specific diversity is higher, it is the best known family, as well as the most represented in this collection, with 43 of the 103 known species worldwide included (approximately half of these are dubiously assigned to the genus Bathynella Vejdovsky, 1882, which some authors consider cosmopolite) (Figures 5,9). In total, 13 of these species include DNA extracts in the collection (Table 3). There is also a high number of undetermined species, at least 16, and 13 of these include DNA extracts. The collection includes at least 35 European species in total (Table 1); 15 are assigned to the genus Bathynella, but should be revised based on the most recent discoveries offered by molecular techniques. The collection holds five of the seven species known for the genus Gallobathynella Serban et al., 1971, five of the seven species known from the genus Vejdovskybathynella Serban & Leclerc, 1984, and nine of the ten species assigned to the rest of European genera. There are DNA extracts in the collection of several of these. The presence of the genus Pacificabathynella Schminke & Noodt, 1988, in the collection is also important with 4 of the 5 American species known included. In the case of the species P. yupik Camacho et al., 2015 from Alaska, DNA extracts are also preserved. The rest of the continents have a relatively low representation ( Figure 10). It is worth noting the holotype collection and the type series of Bathynellacea housed at the MNCN. Table 2 contains a summary of the new taxa (11 genera and 43 species) described by AIC ranging across different families and continents, and whose holotypes and type series are deposited in the collections of the MNCN, either as permanent slices in the arthropod collection (Figure 9), or as DNA extracts in the tissue and DNA collection (Figures 2, 11). The Parabathynellidae includes 33 holotypes and the type series of ten genera coming from all continents: 20 holotypes come from Spain belonging to the genera Iberobathynella, Guadalopebathynella, Paraiberobathynella, Hexaiberobathynella and Hexabathynella. Four other holotypes belong to new genera and species from Thailand, China and Vietnam, another holotype is a new genus from Montana (USA), and other eight holotypes correspond to six Australian and two African species (Figure 2). In the case of the Bathynellidae, there are en holotypes, six Spanish species from two genera (Paradoxiclamousella Camacho et al., 2013a and Vejdovskybathynella), and 4 more from the USA (Montana and Alaska), all from the genus Pacificabathynella Schminke & Noodt, 1988. Table 4 includes all the details of these species and populations, including information on habitat, locality, year of description, the vouchers of the morphologic holotypes, as well as the molecular type series and the composition of the type series in terms of number of specimens. In the case of most of the newly described European species, from both families, as well as for the two African species and of Pacificabathynella yupik from Alaska, there are DNA extracts included in the collection (Figure 11).

Spatial coverage
General spatial coverage: Specimens from all around the world are included, from Alaska (USA) to New South Wales (Australia). Figure 12 includes the number of records per continent, as well as the part corresponding to permanent slices and DNA extracts. The material from the USA comes from a few samples collected in the states of Montana, Washington, Alaska and Texas, and some of the specimens are still pending identification. In total, the database has 200 records (19 corresponding to DNA extracts) from the four species of Bathynellidae and the two species of Parabathynellidae originating from the 18 localities visited in the previously mentioned states. There are also 25 records from three South American localities in Chile, Brazil and Argentina which represent three species in total. The Asian countries included in the collection are China, Thailand, Vietnam and a pair of localities from Mongolia and India, adding up to 149 records corresponding to six species from a total of nine localities. In the case of Africa, there are samples from Morocco (29 records, 12 DNA extracts, and two species in total from two localities) and Chad (41 records, 14 DNA extracts, and with a total of two species from a single locality). Australia is represented by samples from Queensland and New South Wales, adding to a total of 270 records from seven localities that include 13 species in total (some still undetermined).
The most important part of the database is composed by European records, especially from Spain (2064 records, including more than 50 species, with 631 DNA extracts), although other countries are also represented: Italy (256 records, 40 localities and 15 species), France (158 records, 12 DNA extracts, from 24 localities, and 12 species), Portugal  (116 records, 38 DNA extracts, five localities and 11 species), England (28 records, 11 DNA extracts, four localities and a single species), Bulgaria (21 records, from three localities and four species), Slovenia (26 records, four localities and two species) and Romania (34 records, seven localities and six species) ( Figure 13). In the case of Spain, almost all Autonomous Communities are represented (Figure 14), as well as most of the provinces, although Cantabria (472 records) and Burgos (373 records) are the most widely represented, followed by Asturias (245 records) and Soria, Vizcaya, Huesca and Teruel with more than 100 records for each province. There are records for seven of the eight Andalusian provinces (239 records in total): 76 records for Huelva, 57 for Sevilla, Málaga with 41 records, Almería with 35 records, Córdoba 18 records, Granada with nine records and Jaén with only three records. Cádiz is the only Andalusian province without any information in the database. Madrid has 71 records, Galicia 66, the Balearic Islands (only Mallorca) 57, Navarra 33 records and Catalonia with only four records. The rest of the provinces have relatively few records: León 24, Salamanca only 1, Guadalajara 14 records, Ávila and Toledo, both with four records. The only Autonomous Communities not present in the data base are Extremadura and La Rioja (Table 5).
There are 631 DNA specimens coming from basically all provinces, with the exception of Salamanca, Toledo and Jaén. Again the highest number of these specimens come from Cantabria (172 DNA extracts), followed by Asturias (142 DNA extracts) and Burgos (83 extracts). A detailed analysis of the distribution of species and localities where bathynellaceans live in Spain is available in a data paper previously published (Camacho et al., 2014).

1983-present
Natural collections description Parent collection identifier: NA Collection name: Camacho Collection (AIC), Arthropods Collection and Tissues and DNA Collection Specimen preservation method: permanent slices (glycerin jelly and paraffin) and frozen DNA extracts in water.
Curatorial unit: 3399 with an uncertainty of 0 (records)

Method step description:
The collection has been digitized with MSEXCEL software, compatible with Darwin Core 1.2 or Darwin Core 1.4.
Pre-digitization phase: The identifications of each specimen from each sample has been reviewed recently and some former imprecisions and the discovery of cryptic species (due for example to the use of molecular techniques) have lead modifying some records in the Excel file used as starting point for this work. The initial files were short on the number of fields for each of the records, specimens, sampling sites and dates of sampling (date, locality, province, habitat, collector and the species found with data on the family genus, species and author). Digitization phase: Starting from the initial Excel file, the standard fields for a Darwin Corev1.2 database were added as needed, and the geographical data was included (UTM coordinates) from a GPS in association to the samples taken (PASCALIS samples and all those taken after the year 2000), or were obtained from grey (speleological reports) or published (Notenboom and Meijers 1984;Puch 1998) literature (i.e., the precise location through GPS in the entrance of the caves where bathynellid samples have been collected), or were recorded by the researchers who donated the specimens when possible, as well as from type specimens.
Creation of the dataset: The dataset was exported as a file in Darwin Core1.2 format. Darwin Core elements included in dataset structure are listed in the dataset description section. A Darwin Core table was prepared from the original database project. The fieldto-filed mapping was fine-tuned with the support of GBIF-Spain's Coordination Unit. The resulted table was imported into the Darwin Test tool (http://www.gbif.es/darwin_ test/Darwin_test_in.php, Ortega-Maqueda and Pando, 2008). This tool allows detailed structuring of metadata of the dataset, and also performs a number of quality checks on the data (dataset structure compliance to Darwin core, geographic consistency, date format, etc. currently over sixty of those checks are carried out). Once the potential errors flagged have been checked and corrected, a Darwin Core Archive is generated, also by the DarwinTest tool. The produced DwC-A is then uploaded to the GBIF-Spain's IPT installation (http://www.gbif.es/ipt/resource?r=mncn-artp). From there, the dataset is made public, registered in GBIF and indexed and published by the GBIF data portal.
The dataset was transformed to a Darwin Core Archive format with metadata to ensure rapid discovery of this biodiversity resource and future publishing as a citable academic paper (Chavan and Penev, 2011) Study extent description: The MNCN bathynellacean collection begins with the sampling campaigns of AIC in northern Spain for her doctoral thesis since 1983. Some samples studied by AIC were obtained between 1976 and 1978 by R. Rouch in three short sampling trips to different areas of the Iberian Peninsula. From 1984 to 1986 J. Notenboom, assisted by I. Meijers, and later P. van der Hurk & R. Leys, took groundwater samples throughout Spain and all Bathynellacea they found in these samples were also donated to AIC for study. The following years AIC has continued obtaining samples of this fauna throughout Spain in the framework of different research projects. It is worth noting the PASCALIS European project (2002)(2003)(2004) in which AIC and her team conducted intensive sampling of groundwater fauna in the Cantabrian mountain ranges and north of Burgos, an area where continuous sampling has been done since then, together with C. Puch, increasing substantially the number of Bathynellacea records in Spain. Occasional samplings of particular Parabathynellidae species have been done by AIC and C. Puch in touristic Spanish caves in Andalusia, Murcia and Galicia in order to obtain DNA extracts. On top of this, since the beginning of the 2000s, AIC has been receiving donations for her research coming from Spain, but also from other parts of the world (France, Italy, Bulgaria, England, USA, China, Vietnam, Thailand, Mongolia, Chad and Australia).
The methods used in collecting this kind of samples can be seen in Camacho, 1992 and1994. The samples are fixed in the field in formalin 4%, ethanol 96º, or are frozen. Each sample collected is studied under a binocular microscope in order to isolate the bathynellid specimens found.
The specimens used for morphological study are stored in alcohol (70%). The specimens used for molecular study are frozen at -80ºC. A complete dissection, of all anatomical parts of specimens, dropped on pure glycerin, is necessary for taxonomic study. Both, entire specimens or all parts of a dissection specimen are preserved together in permanent slides and kept in special metal slides. Glycerin gelatin stained with methylene blue and paraffin is the mounting medium (Figure 7). Anatomical examinations are performed using an oil immersion lens (100X) of an interference microscope. Method modified after Serban's method personally transmitted to AIC in 1993and 1995(Perina and Camacho, 2016.
The specific techniques used for molecular analysis for taxonomic application are detailed in Camacho et al. , 2012Camacho et al. , 2013aCamacho et al. , 2015Camacho et al. and 2016 Quality control description: Systematics reliability and consistency is backed by the experience of AIC, who made all identifications in the field of Bathynellacea taxonomy. Recently, some identifications made are being confirmed by molecular data. The validation and cleaning of the associated geographical information has been introduced in several steps as a key issue of the digitization process.