Echinoids of the Kerguelen Plateau – occurrence data and environmental setting for past, present, and future species distribution modelling

Abstract The present dataset provides a case study for species distribution modelling (SDM) and for model testing in a poorly documented marine region. The dataset includes spatially-explicit data for echinoid (Echinodermata: Echinoidea) distribution. Echinoids were collected during oceanographic campaigns led around the Kerguelen Plateau (+63°/+81°E; -46°/-56°S) since 1872. In addition to the identification of collection specimens from historical cruises, original data from the recent campaigns POKER II (2010) and PROTEKER 2 to 4 (2013-2015) are also provided. In total, five families, ten genera, and 12 echinoid species are recorded in the region of the Kerguelen Plateau. The dataset is complemented with environmental descriptors available and relevant for echinoid ecology and SDM. The environmental data was compiled from different sources and was modified to suit the geographic extent of the Kerguelen Plateau, using scripts developed with the R language (R Core Team 2015). Spatial resolution was set at a common 0.1° pixel resolution. Mean seafloor and sea surface temperatures, salinity and their amplitudes, all derived from the World Ocean Database (Boyer et al. 2013) are made available for the six following decades: 1955–1964, 1965–1974, 1975–1984, 1985–1994, 1995–2004, 2005–2012. Future projections are provided for several parameters: they were modified from the Bio-ORACLE database (Tyberghein et al. 2012). They are based on three IPCC scenarii (B1, AIB, A2) for years 2100 and 2200 (IPCC, 4th report).


Study extent description
The study area of this dataset includes the Kerguelen Plateau, located at the boundary between the Indian and Southern Oceans, in the flow of the Antarctic Circumpolar Current (Park and Vivier 2011). The plateau is the second largest oceanic igneous province on Earth. It is positioned between 46°S and 62°S latitude, between 63°E and 81°E longitude, and it extends over 500 km from East to West and 2,100 km from North to South for a total surface area of 2.10 6 km 2 (Cottin et al. 2011).
The Kerguelen Plateau is subdivided into the Kerguelen Islands shelf in the north and the Heard and McDonalds Islands shelf in the south. The two shelves are separated by a controlling oceanographic barrier: the Polar Front, which position has recurrently been discussed (Park et al. 2014). Topography and currents also strongly control other environmental parameters (temperature, salinity, chlorophyll a concentration) in the vicinity of the Plateau (Graham et al. 2012, Chacko et al. 2014. The Kerguelen Plateau hosts important economic activities, namely through fishing, generating potential issues for the conservation of marine biodiversity. Exploitation of the marine living resources of the Kerguelen Plateau has been sustainably managed by CCAMLR (Commission for the Conservation of Antarctic Marine Living Resources) and by the TAAF (French Southern and Antarctic Lands) in the French EEZ (Exclusive Economic Zone) with scientific support from the Muséum national d'Histoire naturelle of Paris since 1978 (Duhamel and Williams 2011). In the Australian EEZ, in the south, a similar management system was established in 1979 and was followed by the designation in 2002 of the Heard Island and McDonald Islands (HIMI) Marine Protected Area: one of the world's largest MPA with an area of 65,000 km 2 (Welsford et al. 2011).
The Kerguelen Plateau represents a vast marine area challenged by strong anthropogenic and natural pressures. Relatively few scientific programs have studied marine biodiversity of the Kerguelen Plateau, leaving it poorly documented. In this context, environmental descriptors could prove to be useful proxies to infer species distribution when occurrence data are missing (Hemery et al. 2011).
In addition to the study of collection specimens sampled during historical cruises and identified at species level, the present work also provides original data collected during the recent oceanographic campaign POKER II (2010) and during three field summer campaigns of the IPEV program 1044 PROTEKER (2013)(2014)(2015) led in nearshore areas of the Kerguelen Islands. The spatial extent of the dataset was based on the bathymetric range of echinoids for species distribution modelling to be performed with limited extrapolations.

Design description
Our project aimed at improving the robustness of existing modelling approaches in the case of areas for which only poor and heterogeneous biodiversity data are available, a situation prevailing in the region of the Kerguelen Plateau, and generally in the Southern Ocean (Gutt et al. 2012).
Data compilation from various sources implies temporal heterogeneities that may constitute a critical point when building species distribution models (Aguiar et al. 2015). Spatial and sampling heterogeneities are also likely to introduce biases due to differences in sampling strategies and the gears used during the various cruises. Our objectives were (1) to assess the influence of temporal, spatial, and sampling heterogeneities on species distribution modelling using datasets of echinoid occurrences on the Kerguelen Plateau, (2) to model echinoid distribution on the Kerguelen Plateau for different time periods, and (3) to evaluate potential shifts in species distribution with regards to future projections based on IPCC scenarii (Jueterbock et al. 2013).

Data description
Occurrence data were compiled from many oceanographic campaigns led over a long time-period starting with the Challenger Expedition in 1872 and ending with the recent PROTEKER campaigns that took place between 2013 and 2015 ( Table 1). The dataset was modified after  and Saucède et al. (2015a). Specimens from recent cruises (POKER II and PROTEKER) were identified at species level and added to the dataset. Occurrences are presence-only data for which different sampling tools, protocols, and strategies were used. Moreover, the study area was unevenly investigated, sampling effort being stronger in the northern than in the southern part of the Plateau ( Figure 1). Accordingly, campaigns and sampling dates are mentioned in the dataset to take into account spatial and time heterogeneities.
The environmental descriptors provided in the dataset were compiled from different sources (Table 2 -see Annex). They were selected according to their relevance to echinoid ecology.
Environmental data were formatted with R3.3.0 software (R Core Team 2015) to fit the sampling area of where echinoids occur on the Kerguelen Plateau (+63°/+81°E; -46°/-56°S). They were set up to a 0.1° grid-cell spatial resolution with origin fixed at 0 (top left corner). Seafloor temperature, salinity, oxygen and nutrient concentration data were generated by using the provided data of the World Ocean Database (Boyer et al. 2013) and depth data. In marine nearshore areas, grid-cells with positive depth values above sea level were corrected for accuracy using ArcGis Raster Editor Tool (ESRI 2011) based on geographic charts (IGN: National Geographic Institute, EAN: 3282110102707, scale 1/200 000) and raw depth values measured in the field , Saucède et al. 2015b.
The time coverage of the environmental data extends from 1955 to 2012. Mean annual surface and seafloor temperatures, salinity and their respective amplitudes (i.e., amplitude between mean summer (January to March) and mean winter (July to September) surface and seafloor temperatures and salinities) are available for the following six decades: 1955 to 1964, 1965 to 1974, 1975 to 1984, 1985 to 1994, 1995 to 2004, and 2005 to 2014. Future projections of sea surface temperature, salinity, and amplitude were downloaded from the Bio-ORACLE database . Projections are based on the IPCC A2, A1B, and B1 scenarii published in the 4 th IPCC report (2007). The modelled data correspond to the extrapolated means for two decades: 2087-2096 (here referred to as 2100) and 2187-2196 (here referred to as 2200) (Jueterbock et al. 2013).
All the environmental descriptors and metadata sources are detailed in the data catalog (Table 2) and data are provided in an ascii raster format. N/A was set as the no data reference for missing data.

Quality control description
Specimens sampled during POKER II and PROTEKER 2, 3 and 4 campaigns were all identified by T. Saucède at the species level. Identifications and taxonomic accuracies are based on Anderson (2009) The final compiled dataset was checked for consistency using the WoRMS database (WoRMS Editorial Board 2016) in order to match our data with the most up-todate taxonomy. The dataset was checked for duplicates and errors due to overlapping origins, georeferencing mistakes, and species synonymy or mis-spelling. Only occurrence data identified at the species level were included.
Environmental data relies on different sources as reported in Table 2. The range of data was studied to check for variables consistencies. Data were not interpolated to limit interpolation biases and missing data were reported as N/A values.

General taxonomic coverage description:
The present dataset focuses on all species of the class Echinoidea (Echinodermata) occurring on the Kerguelen Plateau. Echinoids are common species of benthic communities in the Southern Ocean and on the Kerguelen Plateau (David et al. 2005). They are diversified and well-studied. Historical data are available since 1872, starting with the Challenger Expedition, and are completed with recent occurrences collected nearshore areas of the Kerguelen Islands during the PROTEKER campaigns (2013)(2014)(2015).
Echinoid studies take part in conservation issues. Ctenocidaris nutrix is considered a Vulnerable Marine Ecosystems (VME) indicator species by CCAMLR (Commission for the Conservation of Antarctic Marine Living Resources) and is widely distributed on the Kerguelen Plateau.
On the Kerguelen Plateau, the Class Echinoidea includes five families, ten genera, and 12 species. Species distribution is shown in Figure 2.

Dataset of actual environmental parameters description
Environmental variables in the region of the Kerguelen Plateau compiled from different sources and provided in the ascii raster format (Guillaumot et al. 2016). Mean surface and seafloor temperature, salinity and their respective amplitude data are available on the time coverage 1955-2012 and over six decades: 1955 to 1964, 1965 to 1974, 1975 to 1984, 1985 to 1994 and 1995 to 2004, and 2005 to 2012. Future projections are provided for several parameters: they were modified after the Bio-ORACLE database . They are based on three IPCC scenarii (B1, AIB, A2) for years 2100 and 2200 (IPCC, 4 th report).