Taxonomic revision of the Malagasy Nesomyrmex madecassus species-group using a quantitative morphometric approach

Abstract Here we reveal the diversity of the next fragment of the Malagasy elements of the ant genus Nesomyrmex using a combination of advanced exploratory analyses on quantitative morphological data. The diversity of the Nesomyrmex madecassus species-group was assessed via hypothesis-free nest centroid clustering combined with recursive partitioning to estimate the number of clusters and determine the most probable boundaries between them. This combination of methods provides a highly automated species delineation protocol based on continuous morphometric data, and thereby it obviates the need of subjective interpretation of morphological patterns. Delimitations of clusters recognized by these exploratory analyses were tested via confirmatory Linear Discriminant Analysis (LDA). Our results suggest the existence of four morphologically distinct species, Nesomyrmex flavus sp. n., Nesomyrmex gibber, Nesomyrmex madecassus and Nesomyrmex nitidus sp. n.; all are described here and an identification key for their worker castes using morphometric data is given. Two members of the newly outlined madecasus species-group, Nesomyrmex flavus sp. n. and Nesomyrmex nitidus sp. n., represent true cryptic species. Geographic maps depicting species distributions and elevational information for the sites where populations of particular species were collected are also provided.


Introduction
The ant fauna of the Malagasy zoogeographical region, i.e. Madagascar and its surrounding islands (Bolton 1994), has recently been the subject of intensive systematic research (Fisher 2009, Blaimer and Fisher 2013, Yoshimura and Fisher 2012, Hita-Garcia and Fisher 2014. Thanks to these efforts to explore Malagasy biodiversity, our knowledge of the island's myrmecofauna has increased considerably. These latest findings support earlier assumptions about the high species diversity of the region. The goal of the current paper is to contribute to this endeavor and clarify the taxonomy of another segment of the Malagasy Nesomyrmex fauna, the Nesomyrmex madecassus species-group. The four species in this group are known to nest in small diameter (pencil size) dead twigs above ground. They can be found foraging on tree trunks and occasionally in the leaf litter at higher elevations. There is also the occasional record of nests in rotten logs at higher elevations. But in general, to collect these species, the best approach is to break open small dead twigs. We know little of their biology but field observations suggest they are generalist scavengers. Morphological diversity is assessed via a taxonomic protocol NC-PART clustering introduced by Fisher (2016a, 2016b) based on multivariate analyses of quantitative morphological data. This method incorporates elements of NC-clustering  and the partitioning algorithms known as 'part' . Benefits of the combined application of Nest Centroid clustering (NC clustering) and Partitioning Algorithm based on Recursive Thresholding (PART) was described in detail in Fisher (2016a, 2016b) and its efficiency in species delimitation has proven in two Nesomyrmex species-groups and in a fragment of the Malagasy Camponotus fauna (Rakotonirina et al. 2016). The NC clustering searches for discontinuity in morphometric data by sorting all similar cases into clusters in a two-step procedure. This technique has proved efficient at pattern recognition within large and complex datasets, but the number of clusters is still subjectively defined based on the obtained dendrogram. The partitioning method PART allows for estimation on the number of clusters via recursive application of the Gap statistic (Tibshirani et al. 2001) algorithm and automated assignment of each sample in either clusters.
Multivariate evaluation of morphological data has revealed that the N. madecassus species-group incorporates four well-outlined clusters in the Malagasy zoogeographical region, all representing species. Two of them, Nesomyrmex gibber (Donisthorpe, 1946) and N. madecassus (Forel, 1892) are already described taxa, but two new species, N. flavus sp. n. and N. nitidus sp. n., are being described here based on worker caste. The latter two species represent true cryptic species (Seifert 2009) which can be convincingly separated by using a combination of morphometric data. We provide a combined key that uses a traditional, character-based key, and a separation of the two cryptic taxa, N. flavus sp. n. and N. nitidus sp. n. is supported by a character combination. Morphological patterns are linked to geographic map elevations of the sites where populations were collected and are also provided as predictor variables.

Material and methods
The group was defined earlier by Csősz and Fisher (2015) as one of the four remarkable lineages occurring in the region, and defined as follows: "Pronotal spines absent. Anterodorsal spines on petiolar node absent. Propodeal spines short, lamelliform to absent. Vertex ground sculpture smooth. Vertex main sculpture not defined. Metanotal depression present. Median clypeal notch present or absent. Median clypeal notch shape/depth 0-15 µm. Antennomere count: 12. Absolute cephalic size (CS): 571 µm [405,785] In the present study, 18 continuous morphometric traits were recorded in 231 worker individuals belonging to 172 nest samples collected in the Malagasy region.
The material is deposited in the following institutions, abbreviations after Evenhuis (2013) All images and specimens used in this study are available online on AntWeb (http://www.antweb.org). Images are linked to their specimens via the unique specimen code affixed to each pin (CASENT0101667). Online specimen identifiers follow this format: http://www.antweb.org/specimen/CASENT0101667. Digital color montage images were created using a JVC KY-F75 digital camera and Syncroscopy Auto-Montage software (version 5.0), or a Leica DFC 425 camera in combination with the Leica Application Suite software (version 3.8). Distribution maps were generated in R (R Core Team 2015) via 'phylo.to.map' function using package phytools (Revell 2012).
The measurements were taken with a Leica MZ 12.5 stereomicroscope equipped with an ocular micrometer at a magnification of 100×. Measurements and indices are presented as arithmetic means with minimum and maximum values in parentheses. Body size dimensions are expressed in µm. Due to the abundance of worker individuals available relative to queen and male specimens, the present revision is based on worker caste only. Worker-based revision is further facilitated by the fact that the name-bearing type specimens of the vast majority of existing ant taxa belong to the worker caste. All measurements were made by the first author. For the definition of morphometric characters, earlier protocols , Csősz and Fisher 2015, 2016a, 2016b were considered. Explanations and abbreviations for measured characters are as follows:

CL
Maximum cephalic length in median line. The head must be carefully tilted to the position providing the true maximum. Excavations of hind vertex and/or clypeus reduce CL.

CW
Maximum width of the head. Includes compound eyes.

CWb
Maximum width of head capsule without the compound eyes. Measured just posterior of the eyes.

CS
Absolute cephalic size. The arithmetic mean of CL and CWb.

EL
Maximum diameter of the compound eye.

FRS
Frontal carina distance. Distance of the frontal carinae immediately caudal of the posterior intersection points between frontal carinae and the torular lamellae. If these dorsal lamellae do not laterally surpass the frontal carinae, the deepest point of scape corner pits may be taken as the reference line. These pits take up the inner corner of the scape base when the scape is directed caudally and produces a dark triangular shadow in the lateral frontal lobes immediately posterior to the dorsal lamellae of the scape joint capsule.

ML (Weber length)
Mesosoma length from caudalmost point of propodeal lobe to transition point between anterior pronotal slope and anterior pronotal shield. Preferentially measured in lateral view; if the transition point is not well defined, use dorsal view and take the center of the dark-shaded borderline between pronotal slope and pronotal shield as anterior reference point. In gynes: length from caudalmost point of propodeal lobe to the most distant point of steep anterior pronotal face.

MW
Mesosoma width. In workers MW is defined as the longest width of the pronotum in dorsal view excluding the pronotal spines. MPST Maximum distance from the center of the propodeal stigma to the anteroventral corner of the ventrolateral margin of the metapleuron. NOH maximum height of the petiolar node. Measured in lateral view from the uppermost point of the petiolar node perpendicular to a reference line extending from the petiolar spiracle to the imaginary midpoint of the transition between dorso-caudal slope and dorsal profile of caudal cylinder of the petiole.

NOL
Length of the petiolar node. Measured in lateral view from the center of petiolar spiracle to dorso-caudal corner of caudal cylinder. Do not erroneously take as the reference point the dorso-caudal corner of the helcium, which is sometimes visible. PEH maximum petiole height. The chord of the ventral petiolar profile at node level is the reference line perpendicular to the line describing the maximum height of petiole.

PEL
Diagonal petiolar length in lateral view; measured from anterior corner of subpetiolar process to dorso-caudal corner of caudal cylinder. In verbal descriptions of taxa based on external morphological traits, recent taxonomic papers Fisher 2015, 2016) were considered. Definitions of surface sculpturing are linked to Harris (1979). Body size is given in µm, means of morphometric ratios as well as minimum and maximum values are given in parentheses with up to three digits. Inclinations of pilosity given in degrees. Definitions of speciesgroups as well as descriptions of species are surveyed in alphabetic order.
Statistical framework-hypothesis formation and testing. The present statistic framework follows the procedure applied in Fisher (2016a, 2016b). Advantages and limitations of the present procedure are discussed there.
Generating prior species hypotheses via the combined application of NC clustering and PART. This method searches for discontinuities in continuous morphometric data and sorts all similar cases into the same cluster in a two-step procedure. The first step reduces dimensionality in data with cumulative linear discriminant analysis (LDA) using nest samples (i.e. individuals collected from the same nest are assumed genetically closely related, often sisters) as groups . The second step calculates pairwise distances between samples using LD scores as input and the distance matrix is displayed in a dendrogram. The NC-clustering was done via packages cluster (Maechler et al. 2014) and MASS (Venables and Ripley 2002).
The ideal number of clusters was determined by Partitioning Algorithm based on Recursive Thresholding via the package clusterGenomics  using the function 'part', which also assigns observations (i.e. specimens, or samples) into partitions. The method estimates the number of clusters in a data based on recursive application of the Gap statistic (Tibshirani et al. 2001) and is able to discover both top-level clusters as well as sub-clusters nested within the main clusters. If more than one cluster is returned by the Gap statistic, it is re-optimized on each subset of cases corresponding to a cluster until a stopping threshold is reached or the subset under evaluation has less than 2*minSize cases ). Two clustering methods, "hclust" and "kmeans" are used to determine the optimal number of clusters with 1000 bootstrap iterations. The results of PART are mapped on the dendrogram by colored bars via function 'mark.dendrogram' found in (Beleites and Sergo 2015). The script written in R and can be found in Supporting In-formation. The script is published by Fisher (2016a, 2016b) and is freely accessible.
Arriving at final species hypothesis using confirmatory Linear Discriminant Analysis (LDA) and LDA ratio extractor. To provide increased reliability of species delimitation, hypotheses on clusters and classification of cases via exploratory processes were confirmed by LDA Leave-one-out cross-validation (LOOCV). Classification hypotheses were imposed for all samples congruently classified by partitioning methods while wild-card settings (i.e. no prior hypothesis imposed on its classification) were given to samples that were incongruently classified by the two methods or proved to be outliers.
Interpreting discriminant functions as identification tools. In this paper discriminant function analysis is used to determine which variables discriminate between two or more cryptic species. The discriminant functions (D2 and D4) provided in the key and differential diagnoses offer moderately time consuming but accurate opportunities to identify every single individual. The linear equation of the discriminant functions are as follows: D m = a 1 *x 1 + a m *x m + c, where c is a constant, a 1 through a m are the characters in micrometer and x 1 and x m are coefficients. The equation must be calculated with the trait names (e.g. SL) substituted with the length of the corresponding traits in micrometer (e.g. 625). The dimensionless number (D m ) returned by the equation must fit either of the species' scores showing the identity of that particular individual.

Results
Altogether, four remarkable clusters were recognized by both clustering algorithms "hclust" and 'kmeans' using function 'part'. The pattern returned by these partitioning algorithms can be fitted on the hierarchical structure seen on the dendrogram generated by NC clustering (Fig. 1). The grouping hypotheses generated by the combination of hypothesis-free exploratory analyses was validated by Linear Discriminant Analysis with leave-one-out cross-validation (LOOCV-LDA). The overall classification success is 98% (Table 1), hence the four clusters solution is accepted as the final species hypothesis. The four species described here are as follows in alphabetic order: N. flavus sp. n., Nesomyrmex gibber (Donisthorpe, 1946), N. madecassus (Forel, 1892) and N. nitidus sp. n.. Two of the four morphologically diagnosable OTUs, gibber and madecassus, differ in many qualitative characters (e.g. shape of propodeal spines, petiolar node,

Dendrogram of agnes(x = mpredlda, method = "average")
Agglomerative Coefficient = 0.74  surface sculpturing etc.), but the two others, flavus and nitidus, represent true cryptic species in the sense of Seifert (2009). Morphometric data for species calculated on individuals are given in Table 2. Three of four species, N. flavus sp. n., N. madecassus (Forel, 1892) and N. nitidus sp. n. occur in Madagascar exhibiting different but overlapping geographic distribution (Fig. 2) and elevational ranges (Fig. 3). Nesomyrmex gibber is known to occur only in Mauritius.  Diagnosis. Workers of N. flavus cannot be confused with N. gibber because the conspicuous mesothoracic hump which is a diagnostic character of the latter species is absent in N. flavus workers. This species can be easily separated from dark phenotypes of N. madecassus by color: the dark madecassus phenotypes are dark brown but the workers of N. flavus are light yellow. Morphometric ratio (PoOC/CW) and discri-minant D4 function helps to separate N. flavus from ocher madecassus phenotypes; further details are given in diagnosis under N. madecassus.

Synopsis of Malagasy members of the
The workers of this species are the most similar to that of N. nitidus. The elevational distribution of the two species may provide hints useful for separation (Fig. 3) but the ranges broadly overlap. These taxa represent true cryptic species which cannot be identified based on qualitative characters (i.e. sculpture, shape or color), and their overlapping range means ratios cannot be used for identification. Therefore, only a discriminant D2 function with a greatly reduced character set (D2 = +0.0847*SL -0.0625*MW -15.038) yields complete separation (morphometric data are in micrometer): flavus D2 (n = 61) = +3.09 [+0.98, +5.33] nitidus D2 (n = 79) = -2.39 [-4.63, +0.19] For now, this remains the simplest method available to separate workers of these two taxa, but in the future, when more information about these species has been accumulated, we hope to find a reliable and easy-to-use diagnostic trait.
Biology and distribution. This species is known to occur in Madagascar's rain forests at high altitudes between 200 and 1755 m, mean: 1190 m (Fig. 3). This species is known to forage in low vegetation and nests can often be found in dead twigs. This species has occasionly been collectied in leaf litter (leaf mold, rotten wood), or in rotten tree stumps.
Biology and distribution. Endemic to Mauritius island. Occur in rainforests in higher altitude between 500 and 800 meters, (mean = 714 m). This species van be collected on low vegetation and in dead stems.  Diagnosis. Workers of this species differ from that of N. gibber by having no mesothoracic hump, and from N. flavus sp. n. and N. nitidus sp. n. by its dark brown color versus the light yellow hue of the two latter species.
The dark color in madecassus populations is dominant across the entire known distributional area, and comprises ~95% of the examined material. However, a rare, lighter-colored madecassus phenotype (ocher phenotype) was also found in a few localities. There is no evidence, other than color, that would suport heterospecifity of these two discrete phenotypes of N. madecassus workers and no correlation was found between elevational cline and color. Only one mixed sample is known to include both ocher and dark phenotype. Ocher madecassus phenotypes are darker than the majority of N. flavus and N. nitidus workers and also differ from the latter species by having brown femora and a dark patch on the first gastral tergite.
Nesomyrmex madecassus workers (including ocher phenotypes) can be separated from those of N. flavus and N. nitidus using the ratio of postocular area to cephalic width including compound eyes (PoOC/CW), which yielded only three misclassified cases: Biology and distribution. This species is known to occur in Madagascar's rain forests at very high altitudes between 690 and 2150 m, mean: 1538 m (Fig. 3). This species is known to forage in low vegetation, nests can often be found in dead twigs, or rarely in leaf litter (leaf mold, rotten wood), or in rotten tree stumps. Diagnosis. Workers of N. nitidus cannot be confused with N. gibber because the conspicuous mesothoracic hump that is a diagnostic character of the latter species is absent in N. nitidus workers. This species also can be easily separated from dark phenotypes of N. madecassus based on color: the dark madecassus phenotypes are dark brown but the workers of N. nitidus are light yellow. Morphometric ratio (PoOC/CW) and discriminant D4 function helps to separate N. nitidus from ocher madecassus phenotypes; further details are given in Diagnosis under N. madecassus.

Nesomyrmex nitidus
The workers of this species are the most similar to that of N. flavus. The broadly overlapping elevational distribution as well as qualitative and quantitative traits of N. flavus and N. nitidus workers hamper easy separation. A simplified discriminant D2 function with a greatly reduced character set for safe separation is provided in the diagnosis section of N. flavus.
Biology and distribution. This species typically occurs in Madagascar's rain forests at lower altitudes between 10 and 1550 meter, mean: 383 m (Fig. 3). This species is known to forage in low vegetation, nests can often be found in dead twigs, stems above ground or rarely in rotten logs at higher elevations. drasana, Dimby Raharinjanahary). Special thanks due to Flavia Esteves for helping in R. This study was supported by the National Science Foundation under Grant No. DEB-0072713, DEB-0344731, and DEB-0842395. Finally, SC was supported by the Schlinger Fellowship at the California Academy of Sciences and an Ernst Mayr Travel Grants to the MCZ.