DNA barcoding reveals twelve lineages with properties of phylogenetic and biological species within Melitaea didyma sensu lato (Lepidoptera, Nymphalidae)

Abstract The complex of butterfly taxa close to Melitaea didyma includes the traditionally recognized species Melitaea didyma, Melitaea didymoides and Melitaea sutschana, the taxa that were recognized as species only relatively recently (Melitaea latonigena, Melitaea interrupta, Melitaea chitralensis and Melitaea mixta) as well as numerous described subspecies and forms with unclear taxonomic status. Here analysis of mitochondrial DNA barcodes is used to demonstrate that this complex is monophyletic group consisting of at least 12 major haplogroups strongly differentiated with respect to the gene COI. Six of these haplogroups are shown to correspond to six of the above-mentioned species (Melitaea didymoides, Melitaea sutschana, Melitaea latonigena, Melitaea interrupta, Melitaea chitralensis and Melitaea mixta). It is hypothesized that each of the remaining six haplogroups also represents a distinct species (Melitaea mauretanica, Melitaea occidentalis, Melitaea didyma, Melitaea neera, Melitaea liliputana and Melitaea turkestanica), since merging these haplogroups would result in a polyphyletic assemblage and the genetic distances between them are comparable with those found between the other six previously recognized species.


Introduction
The complex of butterfly taxa close to Melitaea didyma (Esper, 1779) is widely distributed in the Palaearctic region. This complex includes the traditionally recognized species M. didyma, M. didymoides Eversmann, 1847 andM. sutschana Staudinger, 1892 , the taxa that were recognized as species only recently (M. latonigena Eversmann, 1847, M. interrupta Colenati, 1846 , M. chitralensis Moore, 1901 andM. mixta Evans, 1912) as well as numerous described subspecies and forms with unclear taxonomic status (Higgins 1941, 1955, Hesselbarth et al. 1995, Kolesnichenko 1999, Kolesnichenko et al. 2011). All these taxa are similar in male and female wing pattern and genitalia structure (Higgins 1941). In our opinion, this complex does not include the species M. deserticola Oberthür, 1909, M. ala Staudinger, 1881, M. enarea Frühstorfer, 1917and M. persea Kollar, 1849 which are similar to M. didyma in wing color and pattern but were shown to be distinctly different with respect to genitalia structure (Higgins 1941). The first significant review of this complex was published by Higgins (1941Higgins ( , 1955 in frame of a complete revision of the genus Melitaea. Recently the genus Melitaea was revised by Oorschot and Coutsis (2014). The taxa within the M. didyma complex have a strong morphological variation between individuals of different generations and indistinct clinal variability in wing size and color from north to south (Lvovsky and Morgun 2007). Available cytogenetic (Lukhtanov and Kuznetsova 1989), morphological (Lvovsky and Morgun 2007, Kolesnichenko et al. 2011, Oorschot and Coutsis 2014 and molecular (Wahlberg and Zimmermann 2000, Lukhtanov et al. 2009, Dincă et al. 2015 data show that the M. didyma species complex requires a more detailed taxonomic revision. Here analysis of mitochondrial DNA barcodes is used to demonstrate that this complex is a natural (monophyletic) group consisting of at least 12 major haplogroups strongly differentiated with respect to the gene COI. Then the taxonomy of the M. didyma species complex is discussed.

Material and methods
Standard COI barcodes (658-bp 5' segment of mitochondrial cytochrome oxidase subunit I) were studied. COI sequences were obtained from 85 specimens collected in Afghanistan, Armenia, Austria, Bulgaria, China, Israel, Kazakhstan, Kyrgyzstan, Mongolia, Morocco, Russia, Syria, Tajikistan, Turkey and Uzbekistan. Collection data of these samples are presented in the Suppl. material 1.
Legs from 24 specimens (KT792884-KT792908, see the Suppl. material 2) were processed at the Department of Karyosystematics of the Zoological Institute of the Russian Academy of Sciences. The set of voucher specimens of these butterflies is kept in the Zoological Institute of the Russian Academy of Science (St. Petersburg). DNA was extracted from a single leg removed from each voucher specimen. For DNA extraction we used the GeneJet Genomic DNA Purification Kit (Fermentas) in accordance with the manufacturer's instructions. Extracted DNA samples were stored at -20 °C.
For DNA amplification we used primers LepF 5'-ATTCAACCAATCATAAA-GATATTGG-3' and LepR (5'-TAAACTTCTGGATGTCCAAAAAATCA-3' (de-Waard et al. 2008). The polymerase chain reaction (PCR) was carried out in 25-mL reactions using a DNA Engine thermal cycler (Eppendorf Mastercycler personal), and typically contained 0.5 mM of each primer, 0.8 mM dNTPs, Fermentas PCR buffer with additional MgCl2 to a final concentration of 2 mM and 1.25 units Fermentas Taq DNA polymerase. All reactions were initially denatured at 94 °C for 2 min, and then subjected to 30 cycles of 60 s at 94 °C denaturation, 60 s at 47 °C and 90 s at 72 °C extension. After amplification, double-stranded DNA was purified using GeneJet PCR Purification Kit (Fermentas). Sequencing of double-stranded product was carried out at the Research Resource Center for Molecular and Cell Technologies.
Legs from 61 specimens of Melitaea (HM404715-HM404718, KT874693-KT874751, see the Suppl. material 2) were processed at the Canadian Centre for DNA Barcoding (CCDB, Biodiversity Institute of Ontario, University of Guelph) using standard high-throughput protocol described in deWaard et al. (2008). The set of voucher specimens of these butterflies is kept at the McGuire Center for Lepidoptera and Biodiversity (University of Florida), at the Zoological Institute of the Russian Academy of Science (St. Petersburg) and in Museum for Insects, Pyatigorsk, Russia (Suppl. material 1).
The analysis involved 148 COI sequences (including outgroup). Among them there were 63 published sequences (Wahlberg and Zimmermann 2000, Vila and Bjorklund 2004, Leneveu et al. 2009, Lukhtanov et al. 2009, Dincă et al. 2011, Hausmann et al. 2011, Ashfaq et al. 2013) collected from GenBank (Suppl. material 2). Sequences were aligned using BioEdit software (Hall 1999) and edited manually. Phylogenetic hypotheses were inferred using Bayesian inference (BI), maximum-likelihood (ML) and maximum-parsimony (MP) analyses as described previously (Vershinina andLukhtanov 2010, Talavera et al. 2013a). Briefly, Bayesian analyses were performed using the program MrBayes 3.1.2 (Huelsenbeck and Ronquist 2001) with default settings as suggested by Mesquite (Maddison and Maddison 2015): burn-in=0.25, nst=6 (GTR + I + G). Two runs of 10,000,000 generations with four chains (one cold and three heated) were performed. Chains were sampled every 10,000 generations. The average value of the Potential Scale Reduction Factor (PSRF) was 1.002 and average standard deviation of split frequencies was 0.01492, to the end of the analysis indicating that convergence was achieved, and a good sample from the posterior probability distribution was obtained.
The ML trees were inferred by using MEGA6 (Tamura et al. 2013) with the nucleotide substitution model T92 (Tamura 1992) as suggested by jModelTest (Posada 2008).
MP analysis was performed using a heuristic search as implemented in MEGA6 (Tamura et al. 2013). A heuristic search was carried out using the close-neighbourinterchange algorithm with search level 3 (Nei and Kumar 2000) in which the initial trees were obtained with the random addition of sequences (100 replicates). We used non-parametric bootstrap values (Felsenstein 1985) to estimate branch support on the reconstructed ML and MP tree. Branch support was assessed using 1000 bootstrap replicates.

Results and discussion
This analysis recovered the M. didyma group as a strongly supported monophyletic clade (Fig. 1). Within this group many clades were well supported, whereas some of the relationships were not fully resolved (Figs 2 and 3). Within the complex we identified 12 differentiated major COI haplogroups. All of them showed a strict attachment to the localities (Fig. 4). Therefore in order to designate these haplogroups, we chose the oldest available name that was described from the area of each haplogroup: M. mauretanica Oberthür, 1909, M. occidentalis Staudinger, 1861, M. didyma Esper, 1779 (Larsen 1974, Benyamini 2002, Tshikolovets 2011. However, the name liliputana was preferred since ICZN states priority of the oldest available name (article 23, Principle of Priority).
Here we also report an observation of parapatry between neera and turkestanica in South Altai and Zaisan valley in East Kazakhstan (shown by green color). In this area the distribution ranges of these taxa overlap, however, the taxa are separated ecologically: M. neera is associated with the steppe biotopes and M. turkestanica is associated with deserts.
Sympatry was also found between haplogroups occidentalis and didyma sensu stricto in Spain (shown by yellow, Dincă et al. 2015). However, morphology and ecology of the bearers of these haplogroups were not analyzed in the contact zone. Therefore, evolutionary and taxonomic interpretation of this case of sympatry is difficult. It may represent sympatric distribution of two different species or may be a consequence of mitochondrial introgression between the allopatric pair occidentalis-didyma.  (Higgins 1941, Larsen 1974, Benyamini 2002, Tshikolovets 2011 and to one form (M. mauretanica) whose status (subspecies or individual variations) is unclear (Higgins 1941).
There is good evidence based on analysis of morphology and observations of taxa in sympatry that M. didymoides, M. sutschana, M. latonigena, M. interrupta, M. chitralensis and M. mixta represent true biological species (Higgins 1941, Lukhtanov and Kuznetsova 1989, Hesselbarth et al. 1995, Kolesnichenko 1999, Kolesnichenko et al. 2011. Theoretically, the remainder of the M. didyma complex can be interpreted as a single species M. didyma. However, such an interpretation meets two difficulties. Firstly, such a lumping would result in a polyphyletic assemblage. Monophyly is the basic principle of phylogenetics and taxonomy. The majority of taxonomists currently believe that monophyly, in the narrow sense used by Hennig (Hennig 1966, Envall 2008, Hörandl and Stuessy 2010 is mandatory. Thus avoiding non-monophyletic groups and focusing on monophyletic entities is the preferable option in practical terms (Talavera et al. 2013b). The COI barcodes alone can provide weak evidence for monophyly of taxa since trees inferred from single markers sometimes display relationships that reflect the evolutionary histories of individual genes rather than the species being studied. Mitochondrial introgression ) and Wolbachia infection (Ritter et al. 2013) can lead to additional bias in inferring phylogenetic relationships. Despite these limitations, we argue that, until not falsified, clusters based on DNA barcode monophyly represent preferable primary taxonomic hypotheses than the clusters based on para-or polyphyletic DNA barcode assemblages.
Secondly, the uncorrected p-distances between these taxa are high (from 1.3% between neera and liliputana to 3.9% between liliputana and occidentalis). Although some of them are lower than the 'standard' 2.7-3.0% DNA-barcoding threshold usually used for allopatric taxa as an indicator for their species distinctness (Lambert et al. 2005, Lukhtanov et al. 2015, even the lowest distances are comparable with those found between other six well recognized species. For example, distances between interrrupta, latonigena and mixta and their sympatric/parapatric non-conspecifics are 1.6-1.9% (Table 1).