Phylogenetic utility of ribosomal genes for reconstructing the phylogeny of five Chinese satyrine tribes (Lepidoptera, Nymphalidae)

Abstract Satyrinae is one of twelve subfamilies of the butterfly family Nymphalidae, which currently includes nine tribes. However, phylogenetic relationships among them remain largely unresolved, though different researches have been conducted based on both morphological and molecular data. However, ribosomal genes have never been used in tribe level phylogenetic analyses of Satyrinae. In this study we investigate for the first time the phylogenetic relationships among the tribes Elymniini, Amathusiini, Zetherini and Melanitini which are indicated to be a monophyletic group, and the Satyrini, using two ribosomal genes (28s rDNA and 16s rDNA) and four protein-coding genes (EF-1α, COI, COII and Cytb). We mainly aim to assess the phylogenetic informativeness of the ribosomal genes as well as clarify the relationships among different tribes. Our results show the two ribosomal genes generally have the same high phylogenetic informativeness compared with EF-1α; and we infer the 28s rDNA would show better informativeness if the 28s rDNA sequence data for each sampling taxon are obtained in this study. The placement of the monotypic genus Callarge Leech in Zetherini is confirmed for the first time based on molecular evidence. In addition, our maximum likelihood (ML) and Bayesian inference (BI) trees consistently show that the involved Satyrinae including the Amathusiini is monophyletic with high support values. Although the relationships among the five tribes are identical among ML and BI analyses and are mostly strongly-supported in BI analysis, those in ML analysis are lowly- or moderately- supported. Therefore, the relationships among the related five tribes recovered herein need further verification based on more sampling taxa.


Introduction
The butterfly subfamily Satyrinae, comprising approximately 2,500 described extant species, is amongst the most diverse groups in insects (Ackery et al. 1999). Recently, Marín et al. (2011) summarized the findings of systematic studies on this group (Peña et al. 2006;Peña and Wahlberg 2008;Wahlberg et al. 2009), proposing that the Satyrinae could be divided into nine tribes. However, phylogenetic relationships among them remain mostly unresolved despite they are assigned to four groups (Marín et al. 2011): group one consisting of two Neotropical Morphini and Brassolini; group two including Elymniini, Amathusiini, Zetherini, Dirini and Melanitini; group three including only the Neotropical Haeterini; and group four comprising the speciose Satyrini distributed worldwide. Regarding the group two, phylogenetic relationships of its five tribes remain unresolved except for the well-defined sister relationship of Dirini and Melanitini (Peña et al. 2006;Peña and Wahlberg 2008;Wahlberg et al. 2009;Price et al. 2011; see the figure 1 in Marín et al. 2011). The phylogenetic uncertainty among them can be mainly exhibited in two aspects: one is the weakly supported nodes bearing them; and another is the unstable topologies of trees conducted by different analysis methods (Peña et al. 2006;Peña and Wahlberg 2008;Wahlberg et al. 2009).
It is widely accepted that selecting suitable genetic markers is of great importance in study of molecular systematics. In previous phylogenetic studies on the tribe level relationships of Satyrinae, the protein-coding genes (e.g., mitochondrial COI, and a number of nuclear genes) have been the main source of phylogenetic information (Peña et al. 2006;Peña and Wahlberg 2008;Wahlberg et al. 2009;Price et al. 2011). However, the ribosomal genes, to date have been never considered. The ribosomal genes have already been proven to be informative for phylogenetic analyses in other butterfly groups (e.g., 16s rDNA in Kim et al. 2010; 28s rDNA and 18s rDNA in Jiang et al. 2013).
In order to test the phylogenetic utility of the ribosome genes for constructing the tribe level relationships of Satyrinae which have not been resolved based on morphological and protein-coding sequence data, two ribosomal genes (16s rDNA and 28s rDNA) as well as four additional protein-coding genes (COII,Cytb, are used in our study to reconstruct the phylogeny of the Elymniini, Amathusiini, Zetherini, Melanitini and Satyrini which represent all the major lineages of Chinese satyrines. Besides, we further clarify the taxonomic placement of the Callarge Leech, a satyrine genus which has never been included in previous molecular studies.

Taxon sampling
A total of 30 species were included in the analyses (Table 1). Of these, the 21 in-group species represent all the five satyrine tribes occurring in China. In consideration of previous studies (Freitas and Brown 2004;Peña et al. 2006), other nine species of six subfamilies (Libytheinae, Danainae, Apaturinae, Biblidinae, Calinaginae and Charaxinae) of the family Nymphalidae were selected as outgroup taxa. Among them, Libythea myrrha Fruhstorfer of Libytheinae was used to root the resulting phylogenetic trees, since Libytheinae is widely accepted as the sister group to the rest Nymphalidae (e.g., Ackery et al. 1999;Freitas and Brown 2004;Peña et al. 2006;Peña and Wahlberg 2008). The butterflies studied stem from the specimens in Entomological Museum of Northwest A&F University (NWAFU), Yangling, China. Details of the sampling are presented in Table 1.

DNA extraction, amplification and sequencing
Genomic DNA was extracted from 95-100% ethanol-preserved muscle tissue of two adult butterfly legs, using an EasyPure Genomic DNA Kit according to the manufacturer's instructions (TransGen Biotech Co., Led., Beijing, China). Extracted genomic DNA was eventually dissolved in 80 µL ddH 2 O and kept in a freezer (-20 °C) until it was used for polymerase chain reaction (PCR). Sequences of six nuclear and mitochondrial genes (EF-1α, 28s rDNA, COI, COII, Cytb and 16s rDNA) were amplified through PCR in a total volume of 25 µL. The volume consisted of 12.5 µL CWBIO 2 × Taq MasterMix, 8.5 µL sterile distilled H 2 O, 2.0 µL genomic DNA template and 1.0 µL 10 µM each primer. The primers used and corresponding annealing temperature in PCR as well as references are listed in Table 2. After electrophoretic analysis to ensure the amplification products were the target fragments we needed, the PCR products were subsequently sent to Sunny Biotechnology Co., Ltd. (Shanghai, China) for sequencing with the same primers used in the PCR. All sequences gathered in this study have been deposited in the GenBank.

Sequence analysis and phylogenetic inference
Sequence chromatogram was checked carefully using Chromas Pro software (Technelysium Pty Ltd., Tewntin, Australia). Each protein-coding sequence was translated for confirmation and assignment of codon positions in Primer Premier version 5.00 software (Premier Biosoft International, Palo Alto, CA). Multiple sequences were aligned using MAFFT version 7.037 with the auto strategy (Katoh and Standley 2013) and, if necessary, manual adjustment was made in MEGA version 6.06 (Tamura et al. 2013). Base frequency and the number of variable and parsimony informative sites were calculated in MEGA version 6.06 (Tamura et al. 2013). We investigated the chi-square of homogeneity of base frequencies across taxa for each gene with the program PAUP4.0b10 (Swofford 2002). The aligned ambiguous regions of two non-coding ribosomal genes (i.e. 16s rDNA and 28s rDNA) were retained because these positions might contain some information that is potentially useful for phylogenetic reconstruction (Aagesen 2004;Redelings and Suchard 2009). As proposed by Xia et al. (2003), we performed tests of substitutional saturation based on the Iss (i.e. index of substitutional saturation) statistic for different partitioned dataset with DAMBE version 5.3.74 (Xia 2013). For this method, if Iss is smaller than Iss.c (i.e. critical Iss), we can infer that the sequences have experienced little substitutional saturation (Xia and Lemey 2009).
Maximum likelihood (ML) analysis was performed using the raxmlGUI version 1.3 interface (Silvestro and Michalak 2012) of RAxML version 7.2.6 (Stamatakis 2006). The best-fit substitution model for each gene partition was determined by jModelTest version 2.1.4 (Darriba et al. 2012) under the Akaike Information Criterion (AIC) (Akaike 1974). Clade supports were assessed using the ML + rapid bootstrap algorithm with 1000 bootstrap iterations.
Bayesian inference (BI) analyses were conducted in MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003). The best-fit partitioning schemes and partition-specific substitution models, defined from 16 subsets formed by gene and codon position of the six genes used, were tested using the 'greed' algorithm of program PartitionFinder v1.1  (Lanfear et al. 2012) under the Bayesian information criterion (BIC). Two independent MCMC runs were performed either for 300,000 generations or until the average standard deviation of split frequencies fell below 0.01. The sampling frequency was set as every 100 generations. After the first 25% of the yielded trees were discarded as burn-in, a 50% majority-rule consensus tree with the posterior probability (PP) values was constructed by summarizing the remaining trees. For BI analyses, two different datasets, the full six-gene-dataset and the non-COI + Cytb + COII-3rds-dataset (with 3rd positions removed), were used to examine the phylogenetic utility of the 3rd sites of COI + Cytb + COII, because these sites have suffered substantial saturation (see the results).

Phylogenetic informativeness
We used phylogenetic informativeness (PI) profiles to quantify the relative contribution of each partition to the resulted tree. The peak of the PI distribution is suggested to predict the maximum phylogenetic informativeness for corresponding partition (Owen et al. 2014). The PI profiles were generated with the PhyDesign (Townsend 2007;Lopez-Giraldez and Townsend 2011). For this, the aligned sequences and an ultrametric tree are needed as input files. In the sequence file, the eight partition schemes

Sequence characterization
One hundred and fifty-four sequences of the six genes were obtained for 30 species (Tables 1, 3). The final alignment yields 3,402 bp of the combined sequence data, of which 1,312 are variable and 1,053 are parsimony informative. The Chi-square test reveals no significant base composition heterogeneity among the taxa for any gene fragment, even for the 28s rDNA showing a high level of CG base composition (p = 0.138). In the case of the saturation test, all observed values of Iss are smaller than the Iss.c values for both symmetrical and asymmetrical topologies in all gene fragments. However, when the analysis was taken for each of the three codon positions of coding gene fragments separately, values of Iss for the third codons of all the COI, COII and Cytb genes are smaller than the Iss.c values in both symmetrical and asymmetrical topologies, indicating some of these sites have suffered substantial saturation.

Model selection and phylogenetic reconstruction
Each gene partition shows the GTR + I + G for its best-fit substitution model except the 28s rDNA being the GTR + G, but we imposed the GTR + G for all gene partitions in ML analysis as recommended by Zahiri et al. (2011). For BI analysis, the best partitioning scheme includes eight partitions. Each partition and corresponding parameters used in BI analyses are summarized in Table 4. The ML and BI trees based on the full six-gene-dataset show generally identical topologies (summarized in Figure 1). All tribes included with two or more taxa examined in this study are recovered to be monophyletic mostly with strong support values. The traditional "satyrine" clade consisting of Calinaginae, Charaxinae and Satyrinae is well-recovered by strong bootstrap value (BV) 100 and PP 1.00. The five tribes of Chinese satyrines constitute the Satyrinae clade with BV 93 and PP 1.00. Within this clade, the Satyrini is consistently recovered as sister of others. Then, Amathusiini branches off, and the Zetherini is sister to the sister group (Melanitini + Elymniini), but the relationship between Melanitini and Elymniini is poorly supported by both ML and BI analyses (BV = 42, PP = 0.71). The genus Callarge is nested into the Zetherini, forming a sister group with Penthema Westwood. Chi-square test of base frequency p = 1.000 p = 1.000 p = 0.998 p = 1.000 p = 0.999 p = 0.138 Table 4. The best-fit partitioning schemes and corresponding partition models used in BI analysis.

Phylogenetic informativeness
As shown in Figure 3, the 3rd codon positions of the combined COI, Cytb and COII has the highest phylogenetic signal at all taxonomic levels, and a peak of the PI distribution can be recognized at about the 1/3 position of the tree near the terminal branches. Followed are the 1st codon positions of the combined COI, Cytb and COII.

Phylogenetic informativeness of related genes
The studies of molecular systematics have been increasingly accessible because more genetic markers have been developed with the advances of sequencing technology.
However, how to make informed choice to these markers confuses many systematics (Danforth et al. 2005). In high level systematics of Satyrinae, EF-1α was commonly used and proven to be quite informative in all previous studies (Peña et al. 2006;Peña and Wahlberg 2008;Wahlberg et al. 2009). Our results show the two ribosomal genes (i.e. 16s rDNA and 28s rDNA) have generally the same phylogenetic informativeness with EF-1α (Figure 3), which indicates that the former two genes also contribute well in constructing the tribe level relationships. Moreover, we infer the 28s rDNA would show better informativeness if the 28s rDNA sequence data for each sampling taxon had been obtained in this study. The consistency between the 28s rDNA and EF-1α in phylogenetic utility supports the findings of Danforth et al. (2005) who suggested that the nuclear ribosomal and protein-coding genes should be combined in phylogenetic practices after comparing the substitution patterns between them in other groups of insects. The 16s rDNA have been proven to be informative in high level systematics (e.g. Nazari et al. 2007) and was even recommended as standard marker for insect phylogenetics (Caterino et al. 2000). The high phylogenetic utility of 16s rDNA examined in this study provides support for these proposals. However, this result does not support that mitochondrial gene datasets should not be applied on the deep divergences due to their substantial variation (Lin and Danforth 2004;Danforth et al. 2005).
We do not recommend the use of the 3rd positions of combined COI, Cytb and COII in high level systematics of Satyrinae, although these sites show higher phylogenetic signals than other partitions (Figure 3). On the one hand, our saturation tests show some sites of the 3rd positions of combined COI, Cytb and COII have suffered substantial saturation. These sites may positively contribute to the tip nodes of trees, but for the nodes after the PI profile peak they may become the source of noise deep in the tree and cause homoplasy (Owen et al. 2014). On the other hand, the deep branch pattern of BI tree (Figure 2) generally not change when excluding the 3rd positions of combined COI, Cytb and COII. This result indicates that the 3rd positions of combined COI, Cytb and COII contribute poorly to the tribe level relationships of the trees based on the full six-gene-dataset.

Phylogenetic relationships among related tribes of Satyrinae
In this study, we present the first use of the ribosomal genes in reconstructing the tribe level relationships of the Satyrinae. The "satyrine" clade consisting of Calinaginae, Charaxinae and Satyrinae defined by Peña and Wahlberg (2008) and Wahlberg et al. (2009) are well-supported by our results. Moreover, monophyly of involved Satyrinae with the Amathusiini included is highly supported by all ML and BI analyses based on multiple outgroup taxa, which confirms, at least partially, the findings of Peña et al. (2006) who noted Satyrinae is monophyletic with inclusion of the tribes Morphini, Brassolini and Amathusiini of Morphinae (sensu Ackery et al. 1999) (Peña and Wahlberg 2008;Wahlberg et al. 2009).
Among the five tribes of Satyrinae analyzed, our results recover the Satyrini as the basal lineage with a long-branch split from the rest four tribes, in agreement with the findings of Peña et al. (2006) and Peña and Wahlberg (2008). However, relationships among the remaining four tribes are incongruent with other related studies regardless of the Dirini not included herein. Our results recover their relationships as Amathusiini + (Zetherini + (Elymniini + Melanitini)); whereas other related studies concluded the following relationships: (Elymniini + Melanitini) + (Zetherini + Amathusiini) in both ML and BI analyses of Wahlberg et al. (2009), the Elymniini + Melanitini + Zetherini + Amathusiini in MP analysis of Wahlberg et al. (2009), and the Melanitini + (Zetherini + (Elymniini + Amathusiini)) in BI analysis of Peña and Wahlberg (2008). Although ribosomal genes were used for the first time in our study, and both the ML and BI trees based on the full six-gene-dataset show identical topology, it should be noticed that the nodes in ML analysis describing the tribe level relationships are lowly-or moderately-supported. Therefore, the relationships among the related five tribes recovered herein need further verification based on more sampling taxa.
The monotypic genus Callarge is distributed restrictedly in China and on the northern border of Vietnam. Morphologically, this genus has marked black veins and lacks eyespots on wings. It is currently placed in Zetherini of Satyrinae (Chou 1999;Yuan et al. 2008) by the presence of hairless eyes, the wings without striking eyespots, and the forewing with basal part of vein Sc, posterior vein of discal cell and vein 2A not swollen (Chou 1998). For the first time, we verify the status of the genus based on molecular phylogenetic analyses, and reveal that it is sister to the Penthema in the present study.