A phylogenetic assessment of the polyphyletic nature and intraspecific color polymorphism in the Bactrocera dorsalis complex (Diptera, Tephritidae)

Abstract The Bactrocera dorsalis complex (Tephritidae) comprises 85 species of fruit flies, including five highly destructive polyphagous fruit pests. Despite significant work on a few key pest species within the complex, little has been published on the majority of non-economic species in the complex, other than basic descriptions and illustrations of single specimens regarded as typical representatives. To elucidate the species relationships within the Bactrocera dorsalis complex, we used 159 sequences from one mitochondrial (COI) and two nuclear (elongation factor-1α and period) genes to construct a phylogeny containing 20 described species from within the complex, four additional species that may be new to science, and 26 other species from Bactrocera and its sister genus Dacus. The resulting concatenated phylogeny revealed that most of the species placed in the complex appear to be unrelated, emerging across numerous clades. This suggests that they were placed in the Bactrocera dorsalis complex based on the similarity of convergent characters, which does not appear to be diagnostic. Variations in scutum and abdomen color patterns within each of the non-economic species are presented and demonstrate that distantly-related, cryptic species overlap greatly in traditional morphological color patterns used to separate them in keys. Some of these species may not be distinguishable with confidence by means other than DNA data.


Introduction
common variants, hence difficult to use to identify more atypical specimens. An attempt to account for variation in an interactive CD-ROM key (Lawson et al. 2003) yielded limited success (Clarke et al. 2005). In addition to the described species, there may likely exist cryptic species, hard to distinguish by morphological means, which can be separated with the help of genetic sequencing (e.g. Carew et al. 2011, Dujardin andKitthawee 2013). Clarke et al. (2005), when reviewing the data available at the time, stated that phylogenetic studies using limited taxa and genes may not demonstrate the monophyly of the complex. However, recent molecular phylogenies which include the OFF complex have found that most species form a well-defined monophyletic clade (Krosch et al. 2012, Virgilio et al. 2015. However, these studies only included methyl eugenolattracted species and were limited to six economic species and six, mainly Australian, non-pest species such as B. cacuminata (Hering) and B. opiliae (Drew and Hardy). An alternate, polyphyletic complex was indicated by a phylogeny based on one mitochondrial and two nuclear genes by San , but sampling was limited.
Our goal was to examine the B. dorsalis species complex more broadly than the few frequently targeted pest species. This is accomplished by reporting and analyzing novel molecular and morphological data on 22 non-pest species in the complex, in the context of the main pest species and selected outgroups. These data are used to: (i) determine through phylogenetic analysis if the complex is monophyletic or polyphyletic; (ii) provide diagnostic molecular data for over 25 species for which such data is currently lacking; and (iii) determine the utility of thoracic and abdominal color/pattern variation as species level diagnostic characters.

Taxa sampling
The molecular phylogenies presented here are based on DNA sequences of 53 specimens collected in Asia, Australia, Oceania, the United States and Africa. These specimens include 47 species of Bactrocera belonging to five subgenera (including 24 species from the OFF complex), three species of Dacus, and Ceratitis capitata (Wiedemann) as the outgroup, (Table 1). In addition, we examined the morphology of thousands of specimens of the economic species and over 1,600 specimens of 22 non-economic species in the OFF complex. Two hundred and thirty seven representatives of these, selected to cover a broad range of color variants, were sequenced for the COI gene, as detailed below, to confirm morphological identifications and document intraspecific variation in morphological characters. In addition to examining the color pattern of individual specimens, photographs of the scutum and abdomen were taken, for all the sequenced specimens, and used to compile the variation plates (Figures 2-15). The number of specimens examined and sequenced for individual species are included in the figure captions.

Validation of identification
Our specimens in the OFF complex were initially tentatively identified to species using available resources (Drew and Hancock 1994, Lawson et al. 2003, Drew and Romig 2013. These determinations were then confirmed by comparing pinned representatives and photographic plates of color variation to the large series of specimens used to produce the above publications, deposited in the Queensland Department of Agriculture and Fisheries (QDAF) insect collection (Ecosciences Precinct, Brisbane). The identifications were also confirmed by R.A.I. Drew, an expert on Bactrocera morphol-

DNA extraction, amplification, and sequencing
For each specimen, one to three legs were used for total genomic DNA extraction. The remainder of the specimen was deposited as a voucher in the University of Hawaii Insect Museum (UHIM) for preservation and morphological studies (Table 1). Genomic DNA was extracted using the DNeasy animal blood and tissue extraction kit following manufacturer's protocol (Qiagen, Inc., Valencia, CA). Three different gene regions were amplified: the mitochondrial gene cytochrome c oxidase I (COI, 780 bp) and the nuclear genes, elongation factor-1α (EF-1α, 759 bp) and period (PER, 450 bp). These three genes were selected because each has been demonstrated to be informative in distinguishing populations, species complexes, species, or genera in Diptera (Folmer et al. 1994, Simon et al. 1994, Cho et al. 1995, Bauzer et al. 2002, Moulton and Wiegmann 2004, Barr et al. 2005, Foley et al. 2007, Virgilio et al. 2009, Gibson et al. 2011. Gene amplification followed San . All polymerase chain reaction (PCR) products were visualized on 1% agarose gel and purified using QIAquick spin columns (Qiagen, Inc.) according to the manufacturer's protocol. Bidirectional DNA sequencing was performed at the Advanced Studies of Genomics, Proteomics and Bioinformatics (ASGPB) sequencing facility of the University of Hawaii at Manoa (http://asgpb.mhpcc.hawaii.edu/).

Sequence alignment, nucleotide composition, and phylogenetic analysis
Sequence alignments were performed with the software package Geneious 7.1.7 (Biomatters ltd.). Heterozygosity in the nuclear genes was present in most samples. Ambiguity codes (i.e., notation according to International Union of Pure and Applied Chemistry (IUPAC)) were used to denote heterozygous base pairs, and these codes were used in the subsequent analysis. Sequence alignment for each gene was conducted in Geneious using the Muscle option with default settings (Edgar 2004). We used jModeltest and the Akaike information criterion (Darriba et al. 2012) to determine the most appropriate evolutionary model for each gene in our analysis. Phylogenetic analyses were performed with both Maximum Likelihood and Bayesian Inference. MrBayes 3.2.1 (Ronquist et al. 2012) was used for Bayesian analyses and RaxML (Stamatakis et al. 2008) was used for maximum likelihood (ML). We used jModeltest (Darriba et al. 2012) to determine the most appropriate model for each partition. We concatenated our datasets by gene and used a GTR+ Γ model for each gene in the Bayesian analysis general time reversible model (Tavaré 1986) with gamma distribution of rates (GTRGAMMA) for each gene in our likelihood analysis. We first analyzed each gene separately and subsequently concatenated them into a single dataset partitioned, by gene, using Maximum Likelihood and Bayesian inference. For each individual gene analysis (COI, period, and EF-1α) we ran four independent Bayesian runs in MrBayes 3.2.1 using the default settings. Each run started from a random tree using default priors sampling every one thousand generation for 10 million generations with a relative burn-in of 25%. We used the program Tracer 1.5 (Rambaut and Drummond 2009) to assess convergence of standard deviation in variance for Bayesian analyses. For RaxML analyses, each dataset included 10 ML tree searches with default settings, using a random starting tree to find the tree with the best likelihood score. One thousand Maximum Likelihood bootstrap replicates were conducted in Raxml to assess support for inferred relationships. For the concatenated dataset, we partitioned the data by gene and ran MrBayes using the same settings as the individual gene analyses except the parameters statefreq, revmat, shape, and pinvar were unlinked between partitions. For the Maximum Likelihood analysis of the partitioned concatenated dataset, we ran RaxML using the same settings and analyses for each partition as when genes were analyzed individually. Trees were visualized using FigTree v1.4.0 (Rambaut 2012) and rooted with Ceratitis capitata. COI sequences for all non-economic species in the B. dorsalis complex for which at least four sequences were available were analyzed using the program DNAsp to provide basic population genetic variability summary statistics (Hn, h π, S).

Data Resources
Sequences listed on Table 1, as well as COI sequences for all specimens included on all figure plates, were deposited into GenBank KT591129 to KT591164 and KT594783 to KT595006.

Results
Topological differences between the individual gene trees were not supported with high bootstrap values and posterior probabilities (<50% BS <0.9 PP) and overall individual gene trees were poorly resolved, with COI providing more signal for the more recent divergences (Suppl. material 1) and the nuclear genes providing signal for deep- er relationships (Suppl. material 2, 3). However the concatenated analysis produced a well-resolved tree (Figure 1) which is consistent with previous studies (Krosch et al. 2012, Virgilio et al. 2015. In the concatenated phylogeny, the Zeugodacus group of subgenera (as defined by Drew and Hancock 2000) is sister to Dacus and the Bactrocera+Notodacus+Daculus clades, which themselves are sister taxa. This renders Bactrocera paraphyletic with respect to Dacus, as suggested previously (White 2006, Krosch et al. 2012, Virgilio et al. 2015. However the relationship is not strongly supported in the tree and additional genes and taxa are necessary to fully resolve this relationship. The subgenus Bactrocera is monophyletic in the concatenated phylogeny (100 BS, 1.0 PP). The inclusion of many non-economic OFF complex species in our study shows with high support that despite a similar appearance, the complex is a highly polyphyletic group. Multiple, well-supported, clades (75-100% BS values) in the subgenus Bactrocera contain a mix of species previously thought to belong to the OFF complex and non-OFF complex species. One clear example is the inclusion of non-OFF complex B. bryoniae, B. latifrons, B. limbifera, with B. kohkon- giae, which fits in the OFF complex ( Figure 15 A-C) in a strongly supported (100%, 100% PP value) clade (Figure 1). This indicates that, despite low support for the backbone topology in the subgenus Bactrocera, the polyphyletic nature of the OFF complex is still well supported. The main pest species in the complex (B. carambolae and B. dorsalis, now including B. papayae and B. invadens, see Schutze et al. 2014a) form a monophyletic unit with very little genetic differentiation (<1.3% in COI) between them, and rest within a well defined clade that includes several other species attracted to methyl eugenol (B. occipitalis, B. cacuminata, B. raiensis). Three species, B. melastomatos ( Figure 9F-O), B. osbeckiae ( Figure 10) and B. rubigina ( Figure 14F), were genetically indistinguishable using COI (0.1% pair-wise difference) in the phylogeny, appearing together in a single lineage, despite having very distinctive color patterns. Interestingly, they were slightly more distinct in the nuclear genes (1.1% EF-1α and 1% period pair-wise difference), which was not the case for most species. Population genetic statistics, based on COI sequences, showed high levels of haplotype diversity for most of the non-economic species in the B. dorsalis complex (Table 2).
Color patterns of scutum and/or abdomen (Figures 2-15) varied extensively within some of the species (Figures 2A-J (Figure 2), B. bivittata (Figure 3), B. kohkongiae (Figure  7), B. melastomatos ( Figure 9F-O), B. osbeckiae (Figure 10), and B. propinqua ( Figure  12). Abdomen pattern was confusingly polymorphic, yet scutum remained uniform in B. thailandica (Figure 13). Scutum color and variation followed three basic patterns among species for which series of specimens were examined. In B. bhutaniae (Figure 2), B. bivittata (Figure 3), B. kohkongiae (Figure 7), and B. osbeckiae (Figure 10), scutum was predominantly redbrown with a highly variable dark lanceolate pattern. The pattern was composed of a medial and two lateral bands, generally interrupted at the level of the transverse suture, in B. bhutaniae and B. bivittata (medial band usually narrower and lateral bands very broad). The lanceolate pattern was highly variable in B. kohkongiae, from extensively pale with a narrow medial band to almost entirely dark with light markings restricted to the transverse suture, and B. osbeckiae, from mostly dark fuscous, with red-brown markings at level of postpronotal lobes and along transverse suture, to extensive lanceolate red-brown pattern with a broad medial longitudinal band, which can be faint or absent. In B. cacuminata (Figure 4), scutum was red-brown with a single medial dark band widened at apex of scutum and anteriorly narrowed to a point, and with two short lateral bands pointed anteriorly. A similar pattern was frequently observed in B. propinqua (Figure 12), in which the scutum varied from B. cacuminata-like to uniformly dark with light markings at level of transverse suture and inside postpronotal lobes. Scutum was generally uniformly black, with at most small red-brown markings anterior to lateral postsutural vittae, inside postpronotal lobes and sometimes at the level of prescutellar setae, in B. fuscitibia (Figure 5), B. latilineola (Figure 9), B. paraare- Figure 11. Variation in color pattern of scutum and abdomen in Bactrocera paraarecae Drew and Romig (10 specimens examined and 5 sequenced). Voucher codes are: A ms1295 B ms1296 C ms2040 D ms1294 E ms1110.  (Figure 11), B. thailandica (Figure 13), and B. usitata ( Figure 14H), and frequently with more extensive red-brown markings along transverse suture in B. kanchanaburi ( Figure 6) and B. melastomatos (Figure 9 F-J). The shape and width of lateral postsutural vittae was relatively constant for all species except B. thailandica (Figure 13). Abdomen color for almost all species and variants followed the basic "T-shaped" pattern typical of the B. dorsalis complex, i.e. a black band across the base of tergum III, a narrow to broad medial longitudinal black band covering the entire length of terga III to V, and narrow to broadly expanded lateral black markings on terga III to V. Medial band was broad and lateral markings generally broad along margins of tergum III and narrower on terga IV and V in B. bhutaniae (Figure 2) and B. propinqua (Figure 12), or the markings on terga III and IV expanded and pointed at apex in B. bivittata (Figure 3). Medial band was broad and extended to the base of tergum II and lateral markings broad on terga III, IV, and base of tergum V in B. latilineola (Figure 9 C-E). Medial band was narrow (broad in B. usitata) and lateral markings usually broad along terga III-IV and basal half of tergum V in B. cacuminata (Figure 4), B. kanchanaburi (Figure 6), and B. usitata (Figure 14 G). Medial band was broad and lateral markings moderately to very broad on tergum III and IV, and shining spots on tergum V usually black (fuscous to dark fuscous in most other species) and continuous with lateral black markings in B. fuscitibia ( Figure 5). Medial band was narrow (broad in B. paraarecae) and lateral markings moderately to very broad but diffuse, rather than well defined (as in previous species), in B. kohkongiae (Figure 8), B. melastomatos (Figure 9 K-O), B. osbeckiae (Figure 10), and B. paraarecae (Figure 11). In B. thailandica, medial band was narrow and the extent of lateral markings varied considerably, from very limited to almost entirely covering the terga except traces of red-brown on tergum V, on either side of medial band (Figure 13).

Discussion
The concatenated tree demonstrates that the OFF complex is a highly polyphyletic assemblage of unrelated species. Consistent with other published studies, the methyl eugenol responsive B. dorsalis, B. carambolae, B. occipitalis, B. cacuminata, and B. raiensis, form a well-defined monophyletic unit (Krosch et al. 2012, Virgilio et al. 2015. Because the phylogeny is based on a relatively limited (24%) proportion of all the species included in the OFF complex, adding more species and using multiple genes may reveal scattered clusters of related species, but the proportion of unrelated clades including OFF complex species is likely to remain high.
The widespread conformity of unrelated species to the dorsalis-like appearance is unclear. Color patterns in Dacine fruit flies are assumed to mimic wasps (White 2000), though few actual wasp mimic examples exist in Dacine fruit flies, and the OFF complex appearance is not a particularly convincing wasp imitation when compared to other groups of mimics (sesiid moths, syrphid flies, etc.). Whether the similarity represents convergent evolution or a retained ancestral state requires further investigation.
Except for a handful of well-studied species (e.g. B. dorsalis, B. carambolae, B. cacuminata), the definitions and concepts for the majority of the OFF complex species were based on morphology (mainly color patterns), lure response, and generally limited host fruit records. Only now are we starting to better characterize these species with molecular tools. Most of the non-economic species described by Drew and Hancock (1994) included in our study appear to be valid, confirmed by molecular data and comparison of morphological intraspecific variation with large series of specimens in QDAF (L.L., unpublished observations).
Attraction of B. osbeckiae to cue-lure is a new lure record. Morphological variation in our cue-lure trapped specimens ( Figure 10) closely matched that observed in the QDAF and Bishop Museum (Honolulu, Hawaii, USA) series, which consist of hostreared specimens without male lure records.
Four species, consistent in appearance with the definition of the OFF complex, could not clearly be identified and are referred to here as numbered species. Species 54 (from Chiang Mai, Thailand) and 55 (Luang Nam Tha, Laos and Jinghong, China) look very similar ( Figure 15A-F), yet are genetically distinct (8.85% COI pair-wise difference). They both key to B. irvingiae in Drew and Hancock (1994), but neither can be confidently matched to that species, even after comparison with series of pinned specimens of B. irvingiae and other OFF complex species in the QDAF collection. Also, B. irvingiae was collected further south in Thailand (Khao Yai) than the samples we have. Until fresh host-reared specimens of B. irvingiae can be obtained from the type locality and sequenced, we will defer from describing new species that may in the future turn out to be synonyms. Species 59 (Luang Nam Tha, Laos) and 60 (Jinghong, China) ( Figure 15G-L) could not be definitely determined to species using available resources Hancock 1994, Drew andRomig 2013), and did not match any of the OFF complex species examined in QDAF. They are likely new species, but not described here, due to the lack of distinctive characters and the very small number of specimens available (1 of species 59 and 3 of species 60). With additional survey work and genetic sequencing, a number of additional cryptic species likely will appear.
Bactrocera dorsalis, B. invadens and B. papayae were recently declared conspecific, and are genetically indistinguishable (Schutze et al. 2014a), despite what some consider diagnosable differences (Drew and Romig 2013). We have found a similar genetically indistinguishable situation for B. osbeckiae, B. melastomatos and B. rubigina (Wang and Zhao) in our phylogeny, despite them being very distinct from each other in color pattern ( Figures 9F-O, 10, 14F). This suggests that these species, if distinct, may or may not differ at gene loci other than those sequenced in our study. Bactrocera osbeckiae and B. rubigina are sympatric in Thailand and Southern China (Leblanc, unpublished) and differ in color patterns and wing costal band expansion (Drew and Romig 2013), while B. melastomatos is confined to Peninsular Malaysia, Borneo, Java and Sumatra. Bactrocera osbeckiae and B. melastomatos are biologically close and peculiar, both breeding on flowers rather than fruits of Melastomataceae Hancock 1994, Allwood et al. 1999), while the host range of B. rubigina is not well documented. Similarly, sympatric B. tryoni and B. neohumeralis in Australia are currently genetically inseparable, yet are likely valid biological species, isolated by time of mating (Clarke et al. 2011).
The high degree of intraspecific variation in color pattern severely limits the reliability of dichotomous and interactive keys. The range of variation differs considerably among species, with extreme cases like the scutum of B. dorsalis , and the abdomen of B. thailandica ( Figure 13). Also, variants of unrelated species, such as B. bhutaniae ( Figure 2) and B. bivittata (Figure 3), can overlap and make them hard to distinguish. The full extent of observed variation is easier to demonstrate in plates rather than words when describing a species. We suggest that descriptions of new species in the future should be accompanied by extensive plates showing variation, included in publications or posted as supplementary online material.

Conclusion
The OFF complex was defined by Hardy (1969) and the definition refined by Drew and Hancock (1994). The species and specimens examined in this study fit their definition in all respects, except for scutum color, said to be mostly black (Drew and Hancock 1994) or black (Drew and Romig 2013). Several species included in the complex consistently have extensive pale markings on the scutum (e.g. B. arecae (Hardy and Adachi), B. bivittata, B. cacuminata, B. osbeckiae). Bactrocera dorsalis has a broad range of variation, from entirely black to extensively or almost entirely pale (Schutze et al. 2014b), a form that was described as the now-synonymized, B. invadens (Drew et al. 2005), which was not included in the OFF complex by Drew and Romig (2013). It is likely that at least some of the 21 other species complexes (Drew 1989, Drew andRomig 2013) are also polyphyletic and their morphological diagnostic characters not robust. Nonetheless, the B. dorsalis complex is likely to remain entrenched for some time in future literature, as an informal group referred to as a "collective group" in the International Code of Zoological Nomenclature (http://iczn. org/iczn/index.jsp). Caution must be exercised in literature to not refer to the group as a biological or evolutionary unit.