Research Article |
Corresponding author: Li-Yun Jiang ( jiangliyun@gmail.com ) Corresponding author: Ge-Xia Qiao ( qiaogx@ioz.ac.cn ) Academic editor: Roger Blackman
© 2017 Xi-Chao Zhu, Jing Chen, Rui Chen, Li-Yun Jiang, Ge-Xia Qiao.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Zhu X-C, Chen J, Chen R, Jiang L-Y, Qiao G-X (2017) DNA barcoding and species delimitation of Chaitophorinae (Hemiptera, Aphididae). ZooKeys 656: 25-50. https://doi.org/10.3897/zookeys.656.11440
|
Chaitophorinae aphids are widespread across Eurasia and North America, and include some important agricultural and horticultural pests. So, accurate rapid species identification is very important. Here, we used three mitochondrial genes and one endosymbiont gene to calculate and analyze the genetic distances within different datasets. For species delimitation, two distance-based methods were employed, threshold with NJ (neighbor-joining) and ABGD (Automatic Barcode Gap Discovery), and two tree-based approaches, GMYC (General Mixed Yule Coalescent) and PTP (Poisson Tree Process). The genetic interspecific divergence was clearly larger than the intraspecific divergence for four molecular markers. COI and COII genes were found to be more suitable for Chaitophorinae DNA barcoding. For species delimitation, at least one distance-based method combined with one tree-based method would be preferable. Based on the data for Chaitophorus saliniger and Laingia psammae, DNA barcoding may also reveal geographical variation.
Chaitophorinae , distance-based analysis, gnd, mitochondrial genes, tree-based analysis
Aphids from more than 5,000 species (
DNA barcoding based on a short fragment of mitochondrial DNA can provide an effective tool for species diagnosis. In animals, the 5’end of mitochondrial cytochrome c oxidase I (COI) with a 658-bp fragment was selected as a standard DNA barcode (
All samples were collected into and cryopreserved in 95% or 100% ethanol. DNA from one individual per sample was isolated for molecular studies and three to five individual aphids per collection were mounted on microscope slides for morphological examination. Preserved aphid colonies were examined prior to preparation to ensure that they did not consist of multiple species. Voucher specimens for each sample were identified by G.X. Qiao based on morphological diagnostic features using standard literature-based keys (esp.
Three aphid genes were targeted: mitochondrial cytochrome oxidase c subunit I (COI), cytochrome oxidase c subunit II (COII), and cytochrome b (Cytb), and one aphid endosymbiont Buchnera gene gluconate-6-phosphate dehydrogenase (gnd) (
Total genomic DNA was extracted from single aphid. Individual aphids were selected from the ethanol-preserved candidates with a destructive DNA extraction procedure. Plump adults are the ideal experimental material, but they must be examined under a microscope (Leica DM 2500) to eliminate parasitized individuals. Total DNA was extracted by following the Quick-Start protocol of DNeasy Blood & Tissue Kit (QIAGEN, Dusseldorf, Germany) with a single individual. The DNA solution was then stored at -20 °C for subsequent molecular experiments.
The polymerase chain reaction (PCR) mixture for the amplification of COI, COII, Cytb, and gnd genes comprised 22 μl of double distilled water (ddH2O), 3 μl of 10 ×EasyTaq Buffer (+ Mg2+) (TransGen Biotech, Beijing, China), 2.4 μl of 2.5 mM/800 μl dNTPs (TransGen Biotech), 0.6 μl of 10 pmol/μl forward and reverse primers, 0.4 μl of 5 U/μl EasyTaq DNA Polymerase (TransGen Biotech), and 1 μl of DNA solution for a total volume of 30 μl.
The PCR conditions differed according to the gene and the specific primers, especially the annealing temperature, which was the most critical factor influencing product quality. The detailed primer information is shown in Suppl. material
The amplification products were detected by 1.5% agarose gel electrophoresis (AGE), and then purified using EasyPure Quick Gel Extraction Kit (TransGen Biotech). The eligible products were then sent to TsingKe Biological Technology, Beijing, China or BGI, Shenzhen, China for sequencing, which was required to be bidirectional.
The returned forward and reverse chromatograms were loaded and then assembled and edited by SeqMan in DNAStar software (DNASTAR, Madison, Wisconsin, USA). The nucleotide sequences were first examined in NCBI by Basic Local Alignment Search Tool (BLAST) (
In addition to sequences from 425 samples, we downloaded 245 COI and 1 COII sequence from NCBI. Here, we defined the datasets as COI-670 (including the whole research group and NCBI sequences), COII-376 (including 375 internal sequences and 1 NCBI sequence), Cytb-413 (newly gotten for this study), gnd-396 (newly obtained sequences), and COI-338, COII-338, Cytb-338, gnd-338, which contained only the specimens that acquired all 4 gene sequences.
A neighbor-joining (NJ) (
The Automatic Barcode Gap Discovery (ABGD) (
The General Mixed Yule Coalescent (GMYC) (
The Poisson Tree Process (PTP) (
The 425 samples collected by the group members in recent years were carefully authenticated with mounted individuals under the microscope, and all 425 samples were identified to species. The few vouchers with uncertain species identification were sorted into featured clusters and were given the epithet “sp.”, which made them convenient for further analysis. A total of 75 morphological species were determined from 670 whole samples, and 51 were identified from the 425 mounted samples.
The COI sequences were trimmed to a length of 658 bp, which included 365 conserved sites, 293 variable sites and 258 parsimony-informative sites. The sequences had an average nucleotide composition of 38.0% T, 17.1% C, 34.4% A, and 10.5% G. The COII sequences were trimmed to a final length of 672 bp, among which 399 sites were conserved, 273 sites were variable, and 251 sites were parsimony-informative. The average T, C, A, G compositions of these sequences were 38.7%, 14.0%, 39.5%, and 7.8%, respectively. The Cytb gene was 760 bp, in which there were 420 conserved sites, 340 variable sites and 303 parsimony-informative sites. The Cytb sequences consisted of 41.4% T, 15.3% C, 34.3% A, and 9.0% G. We obtained a total length of 807 bp for the gnd gene with an average nucleotide composition of 37.8% T, 9.8% C, 39.5% A, and 12.8% G, among which there were 368 conserved sites, 439 variable sites and 417 parsimony-informative sites. Across all 4 genes, a strong T and A nucleotide composition bias existed.
From a total of 425 samples, 425 COI gene fragment sequences, 375 COII gene fragment sequences, 413 Cytb gene fragment sequences, and 396 gnd gene fragment sequences were acquired. The successive amplification efficiency of those markers in order was COI (100%) > Cytb (97%) > gnd (93%) > COII (88%).
Genetic divergences were assessed by 5 disparate metrics among and within species. For the interspecific divergences of congeneric species, we chose the average interspecific distance, which was calculated within genera that contained more than one species, and the smallest interspecific distance, which meant the minimal value of interspecific distance within genera with at least two species. When evaluating the intraspecific divergences, three variables (average intraspecific distance, mean theta, and average coalescent depth) were applied. The average intraspecific distance was the average value of the genetic distances between samples within species that had at least two individuals. The mean theta signified a modified theta, which expressed the average pairwise distance scored for species with more than one obtained representative, by dislodging improper individuals concerned with the asymmetrical acquisition of samples. The average coalescent depth, namely the average value of maximum intraspecific distance, was calculated for species in which there were no fewer than two samples.
All five interspecific and intraspecific metrics were determined within genera and species (Table
The inter- and intra-specific genetic distances of congeneric species of Chaitophorinae.
Interspecific Distance | Intraspecific Distance | ||||
---|---|---|---|---|---|
Genus/Dataset (no. species/specimens) | average interspecific distance | smallest interspecific distance | average intraspecific distance | mean theta | average coalescent depth |
Chaitophorus | |||||
COI-670(38/534) | 0.1158±0.0191 | 0.1015±0.0178 | 0.0070±0.0060 | 0.0083±0.0057 | 0.0126±0.0126 |
COII-376(25/283) | 0.0956±0.0246 | 0.0853±0.0207 | 0.0017±0.0019 | 0.0025±0.0019 | 0.0060±0.0054 |
Cytb-413(25/323) | 0.1233±0.0281 | 0.0971±0.0260 | 0.0049±0.0066 | 0.0058±0.0068 | 0.0219±0.0357 |
gnd-396(25/306) | 0.0996±0.0316 | 0.0807±0.0248 | 0.0020±0.0030 | 0.0034±0.0032 | 0.0042±0.0051 |
COI-338(25/253) | 0.1117±0.0286 | 0.0995±0.0215 | 0.0058±0.0044 | 0.0077±0.0033 | 0.0088±0.0071 |
COII-338(25/253) | 0.0950±0.0247 | 0.0855±0.0208 | 0.0018±0.0021 | 0.0027±0.0021 | 0.0062±0.0055 |
Cytb-338(25/253) | 0.1169±0.0310 | 0.0983±0.0270 | 0.0043±0.0064 | 0.0052±0.0067 | 0.0164±0.0337 |
gnd-338(25/253) | 0.0843±0.0264 | 0.0811±0.0255 | 0.0014±0.0016 | 0.0025±0.0014 | 0.0032±0.0033 |
Lambersaphis | |||||
COI-670(1/3) | - | - | 0.0040±0.0028 | 0.0060±0.0000 | 0.0060±0.0000 |
COII-376(1/3) | - | - | 0.0047±0.0033 | 0.0070±0.0000 | 0.0070±0.0000 |
Cytb-413(1/3) | - | - | 0.0053±0.0012 | 0.0053±0.0012 | 0.0070±0.0000 |
gnd-396(1/3) | - | - | 0.0007±0.0005 | 0.0010±0.0000 | 0.0010±0.0000 |
COI-338(1/3) | - | - | 0.0040±0.0028 | 0.0060±0.0000 | 0.0060±0.0000 |
COII-338(1/3) | - | - | 0.0047±0.0033 | 0.0070±0.0000 | 0.0070±0.0000 |
Cytb-338(1/3) | - | - | 0.0053±0.0012 | 0.0053±0.0012 | 0.0070±0.0000 |
gnd-338(1/3) | - | - | 0.0007±0.0005 | 0.0010±0.0000 | 0.0010±0.0000 |
Periphyllus | |||||
COI-670(19/83) | 0.1113±0.0231 | 0.1075±0.0220 | 0.0040±0.0146 | 0.0080±0.0198 | 0.0218±0.0439 |
COII-376(13/53) | 0.0936±0.0299 | 0.0938±0.0282 | 0.0007±0.0014 | 0.0027±0.0015 | 0.0024±0.0024 |
Cytb-413(14/54) | 0.0975±0.0200 | 0.0944±0.0194 | 0.0020±0.0029 | 0.0041±0.0030 | 0.0052±0.0032 |
gnd-396(14/54) | 0.1256±0.0669 | 0.1292±0.0602 | 0.0004±0.0010 | 0.0016±0.0014 | 0.0007±0.0012 |
COI-338(13/53) | 0.0971±0.0258 | 0.0985±0.0248 | 0.0019±0.0032 | 0.0056±0.0032 | 0.0044±0.0035 |
COII-338(13/53) | 0.0936±0.0299 | 0.0938±0.0281 | 0.0007±0.0014 | 0.0027±0.0015 | 0.0024±0.0024 |
Cytb-338(13/53) | 0.0974±0.0203 | 0.0935±0.0206 | 0.0020±0.0029 | 0.0041±0.0030 | 0.0052±0.0032 |
gnd-338(13/53) | 0.1250±0.0679 | 0.1283±0.0632 | 0.0004±0.0010 | 0.0016±0.0014 | 0.0007±0.0012 |
Trichaitophorus | |||||
COI-670(3/3) | 0.1233±0.0200 | 0.1233±0.0200 | - | - | - |
COII-376(3/3) | 0.1103±0.0190 | 0.1103±0.0190 | - | - | - |
Cytb-413(2/2) | 0.1040±0.0000 | 0.1040±0.0000 | - | - | - |
gnd-396(3/3) | 0.1427±0.0162 | 0.1427±0.0162 | - | - | - |
COI-338(2/2) | 0.0990±0.0000 | 0.0990±0.0000 | - | - | - |
COII-338(2/2) | 0.1190±0.0000 | 0.1190±0.0000 | - | - | - |
Cytb-338(2/2) | 0.1040±0.0000 | 0.1040±0.0000 | - | - | - |
gnd-338(2/2) | 0.1600±0.0000 | 0.1600±0.0000 | - | - | - |
Yamatochaitophorus | |||||
COI-670(3/3) | 0.0043±0.0009 | 0.0043±0.0009 | - | - | - |
COII-376(3/3) | 0.0037±0.0021 | 0.0037±0.0021 | - | - | - |
gnd-396(3/3) | 0.0007±0.0005 | 0.0007±0.0005 | - | - | - |
Chaetosiphella | |||||
COI-670(3/24) | 0.0515±0.0418 | 0.0693±0.0490 | 0.0149±0.0128 | 0.0197±0.0111 | 0.0185±0.0165 |
COII-376(3/24) | 0.0372±0.0368 | 0.0563±0.0399 | 0.0083±0.0051 | 0.0091±0.0047 | 0.0090±0.0090 |
Cytb-413(3/24) | 0.0481±0.0455 | 0.0703±0.0498 | 0.0140±0.0123 | 0.0154±0.0120 | 0.0230±0.0220 |
gnd-396(3/23) | 0.0521±0.0608 | 0.0887±0.0627 | 0.0084±0.0068 | 0.0107±0.0058 | 0.0090±0.0090 |
COI-338(3/23) | 0.0512±0.0422 | 0.0693±0.0490 | 0.0147±0.0128 | 0.0202±0.0107 | 0.0165±0.0145 |
COII-338(3/23) | 0.0371±0.0369 | 0.0563±0.0399 | 0.0083±0.0052 | 0.0091±0.0048 | 0.0090±0.0090 |
Cytb-338(3/23) | 0.0480±0.0457 | 0.0703±0.0498 | 0.0142±0.0124 | 0.0158±0.0121 | 0.0230±0.0220 |
gnd-338(3/23) | 0.0521±0.0608 | 0.0887±0.0627 | 0.0084±0.0068 | 0.0107±0.0058 | 0.0090±0.0090 |
Laingia | |||||
COI-670(1/2) | - | - | 0.0640±0.0000 | 0.0640±0.0000 | 0.0640±0.0000 |
COII-376(1/2) | - | - | 0.0680±0.0000 | 0.0680±0.0000 | 0.0680±0.0000 |
Cytb-413(1/2) | - | - | 0.0620±0.0000 | 0.0620±0.0000 | 0.0620±0.0000 |
Sipha | |||||
COI-670(5/17) | 0.0940±0.0250 | 0.0882±0.0320 | 0.0082±0.0127 | 0.0147±0.0139 | 0.0118±0.0159 |
COII-376(2/5) | 0.1115±0.0009 | 0.1110±0.0000 | 0.0027±0.0012 | 0.0027±0.0012 | 0.0040±0.0000 |
Cytb-413(2/5) | 0.1073±0.0013 | 0.1060±0.0000 | 0.0048±0.0031 | 0.0058±0.0024 | 0.0090±0.0000 |
gnd-396(1/4) | - | - | 0.0005±0.0005 | 0.0010±0.0000 | 0.0010±0.0000 |
COI-338(1/4) | - | - | 0.0033±0.0021 | 0.0040±0.0017 | 0.0060±0.0000 |
COII-338(1/4) | - | - | 0.0027±0.0012 | 0.0027±0.0012 | 0.0040±0.0000 |
Cytb-338(1/4) | - | - | 0.0048±0.0031 | 0.0058±0.0024 | 0.0090±0.0000 |
gnd-338(1/4) | - | - | 0.0005±0.0005 | 0.0010±0.0000 | 0.0010±0.0000 |
To observe the occurrence frequency of different genetic divergences, we drew the frequency line charts of inter- and intra-specific genetic distances based on 338 datasets (Figure
Frequency line charts of inter- and intra-specific genetic distances based on 338 dataset. The x-axis represents the genetic distance, and the y-axis represents the occurrence times in the whole genetic distance matrix. Each peak was a data point with corresponding genetic distance and occurrence times. The data points on the green and red line were calculated with the interspecific distances, and the points on purple and blue line were calculated with the intraspecific distances. The overlap region, which was the crossing area of inter- and intra-specific divergence, is indicated by the red dotted rectangle. Each gene was signified in one chart: the top half was calculated with all the 338 samples; and the bottom half was scored by eliminating the queried samples of Chaetosiphella longirostris.
Given that there were no unambiguous and credible references of genetic thresholds for COII, Cytb, and gnd in aphids, the method of threshold with NJ was applied only in the COI-670 dataset with a threshold of 2% (
The morphological and molecular identification results are shown in Table
Dataset/Method | Morphology | Cluster number | Accurate | Split | Lumped | Partial lumped |
---|---|---|---|---|---|---|
COI-670 | 75 | |||||
GMYC | 89 | 65.17% | 31.46% | 2.25% | 1.12% | |
PTP | 85 | 67.06% | 28.24% | 3.53% | 1.18% | |
ABGD | 81 | 72.84% | 23.46% | 2.47% | 1.23% | |
threshold with NJ | 81 | 72.84% | 23.46% | 2.47% | 1.23% | |
COII-376 | 51 | 1.89% | ||||
GMYC | 53 | 83.02% | 13.21% | 1.89% | ||
PTP | 48 | 83.33% | 8.33% | 8.33% | 0.00% | |
ABGD | 50 | 82.00% | 12.00% | 6.00% | 0.00% | |
Cytb-413 | 48 | |||||
GMYC | 54 | 79.63% | 18.52% | 0.00% | 1.85% | |
PTP | 49 | 81.63% | 12.24% | 2.04% | 4.08% | |
ABGD | 48 | 79.17% | 12.50% | 4.17% | 4.17% | |
gnd-396 | 49 | |||||
GMYC | 46 | 86.96% | 4.35% | 8.70% | 0.00% | |
PTP | 45 | 84.44% | 4.44% | 11.11% | 0.00% | |
ABGD | 48 | 87.50% | 6.25% | 4.17% | 2.08% | |
COI-338 | 45 | |||||
GMYC | 47 | 89.36% | 8.51% | 0.00% | 2.13% | |
PTP | 49 | 85.71% | 12.24% | 0.00% | 2.04% | |
ABGD | 46 | 91.30% | 6.52% | 0.00% | 2.17% | |
COII-338 | 45 | |||||
GMYC | 45 | 93.33% | 4.44% | 2.22% | 0.00% | |
PTP | 45 | 86.67% | 8.89% | 4.44% | 0.00% | |
ABGD | 45 | 86.67% | 8.89% | 4.44% | 0.00% | |
Cytb-338 | 45 | |||||
GMYC | 50 | 80.00% | 18.00% | 0.00% | 2.00% | |
PTP | 46 | 80.43% | 13.04% | 2.17% | 4.35% | |
ABGD | 42 | 83.33% | 4.76% | 7.14% | 4.76% | |
gnd-338 | 45 | |||||
GMYC | 42 | 92.86% | 0.00% | 7.14% | 0.00% | |
PTP | 41 | 90.24% | 0.00% | 9.76% | 0.00% | |
ABGD | 44 | 93.18% | 2.27% | 2.27% | 2.27% |
Seventy-five morphological species were identified from COI-670, which included sequences downloaded from NCBI. For COI-670, we obtained 89 putative species by the GMYC approach with a 65.17% accuracy rate, 85 species using PTP with 67.06% accuracy, 81 species using ABGD with 72.84% accuracy, and 81 species by threshold-NJ with 72.84% accuracy. The COII-376 dataset with only one sequence from NCBI contained 51 morphological species, and 53 hypothetic species were gleaned using the GMYC method with an accuracy rate of 83.02%, 48 species using PTP with 83.33% accuracy, and 50 species using ABGD with 82.00% accuracy. The Cytb-413 data contained 48 morphological species and generated 54 clusters by the GMYC method with an accuracy rate of 79.63%, 49 species using PTP with 81.63% accuracy, and 48 species using ABGD with 79.17% accuracy. There were 49 morphological species within gnd-396, and the putative species found using the GMYC approach was 46 with an accuracy rate of 86.96%, using the PTP method was 45 species with an accuracy rate of 84.44%, and using the ABGD analysis was 48 species with 87.50% accuracy. An analysis of COI-338, COII-338, Cytb-338, and gnd-338 were performed to compare the results of different genes with diverse methods under the same sample composition. There were 45 morphological species within the given 338 samples. The analysis results of various genes and accuracy rate were: COI-338 (GMYC: 47, 89.36%; PTP: 49, 85.71%; ABGD: 46, 91.30%), COII-338 (GMYC: 45, 93.33%; PTP: 45, 86.67%; ABGD: 45, 86.67%), Cytb-338 (GMYC: 50, 80.00%; PTP: 46, 80.43%; ABGD: 42, 83.33%), and gnd-338 (GMYC: 42, 92.86%; PTP: 41, 90.24%; ABGD: 44, 93.18%). The final results were all displayed in NJ trees (see Suppl. material
COI has not been the only gene marker used for aphid DNA barcoding, other genes from the mitochondrial genome and from endosymbionts having been used for various aphid groups (
For different genes, the amplification efficiency was COI (100%) > Cytb (97%) > gnd (93%) > COII (88%). Within the 338-sample dataset, the difference in values between the smallest interspecific distance and average coalescent depth were unequal in different groups. For Chaitophorus, Periphyllus, and Chaetosiphella (Table
The most important factor in choosing the delimitation method was the identification accuracy within different genes. Therefore, a better approach means higher identification accuracy and a greater range of application with various genes. The accuracy of GMYC, PTP, and ABGD within COI-338, COII-338, Cytb-338, and gnd-338 were ABGD (91.30%) > GMYC (89.36%) > PTP (85.71%), GMYC (93.33%) > PTP = ABGD (86.67%), ABGD (83.33%) > PTP (80.43%) > GMYC (80.00%), and ABGD (93.18%) > GMYC (92.86%) > PTP (90.24%), respectively (Table
Chaitophorus saliniger Shinji is an important pest on willows in East Asia. Based on the topology structure and results of analysis with different methods (Figure
The analysis results of some species from the COI-670 dataset. The analysis results based on other genes were almost identical. The NJ tree was constructed based on the Kimura 2-parameter (K2P) model with a bootstrap value over 50% displayed. The gray blocks behind the tree represent the putative species, which means that the taxa in the tree corresponding to a single block are in one putative species. The number of blocks express the number of putative species using this method. A Chaitophorus saliniger B Laingia psammae.
In a similar manner, two samples (Nos. 17613 and 19950) of Laingia psammae Theobald were divided into two independent clades (Figure
From the topology structures and the constructed consequences of threshold with NJ, ABGD, GMYC, and PTP, we observed that population differentiation was clearly present within both C. saliniger and L. psammae. Similar findings have been reported in other aphid species (
In this work, the DNA barcoding of Chaitophorinae aphids was investigated. Three mitochondrial genes and one endosymbiont gene were used to calculate and compare the genetic distances within different datasets. For the delimitation of species, two distance-based methods, threshold with NJ and ABGD, and two tree-based approaches, GMYC and PTP were employed. The interspecific genetic divergence was clearly greater than intraspecific divergence in the four molecular markers. Additionally, the COI and COII genes were more suitable as Chaitophorinae DNA barcoding markers. Based on the data for Chaitophorus saliniger and Laingia psammae, DNA barcoding may reveal population differentiation driven by geographical distribution.
We were very grateful for all the samples collectors of their assistance, and appreciated Fen-Di Yang for mounted slides making of all the voucher specimens. This work was supported by the National Natural Sciences Foundation of China (Nos. 31620103916, 31572307, 31430078), and the External Cooperation Program of BIC, Chinese Academy of Sciences (No. 152111KYSB20130012).
Table S1
Data type: specimens data
Explanation note: Sample information.
Table S2
Data type: molecular data
Explanation note: Primer information.
Table S3
Data type: molecular data
Explanation note: The analysis results with ABGD of all datasets.
Figure S1
Data type: molecular data
Explanation note: The analysis results of dataset COI-670.
Figure S2
Data type: molecular data
Explanation note: The analysis results of dataset COII-376.
Figure S3
Data type: molecular data
Explanation note: The analysis results of dataset Cytb-413.
Figure S4
Data type: molecular data
Explanation note: The analysis results of dataset gnd-396.
Figure S5
Data type: molecular data
Explanation note: The analysis results of dataset COI-338.
Figure S6
Data type: molecular data
Explanation note: The analysis results of dataset COII-338.
Figure S7
Data type: molecular data
Explanation note: The analysis results of dataset Cytb-338.
Figure S8
Data type: molecular data
Explanation note: The analysis results of dataset gnd-338.