Data Paper |
Corresponding author: Niklas Wahlberg ( niklas.wahlberg@biol.lu.se ) Academic editor: Erik J. van Nieukerken
© 2016 Niklas Wahlberg, Carlos Peña, Milla Ahola, Christopher W. Wheat, Jadranka Rota.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Wahlberg N, Peña C, Ahola M, Wheat CW, Rota J (2016) PCR primers for 30 novel gene regions in the nuclear genomes of Lepidoptera. ZooKeys 596: 129-141. https://doi.org/10.3897/zookeys.596.8399
|
We report primer pairs for 30 new gene regions in the nuclear genomes of Lepidoptera that can be amplified using a standard PCR protocol. The new primers were tested across diverse Lepidoptera, including nonditrysians and a wide selection of ditrysians. These new gene regions give a total of 11,043 bp of DNA sequence data and they show similar variability to traditionally used nuclear gene regions in studies of Lepidoptera. We feel that a PCR-based approach still has its place in molecular systematic studies of Lepidoptera, particularly at the intrafamilial level, and our new set of primers now provides a route to generating phylogenomic datasets using traditional methods.
Molecular systematics, Lepidoptera , phylogenomics, phylogenetics
Post-Sanger sequencing technologies have opened up vast possibilities for acquiring molecular data for inferring phylogenetic relationships among taxa using 100s to 1000s of loci (
For the past two decades, the standard protocol in insect molecular systematics has been to extract genomic DNA from one or two legs of dried individuals, often several years old, generally yielding very low concentrations of DNA. Today, millions of such genomic DNA extracts exist, each taken from suboptimally stored specimens, generated by individual researchers and large facilities such as the Canadian Centre for DNA Barcoding. These extracts have been used to PCR amplify specific gene regions, followed by Sanger sequencing. This standard approach has traditionally been restricted to fewer than 10 gene regions due to the lack of universal primers for more regions. Given this extensive DNA resource and the inability of the aforementioned methods to be easily applied to them, here we present an approach for using these extracts in the pursuit of phylogenomic insights.
As DNA sequencing technologies continue to evolve, the molecular systematist must judiciously choose which tools are best suited to the questions they wish to address. While genome scale data are certainly useful, such data are expensive, difficult to analyze and ultimately only a small fraction is utilized. Perhaps most importantly, such large scale datasets are likely only necessary for resolving deeper evolutionary events, such as the relationships among orders of insects (
Whole genome sequences can now be used to search for suitable gene regions for primer design (e.g.
Here we design and test PCR primers for long exon regions of single copy, protein-coding genes across Lepidoptera based on publicly available whole genome sequences of the order. The new gene regions are shown to be phylogenetically informative for Lepidoptera and can be used to complement the eight gene regions that have become standard in Lepidoptera phylogenetics (
Single copy, protein-coding genes with exons longer than 500 bp were found while manually curating the set of genes listed in
As in
Taxa used to test the primers for amplifying the new gene regions. The last column summarizes the number of new gene regions sequenced for each specimen. See Suppl. material
Voucher code | Family | Genus | Species | Number of new genes sequenced |
---|---|---|---|---|
MM00058 | Micropterigidae | Micropterix | aureatella | 11 |
MM00867 | Nepticulidae | Ectoedemia | occultella | 18 |
MM00943 | Tischeriidae | Tischeria | ekebladella | 18 |
MM02175 | Psychidae | Taleporia | tubulosa | 22 |
MM00030 | Gracillariidae | Gracillaria | syringella | 26 |
MM00306 | Yponomeutidae | Yponomeuta | evonymellus | 27 |
MM00510 | Tortricidae | Tortrix | viridana | 22 |
MM00014 | Schreckensteiniidae | Schreckensteinia | festaliella | 26 |
MM02524 | Epermeniidae | Epermenia | illigerella | 24 |
MM03096 | Pterophoridae | Stenoptilia | veronicae | 22 |
MM00913 | Alucitidae | Alucita | hexadactyla | 19 |
MM03941 | Choreutidae | Choreutis | pariana | 21 |
MM00021 | Urodidae | Wockia | asperipunctella | 17 |
MM00116 | Cossidae | Cossus | cossus | 28 |
MM00125 | Sesiidae | Synanthedon | scoliaeformis | 29 |
MM00312 | Zygaenidae | Adscita | statices | 26 |
MM00034 | Hesperiidae | Pyrgus | malvae | 24 |
MM00042 | Elachistidae | Ethmia | pusiella | 25 |
MM00051 | Pyralidae | Pyralis | farinalis | 24 |
MM00027 | Drepanidae | Thyatira | batis | 28 |
MM00032 | Geometridae | Cyclophora | punctaria | 26 |
MM00394 | Endromidae | Endromis | versicolora | 29 |
MM01170 | Noctuidae | Apamea | crenata | 27 |
MM02696 | Lasiocampidae | Poecilocampa | populi | 24 |
Sequences were trimmed of primer sequences and aligned by eye with reference to amino acid sequence in BioEdit 7 (
We selected a total of 48 gene regions (see Supplementary material for alignments) for primer design, of which 30 successfully amplified (Suppl. material
Basic information about the new gene regions amplified and sequenced in this study, along with the traditional eight genes used in many previous studies for comparison.
Gene name | Length (bp) | Number of specimens successful | Variable (%) | Pars. Inf. (%) | Conserved (%) | Freq. A (%) | Freq. T (%) | Freq. C (%) | Freq. G (%) | GeneID from Bombyx genome |
---|---|---|---|---|---|---|---|---|---|---|
AFG3a | 336 | 22 | 39.3 | 37.5 | 60.7 | 28.0 | 27.7 | 20.2 | 24.1 | BGIBMGA010088 |
AFG3b | 300 | 11 | 47.3 | 39.7 | 52.7 | 34.9 | 20.9 | 20.7 | 23.6 | BGIBMGA010088 |
ANK13C | 330 | 20 | 49.1 | 38.8 | 50.9 | 33.0 | 28.5 | 16.4 | 22.2 | BGIBMGA007536 |
ArgK | 388 | 24 | 44.6 | 33.5 | 55.4 | 22.9 | 19.0 | 32.1 | 26.1 | BGIBMGA005812 |
Ca-ATPase | 444 | 23 | 37.2 | 30.2 | 62.8 | 24.9 | 21.0 | 30.1 | 24.0 | BGIBMGA000408 |
Ca2 | 410 | 18 | 44.9 | 38.5 | 55.1 | 33.2 | 23.6 | 18.5 | 24.7 | BGIBMGA006603 |
chitinase | 405 | 18 | 47.2 | 40.5 | 52.8 | 25.7 | 27.4 | 23.8 | 23.2 | BGIBMGA008709 |
Cullin5 | 327 | 22 | 48.3 | 41.0 | 51.7 | 33.4 | 28.7 | 17.5 | 20.5 | BGIBMGA011511 |
CycY | 375 | 18 | 39.7 | 35.5 | 60.3 | 29.9 | 31.4 | 17.2 | 21.6 | BGIBMGA005969 |
DDX23 | 303 | 24 | 46.9 | 43.2 | 53.1 | 40.4 | 22.6 | 13.8 | 23.2 | BGIBMGA003429 |
Exp1 | 729 | 15 | 43.6 | 35.8 | 56.4 | 31.4 | 28.2 | 19.5 | 21.0 | BGIBMGA010657 |
FCF1 | 173 | 17 | 49.7 | 42.8 | 50.3 | 32.4 | 27.7 | 16.2 | 23.7 | BGIBMGA010318 |
GLYP | 384 | 14 | 52.3 | 44.0 | 47.7 | 27.2 | 24.8 | 25.1 | 22.9 | BGIBMGA010361 |
KRR1 | 283 | 16 | 47.0 | 39.2 | 53.0 | 35.4 | 26.4 | 18.0 | 20.2 | BGIBMGA005381 |
LeuZip | 372 | 9 | 49.5 | 35.8 | 50.5 | 36.4 | 24.8 | 18.0 | 20.9 | BGIBMGA003300 |
MK6 | 255 | 20 | 52.2 | 45.1 | 47.8 | 32.8 | 28.1 | 18.6 | 20.6 | BGIBMGA005641 |
MMP41 | 285 | 21 | 56.5 | 48.1 | 43.5 | 31.1 | 30.6 | 19.7 | 18.6 | BGIBMGA007574 |
MPP2 | 330 | 21 | 44.9 | 40.3 | 55.2 | 29.0 | 29.4 | 22.6 | 19.1 | BGIBMGA008312 |
NC | 573 | 15 | 48.9 | 39.6 | 51.1 | 32.2 | 29.1 | 17.0 | 21.7 | BGIBMGA005035 |
Nex9 | 420 | 21 | 60.5 | 47.4 | 39.5 | 33.1 | 24.8 | 19.0 | 23.2 | BGIBMGA001032 |
PolII | 360 | 22 | 43.9 | 39.4 | 56.1 | 30.1 | 25.3 | 19.7 | 24.8 | BGIBMGA004994 |
ProSup | 432 | 22 | 58.8 | 47.5 | 41.2 | 25.6 | 27.8 | 21.0 | 25.6 | BGIBMGA004645 |
PSb | 366 | 23 | 54.4 | 45.9 | 45.6 | 24.8 | 23.9 | 26.7 | 24.7 | BGIBMGA000201 |
SARAH | 381 | 16 | 56.4 | 44.9 | 43.6 | 29.2 | 27.8 | 23.3 | 19.7 | BGIBMGA011095 |
Ssu72 | 249 | 23 | 55.0 | 48.2 | 45.0 | 36.0 | 28.1 | 16.0 | 19.9 | BGIBMGA000925 |
TIF3Cb | 324 | 13 | 50.6 | 40.1 | 49.4 | 24.7 | 22.1 | 28.9 | 24.3 | BGIBMGA012851 |
TIF6 | 336 | 18 | 50.0 | 42.6 | 50.0 | 24.4 | 21.4 | 25.5 | 28.8 | BGIBMGA009830 |
UDPG6DH | 405 | 21 | 49.1 | 41.0 | 50.9 | 30.1 | 27.4 | 20.9 | 21.6 | BGIBMGA012188 |
VPS4 | 432 | 15 | 40.7 | 35.4 | 59.3 | 28.9 | 28.9 | 20.1 | 22.1 | BGIBMGA005930 |
WD40 | 339 | 21 | 42.5 | 38.6 | 57.5 | 30.1 | 31.4 | 19.3 | 19.2 | BGIBMGA006243 |
Genes from |
||||||||||
CAD | 826 | 24 | 52.4 | 42.7 | 47.6 | 35.9 | 28.3 | 14.6 | 21.2 | |
COI | 1476 | 23 | 44.4 | 33.0 | 55.6 | 31.1 | 40.0 | 14.9 | 14.0 | |
EF1a | 1047 | 21 | 34.9 | 27.2 | 65.1 | 25.4 | 23.0 | 27.6 | 24.0 | |
GAPDH | 691 | 12 | 38.9 | 30.8 | 61.1 | 23.6 | 25.8 | 27.3 | 23.3 | |
IDH | 722 | 23 | 48.2 | 41.1 | 51.8 | 31.2 | 27.1 | 19.8 | 21.9 | |
MDH | 407 | 23 | 47.9 | 41.3 | 52.1 | 27.4 | 25.8 | 22.7 | 24.1 | |
RpS5 | 603 | 20 | 38.5 | 34.3 | 61.5 | 25.4 | 24.9 | 24.4 | 25.3 | |
wingless | 400 | 20 | 58.5 | 48.5 | 41.5 | 21.7 | 18.3 | 28.9 | 31.0 |
Primers for 30 new gene regions with universal tails (T7promoter-TAATACGACTCACTATAGGG to forward primers and T3-ATTAACCCTCACTAAAGGG to reverse primers) attached to the 5’ end. F = Forward, R = Reverse. Gene names from Table
Gene | Primer |
---|---|
AFG3a_F | TAATACGACTCACTATAGGGTGTGAAGAAGCTAAGatwgaratyatggartt |
AFG3a_R | ATTAACCCTCACTAAAGGGTGTTGTTGTATTAAAAccrtccatytchac |
AFG3b_F | TAATACGACTCACTATAGGGTGCTCAAGACGACCtdaaraaratmac |
AFG3b_R | ATTAACCCTCACTAAAGGGCCTGTACCTTCCACGaaytcytcrtamgt |
ANK13C_F | TAATACGACTCACTATAGGGCAAATACAAAATTTTTATATGGAAytdaartgggaytt |
ANK13C_R | ATTAACCCTCACTAAAGGGGCAACTGTTTCTTTTCTAtcytcwcgraadatcca |
ArgK_F | TAATACGACTCACTATAGGGyGAyCCsATCATyGAGGACTACCA |
ArgK_R | ATTAACCCTCACTAAAGGGAGrTGGTCCTCCTCrTTGCACCAvAC |
Ca2_F | TAATACGACTCACTATAGGGAAACAGTGGACtgyttgaaraarttcaayg |
Ca2_R | ATTAACCCTCACTAAAGGGGGTGTGTTGTCGATGaaraayttrtgraa |
Ca-ATPase_F | TAATACGACTCACTATAGGGGAAtacgarccbgaaatgggwaargt |
Ca-ATPase_R | ATTAACCCTCACTAAAGGGcdccrtgrgcggggtcgttraagtg |
chitinase_F | TAATACGACTCACTATAGGGGGTGGGTGCTtayttygtngaatgggg |
chitinase_R | ATTAACCCTCACTAAAGGGTGTCCACAccrtcraaraayttcca |
Cullin5_F | TAATACGACTCACTATAGGGTGTTAGTTAAAGATGCTTTTATGgaygaycchmg |
Cullin5_R | ATTAACCCTCACTAAAGGGTCTTAACCATTCAaccatrtcytcttcyttytc |
CycY_F | TAATACGACTCACTATAGGGgattatgayaartataatccwgaacayaaaca |
CycY_R | ATTAACCCTCACTAAAGGGcattgcytcyaatttytgtgcyctttcytt |
DDX23_F | TAATACGACTCACTATAGGGACAAAAGATAAAGAACGTgargargargchat |
DDX23_R | ATTAACCCTCACTAAAGGGTGATCTTTTTCAgaccartghckrtcatccca |
Exp1_F | TAATACGACTCACTATAGGGgthaataaaytdtttgaattyatgcatga |
Exp1_R | ATTAACCCTCACTAAAGGGggrtaytcttcaaartctttrttdatcat |
FCF1_F | TAATACGACTCACTATAGGGACTGGACATCGtdcarartatgatggayt |
FCF1_R | ATTAACCCTCACTAAAGGGTTGTAGCCACGATGtarcayttrtgytg |
GLYP_F | TAATACGACTCACTATAGGGACTGCGACAAGAAtayttyatgtgygcbgc |
GLYP_R | ATTAACCCTCACTAAAGGGTTCACTCGTTTTTCACCTtcytcytcdat |
KRR1_F | TAATACGACTCACTATAGGGaatgcktggrctatgaaratwcc |
KRR1_R | ATTAACCCTCACTAAAGGGtdataatrtcrcatccwatttcrtc |
LeuZip_F | TAATACGACTCACTATAGGGTGCCTGTCACAAaaygaytggaaryt |
LeuZip_R | ATTAACCCTCACTAAAGGGTTTGACCAGGGTTTttdgcrtarttraa |
MK6_F | TAATACGACTCACTATAGGGTTAGAGAAGGTGATgtntggathtgyatgga |
MK6_R | ATTAACCCTCACTAAAGGGTTCTTTCTGGTGCCATGtanggyttrca |
MMP41_F | TAATACGACTCACTATAGGGGAAAACTGGGGTGCTAAagtdtayttyaaya |
MMP41_R | ATTAACCCTCACTAAAGGGTCACTTTGtttttrttytchccaaawgtcat |
MPP2_F | TAATACGACTCACTATAGGGCACTTCCGAATCccdtggttycartaycc |
MPP2_R | ATTAACCCTCACTAAAGGGCCACAGCAGCTGTGtaytcyttdccraa |
NC_F | TAATACGACTCACTATAGGGgatgaagaaaaycchaaraarttytt |
NC_R | ATTAACCCTCACTAAAGGGacwatdgaccartggaarttcatdgc |
Nex9_F | TAATACGACTCACTATAGGGTGCAACTGCAAgartttgtngaytggatg |
Nex9_R | ATTAACCCTCACTAAAGGGCCCAGTCGTATTTAggytgbtcntcatacat |
PolII_F | TAATACGACTCACTATAGGGCTGAAACACCTACAatggcbathgaytgggt |
PolII_R | ATTAACCCTCACTAAAGGGGCTGTAGGGTTCCATttdgcrtgytcytt |
ProSup_F | TAATACGACTCACTATAGGGGACAACAATCGACtggcayccnaayaa |
ProSup_R | ATTAACCCTCACTAAAGGGCTGTCCAGTgactggaayttyttcatdgc |
PSb_F | TAATACGACTCACTATAGGGGCTGGGAGCTACTggvtgytggtgygaya |
PSb_R | ATTAACCCTCACTAAAGGGAGATGCAGTCTCCAGTGTAGatrtcdckytc |
SARAH_F | TAATACGACTCACTATAGGGGAAGATGGTATGCCTAATAtwcaycchaayat |
SARAH_R | ATTAACCCTCACTAAAGGGGTTCACCTTCTTCACGAggytcccadccna |
Ssu72_F | TAATACGACTCACTATAGGGCAGCTGACAGACCTaaytgttaygarttygg |
Ssu72_R | ATTAACCCTCACTAAAGGGCCGATTGTAGCTTCTtcrtgrttrtcytg |
TIF3Cb_F | TAATACGACTCACTATAGGGGAAAAATCGACCACCTGtaytayaarttyga |
TIF3Cb_R | ATTAACCCTCACTAAAGGGGCCAGCAGTTCTTTAggyttnccvgtcatca |
TIF6_F | TAATACGACTCACTATAGGGCTGTGCGAGTGcarttygaraayaataa |
TIF6_R | ATTAACCCTCACTAAAGGGTGTGTCAGCCAGGatytcytchgtrtc |
UDPG6DH_F | TAATACGACTCACTATAGGGCAGGAACTGTGTtgggtvtaygarcaytg |
UDPG6DH_R | ATTAACCCTCACTAAAGGGTCTTGTGTCGCCTgtrttyttyttraa |
WD40_F | TAATACGACTCACTATAGGGGATCCACTTCACAcaygcyaaraayac |
WD40_R | ATTAACCCTCACTAAAGGGCCTgtccartcacaytcyttytcttg |
VPS4_F | TAATACGACTCACTATAGGGTGATTCTGATGATCCAGAAaaraaraaryt |
VPS4_R | ATTAACCCTCACTAAAGGGCATCCATATCAttvccdacaccttgcatytg |
The variability in the new gene regions appears to be similar to the widely used nuclear gene regions reported in
We report here primers for 30 new nuclear gene regions that can be used to complement existing molecular data for Lepidoptera systematics. Our primers were designed to amplify gene regions across the entire taxonomic array of Lepidoptera and to work on relatively degraded material by amplifying less than 500 bp segments of the genome. Many of these primers are being used successfully in our laboratory for projects on e.g. the nymphalid subfamily Limenitidinae (Dhungel and Wahlberg in prep.), the families Geometridae (Brehm et al. in prep.), Choreutidae (Rota et al. in prep.), Limacodidae (Dupont et al. in prep.) and Riodinidae (Seraphim et al. in prep.). The phylogenetic utility of the used gene regions will be reported in more detail in the forthcoming papers: in summary, they are providing similar resolution as the standard gene regions reported in
We would like to stress that the gene regions described here should be seen as complementary to the standard gene regions (
More specifically, it seems that several fragments are not very suitable for nonditrysians (none of the three exemplars that we used amplified AFG3b, CHITINASE, KRR1, NC, SARAH, VPS4) and the utility of several other fragments for these groups needs to be further tested (Ca2, GLYP, MPP2, NEX9, POLII, TIF3CB, and UDPG6DH amplified in only one of the nonditrysians tested). On the other hand, 21 fragments amplified in four or more of the six exemplars of Macroheterocera (the exceptions being LeuZip, which amplified in only one of them, and TIF3Cb and VPS4, which amplified in three out of six). The situation is more complex across the lower ditrysians and apoditrysians, which can be expected since these groups are quite divergent (
In this study, we have used traditional Sanger sequencing to acquire the DNA sequences. However, almost all of the amplicons are short enough to be multiplexed and sequenced on a NextGen sequencing platform, such as Illumina. The advantages would be quick generation of a large number of sequences for a large number of samples. On the other hand, many systematists do not have access to NextGen sequencers, or the bioinformatics knowhow to process the raw data into useable formats, in which case the traditional PCR-based Sanger sequencing approach is still appropriate.
The approach we have used is highly conservative, as we sought to find primer pairs that work under standard conditions. It would thus be possible to design primers for the 18 gene regions that did not work under our strict criteria, but would work under different conditions. It is also possible to design primers that would amplify a longer segment of DNA, although such primer pairs would require fresh samples with little degradation of genomic DNA. It would also be possible to find more gene regions with exon lengths more than 500 bp, although a PCR-based approach becomes less and less efficient as the number of reactions grows. It is quite likely that datasets comprising up to 20 gene regions are sufficient for most phylogenetic studies within families (
We thank Andrew Mitchell for critical comments on the manuscript. This study was funded by a grant from Kone Foundation to JR and the Academy of Finland to NW.
Table S1
Data type: NCBI accession numbers
Explanation note: Details of the success of sequencing of the new gene regions. GenBank accession number indicates successful sequencing, dash indicates unsuccessful amplification.
Sequences used for designing primers
Data type: Reference sequences
Explanation note: A zip-file containing reference sequences for all 48 gene regions used for designing primers.