Efficacy of the core DNA barcodes in identifying processed and poorly conserved plant materials commonly used in South African traditional medicine

Abstract Medicinal plants cover a broad range of taxa, which may be phylogenetically less related but morphologically very similar. Such morphological similarity between species may lead to misidentification and inappropriate use. Also the substitution of a medicinal plant by a cheaper alternative (e.g. other non-medicinal plant species), either due to misidentification, or deliberately to cheat consumers, is an issue of growing concern. In this study, we used DNA barcoding to identify commonly used medicinal plants in South Africa. Using the core plant barcodes, matK and rbcLa, obtained from processed and poorly conserved materials sold at the muthi traditional medicine market, we tested efficacy of the barcodes in species discrimination. Based on genetic divergence, PCR amplification efficiency and BLAST algorithm, we revealed varied discriminatory potentials for the DNA barcodes. In general, the barcodes exhibited high discriminatory power, indicating their effectiveness in verifying the identity of the most common plant species traded in South African medicinal markets. BLAST algorithm successfully matched 61% of the queries against a reference database, suggesting that most of the information supplied by sellers at traditional medicinal markets in South Africa is correct. Our findings reinforce the utility of DNA barcoding technique in limiting false identification that can harm public health.

introduction Traditional medicine is regarded as the most famous health care system in the world (WHO 2002), likely because of its accessibility and popularity. Currently, over 80% of human population around the globe relies on medicinal plants for their daily fight for better health (WHO 2002). In Africa, access to modern medical treatment is very limited largely due to lack of facilities or, when hospitals exist; their services are unaffordable for the majority. As a result, medicinal plants are extensively used to meet people's needs for health care (Staden 1999, Hostettman et al. 2000, WHO 2002, Fyhrquist 2007, Koduru et al. 2007. South Africa has a rich tropical and temperate flora, harbouring approximately 24,000 species, which account for more than 10% of the world's vascular plants (Germishuizen and Meyer 2003). Of this unique diversity, approximately 3000 species (~13%) are used as medicines, with a large number of them exported to other countries even outside Africa (Van Wyk et al. 1997).
In the recent past, harvesting medicinal plants was the domain of trained traditional healers, well known for their skills as herbalists or diviners who respected customary conservation practices (Cunningham 1993). Today, however, the gathering and trading of medicinal plants is no longer restricted to traditional healers but has entered informal commercial sectors of the South African economy, resulting in an increase in the number of herbal gatherers and traders (Dold and Cocks 2002). Mander (1998) recorded more than 100,000 traditional healers in South Africa. For example, in the Province of KwaZulu-Natal alone, between 20,000 and 30,000 people, mainly women, make their living from trade of non-timber forest products, particularly medicinal plants (Mander 1998). This intensive gathering of plants from the wild poses a serious threat to South Africa's rich biodiversity (Dold and Cocks 2002), increases risk of extinction (Hoareau and DaSilva 1999) and leads to scarcity of commonly used medicinal plants (Cunningham 1991, Mander 1997, 1998, Dold and Cocks 2002. Species such as Ocotea bullata (Burch.) Baill., Warburgia salutaris (G. Bertol.) Chiov. and Bowie volubilis Harv. ex Hook. f., which were once abundant, are now threatened with extinction due to over-harvesting in the wild (www.redlist.sanbi.org). In addition, some species such as Cassine transvaalensis (Burtt Davy) Codd, and Erythrophleum lasianthum Corbishley, are now becoming threatened also due to over-harvesting in the wild (Fennel et al. 2004). Given the increasing pressure on medicinal plants, there is a need for increasing commitment towards efficient controls and better practices that can help preserve medicinal plant diversity in South Africa.
To reach this objective, the primary step requires a reliable tool for accurate plant identification. Traditional plant identification is based on morphological characteristics, which can be problematic especially for medicinal plants that are mainly traded as dried or processed barks, dried leaves, roots, and stems ( Figure 1) in popular markets known in South Africa as muthi market. As such, traded medicinal plants are devoid of identification diagnostics making morphologically-based identification non applicable (Dold and Cocks 2002). Also, medicinal plants cover a broad range of taxa, which may be phylogenetically less related but morphologically very similar. Such similarity between species may lead to misidentification and inappropriate use (Chen et al. 2010). This is of high concern as it may cause fatalities especially given that several medicinal plants are poisonous (Watt and Breyer-Brandwijk 1962, Van Wyk et al. 2002, Bruni et al. 2010. For instance, WHO (2004) reported in Hong Kong, fourteen cases of accidental substitution of the roots of Gentiana and Clematis species with that of Podophyllum hexandrum Royle for their antiviral qualities due to similarity in the morphological features of their roots. Similarly, a serious case of cardiac arrhythmias was reported as a side effect, caused by the accidental substitution of plantain (Plantago major L.; used as dietary supplements) with Digitalis lanata Ehrh. (used for heart conditions; WHO 2004). In the early 2000's, large quantities of misidentified plantains were shipped to more than 150 manufacturers, distributors and retailers in the United States over a period of two years (WHO 2004). Another case of misidentification was in India, where mustard oil was accidentally contaminated with seeds of Argemone mexicana L., resulting in an epidemic of dropsy (WHO 2004). The misidentification of these seeds could have been avoided if there had been proper quality control of source materials (WHO 2004).
Given such alarming situations of misidentification, developing techniques to assist and support traditional plant identification (e.g. assigning dried barks, roots or leaves to species) is an urgent matter not only to preserve biodiversity and traditional knowledge attached to each plant (Yessoufou 2005) but also to secure human health (Chen et al. 2010). From this perspective, we propose that the use of DNA barcoding can assist in distinguishing species and assigning unidentified individuals or any plant organs or materials to species level (Kress et al. 2005, Kress and Erickson 2008, Lahaye et al. 2008, Kesanakurti et al. 2011. DNA barcoding is the use of a short gene sequence from a standardised region of the genome that could -in principle -distinguish between even closely related species (Hebert et al. 2004, Lahaye et al. 2008, Kesanakurti et al. 2011, Van der Bank et al. 2012). Ideally, DNA barcoding studies use fresh or well-preserved materials as sources of DNA. However, this is not always practical in many situations where DNA is already degraded because materials are either already processed or poorly preserved. Such situations include diet analyses (Huang 1972), ancient DNA studies (Pääbo et al. 2004), specimen identification from environmental DNA samples (Gratz 2004) and medicinal materials in muthi markets.
Two DNA regions were recently proposed as core barcodes, rbcLa and matK (CBOL 2009) with their identification efficacy estimated at 70-80% for land plants. The efficacy of DNA barcodes has rarely been evaluated for plant materials that are poorly stored or already processed; to our knowledge only one recent study has evaluated this with regards to animals where the discriminatory power of a mini-barcode was assessed in processed materials . In this study, we focus on poorly conserved and processed medicinal plant materials sold in a South African muthi market with specific emphasis on commonly used plants. First, we constructed a DNA barcode library for these medicinal plants using fresh materials. Second, we bought poorly conserved and processed materials sold at the muthi market, and tested the efficacy of the core barcodes in assigning these processed materials to their species using the DNA barcode library as the reference.

Taxon sampling
A total of 108 species belonging to 55 plant families were identified as commonly used medicinal plants in South Africa based on a literature survey (Hutchings et al. 1996, Van Wyk et al. 1997, Van Wyk and Gericke 2000) (see Appendix). We collected these plants from several localities in four Provinces in South Africa: Gauteng, Limpopo, Mpumalanga, and the Western Cape. Our sampling comprised 185 specimens (see Appendix). Collection details, taxonomy, voucher numbers, GPS coordinates, field pictures, and sequence data (matK and rbcLa) are archived online on the Barcode of Life Data Systems (BOLD) (www.boldsystems.org). The voucher specimens for all the taxa as well as GenBank and BOLD accession numbers are listed in the Appendix.
In addition, we included in this study, plant materials bought from the Faraday muthi market (henceforth muthi samples) in Johannesburg, South Africa. A muthi market is a popular market where trade and services in African traditional medicines are provided to the general public. Materials sold in this market include various plant parts such as dried or fresh leaves, seeds, barks, and roots, etc. (Figure 1). These materials are sometimes in poorly stored and/or processed states (e.g. powder). In total, we included 18 additional muthi samples in our sampling and recorded their vernacular names (mainly in isiZulu) as provided by the sellers. It was not possible to assign scientific names to the samples at the time of purchase as they were in poor condition or had already been processed.

DNA extraction, amplification, sequencing and alignment
Of the 108 species collected from the wild, leaf samples of 37 species were sent to the Canadian Centre for DNA Barcoding (CCDB) in Canada, where total DNA was extracted, the two core DNA barcodes (matK and rbcLa) were amplified and sequenced according to CCDB protocols. The sequencing for the remaining 71 species was done at the African Centre for DNA Barcoding (ACDB) in South Africa. The 18 muthi samples were also processed and sequenced at the ACDB.
DNA extraction followed the 2× CTAB method (Doyle and Doyle 1987). Polyvinyl pyrolidone (2% PVP) was added to reduce the effect of high polysaccharide concentration in the samples. After precipitating the DNA with 100% ethanol, it was stored at -20 °C for a minimum of two weeks (Fay et al. 1998). DNA extracts were purified using QIAquik silica columns (Qiagen Inc., Hilden, Germany) according to the manufacturers' protocol.
For both genes, PCR amplification was performed using ReadyMix Mastermix (Advanced Biotechnologies, Epson, Surrey, UK). We added 3.2% bovine serum albumin (BSA) to all reactions to serve as stabilizer for enzymes, to reduce problems with secondary structure, and improve annealing (Palumbi 1996). PCR amplification was performed using either the 9800 Fast Thermal Cycler or the GeneAmp PCR System 9700 machines. PCR programs used are as follows: (a) for rbcLa, pre-melt at 94 °C for 60 sec, denaturation at 94 °C for 60 s, annealing at 48 °C for 60 s, extension at 72 °C for 60 s (for 28 cycles), followed by a final extension at 72 °C for 7 min, and (b) for matK, the protocol consisted of pre-melt at 94 °C for 3 min, denaturation at 94 °C for 60 sec, annealing at 52 °C for 60 s, extension at 72 °C for 2 min (for 30 cycles), final extension at 72 °C for 7 min.
Cycle sequencing reactions were carried out in a GeneAmp PCR System 9700 thermal cycler using the ABI PRISM® BigDye® Terminator v3.1 (Applied Biosystems, Inc., California, USA). Cycle sequencing products were precipitated in ethanol and sodium acetate to remove excess dye terminators. Then suspended into 10 µl HiDi formamide (ABI) before sequencing on a ABI 3130 xl Genetic Analyzer (ABI).
Complementary strands were assembled and edited using Sequencher v3.1 (Gene Codes, Ann Arbor, Michigan, USA). All the sequences generated at ACDB and CCDB including those retrieved from BOLD were aligned manually in PAUP* v4.0b10 (Swofford 2002).

Data analyses
All analyses were conducted in the R package Spider ). Only species for which sequences of both genes (rbcLa and matK) were available were included in the analyses. First, we evaluated K2P-interspecific and intraspecific genetic distances using Wilcoxon's sum rank test and the significance of the differences between both distances was tested. Second, we determined the genetic distance suitable as threshold with which to test the efficacy of the DNA regions in assigning sequences to species. Third, we tested the identification efficacy used medicinal plants using three distance-based methods: best close match (Meier et al. 2006), near neighbour, and species identification methods used by BOLD (www. boldsystems.org). The best close match and near neighbour analyses measure the identification efficacy by searching for the closest individuals; the former focuses on a single nearest neighbour match, whereas the latter considers all matches within a specific threshold. The BOLD species identification method performed species delimitation based on a distance cut-off of 1%.
We then evaluated the ability of the core DNA barcodes in assigning poorly conserved or already-processed plant materials to species. For this test, the barcoding technique was applied on all 18 muthi samples. Our procedure here consisted of two steps. The first involved the use of vernacular names (in isiZulu) for the muthi samples to identify their scientific names based on Hutchings et al. (1996). The second step was based on the BLAST algorithm implemented in the BOLD identification system (www.boldsystems.org/index.php/IDS_OpenIdEngine) for matK and rbcL sequences. The BLAST algorithm measures the efficiency of species identification against a global data repository such as BOLD or GenBank (Munch et al. 2008). The program takes a query of the sequence and matches it against the database selected by the user. The E-value and maximum identity are two statistics that can be used to measure the efficiency of species identification. The results are reported in a rank list whereby the closer the hit is to 100% and the E-value to 0, the better the identification efficiency. The DNA sequences generated from the 18 poorly conserved and degraded muthi samples were BLASTed against the reference database of medicinal plants available on the BOLD system. For additional evidence to the BLAST test, we included the sequences of muthi samples (as queries) in the database of DNA matrix generated for all medicinal plants, and reconstructed a maximum parsimony (MP) phylogeny based on the combined DNA matrix. Our objective here was to trace on the phylogeny, the positions of muthi samples (our queries) along the phylogenetic tree. Support for the groupings was analysed using bootstrapping. Maximum parsimony analysis was performed using PAUP* v4.0b10 (Swofford 2002). Tree searches were done using heuristic searches with 1000 random sequence additions but keeping only 10 trees. Tree bisection-reconnection was performed with all character transformations treated as equally likely i.e. Fitch parsimony (Fitch 1971). Bootstrap resampling (Felsenstein 1985) was done also in PAUP* v4.0b10 (Swofford 2002). Node support was assessed based on the following scale: BS 50-74% (weak bootstrap support) and 75-100% for strong support (Hillis and Bull 1993, Murphy et al. 2001, Daru et al. 2013.

Results
Based on genetic divergence, rbcLa exhibits the lowest mean interspecific distance (0.08); in contrast, matK exhibits the highest mean interspecific distance, which almost doubles that of rbcLa + matK (0.22 versus 0.119 respectively). From the genetic variation test based on K2P-distance for matK, we found that interspecific distance was significantly higher than intraspecific (inter median = 0.232 vs. intra median = 0.00; Wilcoxon sum rank test, p < 0.001; Table 1), indicating that a barcode gap exists for matK. Also, a similar pattern was found for rbcLa, high significant difference between inter-and intraspecific distances (inter median = 0.07 vs. intra median = 0.001, p < 0.001). We also found that when rbcLa and matK were combined the interspecific distance was significantly higher than intraspecific distance (inter median = 0.12 vs intra median = 0.00, p < 0.001). Furthermore, our analyses indicate that a clear barcode gap exist between the range of intra-versus interspecific distances for all regions (Figure 2). The Tajima's K index of sequence was divergence measured as the mean number of substitutions per nucleotide which indicates that matK had the lowest sequence divergence (3%) whereas rbcLa and rbcLa + matK had similar divergence indices of 6% and 5% respectively.
We calculated the optimised genetic distance (threshold) with which the discriminatory power for different gene regions was evaluated. Apart from rbcLa for which the optimised threshold was lower than 1%, both matK and rbcLa + matK had optimised thresholds greater than 1% (i.e. 1.44% and 1.25% respectively). Using these cut-offs, we then evaluated the discriminatory power of different regions. We found that the combination rbcLa + matK provided the best discriminatory power based on the near neighbour and the best close match methods (96% and 97% respectively, Table 2). However, using the BOLD identification criteria, the discriminatory power of the combined regions dropped to 85% which is close to 86% for matK alone but higher than that of rbcLa (76%). Also, the application of BOLD identification criteria results in higher proportion of ambiguous identification: rbcLa (23%), matK (10%) and rbcLa + matK (11%). Conversely, the best close match method had the lowest proportion of ambiguous identification (i.e. 0-7%) for all regions tested.
We then BLASTed (compared) the sequences for the 18 poorly conserved and degraded muthi samples against the BOLD identification system. Two muthi samples proved difficult to amplify whereas the amplification was successful for the 16 remaining muthi samples (Table 3). Of the 16 samples, the BLAST test was successful for 11 samples (61%), indicating that the scientific names recovered from BLAST test matched perfectly the scientific names expected based on vernacular names. However, we found mismatches for five samples. These results were also indicated on the MP phylogeny presented in Figure 3. indicates specimens for which common names or scientific names could not be found in the available literature. IUCN redlist obtained from http://redlist.sanbi.org  Figure 3. Phylogram obtained from the maximum parsimony analysis of matK with muthi samples included as "query". Green dots indicate well-supported nodes (bootstrap support > 74%) and red dots indicate low bootstrap support (BS < 74%).Phylogram obtained from the maximum parsimony analysis of matK with muthi samples included as "query". Green dots indicate well-supported nodes (bootstrap support > 74%) and red dots indicate low bootstrap support (BS < 74%).

Discussion
The efficiency of a good barcode relies fundamentally on its ability to distinguish between closely related species. This is achieved only when there is enough genetic differentiation between rather than within species, i.e. when interspecific distance is significantly higher than intraspecific distance (Hebert et al. 2004, Savolainen et al. 2005, Lahaye et al. 2008. We tested this expectation on commonly used medicinal plants using matK and rbcLa. We found that both regions (matK and rbcLa) exhibit a significant barcode gap, suggesting that they should be efficient in assigning processed medicinal plants to species level. Further, the performance of each gene was very high for single and core barcodes (76-97%) but highest for the core under near neighbour and best close match methods. Overall, the core barcodes proves reliable in identifying commonly used medicinal plants of South Africa.
In several studies, the discriminatory power of the core barcodes has been questioned (Hollingsworth et al. 2009, Pettengill and Neel 2010, Roy et al. 2010, Wang et al. 2010, Clement and Donoghue 2012, Liu et al. 2012. These studies mainly focused on closely related species or single lineages. A recent study with a similar objective to ours also discounts the potential of the core barcodes in discriminating Chinese medicinal plants (Chen et al. 2010). The authors found a more reliable discriminatory power of 92.7% for ITS2 at the genus and species level from different plant families and closely related taxa. In our study, we did not include ITS2, but we found a similar power of 85% to 96% for the core barcodes (matK and rbcLa) in the context of South African commonly used medicinal plants. Chen et al. 2010 included 400 samples belonging to 326 species in 98 families covering dicots, monocots, gymnosperms and ferns of Chinese medicinal plants. Such broad sampling likely increased the probability of high proportion of closely related species, resulting in the low performance of the core barcodes in their study. However, our sampling size is limited to only commonly used medicinal plants (~108 species), and this restriction likely increases the chance of having less related species, leading to a higher performance we found for the core barcodes.
We further tested, the performance of the core barcodes by evaluating their identification efficacy on 18 medicinal plant products bought at the Faraday muthi market in Johannesburg, South Africa. The sequences generated from these 18 plant materials were BLASTed against the reference library on BOLD database system. Given that the plant materials sold at the muthi market were poorly conserved (dried, processed, etc.), we expected a very low percentage of DNA recovery and amplification. Possible explanation for the five samples that yielded false identification, and the two that failed are that the samples could be a mixture of leaves from multiple species. Such limitation could be overcome using individual sequencing of all components of mixed DNA samples based on high throughput sequencing techniques e.g. pyrosequencing technology, which is capable of simultaneously detecting many thousands of different sequences in a mixed sample, without the need for sub-cloning (Margulies et al. 2005).
Another possibility for the amplification failure observed in our study for some samples could be attributable to a bad post harvest condition of preservation, which may result in DNA degradation. Again, such limitation could be overcome through the search of a 'mini-barcode' (Meusnier et al. 2008. The technique of sliding window analysis is now available for that purpose and has been proven reliable . Given that medicinal plants are often poorly conserved or processed materials, the chance of successful extraction and amplification of long DNA fragments (> 200 bp) is very low (Meusnier et al. 2008. As such, a search for shorter and informative fragment is necessary if we are to verify the identity of commonly used medicinal plants which are generally devoid of morphological features. Furthermore, we found some mismatch in species identification by the BLAST algorithm and the corresponding species based on vernacular names. Although, South African medicinal plants are well documented (e.g. Hutchings et al. 1996, Van Wyk et al. 1997, it remains highly likely that the mismatch might not be an artefact of erroneous claims from plant sellers, but presumably due to the variation of names used for the same plants across different ethnic groups. The continual removal of medicinal plants from the wild has become worrisome in southern Africa (Setshogo and Mbereki 2011). Therefore, understanding the scarcity and popularity of plants at the muthi market is the starting point for conservation and evaluating threatened species (Williams et al. 2000, Setshogo andMbereki 2011). For instance, Williams et al. (2000) mentioned Helichrysum sp. as being scarce and threatened in the future because of its popularity and demand at the muthi markets. The harvesting of the whole plant, bulb, tuber or roots before the seeds germinate damages the plant more than harvesting only leaves, seeds, bark or fruits (as seen in Figure 1). Although only about 22% of the muthi samples are currently threatened with extinction (Table 3), continual over-exploitation in the wild might eventually change the status for currently non-threatened species to threatened category. Therefore, there is an urgent need to conserve medicinal plants by cultivating them at home gardens (Setshogo and Mbereki 2011).
In conclusion, our analyses indicate that most of the information supplied by the sellers at the muthi market were correct. This could be due to the fact that we tested only 18 samples. Therefore, it remains possible that if we increase our sample size, we might detect important mismatch between the sellers' claims and the products sold. We also propose a continued effort to increase the barcode library of South African medicinal plants, and in case of difficulties due to degraded materials, a pyro-sequencing technique in tandem with mini-barcodes is necessary. Our suggestions and findings are expected to be of great use in limiting false identification that can harm public health.