Phylogenetic relationships of Malaysia’s long-tailed macaques, Macaca fascicularis, based on cytochrome b sequences

Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.


Introduction
Macaca fascicularis (Raffles, 1821) is also known as long-tailed, crab-eating or cynomolgus macaque. This species is well distributed in the countries of Malaysia, Brunei, Bangladesh, Cambodia, Nicobar Islands, Indonesia, Lao PDR, Myanmar, Philippines, Singapore, Thailand, Timor-Leste and Vietnam ( Figure 1) (Gumert et al. 2011). There appears to be a hybrid zone between M. fascicularis and M. mulatta (Zimmermann, 1780) in the northern range above mainland Southeast Asia, which makes it difficult to determine the northern distribution limit of M. fascicularis (Fooden, 1996). The distribution of long-tailed macaques was extended to the Pacific Ocean (Palau) (Crombie and Pregill 1999), Indian Ocean (Mauritius) (Trask et al. 2013) and New Guinea (Kemp and Burnett 2003) due to human-mediated introduction of the species to these respective regions in the recent past.
Despite the abundance of genetic studies on or including M. fascicularis, the phylogenetic relationships among Malaysia's long-tailed macaques remain uncertain. However, these phylogenetic data are crucial for conservation management of M. fascicularis as this species is reported as a pest in human settlement areas (Md-Zain et al. 2010b;. For example, M. fascicularis engages frequently in crop-raiding activities, and these behaviors are often reinforced by humans that feed these macaques either directly or indirectly, which leads to unintentional habituation of the species. The annual report by the Department of Wildlife and National Parks (PERHILITAN) (2010) indicated that M. fascicularis is at the top of the human-wildlife conflict species case list. From a recorded 9,286 complaints of wildlife disturbance from various species, complaints on M. fascicularis disturbance were the highest, with 5,930 complaints (63.86%). The phylogenetic relationships of Malaysia's M. fascicularis data are crucial in planning and executing the translocation process of this species in the future, which is one of the major actions in the conservation management and human-wildlife conflict management of the species. By understanding the phylogenetic relationships of Malaysia's crab-eating macaque, the plan for translocation the species can finally be carried out without a risk of losing important genetic diversity or even unique genetic lineages of the species. Phylogenetic studies at the subspecies level are very scarce (Rosli et al. 2014), as many primatologists are making the species group classifications the focal point in their studies. Both Vun et al. (2011) and Blancher et al. (2008) have successfully explained the phylogenetic relationships of Presbytis and M. fascicularis, respectively, at population level using Cyt b sequences, proving that Cyt b is a suitable locus for population level studies of Cercopithecidae. Thus, the objectives of this research are to determine the phylogenetic relationships of Malaysia's M. fascicularis using mitochondrial Cyt b sequences and to test genetically the subspecies classifications as made by Medway (1969), Groves (2001) and Brandon-Jones et al. (2004) based on morphological characteristics.

DNA extraction, polymerase chain reaction (PCR) and sequencing
Altogether, 27 genetic samples (Table 1 and Figure 2) were used in this research. These samples were provided by the Department of Wildlife and National Parks (PERHILI-TAN) and Sabah Parks. The samples derive mainly from feces collected in the original habitats of M. fascicularis. In addition, blood and tissue samples collected from a roadkill specimen of M. fascicularis were also used in this study. Mitochondrial DNA (mtDNA) was extracted from each genetic sample using QIAGEN DNeasy Blood and Tissue Kit, following the manufacturer's protocol. A mtDNA genome from FTA (fast technology for analysis of nucleic acids) sample was extracted using the WHAT-MAN ® GenSolve Recovery Kit, also following the manufacturer's protocol. DNA was extracted from 0.5 g -1.0 g of fecal sample using innuPREP Stool DNA kit (Analytik Jena) following the manufacturer's protocol.  Polymerase Chain Reaction (PCR) was employed in order to amplify the targeted locus in the mtDNA genome, which is a partial fragment of the Cyt b gene, by using Mastercycler® nexus (Eppendorf North America, Inc.). PCR was performed by using Phusion TM Flash High-Fidelity PCR Master Mix (Finnzymes, OY), which has extreme speed (extension times of 15 s/kb or less), high accuracy (proofreads DNA polymerase with a fidelity of 25 X Taq polymerase) and a very high yield in reduced times. Primers used in this study were (L14724) 5'-CGAAGCTTGATATGAAAAACCATCGTTG -3' (Pääbo and Wilson 1988) and (H15149) 5'-AAACTGCAGCCCCTCAGAAT-GATATTTGTCCTCA -3' (Kocher et al. 1989). PCR reactions were carried out under the following parameters: 98°C initial denaturation for 30 seconds, followed by 30 cycles of 98°C denaturation for 10 seconds, 55°C of annealing for 30 seconds, 72°C extension for 30 seconds and the final extension stage at 72°C for 10 minutes. The PCR product was purified using the Vivantis G-F1 PCR Clean-up Kit, and the purified PCR products were sent to the 1 st Base Laboratories Sdn Bhd (Malaysia) for sequencing.

Sequence and phylogenetic analysis
Sequence results obtained from the 1 st Base Laboratories Sdn Bhd were proofread and edited using Bioedit Sequence Alignment Editor, and the Sequence Similarity searches were performed using GenBank BLASTn application to validate the DNA sequences obtained. DNA sequences were submitted to GenBank under accession number KJ592589-KJ592594. Bioedit's ClustalW multiple alignment algorithm was then used to align the sequence results, and sequence analysis and phylogenetic analysis were performed. DNASP 4.0 (Rozas et al. 2003), PAUP 4.0b10 (Swofford 2002) and MEGA version 4.0 (Tamura et al. 2007) software were used for sequence analysis to determine nucleotide diversity (π) and net nucleotide divergence (Da); genetic distance and single nucleotide polymorphisms (SNPs) of the sequences/datasets respectively.
Three methods of phylogenetic tree reconstructions were carried out; the distancebased method (neighbor joining, NJ) using MEGA version 4.0 (Tamura et al. 2007); the character-based method (maximum parsimony, MP) using Phylogenetic Analysis Using Parsimony (PAUP) version 4.0b10 (Swofford 2002) and Bayesian inference using MrBayes 3.1 (Huelsenbeck and Ronquist 2001). The Kimura-2-Parameter model was selected for NJ phylogenetic reconstructions. MP phylogenetic tree was carried out with heuristic search methods and 1000 random stepwise addition with the application of a 50% consensus-majority rule concept (Swofford 2002). In the MP analysis, each transition and transversion was calculated on average. The MP phylogenetic tree was constructed using a tree bisection and reconnection (TBR) algorithm, and all the trees constructed underwent 1000 bootstrap replications to obtain the bootstrap confidence level.
Modeltest version 3.7 software (Posada and Crandall 1998) was used to select the best substitution model for the partial Cyt b sequences using Akaike Information Criterion (AIC). The best substitution model was applied in the Bayesian analysis using MrBayes 3.1.2. software. The most suitable model that fit the data was the HKY+G model with a gamma shape parameter of 0.5455 and base frequencies of 0.2866 for A, 0.3171 for C, 0.1266 for G and 0.2696 for T. We ran Metropolis-coupled Markov Chain Monte Carlo (MCMC) with 300000 generations, with 0.008214 Split Fre-quencies Probability (P) and tree was sampled every 10 generations. The first 25% of the trees obtained in the analysis was discarded as burn-in (7500 trees discarded from total 30000 trees), a majority-rule consensus of remaining trees was constructed and posterior probabilities (PP) were summarized for each branch.

Comparison of the cytochrome b sequence of Malaysia's M. fascicularis with Gen-Bank's sequence
Partial sequences of Cyt b locus in size of 383 bp were successfully sequenced for all 27 genetic samples ( Table 1). The first analysis conducted on the sequences consisted of sequence similarity searches using the GenBank application to validate each sequence obtained was from the correct taxon of the acquired samples and to avoid encountering the data problem of nuclear insertion. All genetic samples matched the target species sequences in GenBank with samples FJ906803.1 (M. fascicularis complete genome) and corresponded to most of the ingroup samples with average query cover and maximum identities scores at 95% and 97%, respectively.

Sequence polymorphism, genetic distance and nucleotide diversity
A total of 27 genetic sequences in a size of 383 bp of Cyt b locus yielded 78 (20.37%) variable sites, of which 34 sites were parsimony informative characters (8.88%). Interestingly, when outgroup samples (M. nemestrina and M. arctoides) were excluded from the analysis, only 23 (6%) variable sites were detected, all of which were parsimony informative characters. From these 23 informative characters, 13 were generated by the inclusion of Borneo samples in the analysis; whereas, if the Borneo samples were excluded, only 10 (2.6%) parsimony informative characters were detected.
Pairwise genetic distances of Cyt b partial sequences were calculated with PAUP 4.0b10 (Swofford 2002) using the Kimura-2-Parameter model ( Table 2). The genetic distance of samples originating from Borneo and Peninsula Malaysia showed a minimum genetic distance of 0.04068 (BM95) and a maximum value as high as 0.049 (ALMFA62, ALMFA63, ALMFA64 and ALMFA65), which is the highest value within the M. fascicularis genetic distance analysis. The genetic distance of samples originating from the East Coast of Peninsula Malaysia and West Coast of Peninsula Malaysia (excluding samples from Borneo) showed the minimum value of genetic distance was 0.008 (BM95), and the maximum value was 0.016 (ALMFA62, ALMFA63, ALMFA64 and ALMFA65). The separation of the island population of M. fascicularis and the mainland of Peninsula Malaysia populations (excluding samples from Borneo) showed a minimum value of genetic distance of 0.008 (MF719, MF720, MF721 and MF722) and a maximum value of genetic distance of 0.016 (MF135, MF136, MF137 and MF138 and M1).  Nucleotide diversity (π) and net nucleotide divergence (Da) were also calculated for the Cyt b sequence obtained using DnaSP v4.0. Two separate analyses for π and Da were conducted based on the origin of samples, by first sorting the sequences (excluding outgroup) according to their locality (states) ( Table 3) and then according to regions (Table 4). The first analysis (Table 3) portrayed that π was the highest between Sabah and other states ranging from 0.019 to 0.025, and the results are consistent with Da, ranging from 0.037 to 0.044. π. The lowest values, 0 for π and Da, were found between Negeri Sembilan and Selangor. The second analysis (Table 4)

Neighbor joining
The NJ phylogeny tree (Figure 3) was generated using Kimura-2-Parameter with 1000 bootstrap replication. The NJ phylogenetic tree showed that samples originating from Borneo remain monophyletic from samples originating from Peninsula Malaysia, supported with 99% bootstrap value. Samples from Peninsula Malaysia were divided into 2 clades; clade A and Clade B. Clade A portrays the separation of samples originating from the East Coast of Peninsula Malaysia from the remaining samples with 64% bootstrap value. Clade B on the other hand is the assemblage of populations of the West Coast of Peninsula Malaysia, supported by 90% bootstrap value. Within Clade B, two further clades were defined, namely island population (Pulau Pinang) and mainland populations (Perak, Negeri Sembilan, Johor, and Selangor) supported by 93% and   (Kujoth et al. 2005). Cyt b is a functional gene that contains both redox centers, Q 0 and Q i , involved in electron transfer (Hatefi 1985;Howell and Gilbert 1988). Thus, even a single point mutation that occurs in this region will probably have a deleterious effect on the functional response of the locus. Macaca fascicularis populations in Malaysia are evidently divided into 2 main clusters: the Borneo populations and Peninsula Malaysia populations. Thirteen parsimony informative characters were detected between these populations from a total of 23 parsimony informative characters. Pairwise genetic distance analysis indicated that the genetic distance between these 2 populations was the highest as compared to other populations in Malaysia with a value of 0.041-0.049, aside from monophyletic states of both populations in all phylogenetic trees. These findings are highly anticipated, considering the vicariance theory on population disjunction as the central thesis. In this scenario, the Borneo populations and Peninsula Malaysia populations are separated by the South China Sea. This separation more than likely caused the interruption of gene flow between both populations, causing them to accrue the major genetic differences observed in this study.
Populations of M. fascicularis in Peninsula Malaysia also form 2 further subgroups: east and west. The population from Kelantan forms a single clade in all phylogenetic trees and genetic analysis in comparison to the rest of the Peninsula Malaysia populations. Thus, the Kelantan population likely represents a unique lineage as compared to other populations from Peninsula Malaysia. Results obtained by Vun et al. (2011) that used the same primers as in this research to study the phylogenetic relationships of genus Presbytis in Malaysia provide a suitable comparison. Vun et al. (2011) obtained a pairwise genetic distance (Kimura-2-Parameter) between Presbytis melalophos robinsoni and Presbytis melalophos siamensis as low as 0.058, and each of these subspecies has uniquely distinct morphological characteristics. In comparison, the pairwise genetic distance observed here between M. fascicularis populations of East and West coast of Peninsula Malaysia are 0.016; however, there are no documented morphological characteristics that differentiate these populations.
The observed genetic differences between populations of M. fascicularis are best viewed as subspecies separations. M. f. laeta (Elliot, 1909) was recorded by Medway (1969) as common on Tioman Island (the East Coast of Peninsula Malaysia), but a survey conducted by the Department of Wildlife and Nature Park, Malaysia, PER-HILITAN (1995) proved otherwise. This particular subspecies was observed based on morphological characters on the mainland East Coast of Peninsula Malaysia by PERHILITAN (1995) as well as Weitzel et al. (1988), whose study also acknowledged the distribution of M. f. laeta in the East Coast of Peninsula Malaysia. Raven (1935), however, acknowledges M. f. argentimembris (Kloss, 1911)  The results of this study are directly relevant to conservation management strategies of M. fascicularis in Malaysia, particularly in terms of the translocation process. The human-macaque conflict has intensified in Malaysia due to habitat degradation and habituation of the species. It is important to recognize the genetic diversity between populations of M. fascicularis so we are able to conserve the unique evolutionary lineages of the species. Thus, PERHILITAN can translocate the target populations of M. fascicularis within the same gene pool of the other populations (the Borneo-Peninsula; east-west; mainland-island and north-south populations). This translocation is important to avoid the loss of genetic diversity that cannot be recovered, or, in extreme cases, can result in hybridization of populations and potential outbreeding depression of the populations (DeSalle and Amato 2004).

Conclusion
The phylogenetic relationships of Malaysia's long-tailed macaques, M. fascicularis, are crucial, as they are at a center of the human-wildlife crisis in the field of primatology, taxonomy and conservation biology. This research has shown that populations of M. fascicularis originating from Borneo and Peninsula Malaysia are distinguishable from each other based on genetic data. Within Peninsula Malaysia, further division occurs between the East Coast and West Coast of Peninsula Malaysia; mainland and island and Northern and Southern Peninsula populations. These data can aid the conservation management planning in terms of translocation of the target pest populations within similar gene pools. Further mtDNA and nuclear DNA studies of M. fascicularis at the population level are required to increase the number of individuals from each locality and increase the number of geographical locations to represent clades (east-west and Peninsula-Borneo), in which biogeographical factors can also be taken into account.