Characterization and comparative analysis of the complete mitochondrial genome of Azygia hwangtsiyui Tsin, 1933 (Digenea), the first for a member of the family Azygiidae

Abstract Azygia hwangtsiyui (Trematoda, Azygiidae), a neglected parasite of predatory fishes, is little-known in terms of its molecular epidemiology, population ecology and phylogenetic study. In the present study, the complete mitochondrial genome of A. hwangtsiyui was sequenced and characterized: it is a 13,973 bp circular DNA molecule and encodes 36 genes (12 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes) as well as two non-coding regions. The A+T content of the A. hwangtsiyui mitogenome is 59.6% and displays a remarkable bias in nucleotide composition with a negative AT skew (–0.437) and a positive GC skew (0.408). Phylogenetic analysis based on concatenated amino acid sequences of twelve protein-coding genes reveals that A. hwangtsiyui is placed in a separate clade, suggesting that it has no close relationship with any other trematode family. This is the first characterization of the A. hwangtsiyui mitogenome, and the first reported mitogenome of the family Azygiidae. These novel datasets of the A. hwangtsiyui mt genome represent a meaningful resource for the development of mitochondrial markers for the identification, diagnostics, taxonomy, homology and phylogenetic relationships of trematodes.


Introduction
The genus Azygia Looss, 1899 is an endoparasitic helminth found in the stomach and intestine of freshwater feral carnivorous fish (Frolova and Shcherbina 1975). This genus includes several species complexes and its type species is Azygia lucii (Müller, 1776), which is a parasite of numerous, but especially esocid and percid, fishes in Europe. Many researchers have added to our knowledge of this cosmopolitan species. To date, species of Azygia are frequently reported from the esophagus, stomach, and intestine of a wide range of predatory fishes from Asia, Europe and North America, including China, Japan, India, Russia, Germany, and North America (Tubangui 1928;Tsin 1933;Moravec and Sey 1989;Marcogliese and Cone 1996;Besprozvannykh 2005;Jadhav et al. 2011;Pallewad et al. 2015;Womble et al. 2016;Nagasawa and Katahira 2017).
Azygia hwangtsiyui Tsin, 1933 is a member of the family Azygiidae Odhner, 1911 and is often overlooked; it is parasitic in the gastrointestinal tract of species of the family Channidae Fowler, 1934 but caused only slight clinical signs, including malnutrition and weight loss. In China, Azygia hwangtsiyui-infected freshwater predatory fishes have been described from Shandong, Heilongjiang, Jiangsu, Fujian, Sichuan and Hunan Provinces (Tsin 1933;Zmejev 1936;Ma 1958;Tang and Tang 1964;Kiang 1965;Chen 1973;Wang 1985;Cheng 2011). It has a mainly inland distribution and utilizes freshwater snail species (e.g. Vivipara quadrata (Benson, 1842)) as intermediate hosts (Tang and Tang 1964) and develops into adults in the gastrointestinal tract of predatory fish species such as Ophiocephalus argus Cantor, 1842 and Channa asiatica (Linnaeus, 1758) (Tsin 1933;Besprozvannykh 2005).
Morphology is the most commonly used method for species identification and differentiation of metazoans and is widely adopted globally by parasitologists and taxonomists. A huge disadvantage of using morphological criteria, however, is that it is difficult to identify and distinguish closely related and cryptic species. Although the family Azygiidae was erected more than a century ago, its situation, and that of several species of Azygia, is still controversial and uncertain. Manter (1926) pointed out that Azygia is the only genus in the family then presenting systematic confusion: Azygia longa (Leidy, 1851) in North America may be a synonym of A. lucii in Europe (Manter 1926), and Van Cleave and Mueller (1934) reported that Azygia acuminata Goldberger, 1911 and A. longa should be considered conspecific. Nevertheless, due to the discovery of some life histories of members of Azygia, A. lucii and A. longa have been recognized as two distinct species (Szidat 1932;Sillman 1953;Sillman 1962).
Mitochondrial (mt) genome and nuclear ribosomal DNA sequences are effective molecular tools for taxonomic identification, phylogeny and biogeographical research (Bernt et al. 2013;Le et al. 2019). However, only a partial cytochrome oxidase subunit 1 protein sequence (AIY67834) of Proterometra macrostoma Horsfall, 1933 (Azygiidae) is currently available in GenBank. None of the mitochondrial genome data have been sequenced for a member of the family Azygiidae. Therefore, we determined the complete mitochondrial genome sequence of A. hwangtsiyui as a basis for the future definition of strain-and species-specific markers, and for assessing mitogenomics in resolving the interrelationships of trematodes.

Sampling and DNA extraction
The specimens of flatworms were isolated from the stomach of their definitive host, in this case snakehead fish (Ophiocephalus argus (Cantor, 1842)) obtained from east Dongting Lake in Yueyang, Hunan province, China (29°22'N, 113°06'E). Azygia hwangtsiyui was morphologically identified according to the original and other descriptions (Tsin 1933;Tang and Tang 1964;Zhang et al. 1999;Besprozvannykh 2005), using a stereomicroscope and a light microscope. Furthermore, single samples were confirmed molecularly as A. hwangtsiyui based on sequencing of 1370 bp 28S rDNA sequence. The parasites were completely washed in water, preserved in 99% ethanol, and stored at 4 °C until genomic DNA extraction. Total genomic DNA extraction was performed from an intact specimen with the TIANamp Micro DNA Kit (Tiangen Biotech, Beijing, China), according to the manufacturer's instructions.

Mitogenome annotation and analysis
According to sequence chromatograms, all raw fragments were quality-proofed using CHROMAS (https://www.technelysium.com.au) to remove ambiguity codes and low-quality bases. Whenever the quality was sub-optimal, sequencing was repeated until the amplicon is the consensus sequence. Before manual assembly of the entire mitochondrial genomic sequence, identification of all amplicons was performed by BLASTN check (Altschul et al. 1990). The mt genome of A. hwangtsiyui was aligned against the mt genome sequences of other promulgated digenean mitogenomes utilizing multiple sequence alignment software MAFFT version 7.149 (Katoh and Standley 2013) to identify genetic boundary. Protein-coding genes (PCGs) were predicted with Open Reading Frame Finder (https://www.ncbi.nlm.nih.gov/orffinder/) adopting echinoderm and flatworm mitochondrial codes, and examining the nucleotide alignment against the reference mtDNA in trematode Dicrocoelium chinense Tang et Tang, 1978 (NC_025279.1). Whole tRNAs were inferred with the detection results of ARWEN (Laslett and Canback 2008) and MITOS web server (Bernt et al. 2013). Two rRNA (rrnL and rrnS) were founding by comparison with those of published fluke mitogenomes. Codon usage and relative synonymous codon usage (RSCU) for 12 PCGs of A. hwangtsiyui were computed by PHYLOSUITE (Zhang et al. 2020), and its operation results were imported into GGPLOT2 program (Wickham 2016) to make figures of the RSCU. Tandem repeats in the non-coding regions were determined with Tandem Repeats Finder software version 4.09 (Benson 1999), and the prediction of their secondary structures were performed by the MFOLD web server (Zuker 2003). The annular diagram of A. hwangtsiyui mitogenome was plotted with mitochondrial genome data visualization tool MTVIZ (http://pacosy.informatik.unileipzig.de/mtviz/mtviz).

Phylogenetic analysis
For phylogenetic analyses, we utilized translated and concatenated amino acid sequences of twelve protein-coding genes for 49 Platyhelminthes including A. hwangtsiyui mitogenome determined in this study. Two tapeworm species, Cloacotaenia megalops (Nitzsch in Creplin, 1829) (NC_032295.1) and Dibothriocephalus latus (Linnaeus, 1758) (NC_008945.1) were included as outgroup taxa representing two different families. Species information including systematic positions and GenBank accession numbers is provided in Suppl. material 2: Table S2. The PHYLOSUITE program was used to extract twelve PCGs from the GenBank files, export fasta files with translated amino acid datasets, and align datasets in bulk using integrated applet MAFFT with normalalignment mode. Phylogenetic analyses were performed using Bayesian Inference (BI) and Maximum Likelihood (ML) methods. Assessment of the best-fit evolutionary model for dataset was conducted via ModelGenerator v0.8527 (Keane et al. 2006). BI in MrBayes version 3.2.6 (Ronquist et al. 2012) was carried out under the MtRev matrix of amino acid substitution, and was analyzed with 1 × 10 7 metropolis-coupled Monte Carlo Markov Chain (MCMC) generations. Two independent runs with four simultaneous MCMC chains (one cold and three heated chains) were conducted for 1 × 10 7 million generations, sampling every 10,000 generations and discarding the initial 25% generations as burn-in. ML analysis in PHYLOSUITE was performed using MtART+I+G matrix with 1000 bootstrap replicates.

Codon usage, transfer RNAs, and ribosomal RNAs
For the A. hwangtsiyui mitogenome, codon ends in G or T were more continual than those ending in A or C. The most frequently used start codon in protein-coding genes was ATG (for six PCGs), secondly was GTG (for five PCGs), which resembles that of the most frequent extrapolated start codons for mitogenome protein-encoding genes of digenean species (Chen et al. 2016). The least-used start codon was TTG (only one PCGs). Likewise, the most-used terminal codon was TAG (for nine PCGs), followed by TAA (for two PCGs). Only one of 12 protein-coding sequence (nad5) was terminated with abbreviated T stop codon (Table 1). Although incomplete stop codons (T or TA) frequently occur in cestodes and nematodes, they were rarely presented in flukes other than D. chinensis, Dicrocoelium dendriticum (Rudolphi, 1819), and Postharmostomum commutatum (Dietz, 1858) (Liu et al. 2014b;Fu et al. 2019). The codon UUU (Phe, 10.17%), UUG (Leu, 8.17%), and GUU (Val, 7.19%) were the most frequently occurring codons in protein-coding genes. Leucine, valine, and phenylalanine are the most-used amino acids, with frequency of 15.96%, 13.38%, and 11.09%, respectively. The least-used codons were CGA (Arg, 0.06%) and GAC (Asp, 0.15%), and the least frequent utilized amino acid was glutamine (1.01%) (Suppl. material 5: Figure S2). As most of digenea mitogenome sequences, the mitogenome of A. hwangtsiyui possessed 22 commonly found tRNAs, with the exception of that of P. westermani Korean isolate (23 tRNAs), and P. westermani Indian isolate (24 tRNAs) (Biswal et al. 2014). In A. hwangtsiyui, tRNA-Gly (trnG) is located between NCR1 and NCR2 (Fig. 1). The size of ribosomal RNA genes (rrnL and rrnS) in mitochondrial DNA of A. hwangtsiyui are 975 bp and 740 bp, respectively ( Table 2). The upstream and downstream of rrnL and rrnS are cascaded with trnT and cox2 genes, respectively, and are detached from each other by trnC, as in all reported platyhelminths to date (Littlewood et al. 2006, Le et al. 2016.

Gene arrangement
Comparative analysis of gene arrangement among 47 selected digenean taxa, two gene blocks (cox1-trnT-rrnL-trnC-rrnS-cox2-nad6 and cytb-nad4L-nad4-trnQ) are shared  T  rrnL  C  rrnS  cox2  nad6  Y  L1  S2  N  I  F  atp6  nad2  A  L2  R  nad5  G  cox3  E  H  cytb  nad4L  nad4  Q  K  nad3  D  nad1  V  P  M  W  S1   cox1  T  rrnL  C  rrnS  cox2  nad6  Y  L1  S2  N  I  F  atp6  nad2  A  L2  R  nad5  G  cox3  E  H  cytb  nad4L  nad4  Q  K  nad3  D  nad1  V  P  M  W  S1   cox1  T  rrnL  C  rrnS  cox2  nad6  Y  L1  S2  R  nad5  G  cox3  E  H  cytb  nad4L  nad4  Q  F  M  atp6  nad2  A  D  nad1  N  P  I  by all selected taxa (Fig. 2). Disregarding P. heterotremus and members of the family Schistosomatidae and Fasciolidae Raillet, 1895, the gene order of the remaining digenea taxa is virtually identical with the exception of the translocation of trnE and trnG among the remaining members of selected digenea representatives in family level. Intriguingly, there is the translocation of trnE and trnG within different species of family Fasciolidae. The translocations of three tRNAs (trnS1, trnS2 and trnS) can be discovered even between taxa of the same subgroup. Gene order of the Brachycladium goliath (Van Beneden, 1858) mt genome (the only representative of family Brachycladiidae Faust, 1929) is nearly same as that of P. westermani (Troglotrematidae Ward, 1918) except for the relocations of trnY between trnG and cox3, and trnE to the position between trnN and trnP. The groups of Schistosomatidae show a massive gene reorganization of protein-coding genes and tRNAs compared with other sequenced digenea mitogenome, which is in accord with previous finding reported by Littlewood et al. (2006).

Mitogenome-derived phylogeny
To assess phylogenetic relationships among available flatworms, we utilized concatenated amino acid sequence dataset representing 12 protein-coding genes of A. hwangtsiyui, 46 other digenean representatives, and two tapeworm species (C. megalops and D. latus) for analyzing molecular-based phylogeny. In this study, the topological structure is divided into two large clades: one consists of seven members of the family Schis-tosomatidae; and the other clade comprises 40 members from 16 families including the family Azygiidae (A. hwangtsiyui) (Fig. 2). The topological structure shows that A. hwangtsiyui (Azygiidae) is identified as the most basal lineage of the Digenea, but separated from C. complanatum (Clinostomidae Lühe, 1901), and Cyathocotyle prussica Mühling, 1896(Cyathocotylidae Poche, 1926. Phylogenetic analyses of all complete digenea mtDNAs confirmed taxonomic and previous phylogenetic assessments Kostadinova and PéRez-del-Olmo 2014;Fu et al. 2019). The intricate structure and varying content of the family Azygiidae still awaits investigation of relationships based on a much wider taxon sampling and more mitogenome datasets.

Figure S1
Authors: Yuan-An Wu, Jin-Wei Gao, Xiao-Fei Cheng, Min Xie, Xi-Ping Yuan, Dong Liu, Rui Song Data type: molecular data Explanation note: Secondary structure of tandem repeats in non-coding region 2 (NCR2) of Azygia hwangtsiyui mitochondrial genome. Copyright notice: This dataset is made available under the Open Database License (http://opendatacommons.org/licenses/odbl/1.0/). The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited. Link: https://doi.org/10.3897/zookeys.945.49681.suppl4