Objective: To evaluate the evolution of the pathogen Mayaro virus, causing Mayaro fever (a mosquito-borne disease) and to perform selective pressure analysis and homology modelling. Methods: Nine different datasets were built, one for each protein (from protein C to non-structural protein 4) and the last one for the complete genome. Selective pressure and homology modelling analyses were applied. Results: Two main clades (A and B) were pointed in the maximum likelihood tree. The clade A included five Brazilian sequences sampled from 1955 to 2015. The Brazilian sequence sampled in 2014 significantly clustered with the Haitian sequence sampled in 2015. The clade B included the remaining 27 sequences sampled in the Central and Southern America from 1957 to 2013. Selective pressure analysis revealed several sites under episodic diversifying selection in envelope surface glycoprotein E1, non-structural protein 1 and non- structural protein 3 with a posterior probability P≤0.01. Homology modelling showed different sites modified by selective pressure and some protein-protein interaction sites at high interaction propensity. Conclusion: Maximum likelihood analysis confirmed the Mayaro virus previous circulation in Haiti and the successful spread to the Caribbean and USA. Selective pressure analysis revealed a strong presence of negatively selected sites, suggesting a probable purging of deleterious polymorphisms in functional genes. Homology model showed the position 31, under selective pressure, located in the edge of the ADP-ribose binding site predicting to possess a high potential of protein-protein interaction and suggesting the possible chance for a protective vaccine, thus preventing Mayaro virus urbanization as with Chikungunya virus.

Mayaro virus (MAYV), belongs to Togaviridae family, triggers a febrile arthralgia syndrome close to Dengue and Chikungunya fever. The virus isolated in Trinidad in 1954 is a mosquito-borne alphavirus reported in the South and Central America[1].

The genome is a single-strand RNA of 11.5 kb[2],[3]. It has two different open reading frames. The first one translates four non- structural proteins (NSP1–4), whereas the second open reading frame encodes a single polyprotein, subsequently being processed to five several proteins: capsid protein (C), two envelope surface glycoproteins (E1 and E2), and two small peptides (E3 and 6k)[4],[5].

MAYV causes Mayaro fever (MF), a neglected endemic syndrome of tropical Americas. After 7–12 d following a mosquito bite as incubation time, patients develop fever, rash, headache, and arthralgia that persist for several weeks[6]. The most prominent symptoms, represented by joint pain and edema, are typical of the acute phase of the disease but can persist chronically for several months[7],[8]. Even though the disease is not severe and none case of death has been reported, MF cause significant morbidity especially in rural population[9], in rare case also with hemorrhagic manifestations[10]. The disease can determine temporary incapacitation to work and hospitalization in some cases[1].

Two different genotypes have been described during MAYV epidemics, the D genotyped prevalently isolated in Brazil and the L genotype with a more wide distribution[1],[10]. Reproducing the path of transmission followed by Chikungunya virus (CHIKV), Dengue virus (DENV) and Zika virus (ZIKV) in the Western countries and Indian regions, MAYV has the potentiality to become a global pathogen, due to its transmission mediated by urban mosquitoes such as Aedes aegypti[11]. Similarly to CHIKV and ZIKV infection, MAYV epidemics can be undiagnosed during DENV outbreaks and incorrectly misdiagnosed as DENV infection in about 1% of cases[12]. Case definition is clinical definition plus laboratory diagnosis of MAYV[1].

MAYV identification is difficult to be achieved due to the relative short duration of the viremic phase[10]. The available treatment for MF is based on non-steroidal anti-inflammatory medications and chloroquine, and no vaccine or specific antiviral drugs are available at the moment[13]. For this reason, MAYV infection should represent an emerging public health threat, and improved surveillance and preventive measures should be provided to mitigate the already burden on health systems in Latin America so as to limit the spread of the virus infection.

In this study, the genetic diversity of MAYV has been investigated with the aim to estimate the phylogenetic relationships between MAYV strains circulating worldwide. Since MAYV is considered an emergent pathogen, evolutionary analyses have been performed to obtain a comprehensible overview positive-selection analyses and homology modelling has been also performed to evaluate the pathogen evolution and the virus evolution consequences in the protein recognition by host immune response. Homology modelling of MAYV proteins has been applied to allow mapping of sites under pressure and to put forward structural hypotheses about possible consequences of mutations on virus protein recognition by host immune response.

2. Materials and methods

2.1. Sequence dataset

Nine different datasets were assembled, eight for each protein [C, E1, E2, E3, non-structural protein (NSP) 1, NSP2, NSP3, and NSP4] and one for the complete genome, including sequences dated from the year 1955 to the year 2015. All the datasets contained 33 sequences for each protein and the complete genome. All the sequences and complete genome were downloaded by the availability from the NCBI database (http://www.ncbi.nlm.nih.gov/) and all the different genes were analyzed separately. All sequences were aligned using MAFFT software v.7 and manually edited using Bioedit software[14],[15]. Modeltest software version 3.7[16] was used to choose the best evolutionary model.

2.2. Phylogenetic signal and maximum likelihood analysis

The phylogenetic signal has been evaluated analyzing randomly 10 000 groups of four sequences (quartets), by the likelihood mapping method using TREE-PUZZLE[17]. The dots inside the equilater triangles correspond to the likelihood of the possible trees. In the triangle, the corners represent the trees with fully resolved topologies, the epicenter area represents the star-like phylogeny, and the areas on the sides correspond to the network-like phylogeny (presence of recombination or conflicting phylogenetic signals). For substitution/saturation analysis assessment, the Xia’s test was used with the transitions/transversions ratio vs. divergence graph in DAMBE [http://dambe.bio.uottawa.ca/DAMBE/]. The percentage of constant sites and parsimony-info sites were estimated using MEGA7.

The maximum likelihood tree was inferred with the evolutionary model previously selected with ModelTest, by using Phyml[18]. The phylogenetic tree was confirmed with the bootstrap analysis (bootstrap values >75%) as reliability test for the branches.

2.3. Selective pressure analysis

Adaptive Evolution Server (http://www.datamonkey.org/) was used to predict the ratio ( ω ) of non-synonymous to synonymous substitution rates for each codon, with ω >0 indicating diversifying or positive selection[19]. The following selection analyses were conducted according to the best-fit substitution model, also determined within the Server: adaptive branch-site random effects likelihood [aBSREL][20] model for the detection of lineage-specific selection, fast unconstrained Bayesian approximation [FUBAR][21] for inferring site-specific pervasive selection, Bayesian unrestricted test for episodic diversifying selection [BUSTED][21] across the region of interest, and the mixed effects model of evolution [MEME] to identify episodic selection at individual sites[21]. Sites were considered to have been subjected to statistically significant positive or negative selection based on the following cut-offs: likelihood- ratio test (LRT) P≤0.05 for BUSTED and aBSREL and posterior probability (PP)] >0.90 for FUBAR, and LRT≤0.05 [with P≤0.01] for MEME. The reference sequence used to trace the amino acid position found under selection was Accession Number: KX496990.1.

2.4. Homology modelling and structural analysis

Homology modeling relied on the software Modeller version 9.17[22]. Template identification was carried out with the programs Blast[23], HHPred[24] or Phyre2[25]. Suitable templates were considered those structures sharing the highest sequence similarity to the target protein, solved at the best resolution available, and with maximum sequence coverage. Sequence alignments were calculated with Clustal Omega[26] and were displayed or edited with the Jalview editor[27]. Ten models of each target protein were built at the highest refinement level and that one showing the lowest value of the Modeller target function, which is indicative of model quality, was considered the representative model. ProsaII[28] and Procheck[29] were utilized for model validation, while protein structure visualization and analysis were carried out with the graphic software PyMOL[30] or Chimera[31]. B-cell epitopes were predicted with a set of publicly available servers[32] listed in [Table 1]. Proteinprotein interaction site prediction used the meta-PPISP server.

As a first step, the phylogenetic signal for all the genomic regions of the MAYV (C, E1, E2, E3, NSP1, NSP2, NSP3, NSP4, and complete genome) by likelihood mapping [Figure 1]A,[Figure 1]B,[Figure 1]C,[Figure 1]D,[Figure 1]E,[Figure 1]F,[Figure 1]G,[Figure 1]H,[Figure 1]I was tested. The percentage of likelihood points in the central area of the triangles was 12.9% for the first dataset (Capsid gene), 7.5% for E1 gene, 3.4% for E2 gene, 19.9% for E3 gene, 3.8% for NSP1 gene, 3.1% for NSP2 gene, 7.0% for NPS3 gene, 9.4% for NSP4 gene, and 1.2% for the main dataset, complete genome region. All the datasets showed sufficient phylogenetic signal (30%). All the genetic signal values and phylogenetic information of the MAYV datasets are shown in [Table 2]. The percentage of the Parsimony-Info sites ranged from 1.74% (E2 dataset)-19.65% (E1 dataset); instead the percentage of the constant sites ranged from 73.62% (E1 dataset)–81.72% (NSP1 dataset).

Figure 1: Likelihood mapping of all datasets.Each dot represents the likelihoods of the three possible unrooted trees for a set of four sequences (quartets) selected randomly from the dataset: dots close to the corners or the sides represent, respectively, tree-like, or network-like phylogenetic signal in the data. The central area of the likelihood map represents starlike signal. The percentage of dots in the central area is given at the basis of each map.

In the maximum likelihood tree, two main clades (A and B) were highlighted [Figure 2]. The clade A included six sequences, sampled from 1955 to 2015, of which five from Brazil and one from Haiti. The Brazilian sequence sampled in 2014 clusters with the Haitian sequence sampled in 2015 with high bootstrap value (100%), suggesting a very close phylogenetic relationship. The clade B included the remaining 27 sequences sampled in the Central and Southern America (Bolivia, Brazil, French Guiana, Peru, Trinidad and Tobago and Venezuela) from 1957 to 2013. Inside the clade B, there is a statistically supported clade (B1) where one Peruvian sequence sampled in 2010 can be considered outgroup of the clade B1. Inside the clade B1, there are several statistically supported clusters, grouping mostly according the geographic location.

Figure 2: Maximum likelihood tree of Mayaro virus.° along the branches indicating a statistical value from bootstrap (>70%). The legend for the location classification is in the left corner of the figure.

Genetic variability was determined by nucleotide sequencing of fragments ranging from 198 nt for the E3 protein to 2 394 nt for the NSP2 protein. The α parameter of the γ distribution for all the proteins analyzed was <1, demonstrating L-shape characteristic of the distribution and proposing that rate heterogeneity of nucleotide substitution is across sites. Selective pressure analyses performed considering all the analyzed genomic regions (C, E1, E2, E3, NSP1, NSP2, and NSP4) highlighted eight sites (107, 215, 243, 11, 231, 497, 251, and 64) under strong negative selection (by using FUBAR) indicated by ω <0, suggesting high degree conservation in the genomic region analyzed [Table 2]. Using BUSTED, it has been highlighted episodic diversifying selection only in E3 and NSP4. No branch-specific episodic diversifying selection was found using aBSREL. Few sites with episodic diversifying selection were found in E1, NSP1 and NSP3 with LRT P≤ 0.01. There was one on each gene: amino acidic position 300 in E1, amino acidic position 242 in NSP1 and amino acidic position 31 in NSP3 (using MEME).

3.3. Homology modeling and structure analysis

Homology model of the MAYV E1 protein (GenBank code ANY58848) was built using the templates represented by the E1 envelope proteins from Semliki Forest virus (SFV) (PDB code 2ALA, resolution 3.0 Å) and CHIKV (PDB code 3N44, 2.35 Å). The sequence alignment on which the modeling was based is displayed in [Figure 3]. MAYV E1 shares 76% and 60% sequence identity with SFV and CHIKV proteins, respectively. [Figure 4] reports the superposition between the best MAYV E1 model and the structures of the template proteins, and it also shows the location of sequence position 300 which was found to be under selective pressure. This site is located in the domain III of the E1 protein within a β -strand of the immunoglobulin-like fold, before a conserved Cys involved in a disulfide bond. A multiple sequence alignment of a set of 58 homologous E1 proteins from various alphaviruses shows that position 300 can hold Ser (with frequency 58%), Glu (18%), Lys (12%), Thr (5%), Leu (3%) or Ala (1%). Interestingly, MAYV E1 displays a two-residue deletion in correspondence of positions 344–345 of the SFV and CHIKV E1 proteins [Figure 3]. This deletion occurs within a β -strand of the β -barrel of the immunoglobulin fold and may induce a partial local conformational rearrangement of the MAYV E1 loop encompassed by the sequence positions 346–349 of domain III. Epitope prediction [Table 2] of protein E1 variants which possess Thr or Leu at position 300 was carried out using 11 methods listed in [Table 1][32]. Epitope prediction is quite inaccurate and for that reason a consensus approach is generally recommended[32]. Overall, although no clear consensus is evident, position 300 is predicted to be part of a region possessing potential epitope propensity. In fact, the 5th and 6th methods out of the 11 prediction methods employed indicate the occurrence of a potential epitope with Leu or Thr at position 300, respectively [Table 2]. Since the region bearing Thr is predicted as a potential epitope by six methods, it may be conceived that the occurrence of this residue can attribute a marginally higher epitope propensity. No suitable template was found for the protein NSP1 and for that reason no homology model could be built in our study. Modelling of the macrodomain of MAYV NSP3 protein (Genbank code ALJ56198.1) utilized the following templates: a fragment of the NSP from Sindbis virus (PDB code 4GUA, resolution 2.85 Å); macrodomain from CHIKV (3GPG, 1.65 Å) and macrodomain of venezuelan equine encephalitis virus in complex with ADP-ribose (3GQO, 2.60 Å). The complex between MAYV NSP3 and ADP-ribose was predicted following the geometry of the complex reported in 3GQO. MAYV NSP3 shares 52%, 66% and 56% sequence identity with 4GUA, 3GPG, and 3GQO, respectively. According to the homology model, sequence position 31 of the MAYV NSP3 is predicted at the mouth of the macrodomain binding site of the ADP-ribose [Figure 5] in proximity of residues (for example Asn24 and Val33) considered to be important for ADP-ribosylhydrolase activity in the homologous CHIKV NSP3 macrodomain[44]. Prediction of protein-protein interaction sites attributed position 31 a high interaction propensity [Figure 5].

Figure 4: Structural superposition among the MAYV E1 homology model (orange cartoon) and the homologous proteins from Semliki Forest virus (grey) and Chikungunya virus (green).Cyan subunit is the E2 protein from Chikungunya virus. Side chain of Leu at positions 300 of MAYV E1 is reported as red stick model. Disulfide bonds are displayed as yellow sticks. Red areas in the β -strands of the Semliki Forest and Chikungunya virus E1 proteins denote the two-residue insertions found with respect to the MAYV E1 sequence.

Figure 5: Prediction of structural protein position.[A] Structural superposition between the homology model of the MAYV NSP3 macrodomain (light cyan cartoon 0 and the template macrodomain from Venezuelan Equine Encephalitis virus [orange]. The ligand ADP-ribose is depicted by yellow sticks. Arrow indicates the position 31 occupied by an Asp residue. [B] Surface of the MAYV macrodomain colored according to the protein-protein interaction potential using a scale ranging from blu (low potential) to red (high potential). Arrow marks the position 31. ADP-ribose is displayed as in panel A.

MAYV has been reported in several tropical countries in the Central and South America, as Brazil[1],[45]. MAYV has the potentiality to become a global pathogen, being its transmission mediated by urban mosquitoes such as Aedes aegypti, as happened in the recent past for CHIKV[46],[47]. This analogy can also be supported by similar clinical symptoms. Indeed the illness, which can cause chills, malaise, headache, stomach pains and joint pains, is transmitted by that same vector Aedes aegypti mosquito which transmits ZIKV, DENV, yellow fever virus and CHIKV. As recently reported in CHIKV and ZIKV epidemics, MAYV outbreaks can pass undiagnostigsate during DENV epidemics[46],[48]. Indeed the clinical symptoms of Mayaro are so similar to Chikungunya and several other ailments that it is easy to be misdiagnosed. Moreover dual infections from these viruses can further complicate accurate diagnosis. MAYV is the one of the long list of recently emerging viral pathogens that has received global attention due to various outbreaks in tropical areas and its rapid spread in several countries. Maximum likelihood analysis confirmed the MAYV previous presence in Haiti and successful spread to areas such as the Caribbean and USA[49].

Selective pressure analysis performed in this study revealed a strong presence of negatively selected sites in all genomic regions analyzed, suggesting a probable exclusion of deadly polymorphisms in functionally genes. At the same time, this analysis highlighted a few sites with episodic diversifying selection in E1, NSP1 and NSP3 with LRT P≤0.01 located in the amino acidic position 300 in E1, amino acidic position 242 in NSP1 and amino acidic position 31 in NSP3.

The contemporary lack of presence of high percentage of positively selected sites also suggests and enforces the idea of highly adapted phenotypes. Probably, the alternation cycle between arthropod vector and human and no human hosts may impose a sort of barrier to non- synonymous mutations in important genes. Amino acid conservancy within MAYV sequences can be considered in combination with potential immune escape amino acid mutations. The homology modeling analysis showed localized high mutation frequency in the structural and non-structural proteins (NS3 and NS5) corresponding to a considerable alteration in the protein stability.

The position of the site 300 under positive selective pressure has been mapped onto the predicted three-dimensional structure of the MAYV E1 protein. In particular, model inspection indicates that this site is located in the domain III of the E1 protein within a β -strand of the immunoglobulin-like fold, before a conserved Cys involved in a disulfide bond[50],[51]. In the E1 envelope protein of Sindbis virus, belonging to the Alphavirus genus, the position equivalent to the MAYV E1 300 was proven to be part of a set of transitional epitopes that become accessible only after virus particle exposures to agents, such as heat, low pH and the like[50],[51]. Change in epitope accessibility was observed also upon virus particle exposure to susceptible cells, which suggests that a conformational modification of the envelope proteins takes place during cell binding. Moreover, the structure site of the SFV E1 protein equivalent to the MAYV Leu 300 is involved in E1-E1 contact within the virus protein shell[52]. The high similarity between the MAYV E1 and the homologous protein sequences from other alphaviruses suggests that the position 300 may have properties similar to those attributed in the other viruses. Moreover, within the limits of intrinsic inaccuracy, B-cell epitope prediction suggests that the region surrounding the position 300 might have an immunogenic potential. Interestingly, Thr at position 300 apparently may attribute a higher epitope potential than the presence of Leu at the same position. A multiple sequence alignment of a set of 58 homologous E1 proteins from various alphaviruses shows that position 300 can in other viruses hold other residues mostly polar such as Ser (with frequency 58%). Furthermore, MAYV E1 displays a two-residue deletion in correspondence of positions 344–345 of the SFV and CHIKV E1 proteins. This deletion occurs within a β -strand of the β -barrel of the immunoglobulin fold and may induce a partial local conformational rearrangement of the MAYV E1 loop encompassed by the sequence positions 346–349 of domain III. All these observations hint at the potential existence of an ongoing fine process of MAYV E1 protein adaptation to cope with host immunological surveillance.

MAYV protein NSP3 contains a N-terminal domain homologous to the highly conserved macrodomain[53], which, in CHIKV, has been demonstrated to possess ADP-ribosylhydrolase activity essential for virus replication and virulence[44]. According to the homology model, the position 31 found to be under selective pressure is located in the edge of the ADP-ribose binding site and is predicted to possess a high potential of protein-protein interaction. It may be conceived that this position is involved in the recognition of the intracellular targets of the MAYV macrodomain and consequently is subjected to functional constraints. Currently, no licensed vaccines are available for MF, and the only control strategy is based on human exposure to potential vectors reduction. Two attempts of vaccine are described in literature, one in human diploid cells and the other in murine models[54]. A protective vaccine could decrease the potential risk of MAYV urbanization through adaptat–ve mutation that enhance the vectorial ability of transmission, as happened with CHIKV[55] and could be of great impact on public health.

MAYV has many undiscovered features of concern, and since it was first isolated in Trinidad in 1954, it has been linked with limited epidemics in Southern America[56] near the Amazon rainforest. MAYV infection syndrome includes a group of symptoms such as arthralgias, fever, headache, myalgias, rash, and sporadically nausea and vomiting[57]. MAYV infections are underdiagnosed because of a possible misleading with other mosquito-borne virus infections, especially dengue fever, endemic in common areas. Recently, CHIKV has added disorientation in the diagnosis, especially due to the prolonged arthralgia connected with CHIKV and MAYV cases[56]. Our findings confirmed that MAYV is spreading in Brazil since its isolation in the 1950s[49]. A serious concern arises from the easy interaction and proclivity for MAYV/DENV co-infections. That phenomenon should be the subject of careful studies and include other arboviruses from the same geographic regions to clarify the nature of the MAYV epidemic or other similar pathogens. In addition, in light of these considerations, conducting molecular and evolutionary studies are important to reveal the potential abilities of this virus to trigger a significant epidemic, and help to elucidate the MAYV way of transmission by Aedes and Haemagogus spp. mosquitoes.