Ebola virus is evolving but not changing: no evidence for functional change in EBOV from 1976 to the 2014 outbreak

The Ebola epidemic is having a devastating impact in West Africa. Sequencing of Ebola viruses from infected individuals has revealed extensive genetic variation, leading to speculation that the virus may be adapting to the human host and accounting for the scale of the 2014 outbreak. We show that so far there is no evidence for adaptation of EBOV to humans. We analyze the putatively functional changes associated with the current and previous Ebola outbreaks, and find no significant molecular changes. Observed amino acid replacements have minimal effect on protein structure, being neither stabilizing nor destabilizing. Replacements are not found in regions of the proteins associated with known functions and tend to occur in disordered regions. This observation indicates that the difference between the current and previous outbreaks is not due to the observed evolutionary change of the virus. Instead, epidemiological factors must be responsible for the unprecedented spread of EBOV.


Intoduction
The current Ebola epidemic in West Africa is characterized by an unprecedented number of infections, and has resulted in over 8,000 fatalities to date. A recent study using whole-genome sequencing methods 1 identified high levels of variation in the Ebola virus (EBOV), much of it unique to the 2014 outbreak. Specifically, 341 substitutions were found of which 35 are nonsynonymous, i.e., amino acid altering. These changes, coupled with those that occurred prior to the current outbreak, indicate that the virus is evolving rapidly within humans. This presence of the non-synonymous substitutions in the EBOV genomes has led to concern that functional adaptation of the virus to the human host has already occurred 1,2 , accounting for the unusual scale and severity of the current outbreak.
According to the neutral theory of molecular evolution, mutations may be subject to positive selection if they are beneficial, purifying selection if they are deleterious, or may evolve neutrally if the functional effect is neutral or nearly-neutral 3,4 . Gire et al suggest that the amino acid replacements observed in the 2014-15 EBOV population are due to incomplete purifying selection 1 . If this is the case, we would expect replacements to be deleterious and their presence in the population to be due to selection having insufficient time to remove them. By contrast, positive selection may arise if any replacement increases viral fitness. This may be in the form of replacements that stabilize the protein structure, changes that have a beneficial effect on molecular function, or that permit the virus to escape the immune system. Neutral evolution would occur if replacements have minimal effect on fitness. We expect such replacements would be found predominantly in regions of the proteins that are not associated with defined functions, (i.e., away from active sites and interaction sites with other molecules), have minimal effect on protein structure, either positive or negative, and not be in sites subject to selection from the immune system. Neutral evolution is the null hypothesis, and should be assumed in the absence of evidence for either positive or purifying selection.
We use the available sequence data to investigate whether there are deleterious, functional or adaptive change in the EBOV genome. EBOV is a negative stranded RNA virus with a genome comprising seven genes. One of these genes (GP) gives rise to two protein products via transcriptional editing 5 : a membrane-bound glycoprotein and a secreted protein. Additional functional diversity arises from vp40 existing in a number of conformational forms, which have different functions 6 . Protein structural data are available for at least part of six of the seven EBOV proteins, including three conformational forms for vp40 6-12 . These data allow us to investigate the structural and functional effects of amino-acid replacements in the virus.

Results
We first investigated whether there is evidence for positive selection acting in any of the seven EBOV genes, using all of the available data (1976 to present). We calculated the ratio between non-synonymous (dN) changes and synonymous changes (dS, non-amino acid altering). A dN/dS ratio close to 1 indicates lack of selection, <1 purifying selection, and >1 that positive selection (adaptive evolution) has taken place. For all seven genes we find few sites with evidence for positive selection (figure 1 and Suppl Table 1). However, except for one case, sites where dN/dS>1 have very low estimates of dS, suggesting that dN/dS ratios are unreliable in this context. For the exception, codon 430 in GP, the single-likelihood ancestor counting (SLAC) model estimates dN/dS to be 4.4. This replacement is not specific to the 2014 outbreak and in a region of the protein predicted to be intrinsically disordered. Disordered regions are known to be permissive for residue changes because they do not have well-defined structures and so are relatively unconstrained 13 . There are also sites with dN but no dS values such that the dN/dS ratio cannot be calculated. As the dN/dS ratio is a relatively conservative method for detecting functional change we computationally characterized all individual amino acid replacements.
For the amino acid replacements that fall within a region of known protein structure (figure 2) we assessed the likely functional effect of the substitution. To do this we used phylogenetic methods to reconstruct the most likely sequence of the most recent common ancestor and so the likely evolutionary trajectory. Firstly, we assess the goodness-of-fit of side chains in replacements in all proteins 14 . We find that all residue changes can be accommodated in the relevant protein structure in a low energy conformation ("rotamer") with no substantial van der Waals overlaps (figure 3A). Secondly we used an empirical potential to predict the likely structural effect of a residue change 15 . For all replacements that can be assessed the ΔΔG of folding associated with the amino-acid replacement has a small magnitude compared to the background distribution (figure 3B) indicating that the likely effect on protein structure is minimal. We conclude that all replacements are compatible with their protein structure, and are unlikely to either increase or decrease protein stability.
For the GP, VP30 and VP40 proteins, the available protein structures allow identification of protein-protein interaction interfaces, i.e., those regions of amino acid sequence responsible for virus to virus and virus to host binding. In total 247 interface residues were identified (from all three proteins, including multiple VP40 conformation forms 6 ). Overwhelmingly, replacements in EBOV sequences, whether from the 2014 or earlier outbreaks, are found at sites that do not comprise interaction interfaces. The single exceptions is a replacement of aspartate 47 in GP1 to glutamate, which is not specific to the 2014 outbreak 1 . Aspartate 47 makes an intra-chain salt bridge with lysine 588 in GP2. Structural modeling of this aspartate to glutamate replacement in GP1 indicates that there is a low energy conformation for glutamate that is able to make the same interaction with no van der Waals overlaps, suggesting this is a relatively minor change (figure 3C). Note, the crystallographic temperature factors are high for this part of the protein structure (many are >100), indicating that the exact positioning of atoms is uncertain. Overall, we conclude that there is no evidence the identified protein interaction interfaces are being disrupted or otherwise altered by the observed amino acid replacements.
As structures are not available for all regions we next investigated the similarity of properties associated with the replacement residues. We find all are conservative for both hydropathy and volume, indicated by the small magnitude of the changes relative to the background distribution (see figure 3 supplement). Larger magnitude alterations are predominantly found in regions of the proteins that are predicted to be intrinsically disordered which are permissive for residue changes. These changes are therefore unlikely to affect structure or function.
Interestingly, half of the amino-acid replacements (88/177) are in regions of proteins predicted to be intrinsically disordered, despite these regions constituting only 27% of the protein sequence (figure 2). In the GP protein the central disordered section corresponds to a mucin-like protein which is highly glycosylated 5 . Within disordered sequences such as mucin the general character of the amino acid residue is important for function, but, glycosylation sites aside, specific interactions are not made, allowing a range of different amino acids to contribute to the same functional role 16 . Disordered regions have been implicated in the formation of new protein interactions 17 and may permit viruses to explore novel host perturbations via "sticky" interactions with host proteins 13 , although predicted disorder is similar for all EBOV outbreaks (figure 4).
A number of experimental studies have identified specific residues that are important for various functions. These include 18 residues in GP that are important for viral entry [18][19][20] , four residues in VP30 that are required for nucleocapsid incorporation 8 , 23 residues in VP24 that are implicated in a range of protein-protein interactions 11 , four residues in VP40 that make direct contact with RNA 7 and a further two residues that are essential for budding 6 , and three residues in VP35 that are required for binding RNA 10 . None of these residues are replaced in any of the EBOV lineages sequenced to date, indicating that none of these functions are likely to have been altered, either positively or negatively.

Discussion
Collectively, the structural and functional data indicate that the observed amino acid replacements are not found in regions of the protein that directly contribute to known functions, and, with the exception of GP aspartate 47, are not in any identified binding interface. None are likely to either stabilize or destabilize the protein structure. The non-synonymous changes are physicochemically conservative, and predominantly cluster in regions predicted to be intrinsically disordered and comparing predictions of these disordered regions for the different EBOV outbreaks indicates this is not changing substantially.
With the exception of a small number of low-frequency frame-shifting intrahost single nucleotide variants, we find no evidence for deleterious mutations that would have been consistent with incomplete purifying selection, as postulated by Gire et al. 1 . We find no evidence of changes that are likely to be adaptive. This result is corroborated by analysis from a phylogenetic perspective 21 that similarly shows no evidence of positive selection in the 2014 outbreak . In addition we find no evidence of adaptive change in any of the EBOV sequences from past outbreaks. Although it is not possible to rule out functional change in the intergenic regions, we conclude that none of the non-synonymous substitutions observed to date are likely to affect protein structure or function in any way, be it positive or negative. Thus the null hypothesis of neutral evolution cannot be rejected, and is the most reasonable explanation of the observed sequence diversity.
The lack of functional distinction between the current and previous Ebola outbreaks emphasizes the importance of human-centric epidemiological factors over the molecular biology and evolution of the virus. The main factor that differs between the 2014 outbreak and those that have occurred previously is the establishment of infections in relatively densely populated areas compared with previous outbreaks, coupled with poor health facilities 22 . Human population growth and progressive urbanization have created efficient pathways for viral transmission 23 despite a comparably low basic reproductive number, R 0 22 .
Reconstructing the likely genomes of the most recent common ancestor of each outbreak demonstrates the high degree of similarity of the progenitor virus of each outbreak. This similarity points to a stable viral population in the animal reservoir, which is most likely to be fruit bats 23 . The zoonotic nature of EBOV and functional stability of the virus suggest that future transmission of similarly virulent potential are highly likely. Given the dense and highly connected nature of the human population, identification of the animal reservoir, surveillance and early intervention will be the key to prevention. The relative functional stability of EBOV suggests that intervention strategies such as with drugs or vaccinations may be more successful than for other RNA viruses.
Strategies for containing EBOV in West Africa 24 have been suggested, but are predicated on lack of adaptation of the virus. While evolution of a change in mode of transmission of EBOV is extremely unlikely, due to the highly specific and intricate nature by which viruses interact with their hosts. Of more concern is the possibility that the R 0 , the number of transmissions per infection, increases. Due to the unusually deadly nature of EBOV, a milder virus with even a substantial drop in virulence would still frequently result in deadly infections on an epidemic, let alone pandemic, scale. Despite observing no functional change between 1976 and mid-2014, future functional change is possible, emphasizing the need for continued monitoring of viral evolution.
The sequences were aligned and phylogenetic trees were estimated using the WAG substitution model 25 , implemented in RAXML 26 . Ancestral sequences of EBOV proteins was reconstructed using maximum likelihood, implemented by FastML 27 . Using ancestral reconstruction, the evolutionary pathway for every EBOV sequence in our data set was traced to the last common ancestor, and the sequence of every internal node was compared with that of its ancestor.
The energy change for all amino acid replacements that fall within regions of known structure was predicted using an empirical force field as implemented in FoldX software (version 3 Beta 6). Mutant structures were generated using the "build model" function in FoldX by mutating the native residue in the wild type with the 19 other possible amino acid residues for each position where a non-synonymous mutation was observed. Where the sequence of the native protein differed from that of the crystal structure, the structure was predicted using Modeller 28 .
Goodness-of-fit of replacement residues in the context of the protein structure was calculated using Probe 14 after addition of hydrogen atoms with Reduce 29 . Probe was also used to identify residues in interaction interfaces. In each case all low energy conformations ("rotamers") 30 were assessed, and the rotamer with the best Probe score used. For replacement valine 325 to isoleucine in VP35, we use the conformation of χ1=-45° χ2=-50°, which is not the bottom of the energy well, but is nevertheless a favorable side chain conformation.
The SLAC method as implemented by the Datamonkey webserver 31 (http://www.datamonkey.org) were used to estimate dN (nonsynonymous) and dS (synonymous) rates for each protein. This method estimates the dN and dS rates for each codon site and compare the observed rates with null expectations based on the used nucleotide substitution model.
Residue volumes were taken from the Zamyatnin 32 and hydropathy scales from Kyte and Doolittle, 33 . Disordered regions of proteins were predicted by DISOPRED 34