It is estimated that there are 1.67 million viruses on earth and we only know of 4000. Recognizing viruses that are in animals, who are themselves in close contact with people, can help avert future pandemics if detection occurs before widespread human-to-human transmission. Our project pioneered virus discovery efforts across the globe and identified more than 900 novel viruses in wildlife species. These viruses were found in wildlife species that routinely are seen in close association with humans allowing for possible transmission of these viruses into humans. Hence, it is crucial to understand the risk these viruses pose to human health by understanding their ecological and evolutionary propensity to infect humans.
Ecological approach: getting clues from well-studied viruses and their hosts
Host-pathogen networks provide significant insights into the ecology of viruses and their hosts. These networks inform on host predilection of pathogens (viruses in our case) by identifying clusters in these networks. The number of connections and the connectedness of pathogens in networks can help us identify pathogens that can have wider host breadth and hence a higher risk of infecting humans. We compiled a database from online repositories and published literature to develop a network of 576 well-recognized zoonotic and non-zoonotic animal viruses. In this network, a virus represented a node (orange circle in figure 1), and these viruses were connected to each other if they shared the same host.
We employed machine learning models that use network characteristics to predict the connections between these viruses, i.e. nodes of the network, and also predict what taxonomic group those two viruses will share including humans. These predicting abilities of the models gave us insights into key quantifiable metrics that we can estimate for novel viruses to understand the zoonotic risk and host predilections.
Figure 1: The figure shows the modeling procedure and methods implemented in the study. Orange dots represent a known virus in the observed (Gc) and predicted networks (Gpredicted), and blue dots represent novel viruses in the predicted network (Gpredicted). Virus-host networks: Gc, represents a unipartite observed network of known zoonotic and non-zoonotic viruses with nodes representing viruses and edges representing shared hosts. Gpredicted represents the predicted unipartite network generated after predicting possible linkages between 531 novel viruses (blue) and known viruses. The node size is proportional to the betweenness centrality.
The behavior of novel viruses behave in the host-virus network and clues for understanding zoonotic risk:
Models generated predictions for 531 novel viruses, allowing us to estimate how well connected these viruses will be in the network, representing their host breadth and possible risk of zoonotic transmission.
Novel viruses are estimated to be more host-specific than well-recognized viruses. When it comes to their host predilections, we found that some novel viruses will form clusters by infecting similar types of host species, as predicted by our models. The information is crucial for guiding further surveillance of novel viruses especially to understand their ecology and transmission in wildlife. For example, let's look at model host predictions for novel coronaviruses that are found in bats, as shown in the figure below. The first cluster of novel coronaviruses in bats had a higher proportion of predicted species from the Miniopteridae family (Bent-winged bats) but none from Natalidae (Neotropical funnel-eared bats).
Figure 2: Surveillance targets for novel coronaviruses detected in bats based on predicted sharing of hosts with known viruses. The red color represents the cumulative probability with darker red color indicating a higher number of species occurrences from the taxonomical family.
Based on the predicted behavior in the network, we estimated that novel coronaviruses will have greater host plasticity compared to novel viruses from other families suggesting that they are more likely to be found in multiple animal species than other viruses. This is based on the higher predicted network degree (number of connections) and betweenness centrality (connectivity nodes)for those viruses, the network metric assessing connections.
Prioritizing novel viruses for further characterization:
Using model predictions we developed a prioritization score to understand the zoonotic risk of novel viruses. The prioritization score provides a data-driven tool to quantify the ecological and evolutionary trajectory towards zoonotic transmission for novel viruses, with higher scores indicating greater risk. Coronaviruses with high prioritization scores were detected in various bat species from the Phyllostomidae, Hipposideridae, Vespertilionidae, and Pteropodidae families. More surveillance efforts are needed for bat species found in South America and Southeast Asia within these families. PREDICT_CoV-78, which was detected in bats and rodents in Southeast Asia, also showed a high prioritization score. This is a rare detection of a novel virus shared across different taxonomic orders.
Previous studies have tried understanding the risk based on expert opinions. Our approach develops a more agnostic and data-driven metric to understand the risk of zoonotic spillover and prioritize viruses accordingly for further in-vitro and in-vivo characterization. These models will improve significantly in their ability to predict with additional data and inclusion of other predictive features including molecular characteristics. Given that our findings provide further evidence for the relationship between higher host plasticity and greater zoonotic potential, key future research directions would involve additional surveillance across a broad taxonomic range to gain insights on newly detected viruses to further inform on spillover risk. The prioritization of novel viruses for further characterization and Increased availability of ecological traits and genomic data resulting from these broad surveillance efforts will provide improved model predictions. As virus discovery surveillance programs explore the virome, we will be soon inundated with virus data. Understanding the risk from all these new findings will be crucial to ensure that future pandemics are prevented. Tools and approaches like these will pave the way to streamline our understanding of the risks these viruses pose to human and wildlife health.