How does biological novelty come about? This was the question that I kept pondering as I attended the Santa Fe Institute’s Summer School of Complex Systems in New Mexico back in 2019. In the last year of my PhD and traveling from Scandinavia, I arrived immensely jet lagged and loaded up on questions related to protein evolution. Prior molecular research of evolutionary novelty has focused on nucleotide sequence features, and that is what I knew best. This approach has had great success in mapping processes that enable protein evolution, e.g. putative de novo gene emergence and extended open read frames (e.g. see the community blog posts by 1,2,3). However, no individual feature exists in a vacuum but in a context, in a network. In Santa Fe I met Brennan who introduced me to protein research that had applied network science. We successfully managed to recruit other evolutionary biologists—Mackenzie, Ashley—and other scientists—Keith, Anshuman, Ludvig, Laura—to discussions surrounding network science, protein evolution, and emergence of novelty. An interdisciplinary project emerged where we managed to conduct research across 7 time zones, 5 nations, and recurring lockdowns (and without any professors).
In our paper in Communications Biology (DOI: 10.1038/s42003-021-02867-8), we study the emergence of novelty on a systems level- not individual sequence contexts. Novel can be understood and defined as a fresh de novo sequence from previously non-coding DNA, or as an isoform- a variant and similar to an already existing protein in the system. More and more research appears to suggest that spurious and stochastic expressions of both RNA and proteins are frequent. In fact, nature does not appear to run short of novel input when inferred as spurious expression or mutations, and much of it appears as (near) neutral. It would appear as if the evolutionary challenge is not to ‘create’ or ‘invent’ novelty, but instead to retain and not lose novelty. Our question was, therefore, once a novel sequence has emerged, how and under what context is novelty retained in a biological system? To address this question, we look at protein-protein interactions (PPI) networks.
One of the prior molecular and evolutionary contexts for a protein to handle is its immediate protein-interaction partners. Protein interactions yield complex networks that allow us to address the possibility that topological network features influence the evolution of biological systems. In our work, we consider novelty to be the introduction of a new protein—a new “node”—in a PPI network, and we assume that this novel node is not harmful but neutral. This allowed us to conduct a series of computational experiments looking at structural changes that take place in PPI networks after the introduction of these novel nodes. In particular, we quantified the change in a network’s resilience after novel proteins have been incorporated into the network.
Network resilience—similar to the notion of biological redundancy and robustness—is defined here as tolerance towards perturbations. A resilient network can tolerate multiple perturbations, e.g. mutations that cause some loss of interactions, because high network redundancy implies that there are still plenty of connections despite demise in network connectedness. Previous work has quantified PPI network resilience by removing nodes and connections, and calculating the altered network resilience using a measure based on the Shannon index (see Figure 1) . We built on this work by quantifying network resilience after adding nodes, which simulates how PPI networks may incorporate novel nodes (see Figure 1). We assume that novel proteins can be integrated into existing PPI networks if they do not cause the network to become disconnected, and instead add to the network resilience. We call this the prospective resilience, and we study it using three different node addition mechanisms.
Gene expression attachment adds novel nodes to proteins annotated as highly expressed (e.g. the higher the gene expression the higher preferred attachment to novel proteins). Since gene expression influences the concentrations of proteins, and concentrations of proteins influence the likelihood of proteins coming across one another in the cell, we study how gene expression distributions may influence the PPI network topology and the effect this has on resilience. We compare this to two null models of attachment—random attachment where novel proteins have equal probability of forming interactions with all other proteins; and degree-based attachment—where the probability of a novel protein forming an interaction with another protein is proportional to the number of interactions (also called number of links) that that protein already has.
We find that gene expression-based attachment creates a more resilient network than either random or degree based attachment. Additionally we observe that network connectedness (or degree) correlates positively with gene expression to some extent. In practical terms these results imply that novel proteins are able to be integrated if they i) are interactive with many existing proteins, or ii) primarily interact with proteins that are more abundant (inferred by gene expression). However, we then wanted to understand the effect of spurious biological noise on this resilience. To simulate noise, a portion of nodes in the network were assigned randomized gene expression values. We then reran preferential attachment by gene expression and computed network resilience. We found that resilience increased even more after we introduced noise. This tells us the addition of biological noise enhances the incorporation of novel proteins into the PPI network. On the contrary, if one would keep adding nodes by expression based attachment without noise, the network’s connectivity would start to look the same as if generated by degree-based attachment. We already showed that degree-based attachment was less effective at retaining network resilience than gene expression-based attachment. So, noise enables change to the connectivity across the network that enhances the network resilience. This can be understood by the fact that noise - shuffling gene expression values - increases the chance that previously poorly connected clusters of nodes receive novel connections.
In sum, we suggest that biological noise induces novel structures in the PPI network which makes the network more resilient. Whether the resilience of PPI networks is a feature selected for per se rather than a consequence of noise is unclear. Resilience is not the only factor to consider in PPI network evolution, but it does offer us an informative approach for understanding how various PPI networks may tolerate perturbations. As we discuss in our paper, new nodes added to the network may initially be neutral but over time contribute to the network’s features and complexity. We have here offered a mathematically plausible scenario for how biological novelty emerges and may be retained, which may be complemented by considering further empirical data sources outlined in the paper. Since the project began, Laura has started her PhD in Oxford, Ashley has become a PI in San Antonio, Brennan and April became doctors and postdocs, Keith left Scotland for England, next to other unmentioned things. The future will tell how resilient we all are for the future’s perturbations.
Note: Also Dr. Keith Malcom Smith contributed to this blogpost.
Mar Alba, Mar 19, 2018. New proteins on the test track, http://go.nature.com/2IhMenM
Mar Alba, William Blevins. Identifying recently evolved genes in yeast, https://go.nature.com/36J5Saf
Andreas Lange, New proteins from nowhere -- how evolution shapes the structure and function of a newly emerged protein in flies, https://go.nature.com/3cJl7Cf
- Evolution of resilience in protein interactomes across the tree of life. Marinka Zitnik, Rok Sosič, Marcus W. Feldman, Jure Leskovec Proceedings of the National Academy of Sciences Mar 2019, 116 (10) 4426-4433; DOI: 10.1073/pnas.1818013116