What is the strength of selection on gene expression?
Selection is a tricky thing. While Charles Darwin enshrined natural selection as the dominant force behind evolution, it was difficult to study. This led to confusion in the field, as many phenotypes were looked at through the prism of selection and adaptation, and yet there were few easy and rigorous ways to measure selection or even show it was really there.
In 1983, Russ Lande and Steven Arnold came out with a technique – phenotypic selection analysis – that provided a nice way to measure selection. Their method was a multiple regression of phenotypes on fitness, and the slope of co-variation of traits with fitness gave the strength of selection.
Figure 1. A drone shot of the experiment in the rice fields of the International Rice Research Institute.
This approach proved popular. Thousands of studies were undertaken and selection was measured across traits and organisms that evolutionary ecologists could get their hands on. Most selection on traits was found to be weak, but the distribution had a long tail with a few traits exhibiting strong selection.
It was in 2000 when I first encountered the technique of phenotypic selection analysis. By that time the method was nearly 2 decades old, but as a molecular evolutionist it was new to me. I began to wonder – could we measure the strength of selection on gene expression? At that time, however, applying phenotypic selection analysis to gene expression would have been costly and difficult. So I shelved the idea, but I kept it in the back of my mind.
In 2015, I felt that the time had come to try. In the intervening years I had expanded my research into genomics and systems biology, moved to New York University and re-focused much of my work towards studying domestication. Domesticated species had always intrigued me, as they – like they did with Darwin – offered insights into the workings of the evolutionary process. Plus, in a world faced with climate change and sustainability issues, studying a crop species offered the possibility that the work could also have societal impact.
A postdoc, Niels Groen, had joined my lab and I also recruited Steve Franks from nearby Fordham University who could collaborate with us. He, in turn, recruited Irina Calic, who had experience working with large genome datasets. To grow the plants and get the phenotype information, I turned to the International Rice Research Institute in my home country, the Philippines. There, we worked with two great collaborators – Gina Vergara and Amelia Henry - who ensured that the large phenotyping effort would happen. Together with other colleagues in our labs, we set out to measure selection on gene expression.
Figure 2. Sampling rice leaves for gene expression measurements.
We began our experiment in early 2016. We planted 220 rice varieties in replicate, mostly traditional landraces, and counted more than 4 million seed (by hand!) as our proxy for fitness. We also collected other key phenotypes, like flowering time, leaf area and chlorophyll content, and we did it in both normal and drought conditions. For measuring RNA expression in rice leaves, we initially thought to look at just 50 genes and use Nanostring technology, but then Rahul Satija joined the biology faculty of NYU. Rahul had developed technologies for single-cell transcriptomics that we could tweak to massively multiplex and automate RNA-Seq library construction. He gladly collaborated with us, and adapted the single-cell transcriptome protocol which allowed us to construct 1,320 RNA-Seq libraries at little cost and in 2 weeks.
With this, we were able to finally do the work I had thought up nearly 20 years ago, but at a more massive scale. In the end, we measured selection differentials for more than 15,000 genes, measuring gene expression in two field environments in more than 1,300 plants. We showed that directional selection on the expression of individual genes are generally weak, and the distribution looks similar to the pattern seen in organismal traits such as morphology and life-history. Moreover, we found that under stress (for example, drought), strength of selection on gene expression increased, and which types of genes were under stronger selection changed with environment – plant defense genes in normal environments vs. stress response and flowering time genes in drought.
And aside from offering insights into selection on gene expression, our study found 5 genes associated with high fitness under drought. One of them, OsMADS18 we identified as a likely drought-escape gene that gives early flowering lines higher fitness under drought. With this evolutionary ecological approach, we not only can understand something as fundamental as selection on gene expression, but it may help in developing new crop varieties adapted to stressful environments.
This opens up the possibility that we can now turn our attention to how selection acts in gene regulation, and helps forge the links between the genotype and phenotype by thinking about how gene expression evolves. Now that we can measure selection on individual gene expression in a genome-wide fashion, we can do it in other organisms. Many questions remain – how does selection strength compare if we look at expression not just in leaves but other tissues? How do you integrate expression data from multiple tissues? What features determines the strength of selection on gene expression? How is selection on gene expression linked to the evolution of phenotypes?
By integrating evolutionary ecology, genomics, systems biology and molecular evolutionary analysis, we can start to explore how selection acts on these fundamental molecular phenotypes.
Our paper in Nature can be accessed at https://www.nature.com/articles/s41586-020-1997-2