A natural variant in uORF underlying phenotypic diversity

To dissect natural genetic variation in crops facilitates molecular breeding. To date, most published causal variants underlying phenotypic diversity are involved in either gene expression or protein sequence polymorphisms. Natural variation at post-transcriptional level remains elusive in plants.

By leveraging genome sequencing, forward genetic analyses, and molecular techniques, we reveal that a naturally-occurring SNP changing the length of an open reading frame in 5’UTR (namely uORF), largely alters the protein abundance of the downstream coding region in a cell-type-specific manner, and thus confers phenotypic diversity in a soybean germplasm collection.

How to identify a strong and clear association signal from genome-wide association study (GWAS) ?

Though GWAS is an efficient genetic approach to study natural genetic variation, it is difficult to identify a strong and clear association signal for complex traits, which undermines researchers’ confidence to explore causal genes/variants. As an alternative approach, adding a heritable trait into GWAS as a covariate may facilitate the identification of a true covariate-independent QTL for a trait of interest 1. In the study 2, the locus CPU1 failed to be detected using phosphorus (P) uptake. When adding total root length as a covariate, CPU1 is detected by GWAS of P uptake. The result suggests that the locus may regulate P uptake by regulating P acquisition efficiency of root rather than the total root length. We then use P uptake per unit root length (P uptake/total root length) to characterize P acquisition efficiency, and detect one strong and clear GWAS signal. It’s just CPU1.

Where is the causal variant?

Based on genes’ expression profiles and phylogenetic analyses, GmPHF1 is identified and verified as the causal gene underlying CPU1, which encodes a SEC-12 like protein facilitating P transporter’s trafficking from endoplasmic reticulum to plasma membrane. Where is the causal variant? We feel confused that no amino acid changes and no significant differences of expression levels between two major haplotypes of GmPHF1. These results indicate that the causal polymorphism is attributed neither to protein sequence nor to gene expression. GmPHF1-based association mapping reveals two significant variants in 5’UTR. Intriguingly, it is demonstrated that 5’UTR variation contributes to protein abundance in a cell-type-specific manner.

How does the causal variant give rise to phenotypic diversity?

There are two SNPs of high frequency (minor allele frequency = 0.49) in 5’UTR. Which SNP is the causal variant? How does the causal variant affect protein abundance? We check 5’UTR sequence and find an open reading frame (named uORF). The two SNPs change amino acid and introduce stop codon in the uORF, respectively. By constructing recombinant constructs for the two SNPs, it is demonstrated that the SNP which introduces stop codon and thus reduces uORF’s length, is the causal variant. When artificially mutating the start codon of the uORF (ATG→AAA), the causal SNP fails to change the protein abundance, showing that the SNP works dependently of uORF. uORF could affect translation efficiency of downstream CDS or mRNA stability by inducing non-sense mRNA decay 3. In the study 2, the similarity of spatial distribution in roots between steady-state mRNA and protein is observed and the causal SNP changes the steady-state mRNA levels, suggesting its effect on mRNA stability.

Suggestions and interesting issues for future studies

  1. If local adaptation of a diverse population is not fully considered, large sample size may not provide high mapping power in GWAS.
  2. Causal genes or variants may be missed when only promoter and coding regions are focused on.
  3. In the study 2, the uORF’s function does not depend on uORF-encoding small peptides. Is there uORF variation which functions by changing sequence of its small peptide and affects its interaction with the targeted protein ? An interesting issue for future studies.
  4. Almost all studies show uORF’s repressive effect on the protein translation. In the study 2, both artificially mutating uORF’s start codon and truncating this uORF’s length by the naturally occurring SNP, remarkably reduce protein abundance. These results show that the uORF is indispensable for the downstream protein translation. What is the mechanism and is the enhancer uORF universal ?
  5. uORF functions in a cell-type-specific manner in the study 2, suggesting that uORF can be selectively initiated. So inducible uORF can be used to enhance plants’ resistance to stress without yield penalty.



  1. Guo, Z. et al. Genetic analyses of lodging resistance and yield provide insights into post-Green-Revolution breeding in rice. Plant Biotechnology Journal 19, 814-829 (2021).
  2. Guo, Z. et al. A natural uORF variant confers phosphorus acquisition diversity in soybean. Nature Communications 13, 1-14 (2022).
  3. Lee, D. et al. Disrupting upstream translation in mRNAs is associated with human disease. Nature communications 12, 1-14 (2021).