ESPEYB25 15. Editors' Choice Genetics (9 abstracts)
Nat Genet 2025; 57:626-634. PMID: 39994471 doi: 10.1038/s41588-025-02095-4
In Brief: To examine the contribution of rare noncoding genetic variants to biology and disease risks, the authors analysed whole-genome sequencing (WGS) data consisting of 1.1 billion variants to identify associations with circulating levels of 2,907 proteins in ~50,000 UK Biobank participants. They identify 604 rare noncoding single-variants associated with circulating protein levels.
Comment: To date, the large majority of disease-altering mutations are in gene-coding sequences, which may disrupt the production, stability or function of that gene. Such mutations are detectable on gene-centric assays, including whole exome sequencing (WES). By contrast, WGS generates far greater datasets; 98-99% of the genome is noncoding. A major challenge is to determine which of those noncoding genetic variants might have functional consequences.
The authors undertake a clever approach to address this question, hypothesising that functionally relevant noncoding variants may alter circulating protein levels. This approach takes advantage of widely-available data from the UK Biobank on WGS and also proteomics generated by Olink technology. Noncoding variants associated with protein levels were more likely to occur in regions near to genes, in 5′-UTR and predicted intronic splice acceptor or donor sites, which are not captured by WES. Observed effect sizes were as large as those seen for rare coding variants.
A crucial warning for those using UK Biobank WGS data is that only one-third of measured proteins had what the authors considered high-quality WGS data across the relevant genes locus, indicating limited quality of the efficiently produced low coverage WGS. Further resources are needed to generate high coverage WGS data at large scale, as well as more comprehensive proteomic measurements.