Could the folding of synthetic gene circuits in 3D shape how genes are expressed? Today @ScienceMagazine we report on the role of gene syntax in shaping feedback between transcriptional activity and genome folding for advanced circuit design🧵 (1/n)
Researchers tracked individual diatoms—unicellular photosynthetic microalgae with rigid silica cell walls—as they glided along a surface using mobile adhesive strands that protrude through slits in their cell walls, known as raphes. In PNAS: ow.ly/o98N50YFJwM
When a protein embedding is indistinguishable from noise
Protein language models have become the backbone of computational biology. Feed them an amino acid sequence, and they return a dense vector—a compact numerical fingerprint that downstream models use to predict function, structure, localization, or the effect of a mutation. The assumption, largely unquestioned, is that this fingerprint actually encodes meaningful biology.
Prabakaran and Bromberg challenge that assumption directly. They ask a deceptively simple question: how do you know whether a given embedding actually represents a protein—or whether it's just noise dressed up as a vector?
Their answer is the Random Neighbor Score (RNS). The idea is elegant: generate biologically meaningless sequences by randomly shuffling the residues of real proteins—preserving amino acid composition but destroying all evolutionarily meaningful interactions. Then, for each real protein, measure how many of its nearest neighbors in latent space are these random imposters. A high RNS means the model never learned to place that protein somewhere biologically meaningful.
Applied to ESM-2 and ProtT5 across thousands of proteins, RNS correlates strongly with structural prediction quality: proteins with poorly predicted structures have embeddings nearly indistinguishable from random sequences. Downstream tasks follow the same pattern—contact prediction precision drops roughly 40% for high-RNS proteins, and variant effect prediction falls to near chance. Most sobering: between 19% and 46% of the human proteome is underlearned by current models, depending on architecture. Intrinsically disordered regions fare especially poorly across all architectures tested.
RNS is model-agnostic and computationally cheap—around two minutes on GPU for 10,000 proteins—making it a practical prescreening step before any embedding-based inference.
For R&D teams that routinely use protein embeddings to prioritize variants, annotate novel sequences, or screen large libraries, this has immediate consequences. Running RNS before downstream inference flags proteins where predictions are unreliable, reducing the risk of propagating errors into expensive wet-lab campaigns. It also offers a principled way to identify gaps in model coverage—directly actionable for teams building or fine-tuning their own foundation models.
Paper: R. Prabakaran & Yana Bromberg, Nature Methods (2026) — CC BY-NC-ND 4.0 | nature.com/articles/s4159…
#PRESS Evoke phase 3 trials did not demonstrate a statistically significant reduction in Alzheimer's disease progression. Learn more here: novonordisk.com/news-and-media…
BIG ANNOUNCEMENT📣: I haven’t been this excited to be part of something new in 15 years… Thrilled to reveal the passion project I’ve been working on for the past year and a half!🙀🥳 It started from my frustration with the depressing effect that the current publishing system has on the well-being of myself, my team, and pretty much every scientist I know (maybe you’ve noticed from my stupid jokes… :) I was exhausted of dealing with the huge delays, reviewers that can be abusive, and how arbitrary it all is. Unfortunately, the most important factors are often WHO your reviewers are and who YOU are... It’s clear we need alternatives or at least ways to improve the situation. So, together with a really special and talented team we worked to develop this idea into “qed” a platform where you can get CONSTRUCTIVE feedback on your own work or CRITICALLY assess other people’s papers. It can be a real difference maker if many of you join us (thousands have tried it already, but today we release a NEW and much stronger version ;) Let’s harness qed to put the power back in the scientists’ hands, to do, to read & to publish science on our own terms. I’m dying for you to TRY IT, and it’s very simple - just drop a paper (the link to the website is in the replies👇) - it’s completely secure, private, and free, and you get results fast. Please show your support, SHARE, tell your friends, and let’s be the revolution 🫵!
15 years in the making, we confirmed that mitochondria -the powerhouse of the cell- have an unusual localization in patients who experience psychosis (including schizophrenia and bipolar disorders). You’ll never guess what kind of patient cells we used to make this discovery...🧵
We don't know the (phenotypic) consequences of most mutations. Good news, the proteome can help! And it predicts phenotypes in new environments. science.org/doi/10.1126/sc…
Our algorithm dGbyG for predicting standard Gibbs free energy of metabolic reactions and thermodynamic analysis of genome-scale metabolic networks is now online at Cell Systems: authors.elsevier.com/a/1loAF8YyDfuZ…
Right now is the best chance the scientific community has ever had to end the artificial scarcity of academic journals.
Check out my new op-ed urging @NIH@NIHDirector_Jay to disallow taxpayer dollars towards journal publication fees — something both publishers and scientists have played a role in perpetuating.
The last day for public comment on this topic is Monday, Sept 15. It’s time to unleash science.
Links in 🧵
The difference between doing a project and presenting it. An observation can lead to many avenues of explorations before focus turns to a specific discovery. Presenting it, in a talk / paper, follows inversely, with broad perspectives coming before & after the specific discovery.
Razor = rule of thumb to shave other alternatives
Occam's razor = simplest solution in problem-solving
Large parameter space of Neural Networks seem to defy Occam's razor but almost degenerate Fisher information explains their low complexity
👉arxiv.org/abs/1905.11027