Manoj K Valluru

1.5K posts

Manoj K Valluru banner
Manoj K Valluru

Manoj K Valluru

@mkvalluru

https://t.co/7Xqee6zmyZ Postdoc @ Kidney Genetics Group @Ong_Lab @ShefUni_ClinMed

Yorkshire and The Humber Katılım Ağustos 2010
524 Takip Edilen338 Takipçiler
Manoj K Valluru retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Compressing the collective knowledge of ESM into a single protein language model @naturemethods 1. The paper argues that “sequence-only” protein language models (PLMs) are not intrinsically capped for variant-effect prediction (VEP); instead, their evolutionary signals are fragmented across model families and can be recovered by making models learn from each other. 2. Key observation: closely related ESM models have complementary blind spots. For example, ESM2 models systematically miss KRAB-domain conservation signals, while ESM1b/ESM1v can miss BRICHOS-domain signals; yet at least one model in the family captures each domain’s mutational sensitivity. 3. They introduce a simple but effective ensemble rule: for each missense mutation, take the minimum log-likelihood ratio (LLR) across models (ESMIN), i.e., “maximum confidence” scoring. This can amplify subtle evolutionary constraints that averaging would dilute. 4. A theoretical analysis explains when min-LLR beats averaging: if pathogenic-variant LLRs are more dispersed across models than benign-variant LLRs (variance asymmetry). The ESM family empirically shows this property, making maximum-confidence aggregation advantageous. 5. ESMIN is evaluated using 11 sequence-only ESM models (ESM1b, five ESM1v, five ESM2; excluding ESM2-15B). It outperforms averaging-based ensembles and improves ProteinGym DMS correlations, with gains occurring in ~50% of assays (versus ~20% for typical ensembles). 6. Main methodological contribution: “maximum-confidence co-distillation.” For each protein, all models score all mutations; the element-wise minimum LLR matrix becomes a teacher signal, and each model is trained (variant-level MSE) to match these confident targets—without MSAs, structures, or population genetics features. 7. Co-distillation substantially improves every participating model, including small ones: ESM2-8M improves on ClinVar AUC from ~0.65 to ~0.88. Several co-distilled single models (e.g., ESM2-3B, ESM1b, ESM2-650M) can even surpass the ESMIN teacher signal (“student surpasses teacher”). 8. Robustness/ablation: improvements persist when training data are heavily reduced and de-homologized. With only ~1% of human proteins (~200 sequences; <30% identity to benchmark proteins), ESM2-35M reaches ~97% (ClinVar) and ~94% (DMS) of its peak co-distilled performance. 9. Iterative procedure: after round 1 (min-LLR co-distillation), additional rounds switch to average-aggregation co-distillation. As models improve, class-conditional variances become more symmetric, making averaging slightly better; after 3 rounds, a single 3B model matches the ensemble—named VESM-3B. 10. Practical compression: VESM-3B is distilled into smaller models (650M, 150M, 35M) that retain most performance (reported as >98% on Balanced ClinVar and >93% on ProteinGym DMS relative to VESM-3B), enabling high-throughput VEP under limited compute. 11. Clinical benchmark (ProteinGym ClinVar, 2,227 genes): sequence-only VESM models outperform other sequence-only PLMs (including ESM-C) and compete with or surpass methods using MSA/structure/population priors. VESM-3B shows balanced ROC behavior across specificity and sensitivity regimes. 12. AlphaMissense comparison: VESM-3B performance is stable across allele-frequency strata, while AlphaMissense shows strong dependence on MAF (consistent with circularity risks when population frequency informs clinical labels). After excluding variants overlapping AlphaMissense training (gnomAD v2 MAF > 1e-5), all VESM sizes outperform AlphaMissense on AUC and multiple calibrated metrics. 13. Modular use of structure: rather than retraining a joint model, they fine-tune the sequence component of ESM3 using VESM-style sequence-based loss to create VESM3, and combine VESM3 with VESM-3B into a structure-aware ensemble (VESM++). This improves performance on structure-dependent DMS assays (binding/stability/expression) while maintaining strong fitness/activity performance. 14. Cross-domain generalization: despite co-distillation being trained on human proteins, gains transfer strongly to nonhuman DMS assays, with disproportionately large improvements reported for viral proteins—even though ESM3’s released training data excluded viral sequences. 15. Beyond binary pathogenicity: using UK Biobank/Genebass summary statistics for 332 gene–phenotype pairs (blood biochemistry biomarkers), variant-level VESM scores correlate with single-variant effect sizes (β). VESM++ and VESM-3B yield the strongest gene–trait association signals across tested models. 16. Notably, VESM-3B recovers the correct pLoF direction of effect in 98.8% of significant gene–phenotype pairs and identifies many associations not detected by missense burden tests, suggesting utility for quantitative trait interpretation from summary statistics. 📜Paper: doi.org/10.1038/s41592… #ProteinLanguageModels #VariantEffectPrediction #ComputationalBiology #HumanGenetics #ESM #ClinVar #ProteinGym #DeepMutationalScanning #UKBiobank #MachineLearning
Biology+AI Daily tweet media
English
1
26
115
18.2K
Manoj K Valluru retweetledi
Piero Carninci
Piero Carninci@carninci·
We are looking for an outstanding postdoc to join my lab at the Human Technopole in Milan, to tackle big questions in the biology and function of long non-coding RNAs. Details here: careers.humantechnopole.it/job/Postdoc-Ca…
English
1
24
59
6.2K
Manoj K Valluru retweetledi
Steven Salzberg 💙💛
Steven Salzberg 💙💛@StevenSalzberg1·
"The AI Scientist, writes code, runs experiments, plots and analyses data, writes the entire scientific manuscript, and performs its own peer review." Amazingly, this is not satire! It's from a new paper in @Nature. Maybe they meant to release it on April 1? 🤣
English
1
2
15
4.8K
Manoj K Valluru retweetledi
BioImagingUK
BioImagingUK@BioImagingUK·
🐍 Learn Python for Image Analysis! UK life scientists: join EuroBioImaging's hands-on beginner course at The Crick, London (23–25 June), running with sites in CZ, PT and SWE! ⏰ Register by 15/4— places are limited! ✈️ Travel funding may be available shorturl.at/v3Vsk
BioImagingUK tweet media
English
0
18
43
3.3K
Manoj K Valluru retweetledi
Kresten Lindorff-Larsen
Kresten Lindorff-Larsen@LindorffLarsen·
New preprint on how disagreement among variant effect predictors (VEPs) can help guide prioritization of proteins for experimental analysis We analyse for which proteins VEPs disagree, what features they have, and suggest lack of concordance & clinical data to guide experiments
Kresten Lindorff-Larsen tweet media
bioRxiv Bioinfo@biorxiv_bioinfo

Disagreement among variant effect predictors guides experimental prioritization of target proteins biorxiv.org/content/10.648… #biorxiv_bioinfo

English
3
19
92
10.6K
Manoj K Valluru retweetledi
Jun Yasuda
TLにも流れていた、マウス全細胞(胎児は全身、成体は全臓器)の3次元アトラス論文。今週のCELL誌の表紙をとった。 cell.com/cell/abstract/…
日本語
1
8
78
5.8K
Manoj K Valluru retweetledi
Alice Ting
Alice Ting@aliceyting·
We are recruiting! If you are passionate about technology development, protein engineering, computational design, directed evolution, chemical biology - please reach out! (The setting is pretty nice too…)
Alice Ting tweet media
English
23
67
508
57.9K
Manoj K Valluru retweetledi
Science Magazine
Science Magazine@ScienceMagazine·
The Human Organ Atlas, a new resource for researchers, clinicians, and educators, is an open-access database of 3D imaging of intact human organs. The portal includes donor samples with conditions from congenital disorders to COVID-19. Learn more in @ScienceAdvances: scim.ag/4bnSEzZ
Science Magazine tweet media
English
1
62
164
20.4K
Manoj K Valluru retweetledi
John A Rogers
John A Rogers@ProfJohnARogers·
Mechanical forces play essential roles in biology. Measuring and controlling these forces represent emerging strategies for disease diagnosis and therapeutic intervention, to complement more traditional approaches. This review article, published today in Nature Reviews Bioengineering (nature.com/articles/s4422…), summarizes the status of work in this area -- from organisms, organs and tissues to cells and molecules -- and future directions. Check it out if you’d like to learn more about this field of research and translation. Big thanks to my senior co-authors, Prof. @songli_UCLA, Prof. Jun Chen and Prof. Viola Vogel, for their leadership on this piece, to Dr. Min-Seung Jo (postdoc in the group) for his contributions to our sections on sensors and actuators, and to the other co-authors for their roles.
John A Rogers tweet media
English
2
12
67
4.9K
Manoj K Valluru retweetledi
Dr Charlotte Houldcroft
Dr Charlotte Houldcroft@DrCJ_Houldcroft·
Dear students in the UK, if you haven't had your teenage meningitis vaccine or aren't sure if you have had it, PLEASE arrange with your GP to get vaccinated nhs.uk/vaccinations/m… It helps prevent tragic deaths like those that have occured recently in Kent.
English
50
668
1K
107.9K
Manoj K Valluru retweetledi
Satya Nadella
Satya Nadella@satyanadella·
We’ve trained a multimodal AI model to turn routine pathology slides into spatial proteomics, with the potential to reduce time and cost while expanding access to cancer care.
English
457
1.9K
11.3K
2.8M
Manoj K Valluru retweetledi
Nilimesh Das
Nilimesh Das@NilimeshDa6026·
DNAsight: an automated analysis framework for AFM data that integrates machine learning (ML)-based segmentation with modular, base-pair-calibrated quantification of DNA spatial organization, looping, nucleosome spacing, and protein clustering biorxiv.org/content/10.648…
English
0
1
6
497
Manoj K Valluru retweetledi
Garyk Brixi
Garyk Brixi@garykbrixi·
Evo 2 is out in Nature today, showing that genome language models can predict and design across the full complexity of life, from phages to eukaryotes. A few surprises from the project, including how ignoring trillions of nucleotides was key to getting a good model. 🧵
Garyk Brixi tweet media
English
13
208
1K
98.6K
Manoj K Valluru retweetledi
Veera Rajagopal 
Veera Rajagopal @doctorveera·
Excited to share one of my favorite genetic discoveries made at the Regeneron Genetics Center. We went looking for genetic clues about why some people smoke more than others and found something in an unexpected place: the genomes of Indigenous Mexicans. 1/
Veera Rajagopal  tweet media
English
11
63
283
36.6K
Manoj K Valluru retweetledi
Ming "Tommy" Tang
Ming "Tommy" Tang@tangming2005·
5 best plugins for Claude Code that most developers don't know about: 1/ superpowers (~54k GitHub stars) Forces Claude to actually think before writing code brainstorms requirements with you first writes a spec, then breaks work into 2-5 min tasks
Ming "Tommy" Tang tweet media
English
2
2
30
4K
Manoj K Valluru retweetledi
Hao Yin
Hao Yin@HaoYin20·
#AlternativeSplicing analysis pipeline MAJIQ v2 - VOILA v2 modulizer 👉Build gene splicegraph including unannotated, complex elements (e.g. de novo intron retention) 👉Quantify AS events & modules vs MAJIQ, rMATS turbo, LeafCutter, SUPPA2, Whippet 👉Comparative analysis of GTEx v8 all 53 tissues 👉Focused analysis of Brain Subregion Changing Cassette Exon events @YosephBarash @NatureComms 2023 nature.com/articles/s4146…
Hao Yin tweet media
English
1
8
28
1.6K
Manoj K Valluru retweetledi
PKD Foundation
PKD Foundation@PKDFoundation·
Check out this free resource for PKD patients and families 👇 The Mayo Clinic PKD Resource Center offers articles and videos on diagnosis, genetics, lifestyle management, treatment options, and more. Find reliable and expert-led PKD information at mcpress.mayoclinic.org/polycystic-kid…
English
0
2
3
483
Manoj K Valluru retweetledi
Nature Reviews Genetics
Nature Reviews Genetics@NatureRevGenet·
📢 Our February issue is live! go.nature.com/4b9MfKl Reviews include: how spatial omics has reshaped our understanding of human development and disease; multiplexed assays of variant effect (MAVEs); haplotype phasing and genotype imputation; forensic genetics in the omics era
Nature Reviews Genetics tweet media
English
1
6
30
7.4K
Manoj K Valluru retweetledi
Kidney Research UK
Kidney Research UK@Kidney_Research·
We welcome the latest news from @NICEComms recommending the combined use of two different targeted therapies to treat adults in England living with a particular type of advanced kidney cancer. Kidney cancer is the 11th most common cause of cancer death in the UK; some 28,485 people in England were diagnosed with the condition between 2020 and 2022. Director of policy and public affairs, Alison Railton, said: "𝑇ℎ𝑖𝑠 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛 𝑜𝑓𝑓𝑒𝑟𝑠 𝑎𝑛 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑜𝑝𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑝𝑒𝑜𝑝𝑙𝑒 𝑙𝑖𝑣𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑎𝑑𝑣𝑎𝑛𝑐𝑒𝑑 𝑟𝑒𝑛𝑎𝑙 𝑐𝑒𝑙𝑙 𝑐𝑎𝑟𝑐𝑖𝑛𝑜𝑚𝑎. “𝐾𝑖𝑑𝑛𝑒𝑦 𝑅𝑒𝑠𝑒𝑎𝑟𝑐ℎ 𝑈𝐾 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 𝑐𝑜𝑚𝑚𝑒𝑛𝑡𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑐𝑜𝑛𝑠𝑢𝑙𝑡𝑎𝑡𝑖𝑜𝑛 𝑝ℎ𝑎𝑠𝑒 𝑜𝑓 𝑡ℎ𝑖𝑠 𝑎𝑝𝑝𝑟𝑎𝑖𝑠𝑎𝑙 𝑝𝑟𝑜𝑐𝑒𝑠𝑠. 𝑆𝑜𝑚𝑒 𝑡𝑦𝑝𝑒𝑠 𝑜𝑓 𝑅𝐶𝐶 𝑐𝑎𝑛 𝑏𝑒 𝑑𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡 𝑡𝑜 𝑡𝑟𝑒𝑎𝑡, 𝑒𝑠𝑝𝑒𝑐𝑖𝑎𝑙𝑙𝑦 𝑖𝑓 𝑑𝑖𝑎𝑔𝑛𝑜𝑠𝑒𝑑 𝑙𝑎𝑡𝑒𝑟. 𝑇ℎ𝑖𝑠 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑒𝑤𝑠 𝑓𝑜𝑟 𝑝𝑒𝑜𝑝𝑙𝑒 𝑙𝑖𝑣𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑡ℎ𝑖𝑠 𝑡𝑦𝑝𝑒 𝑜𝑓 𝑐𝑎𝑛𝑐𝑒𝑟 𝑤ℎ𝑒𝑛 𝑖𝑡 𝑖𝑠 𝑢𝑛𝑡𝑟𝑒𝑎𝑡𝑒𝑑 𝑎𝑛𝑑 𝑎𝑑𝑣𝑎𝑛𝑐𝑒𝑑 𝑎𝑠 𝑖𝑡 𝑚𝑎𝑦 𝑖𝑚𝑝𝑎𝑐𝑡 𝑡ℎ𝑒𝑖𝑟 𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑜𝑓 𝑙𝑖𝑓𝑒 𝑎𝑛𝑑 𝑙𝑜𝑛𝑔-𝑡𝑒𝑟𝑚 𝑠𝑢𝑟𝑣𝑖𝑣𝑎𝑙.” Read our full story here: bit.ly/45p2NKT
Kidney Research UK tweet media
English
0
2
4
396
Manoj K Valluru retweetledi
Nature Methods
Nature Methods@naturemethods·
A Perspective on 3D traction force microscopy provides practical experimental and computational guidance for users. nature.com/articles/s4159…
Nature Methods tweet media
English
0
3
7
2.6K