Sergey Ovchinnikov

3.8K posts

Sergey Ovchinnikov banner
Sergey Ovchinnikov

Sergey Ovchinnikov

@sokrypton

Scientist, Assistant Professor @MITBiology, #FirstGen, ProteinBERTologist, 🇺🇦 No Human is illegal. Moving to: https://t.co/sow6IRD3jj

Cambridge, MA Katılım Aralık 2014
3.7K Takip Edilen17.2K Takipçiler
Sabitlenmiş Tweet
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
I'm excited to share that I'll be joining @MITBiology as an Asst Prof. in Jan 2024! Come join us! 🤓🧪🖥️🧬
Sergey Ovchinnikov tweet media
English
169
147
2K
213.3K
Sergey Ovchinnikov retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Protein Diffusion Models as Statistical Potentials 1 The work tackles the persistent challenge that high‑confidence protein structure prediction can only be achieved when evolutionary co‑variation signals are strong. By formulating protein conformational space as an Energy‑Based Model (EBM), the authors circumvent the need for deep multiple sequence alignments, opening the door to de novo folds that are under‑represented in current databases. 2 ProteinEBM is a diffusion‑based EBM that learns a sequence‑conditioned energy function Eθ(x, s) via denoising score matching. Unlike earlier diffusion models that output a direct score, this formulation explicitly represents the score as the gradient of a learned energy, allowing the model to be used both for ranking and for guiding Langevin dynamics. 3 On a benchmark of thousands of decoy structures from the Rosetta set, ProteinEBM‑x achieves a Spearman correlation of 0.838 between energy and TM‑score, surpassing Rosetta’s 0.757 and matching the performance of AlphaFold‑based scoring methods. In zero‑shot mutation‑stability ranking on ProteinGym, the same model records a Spearman of 0.686, the best result reported to date and far ahead of language‑model baselines that rely on evolutionary data. 4 The learned energy landscape can be sampled with Langevin annealing or reverse diffusion. For eleven fast‑folding proteins, the model recovers native folds with median RMSD below 3 Å, and the sampled energy funnels closely resemble those generated by Rosetta or all‑atom MD, demonstrating that the EBM captures both low‑energy basins and kinetic accessibility. 5 ProteinEBM also supports direct folding simulations. Starting from a Ramachandran‑random chain, Langevin dynamics guided by the EBM reproduces experimentally observed transition intermediates for Protein G, NuG2, and Protein L, qualitatively matching the order of contact formation reported by φ‑value analysis. 6 Because ranking and sampling are decoupled, users can allocate arbitrary compute to search for low‑energy structures, a feature that alleviates the computational bottleneck of end‑to‑end models. The authors demonstrate an end‑to‑end prediction pipeline that combines large‑scale sampling, AF2Rank refinement, and MSA‑free inference, achieving higher TM‑scores on “easy” targets than AlphaFold‑2 or AlphaFold‑3 while remaining orders of magnitude faster than existing refinement workflows. 💻Code: github.com/jproney/Protei… 📜Paper: biorxiv.org/content/10.648… #ProteinEngineering #MachineLearning #EBM #AlphaFold #ProteinDesign #ComputationalBiology #DeepLearning #ProteinFolding #StructuralBioinformatics
Biology+AI Daily tweet media
English
0
17
86
3.5K
Sergey Ovchinnikov retweetledi
Christian Dallago
Christian Dallago@sacdallago·
🧵 We ran the largest head-to-head benchmark of protein binder design methods in the wet lab. Project page: research.nvidia.com/labs/genair/pr… 1 million designs. 127 targets. RFdiffusion, BindCraft, BoltzGen, and Proteina-Complexa — all tested side by side.👇
English
2
64
221
28.9K
Sergey Ovchinnikov retweetledi
James Roney
James Roney@jamesproney·
I'm excited to announce some major updates to our ProteinEBM paper with Chenxi Ou and @sokrypton!
English
4
61
360
33.9K
Sergey Ovchinnikov retweetledi
Mohammed AlQuraishi
Mohammed AlQuraishi@MoAlQuraishi·
New OpenFold3 preview out! (OF3p2) It closes the gap to AlphaFold3 for most modalities. Most critically, we're releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
Mohammed AlQuraishi tweet media
English
8
185
672
51.4K
Sergey Ovchinnikov retweetledi
Om Patel
Om Patel@om_patel5·
stop spending money on Claude Code. Chipotle's support bot is free:
Om Patel tweet media
English
1.1K
10.3K
160.4K
7.9M
Corey Howe
Corey Howe@design_proteins·
New personal best ipSAE of 0.942 de novo binder towards RBX1 Using ProteinHunter
Corey Howe tweet media
English
5
9
66
4.3K
Sergey Ovchinnikov retweetledi
Clay Kosonocky
Clay Kosonocky@kosonocky·
The results are finally in! 🏆💻🧬 I'm thrilled to announce that the manuscript for the Bits to Binders protein design competition is out on bioRxiv! Here's a summary of our findings, including some simple criteria that nearly *double* success rates when applied as a filter 🧵
Clay Kosonocky tweet media
English
1
36
140
11.1K
Sergey Ovchinnikov retweetledi
Andre Cornman
Andre Cornman@ancornman1·
Predicting protein-protein interactions (PPIs) at proteome scale can take months with co-folding models due to massive all-vs-all comparisons required. We are excited to announce FlashPPI, a contrastive model that predicts proteome wide physical interfaces in minutes. 1/🧵
GIF
English
5
28
131
10.1K
Sergey Ovchinnikov retweetledi
Yunha Hwang
Yunha Hwang@Micro_Yunha·
Protein–protein interactions (PPIs) are key to discovering and interpreting new biological functions. We’re excited to introduce 𝑭𝒍𝒂𝒔𝒉𝑷𝑷𝑰: a new application of gLM2 that uses genomic language modeling to predict proteome-wide PPIs in microbial genomes in minutes.
GIF
English
9
78
448
21.8K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@RichardALJones Well actually... they solved the 2d to 3d problem. 1d to 3d remains unsolved. 🤓 (2d = sparse/noisy covariance signal from input multiple sequence alignment)
GIF
English
2
1
30
1.2K
Richard Jones
Richard Jones@RichardALJones·
Biggest contribution of AI to science so far: AlphaFold, which has solved problem of predicting protein 3d structure from 1d sequence But there are other problems of protein folding: understanding mechanisms & pathways, misfolding, & role of disordered proteins: my blogpost ..
English
4
8
83
10.1K
Sergey Ovchinnikov retweetledi
Matt DeJong
Matt DeJong@DejongMatt·
First preprint of the @fordycelab and @Dunn_Lab collaboration! We used high-throughput microfluidics for sequence-strength mapping at the single-molecule level. Our new tech allowed us to discover a fundamental nonequilibrium property of multivalent systems. 1/13
Matt DeJong tweet media
English
3
18
60
18.6K
Sergey Ovchinnikov retweetledi
Kevin K. Yang 楊凱筌
Kevin K. Yang 楊凱筌@KevinKaichuang·
We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.
Kevin K. Yang 楊凱筌 tweet media
English
6
45
194
18.3K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@DdelAlamo @ReplicaTricks We actually tried do something like this before as a reply to a reviewer: (It's an extremely expensive analysis, but does give back similar result to our cheaper categorical Jacobian method) x.com/sokrypton/stat…
Sergey Ovchinnikov@sokrypton

A few updates: We compare the categorical jacobian to explicitly computing pseudo-likelihood for all single & double mutations, allowing one to compute epistasis via ΔE(double) - ΔE(single) - ΔE(single), as proposed by @JeannefaustineT et al. We see strong correlations. (1/5)

English
0
0
0
139
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@DdelAlamo @ReplicaTricks I'm guessing the real question is how to properly compute: pl(double mut a,b)? I agree, double masking of a & b would NOT give you this. You need to mutate both positions, then mask each position one-by-one to compute the pseudo-likelihood at masked position & add these up.
English
1
0
0
111
Diego del Alamo
Diego del Alamo@DdelAlamo·
I have a concern with this paper and I want someone who knows more than me to confirm if it is founded or not. The title makes a pretty specific claim about epistasis predictions, but the method does not seem sound for masked LMs (1/3)
Biology+AI Daily@BiologyAIDaily

Beyond additivity: zero-shot methods cannot predict impact of epistasis on protein properties and function 1 The study reveals a critical blind spot in modern protein AI: while 95 state-of-the-art zero-shot models can predict single mutations well, they systematically fail when mutations interact epistatically—where the combined effect of mutations deviates from simple additivity. 2 Using 53 MAVE datasets from ProteinGym, the researchers identified epistatic genotypes by comparing observed effects against expected additive effects, accounting for experimental error. For GFP fluorescence and protein thermostability, epistasis is widespread and biologically genuine, not a measurement artifact. 3 The performance gap is stark. Top models like ESCOTT, PoET, and MSA-Transformer achieve Spearman correlations above 0.6 for all genotypes, but collapse to near-zero or negative correlations for epistatic genotypes. Simple linear regression baselines often match or exceed complex deep learning models on epistatic combinations. 4 This exposes a fundamental limitation: protein language models learn evolutionary plausibility from natural sequences, but natural selection only explores functional sequence space. Epistatic combinations—often traversing fitness valleys—lie outside this training distribution, leaving models blind to higher-order mutational interactions. 5 The work highlights that clever feature engineering (evolutionary conservation, structural information) outperforms architectural complexity for epistasis prediction. Yet even structure-aware models like ProSST and ESM-IF1, while top performers on stability, show no consistent advantage across datasets. 6 The implications are profound for protein design and directed evolution. Current zero-shot methods cannot reliably navigate rugged fitness landscapes or predict functional variants along evolutionary paths requiring epistatic mutations. The field urgently needs models trained on multi-mutational data and architectures explicitly modeling non-linear interactions. 💻Code: github.com/kalininalab/ep… 📜Paper: biorxiv.org/content/10.648… #ProteinEngineering #Epistasis #MachineLearning #ProteinGym #VariantEffectPrediction #ComputationalBiology #Bioinformatics #ProteinEvolution #AIforScience #StructuralBiology

English
5
7
35
9.3K
Sergey Ovchinnikov retweetledi
Boston Protein Design and Modeling Club
We've got @jeffreyjgray coming to present this month!! Wednesday, February 25th 2026, starting at 7pm EST in Room 181, Building 68, @MIT "Antibody language models vs. biology; protein docking denoising diffusion models vs. physics" bpdmc.org
English
1
6
35
6.2K
Bo Wang
Bo Wang@BoWang87·
We submitted two cover proposals for our LUMI-lab paper in @CellCellPress. Neither made the cut 😅 But here's a challenge: one was designed by a human artist. One was generated by AI in minutes. Can you tell which is which? 👇
Bo Wang tweet mediaBo Wang tweet media
Bo Wang@BoWang87

Welcome to the Lab of the Future! 🧬🤖 Excited to share LUMI-lab, out today in @CellCellPress — a self-driving platform that pairs an AI foundation model with a robotic lab to autonomously discover ionizable lipids (LNPs) for mRNA delivery. The core problem: Designing lipid nanoparticles (LNPs) is hard. The chemical space of ionizable lipids is vast, experimental cycles are slow, and — critically — historical LNP datasets are far too small to train a predictive model from scratch. Most AI approaches in this space hit a wall immediately: not enough data to learn from. Our solution: lab-in-the-loop foundation model learning. Instead of training on LNP data alone, LUMI starts as a transformer-based foundation model pretrained across broad chemical space, building rich molecular representations before it ever sees a single LNP experiment. Then it enters a closed loop with a robotic synthesis platform: predict → synthesize → assay → update. Each round of real wet-lab experiments fine-tunes the model, which then proposes smarter candidates for the next round. The lab isn't just validating AI predictions — it's actively teaching the model, continuously. What happened when we let it run: LUMI-lab autonomously synthesized and screened 1,700+ ionizable lipids in human bronchial epithelial cells. The top candidate — LUMI-6 — features a brominated lipid tail, a structural motif that had been largely overlooked in LNP design. LUMI found it without being told where to look. When formulated into LNPs and delivered intratracheally to mice, LUMI-6 achieved 20.3% gene editing efficiency in lung epithelial cells — a compelling result for one of the hardest-to-reach therapeutic targets, directly relevant to diseases like cystic fibrosis and alpha-1 antitrypsin deficiency. Why this matters beyond LNPs: This is a proof of concept for a broader thesis — that foundation model pretraining + active learning + robotic experimentation can overcome the data scarcity bottleneck that plagues AI-driven discovery in biology. You don't need a massive domain-specific dataset to start. You need a model that can generalize, a lab that can generate the right data, and a loop that connects them. Huge congratulations to first authors Yue Xu, @HAOTIANCUI1, and Kuan Pang, and to the entire @BowenLi_Lab team. Grateful to our collaborators at @UHN and @UofTPharmacy, and to Princess Margaret Cancer Centre Research @PMResearch_UHN. 📄 Paper: cell.com/cell/fulltext/…

English
8
2
51
19.9K
Sergey Ovchinnikov retweetledi
Ajasja 💻🧬🔬
Ajasja 💻🧬🔬@AjasjaLjubetic·
Want to learn more about PyRosetta, Prosculpt, or making OVO plugins? You are in luck! The recordings from the last two sessions of the protein design workflows meetings are up! youtube.com/watch?v=3Vm12X…
YouTube video
YouTube
Ajasja 💻🧬🔬 tweet media
English
0
24
138
4.8K
Sergey Ovchinnikov retweetledi
Romero lab
Romero lab@romerolab1·
AlphaFold 3 is a game-changer for biomolecular modeling, but the CPU-bound MSA bottleneck is a major hurdle for high-throughput discovery. Today, Romero lab introduces AlphaFast: our new framework that delivers a 22.8x speedup in AF3 inference on a single GPU. 🚀 1/5
Romero lab tweet media
English
2
24
140
7.6K