Sergey Ovchinnikov

3.8K posts

Sergey Ovchinnikov banner
Sergey Ovchinnikov

Sergey Ovchinnikov

@sokrypton

Scientist, Assistant Professor @MITBiology, #FirstGen, ProteinBERTologist, 🇺🇦 No Human is illegal. Moving to: https://t.co/sow6IRD3jj

Cambridge, MA Katılım Aralık 2014
3.7K Takip Edilen17.8K Takipçiler
Sabitlenmiş Tweet
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
I'm excited to share that I'll be joining @MITBiology as an Asst Prof. in Jan 2024! Come join us! 🤓🧪🖥️🧬
Sergey Ovchinnikov tweet media
English
169
148
2K
216.4K
Sergey Ovchinnikov retweetledi
Silvi Rouskin
Silvi Rouskin@silvirouskin·
New preprint from the lab ! In short - nobody told the model what a stem-loop was, we gave it 50,000 IRES sequences and one job: predict masked nucleotides. That’s it! Albatross is an RNA language model that taught itself the structural logic of viral RNA the same way LLMs learn grammar- from sequence alone. No base pairing rules, no thermodynamics, no structure labels. Check it out-biorxiv.org/content/10.648… and albatrossrna.org
English
6
26
192
29.8K
Dmitry Rybin
Dmitry Rybin@DmitryRybin1·
Some time ago mathematicians proposed that the first thing we should share with alien intelligence is this image:
Dmitry Rybin tweet media
English
137
50
1.4K
400K
Corey Howe
Corey Howe@design_proteins·
New binder scoring metric just dropped Absolute stability prediction with seq-based (ESM3dG) and structure based (SaProtdG) PLMs Adding stability prediction boosts ipSAE performance in discriminating binders from non-binders Congrats @ChoYehlin !
Corey Howe tweet media
Yehlin Cho@ChoYehlin

🚀 Excited to share our new work: Absolute Stability Predictor! 📊: forms.gle/4ZnXZSnTBvaykk… Built the MGnify Stability Dataset (1.8M+ measurements) and developed stability prediction models, together with @grocklin, @KotaroTsuboyama, @sokrypton, and teams.

English
3
15
116
10.9K
Sergey Ovchinnikov retweetledi
Yehlin Cho
Yehlin Cho@ChoYehlin·
We measured stability for 1.8M diverse protein domains (60–80 aa) from the MGnify metagenomic database, spanning 200k+ sequence families, and created the MGnify Stability Dataset.
Yehlin Cho tweet media
English
1
12
54
4.3K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@janekm @mbeisen Yeah... That wasn't part of the prompt... 😅 Not sure why it (Gemini) added all the "edible" disclaimers, I just asked it to make a slide comparing the sizes between rice and fly eggs.
English
0
0
1
172
Chris Hayduk
Chris Hayduk@ChrisHayduk·
GPT 5.5 is an effective autoresearcher in structural biology! I've had goal mode running for over 150 hours straight, looking for topologically inspired architectural changes to improve the performance of AlphaFold2. Performance is strong and improving!
Chris Hayduk tweet media
English
44
135
1.4K
131.7K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@ChrisHayduk Pretty cool! Though be careful... I tried something like this with Claude Code a while back, and after running for a couple days, I woke up to an RMSD of zero! That's when I realized it started inputting the correct answer as the input to the model 😅
English
2
0
26
2K
Sergey Ovchinnikov retweetledi
Protein Data Bank
Protein Data Bank@PDBeurope·
PDBe-SIFTS is now open source 🎉 Developed in collaboration with @PDBeurope and @uniprot , it enables fast, accurate residue-level mapping between protein sequences and structures, achieving >93% agreement with curated mappings. Get started: github.com/PDBeurope/SIFTS
Protein Data Bank tweet media
English
4
46
192
26.4K
Julian Englert
Julian Englert@julian_englert·
We just made an app that walks you through designing a novel protein with AI from scratch. Takes about 5 minutes, requires zero biology knowledge. ➡️ design-a-protein.com The best part: we will actually synthesize 1000 of those protein designs in the lab and test their real world function as a therapeutic.
English
31
183
971
122.5K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@shae_mcl You also need to factor in the cost of sequence databases, isolate genomes (eg. UniProt) and metagenomic databases (eg. Mgnify, JGI). And the billions of years of natural selection needed to produce these 🙃.
English
0
0
6
680
Shae McLaughlin
Shae McLaughlin@shae_mcl·
It’s estimated that the Protein Data Bank (PDB) cost around $13B to create. Alphafold was only possible because of it. If we want ML to solve biology, we should be funding the creation of databases and the development of new assay technologies. ML is nothing without data.
English
40
176
1.3K
156.6K
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@nlarusstone @jboysen0 Sidenote: Pre-AlphaFold it was thought you needed ~1000 diverse sequences (no two sequences more than 90% identical to each other) around the sequence you wanted to compute the structure (with simple linear algebra) for. Post-AlphaFold (and DL) that dropped to ~100.
English
0
0
2
62
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
@nlarusstone @jboysen0 If anything, it might be more interesting to quantify the cost of resequencing the uniprot and metagenomic (mgnify/JGI) DBs at 90% clustering threshold. That being said, if someone solves a structure where sequence databases lack diversity, structure will definitely help (2/2).
English
1
0
0
43
Sergey Ovchinnikov retweetledi
Yeqing Lin
Yeqing Lin@lin_yeqing·
Introducing Genie 3, a generative protein model that substantially advances the state-of-the-art for binder design, increasing in silico success rates by up to 20x on hard multimeric targets. It also debuts a form of inference-time scaling unobserved in other design models. 🧵1/8
English
8
110
435
73.6K