GAMA Miguel Angel 🐦‍⬛🔑

4.4K posts

GAMA Miguel Angel 🐦‍⬛🔑 banner
GAMA Miguel Angel 🐦‍⬛🔑

GAMA Miguel Angel 🐦‍⬛🔑

@miangoar

Biologist that navigates in the oceans of diversity through the space-time | MSc in Biochem/Bioinfo @ibt_unam 🇲🇽 | Protein evo, metagenomics & AI/ML/DL

Mexico Katılım Kasım 2020
1.2K Takip Edilen1.9K Takipçiler
Sabitlenmiş Tweet
GAMA Miguel Angel 🐦‍⬛🔑
1/12🧵Do you want to learn how to design proteins using AI but don’t know anything about bio? I created a free 10-lesson course on YouTube. It’s now available in Spanish (original) and English (autodubbing w/Kokoro 82M). Here’s an overview of the topics covered in each lecture :)
GAMA Miguel Angel 🐦‍⬛🔑 tweet media
English
2
10
35
2.2K
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
Andrew White 🐦‍⬛
Andrew White 🐦‍⬛@andrewwhite01·
hallucinated references will land you a 1-year ban from arxiv now. wow
Andrew White 🐦‍⬛ tweet media
English
84
364
3.4K
198.3K
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
Natasha Malpani 👁
Natasha Malpani 👁@natashamalpani·
isomorphic labs just raised $2.1 billion. no human patient has been dosed yet. the first clinical trials are expected end of 2026. zasocitinib just cleared Phase 3. the first AI-designed drug to do it. zasocitinib was discovered by nimbus therapeutics using @schrodinger's physics-based simulation platform. the model understood molecular interactions from first principles. it validates one specific thesis: AI works in biology when the search space is large, the functional assay is clear, and you can close the loop between model and experiment. @IsomorphicLabs’ bet is that this same first-principles approach scales across the entire drug discovery stack. multiple therapeutic areas. multiple modalities. from structure prediction all the way to the clinic. zasocitinib is the closest validation point this thesis has ever had. it is also a reminder that one layer working does not mean the whole stack works. the 90% Phase 2/3 failure rate has not moved. the virtual cell is unproven. the hardest layers are still ahead. biology does not digitise uniformly. zasocitinib is the first proof. isomorphic is the next wager.
English
8
16
174
20.2K
GAMA Miguel Angel 🐦‍⬛🔑
@danofer That’s interesting. Weird in what sense? For example, something like showing a double descent behavior when training a PLM? Or weird in the sense that a smaller PLM shows better performance than its larger version?
English
1
0
0
8
Dan Ofer (Was @ICML,@Worldcon )
@miangoar BTW, I've added a pile of related benchmarks recently and it smells like something worth a paper on it's own in terms of the behavior being... weird. (I need extra hours in the night first). Happy to chat!
English
1
0
2
87
Danny Williams-Jones
Danny Williams-Jones@DannyWilliamsJ2·
@miangoar Okay so turns out it's already clustered, 901k genomes are represented by 199,923 species clusters, so they only provide assemblies and FAAs for those guys. I've added taxonomy headers to the FASTAs and will upload to Zenodo, separated by phylum
English
2
0
1
37
sk
sk@compchemm·
@MartinPacesa @miangoar The math is already done and it’s pretty brutal. 3-helical > 4-helical > everything else.
sk tweet media
English
2
0
2
210
GAMA Miguel Angel 🐦‍⬛🔑
Here I briefly discussed a few notable works on the interpretability of protein language models (ESM2 ESMFold). Although I find this type of research very interesting, statements like these make me think a lot, even more so coming from people like Mohammed youtu.be/PvMNlxZv_Bg?si…
YouTube video
YouTube
GAMA Miguel Angel 🐦‍⬛🔑 tweet media
Romain Lacombe@rlacombe

“In the long run, interpretability is hopeless. Physics being reducible to equations was an exception, there is no reason for the world to be comprehensible to our puny brains.” – @MoAlQuraishi #ICLR2024

English
1
1
4
421
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
Noelia Ferruz
Noelia Ferruz@ferruz_noelia·
Generative AI models have transformed protein design, but have they helped us understand proteins better? In our perspective, we review XAI for protein language models and ask how we can open these black boxes and extract scientific knowledge from them. Read our manuscript now out at @NatMachIntell!: nature.com/articles/s4225… (with Twitter-less Andrea Hunklinger :))
English
3
32
172
16.5K
GAMA Miguel Angel 🐦‍⬛🔑
“entire PDB archive is conservatively estimated at ∼US$20B, assuming an average cost of ∼US$100K for regenerating each experimental structure” academic.oup.com/nar/article/51… For Uniprot, the annual economic value is estimated between €332M - €524M ebi.ac.uk/about/news/ann…
GAMA Miguel Angel 🐦‍⬛🔑 tweet media
Shae McLaughlin@shae_mcl

It’s estimated that the Protein Data Bank (PDB) cost around $13B to create. Alphafold was only possible because of it. If we want ML to solve biology, we should be funding the creation of databases and the development of new assay technologies. ML is nothing without data.

English
2
1
26
3.2K
GAMA Miguel Angel 🐦‍⬛🔑
Corrigendum: well, properly, we are not directly watching the effect of scaling in that plot. However, the behavior remains consistent across epochs. eg, here you can see the effect on zero-shot fitness pred as a function of the number of parameters x.com/miangoar/statu…
GAMA Miguel Angel 🐦‍⬛🔑@miangoar

1/2 For zero-shot fitness prediction, protein language models hit a wall around Spearman correlation values of ~0.5. But this isn’t because the models are trash it’s because the protein universe is highly biased and PLMs overfit to the most abundant protein families.

English
0
0
1
117
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
Leonardo V. Castorina
Leonardo V. Castorina@DrLeucine·
Finally published in Bioinformatics "𝗙𝗿𝗼𝗺 𝗔𝘁𝗼𝗺𝘀 𝘁𝗼 𝗙𝗿𝗮𝗴𝗺𝗲𝗻𝘁𝘀: 𝗔 𝗖𝗼𝗮𝗿𝘀𝗲 𝗥𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗮𝗻𝗱 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗣𝗿𝗼𝘁𝗲𝗶𝗻 𝗗𝗲𝘀𝗶𝗴𝗻" doi.org/10.1093/bioinf… This is just the beginning :)
Leonardo V. Castorina tweet media
English
3
18
144
7.6K
GAMA Miguel Angel 🐦‍⬛🔑
Friendly reminder: if you want to read a paper from the David Baker Lab but it is behind a paywall, you can go to the Publications tab on their official website, where most papers are available as PDFs. I think this is a great practice for open science. bakerlab.org/publications/
GAMA Miguel Angel 🐦‍⬛🔑 tweet media
English
0
6
37
1.8K
Kevin K. Yang 楊凱筌
Kevin K. Yang 楊凱筌@KevinKaichuang·
I don't wanna Schmidhuber, but it does feel bad when somebody publishes a paper titled "Protein FID..." 2.5 years after your preprint introducing a "Frechet Protein distance" without any acknowledgment
English
7
0
111
11.4K
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
James Evans
James Evans@profjamesevans·
Out today in @ScienceMagazine: with the amazing Haochuan Cui, Yiling Lin, & @LingfeiWu, we analyzed 3.6 million scientists publishing 1960–2020. The findings reshape a century-old debate about age and scientific creativity.
English
7
91
273
75.8K
GAMA Miguel Angel 🐦‍⬛🔑 retweetledi
Mohammed AlQuraishi
Mohammed AlQuraishi@MoAlQuraishi·
Equivariance is dead😢 Or is it?😈 Genie 3 is out! Our latest protein design model yields SoTA results for binder design & motif scaffolding, greatly improving on BindCraft & Proteina-Complexa It does so using all-atom SE(3)-equivariance on a branched polymer representation👇
Yeqing Lin@lin_yeqing

Introducing Genie 3, a generative protein model that substantially advances the state-of-the-art for binder design, increasing in silico success rates by up to 20x on hard multimeric targets. It also debuts a form of inference-time scaling unobserved in other design models. 🧵1/8

English
7
32
215
22.7K