GAMA Miguel Angel 🐦‍⬛🔑

4.4K posts

GAMA Miguel Angel 🐦‍⬛🔑

@miangoar

Biologist that navigates in the oceans of diversity through the space-time | MSc in Biochem/Bioinfo @ibt_unam 🇲🇽 | Protein evo, metagenomics & AI/ML/DL

Mexico Katılım Kasım 2020

1.2K Takip Edilen1.9K Takipçiler

Sabitlenmiş Tweet

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·20 Nis

1/12🧵Do you want to learn how to design proteins using AI but don’t know anything about bio? I created a free 10-lesson course on YouTube. It’s now available in Spanish (original) and English (autodubbing w/Kokoro 82M). Here’s an overview of the topics covered in each lecture :)

English

2.2K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Andrew White 🐦‍⬛@andrewwhite01·23h

hallucinated references will land you a 1-year ban from arxiv now. wow

English

364

3.4K

198.3K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Natasha Malpani 👁@natashamalpani·2d

isomorphic labs just raised $2.1 billion. no human patient has been dosed yet. the first clinical trials are expected end of 2026. zasocitinib just cleared Phase 3. the first AI-designed drug to do it. zasocitinib was discovered by nimbus therapeutics using @schrodinger's physics-based simulation platform. the model understood molecular interactions from first principles. it validates one specific thesis: AI works in biology when the search space is large, the functional assay is clear, and you can close the loop between model and experiment. @IsomorphicLabs’ bet is that this same first-principles approach scales across the entire drug discovery stack. multiple therapeutic areas. multiple modalities. from structure prediction all the way to the clinic. zasocitinib is the closest validation point this thesis has ever had. it is also a reminder that one layer working does not mean the whole stack works. the 90% Phase 2/3 failure rate has not moved. the virtual cell is unproven. the hardest layers are still ahead. biology does not digitise uniformly. zasocitinib is the first proof. isomorphic is the next wager.

English

174

20.2K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

@danofer That’s interesting. Weird in what sense? For example, something like showing a double descent behavior when training a PLM? Or weird in the sense that a smaller PLM shows better performance than its larger version?

English

Dan Ofer (Was @ICML,@Worldcon )@danofer·5d

@miangoar BTW, I've added a pile of related benchmarks recently and it smells like something worth a paper on it's own in terms of the behavior being... weird. (I need extra hours in the night first). Happy to chat!

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·5d

I think this plot from the ProteinBERT paper clearly illustrates this. But if you want to see a more detailed analysis showing how scaling appears to improve only structure-related tasks, check out this other paper. Feature Reuse and Scaling x.com/KevinKaichuang…

multimodali@multimodali

English

4.1K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

@DannyWilliamsJ2 Thank you so much again Danny :)

English

Danny Williams-Jones@DannyWilliamsJ2·1 May

@miangoar Okay so turns out it's already clustered, 901k genomes are represented by 199,923 species clusters, so they only provide assemblies and FAAs for those guys. I've added taxonomy headers to the FASTAs and will upload to Zenodo, separated by phylum

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·29 Nis

The Genome Taxonomy Database (GTDB R11-RS232) has been updated and now contains ~900k microbial genomes representing ~200k species! 🔥 I don’t know how many proteins this corresponds to, but the gtdb_proteins_aa_reps file is around 123 GB! That’s a lot! forum.gtdb.ecogenomic.org/t/announcing-g…

English

914

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

@compchemm @MartinPacesa This looks very interesting. What paper or preprint are these plots from?

English

sk@compchemm·28 Nis

@MartinPacesa @miangoar The math is already done and it’s pretty brutal. 3-helical > 4-helical > everything else.

English

210

Martin Pacesa@MartinPacesa·27 Nis

ZXX

145

7.2K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

Here I briefly discussed a few notable works on the interpretability of protein language models (ESM2 ESMFold). Although I find this type of research very interesting, statements like these make me think a lot, even more so coming from people like Mohammed youtu.be/PvMNlxZv_Bg?si…

YouTube

Romain Lacombe@rlacombe

“In the long run, interpretability is hopeless. Physics being reducible to equations was an exception, there is no reason for the world to be comprehensible to our puny brains.” – @MoAlQuraishi #ICLR2024

English

421

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

Is this the largest funding ever raised by a biotech startup? AFAIK, Xaira Therapeutics, co-founded by David Baker, raised $1B USD.

Isomorphic Labs@IsomorphicLabs

Today we announced our $2.1 Billion funding. This is the catalyst that will bring us to a future of medicine with the power of AI. And we are just getting started, come join our mission.

English

382

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Noelia Ferruz@ferruz_noelia·2d

Generative AI models have transformed protein design, but have they helped us understand proteins better? In our perspective, we review XAI for protein language models and ask how we can open these black boxes and extract scientific knowledge from them. Read our manuscript now out at @NatMachIntell!: nature.com/articles/s4225… (with Twitter-less Andrea Hunklinger :))

English

172

16.5K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Xinming Tu@TuXinming·24 Nis

After our posts on the AlphaGenome PyTorch port + JAX/Haiku finetuning, a lot of people asked the same two things: "how much GPU memory do I need?" "and how long will each step take?" so we(with @abuen_dia ,@anshulkundaje @sara_mostafavi ) ran the numbers. 🧬⚙️ 👉 genomicsxai.github.io/blogs/2026-005/

English

16.1K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·5d

@celulacecedista international taxes 🤪

English

Celula Cedista@celulacecedista·5d

@miangoar UNIPROT has an income stream??? Of what?

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·5d

“entire PDB archive is conservatively estimated at ∼US$20B, assuming an average cost of ∼US$100K for regenerating each experimental structure” academic.oup.com/nar/article/51… For Uniprot, the annual economic value is estimated between €332M - €524M ebi.ac.uk/about/news/ann…

Shae McLaughlin@shae_mcl

It’s estimated that the Protein Data Bank (PDB) cost around $13B to create. Alphafold was only possible because of it. If we want ML to solve biology, we should be funding the creation of databases and the development of new assay technologies. ML is nothing without data.

English

3.2K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Raktim Mitra@Raktim7879·28 Nis

On a side note, Me and @andrew_favor have extended RFdiffusion3 to enable nucleic acid codesign. Code and weights are out at github.com/RosettaCommons…

GIF

English

6.3K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·5d

Corrigendum: well, properly, we are not directly watching the effect of scaling in that plot. However, the behavior remains consistent across epochs. eg, here you can see the effect on zero-shot fitness pred as a function of the number of parameters x.com/miangoar/statu…

GAMA Miguel Angel 🐦‍⬛🔑@miangoar

1/2 For zero-shot fitness prediction, protein language models hit a wall around Spearman correlation values of ~0.5. But this isn’t because the models are trash it’s because the protein universe is highly biased and PLMs overfit to the most abundant protein families.

English

117

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Leonardo V. Castorina@DrLeucine·6d

Finally published in Bioinformatics "𝗙𝗿𝗼𝗺 𝗔𝘁𝗼𝗺𝘀 𝘁𝗼 𝗙𝗿𝗮𝗴𝗺𝗲𝗻𝘁𝘀: 𝗔 𝗖𝗼𝗮𝗿𝘀𝗲 𝗥𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗮𝗻𝗱 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗣𝗿𝗼𝘁𝗲𝗶𝗻 𝗗𝗲𝘀𝗶𝗴𝗻" doi.org/10.1093/bioinf… This is just the beginning :)

English

144

7.6K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·6d

Friendly reminder: if you want to read a paper from the David Baker Lab but it is behind a paywall, you can go to the Publications tab on their official website, where most papers are available as PDFs. I think this is a great practice for open science. bakerlab.org/publications/

English

1.8K

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·6d

@KevinKaichuang I suggest to use a new term for this phenomenon: "Yangified"

English

Kevin K. Yang 楊凱筌@KevinKaichuang·30 Nis

I don't wanna Schmidhuber, but it does feel bad when somebody publishes a paper titled "Protein FID..." 2.5 years after your preprint introducing a "Frechet Protein distance" without any acknowledgment

English

111

11.4K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

James Evans@profjamesevans·8 May

Out today in @ScienceMagazine: with the amazing Haochuan Cui, Yiling Lin, & @LingfeiWu, we analyzed 3.6 million scientists publishing 1960–2020. The findings reshape a century-old debate about age and scientific creativity.

English

273

75.8K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Mohammed AlQuraishi@MoAlQuraishi·8 May

Equivariance is dead😢 Or is it?😈 Genie 3 is out! Our latest protein design model yields SoTA results for binder design & motif scaffolding, greatly improving on BindCraft & Proteina-Complexa It does so using all-atom SE(3)-equivariance on a branched polymer representation👇

Yeqing Lin@lin_yeqing

Introducing Genie 3, a generative protein model that substantially advances the state-of-the-art for binder design, increasing in silico success rates by up to 20x on hard multimeric targets. It also debuts a form of inference-time scaling unobserved in other design models. 🧵1/8

English

215

22.7K

GAMA Miguel Angel 🐦‍⬛🔑 retweetledi

Harris Wang@harriswangnyc·1 May

1/ Excited to share our new paper in Science: “Toward life with a 19-amino acid alphabet through generative artificial intelligence design.” @ColumbiaSysBio @ColumbiaBME @Columbia science.org/doi/10.1126/sc… 🦠🧬🛠️🖥️💥

English

153

539

82.5K

Keşfet

@schrodinger @IsomorphicLabs @danofer @DannyWilliamsJ2 @compchemm @MartinPacesa @NatMachIntell @abuen_dia