Grant Rotskoff

53 posts

Grant Rotskoff

Grant Rotskoff

@grantrotskoff

Building molecular intelligence @ Stanford

San Francisco Katılım Eylül 2025
126 Takip Edilen395 Takipçiler
Grant Rotskoff
Grant Rotskoff@grantrotskoff·
The design platform is built using @modal for the backend, so generate IDRs until I run out of API credits $$ @charles_irl
English
2
0
4
335
Grant Rotskoff retweetledi
Michael Albergo
Michael Albergo@msalbergo·
New paper! Presenting Discrete Flow Maps: paper: arxiv.org/abs/2604.09784 blog: malbergo.me/discrete-flow-… A laughable problem for me these days is that @nmboffi and I share a research brain, and we have had, time and again, a conversation that ends with “ha so I guess we’re writing the same paper.” Soon we will return to just doing it together :). Here we are doing it again with discrete flow maps and flow language models! A complete and thorough paper led by @PPotaptchik @json_yim @adhisarav @peholderrieth. We took a bit of time to post it to ensure we understood a few more things about the stability of the loss functions. Like @osclsd , @FEijkelboom, and @nmboffi , we think this could be a very helpful paradigm for thinking about fast inference and even better alignment! Here’s our version of the story, and I hope it makes clear how green field this research direction is — we provide a comprehensive picture of the KL losses you can write from the properties of the flow map, some nice geometric proofs about the mean denoiser and the simplex, and find that at this time, the ESD can actually be the most performant, with some caveats. Excited for everyone to work together and push this class of models to their limit!
GIF
Michael Albergo tweet media
English
10
57
383
71K
Grant Rotskoff retweetledi
Michael Albergo
Michael Albergo@msalbergo·
Every day I feel like this app inches closer and closer to just full AI slop
Jorge Bravo Abad@bravo_abad

Flow matching is emerging as a unifying framework for generative biology Biology is full of mappings between states: a healthy cell turning diseased, amino acids folding into a functional protein, a ligand docking into its target. Deriving such transformations analytically is intractable—which is where generative AI steps in, and flow matching is quickly becoming its backbone. Morehead and coauthors review how flow matching (FM) is reshaping generative modeling in bioinformatics. Unlike diffusion models, FM doesn't force the source distribution to be Gaussian: it learns a time-dependent vector field that transports samples between any two distributions along straight-line, optimal-transport paths. The payoff: fewer inference steps, simulation-free training, and built-in support for geometric priors like SE(3) equivariance—essential for 3D biomolecules. What's striking is how fast FM has spread across biological scales. For molecules, FoldFlow, FrameFlow, and Multiflow generate protein backbones on SE(3)ᴺ manifolds, SemlaFlow produces valid small molecules up to 100× faster than diffusion, and Dirichlet FM handles discrete DNA/RNA sequences. FlowDock and NeuralPLexer3 predict protein–ligand complexes that match or exceed AlphaFold 3 on key benchmarks, while AlphaFlow and MDGen generate conformational ensembles and MD trajectories. At the cellular scale, CellFlow and Meta FM map unperturbed populations to perturbed states, and CryoFM and FlowSDF extend FM to cryo-EM and microscopy. The deeper point: FM subsumes diffusion models, continuous normalizing flows, and optimal transport as special cases, providing scaffolding for an AI-based virtual cell—simulating molecular, structural, and phenotypic effects of perturbations across scales. Overall, this signals a shift in what's computationally tractable. Instead of narrow, stage-specific models, FM points to unified conditional generators that design sequences, predict complexes, and model perturbation responses in one framework—shortening wet-lab cycles and making closed-loop, active-learning workflows practical. Paper: Morehead and coauthors, Nature Machine Intelligence (2026) — Journal license | doi.org/10.1038/s42256…

English
5
4
160
42.1K
Grant Rotskoff retweetledi
Peter Holderrieth
Peter Holderrieth@peholderrieth·
We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: arxiv.org/abs/2602.05993 Code: github.com/PeterHolderrie… Huge thanks to an amazing team: Douglas Chen, @LucaEyring, @ishin_shah, Giri Anantharaman, @electronickale, @zeynepakata, Tommi Jaakkola, @nmboffi, and @max_simchowitz. It was awesome bringing this to life together!
English
2
43
243
56K
Grant Rotskoff
Grant Rotskoff@grantrotskoff·
Super excited to work with @SoojungYang2 and @FutureHouseSF on this ambitious project!
Sam Rodriques@SGRodriques

Soojung Yang @SoojungYang2 previously created approaches to identify rare protein conformational transitions and, in collaboration with Microsoft Research, to efficiently sample equilibrium ensembles at scale. As a FutureHouse Fellow with Grant Rotskoff @grantrotskoff, she will build machine learning models that unify protein structure, thermodynamics, and kinetics, and deploy agentic AI to search variant space and enable biochemistry-informed protein optimization.

English
2
2
37
6.6K
Grant Rotskoff retweetledi
Nicholas Boffi
Nicholas Boffi@nmboffi·
@lowerbad really nice work! but are you sure it's the first?
English
1
1
35
1.1K
Grant Rotskoff retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Generative design of intrinsically disordered protein regions with IDiom 1. The paper introduces IDiom, a 122M-parameter autoregressive (decoder-only) protein language model trained specifically for intrinsically disordered regions (IDRs), aiming to make rational design possible in a regime where structure-based generative methods do not apply. 2. Key technical idea: fill-in-the-middle training for proteins with explicit tokens that separate N-terminal context, C-terminal context, and the IDR span (, , ). This enables conditional generation of an IDR that fits into a chosen structured protein context, not just unconditional sampling. 3. Training data scale: 37 million IDRs curated from AlphaFold DB v4 using low pLDDT as a disorder proxy (plus filtering and clustering at 90% identity). They augment to 74 million sequences by also creating “context-deleted” records to train unprompted generation of fully disordered proteins (IDPs). 4. Generated sequences are diverse yet IDR-like: maximum identity to the training IDR set broadly peaks around ~60% (not memorized), length distributions match natural IDRs (mostly <100 aa with a tail to ~300 aa), and amino-acid composition recapitulates known disorder biases (e.g., enriched Pro/Ser; depleted bulky hydrophobics and aromatics vs folded CATH domains). 5. Disorder is maintained by structure prediction checks: ColabFold/AlphaFold pLDDT distributions for generated sequences closely resemble curated AFDB IDRs and experimentally validated DisProt IDRs, both for standalone IDPs and for generated IDRs evaluated within full-protein context. 6. IDiom learns “IDR grammar”, not just composition: generations reproduce natural distributions of (i) fraction of charged residues (FCR), (ii) charge patterning/blockiness (κ), (iii) hydropathy patterning (SHD), and (iv) low complexity (SEG). These metrics separate generated IDRs from folded CATH domains and align them with DisProt statistics. 7. Conditioning matters: DisProt-context–prompted generations are consistently closer to DisProt IDRs than unprompted IDPs across multiple metrics (quantified via Wasserstein-1 distances), supporting in-context learning of context-appropriate IDR features. 8. Case study (NPM1): when prompted with NPM1 flanks, IDiom generates many low-identity IDRs that still reproduce the functional charge-block architecture (κ near WT; alternating NCPR blocks), suggesting it can preserve biophysically relevant patterning without copying sequence. 9. Post-training via reinforcement learning: the authors steer IDiom with GRPO (with DAPO modification) using ProtGPS as a reward model for subcellular localization (nucleolus, chromosomes/chromatin, P-bodies, stress granules). Regularization includes KL-to-base-model, target entropy (to avoid collapse), and target length. 10. RL-induced features are biologically interpretable while staying disordered: nucleolus-targeting sequences become Lys/Arg-rich and show higher κ; chromosome-targeting sequences become Ser/Thr-rich and show strong enrichment of ELM PTM motifs; P-body and stress-granule targeting sequences enrich RNA-interaction motifs (RG/RGG, F/YGG, SYG). Importantly, generated sequences remain low-pLDDT, indicating the policy does not drift toward folded-domain priors. 💻Code: github.com/rotskoff-group… 📜Paper: biorxiv.org/content/10.648… #ComputationalBiology #ProteinDesign #IntrinsicallyDisorderedProteins #ProteinLanguageModels #Transformers #ReinforcementLearning #PhaseSeparation #SubcellularLocalization #SyntheticBiology
Biology+AI Daily tweet media
English
1
4
11
1.5K
Grant Rotskoff retweetledi
Nicholas Boffi
Nicholas Boffi@nmboffi·
🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation. read about it in the blog: one-step-lm.github.io/blog/ full details in the paper: arxiv.org/abs/2602.16813 we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation. we beat all discrete diffusion baselines at ~8x speed! v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps. beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale. we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language. we’re extremely excited about the future of this class of models. stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀
English
13
90
472
72.5K
Grant Rotskoff retweetledi
vitrupo
vitrupo@vitrupo·
Chris Manning says Yann LeCun sees language as a low bandwidth communication channel compared to vision. But the gap between a chimp and a human wasn’t produced by superior eyes. What took off for humans was language. Not just for communication, but as a cognitive tool.
English
81
126
1.1K
117.1K
Grant Rotskoff retweetledi
Grant Rotskoff retweetledi
Soojung Yang
Soojung Yang@SoojungYang2·
AI for Science workshop @ ICML is happening in Seoul 🇰🇷 this summer! Expect 🔥 discussion on AI Scientists. We’re looking for area chairs and reviewers. Please consider signing up! We welcome contributions from all areas of AI for Science (not limited to AI scientists).
AI for Science@AI_for_Science

😊We would like to invite researchers to join our 7th AI for Science workshop @icmlconf as reviewers or ACs. Thank you all for your support to AI for Science. Hope to meet more people in Seoul this summer! AC: docs.google.com/forms/d/e/1FAI… Reviewer: docs.google.com/forms/d/e/1FAI…

English
0
9
99
17.8K
Grant Rotskoff retweetledi
Kresten Lindorff-Larsen
Kresten Lindorff-Larsen@LindorffLarsen·
We are hiring a postdoc in computational biophysics and machine learning studies of intrinsically disordered proteins We aim to study the function of IDPs by combining CG MD, ML and bioinformatics in collaboration with Tanja Mittag and Rasmus Hartmann-Petersen
Kresten Lindorff-Larsen tweet media
English
1
32
147
10K