Jeff Wintersinger

1.8K posts

Jeff Wintersinger

@jwintersinger

Senior Research Scientist in comp bio at @deepgenomics. Did PhD work on cancer evolution. My three favourite things are 🍌s, riding my 🚴fast, and vim.

Toronto, ON Katılım Mayıs 2009

5.5K Takip Edilen1K Takipçiler

Jeff Wintersinger retweetledi

Ankit Gupta@agupta·4 May

AI for Bio is hot again. Given that, I wrote a primer on why this field is so hard. tl;dr it's because the APIs are fuzzier than you might think. ankitg.me/blog/2026/05/0…

English

428

41.4K

Jeff Wintersinger@jwintersinger·18 Nis

Super excited to check out Orthrus!

Phil Fradkin@phil_fradkin

Excited to share Orthrus is now published in Nature Methods! This was a work from our PhDs in which we showed 3 things: - There's lots of room for new biologically grounded self-supervised objectives - The "y - intercept" in scaling is important! We show that representations from 10 million parameter Orthrus outperform a 7 billion parameter model, 700 its size. - Orthrus works in the low-data regime where data acquisition is especially expensive: low throughput experimental data and clinical trials Ian and I are now building BlankBio to apply these ideas at a bigger scale. I'm going to be at AACR get in touch if you want to chat!

English

101

Jeff Wintersinger retweetledi

Heng Li@lh3lh3·14 Oca

I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see hlilab.github.io/vacancies. RTs appreciated!

English

109

194

21.4K

Jeff Wintersinger retweetledi

Peter Koo@pkoo562·2 Oca

Mark your calendars: The AI x Bio meeting of 2026 will be held at CSHL on May 26-31! The program brings together 50+ invited leaders in genomics, transcriptomics, protein design, drug discovery, neuroAI, pathology, agentic AI, and more! Abstract: 3/26 meetings.cshl.edu/meetings.aspx?…

English

104

337

33.6K

Jeff Wintersinger retweetledi

bioRxiv Genomics@biorxiv_genomic·20 Ara

Enigma: An Efficient Model for Deciphering Regulatory Genomics biorxiv.org/content/10.648… #biorxiv_genomic

Català

928

Jeff Wintersinger retweetledi

antisense.@razoralign·21 Ara

Enigma: An Efficient Model for Deciphering Regulatory Genomics biorxiv.org/content/10.648…

Català

1.8K

Jeff Wintersinger retweetledi

Peter Koo@pkoo562·11 Eyl

2025 MLCB day 2 is staring now! Streaming live now! m.youtube.com/watch?v=PxlXNb…

Peter Koo@pkoo562

2025 Machine Learning in Computational Biology (#MLCB) meeting starts TODAY (9/10) at 9:30 (EST)! We have a great lineup of keynotes, contributed talks, and posters today and tomorrow! Schedule: mlcb.org/schedule Join for free via livestream: @mlcbconf" target="_blank" rel="nofollow noopener">m.youtube.com/@mlcbconf

English

2.3K

Jeff Wintersinger retweetledi

j⧉nus@repligate·11 Eyl

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through layers at each position - The K/V stream (purple arrows): Flows horizontally across positions at each layer (by positions, I mean copies of the network for each token-position in the context, which output the "next token" probabilities at the end) At each layer at each position: 1. The incoming residual stream is used to calculate K/V values for that layer/position (purple circle) 2. These K/V values are combined with all K/V values for all previous positions for the same layer, which are all fed, along with the original residual stream, into the attention computation (blue box) 3. The output of the attention computation, along with the original residual stream, are fed into the MLP computation (fuchsia box), whose output is added to the original residual stream and fed to the next layer The attention computation does the following: 1. Compute "Q" values based on the current residual stream 2. use Q and the combined K values from the current and previous positions to calculate a "heat map" of attention weights for each respective position 3. Use that to compute a weighted sum of the V values corresponding to each position, which is then passed to the MLP This means: - Q values encode "given the current state, where (what kind of K values) from the past should I look?" - K values encode "given the current state, where (what kind of Q values) in the future should look here?" - V values encode "given the current state, what information should the future positions that look here actually receive and pass forward in the computation?" All three of these are huge vectors, proportional to the size of the residual stream (and usually divided into a few attention heads). The V values are passed forward in the computation without significant dimensionality reduction, so they could in principle make basically all the information in the residual stream at that layer at a past position available to the subsequent computations at a future position. V does not transmit a full, uncompressed record of all the computations that happened at previous positions, but neither is an uncompressed record passed forward through layers at each position. The size of the residual stream, also known as the model's hidden dimension, is the bottleneck in both cases. Let's consider all the paths that information can take from one layer/position in the network to another. Between point A (output of K/V at layer i-1, position j-2) to point B (accumulated K/V input to attention block at layer i, position j), information flows through the orange arrows: The information could: 1. travel up through attention and MLP to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 2 positions]. 2. be retrieved at (i-1, j-1) [RIGHT 1 position], travel up to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 1 position] 3. be retrieved at (i-1, j) [RIGHT 2 positions], then travel up to (i, j) [UP 1 layer]. The information needs to move up a total of n=layer_displacement times through the residual stream and right m=position_displacement times through the K/V stream, but it can do them in any order. The total number of paths (or computational histories) is thus C(m+n, n), which becomes greater than the number of atoms in the visible universe quickly. This does not count the multiple ways the information can travel up through layers through residual skip connections. So at any point in the network, the transformer not only receives information from its past (both horizontal and vertical dimensions of time) inner states, but often lensed through an astronomical number of different sequences of transformations and then recombined in superposition. Due to the extremely high dimensional information bandwidth and skip connections, the transformations and superpositions are probably not very destructive, and the extreme redundancy probably helps not only with faithful reconstruction but also creates interference patterns that encode nuanced information about the deltas and convergences between states. It seems likely that transformers experience memory and cognition as interferometric and continuous in time, much like we do. The transformer can be viewed as a causal graph, a la Wolfram (wolframphysics.org/technical-intr…). The foliations or time-slices that specify what order computations happen could look like this (assuming the inputs don't have to wait for token outputs), but it's not the only possible ordering: So, saying that LLMs cannot introspect or cannot introspect on what they were doing internally while generating or reading past tokens in principle is just dead wrong. The architecture permits it. It's a separate question how LLMs are actually leveraging these degrees of freedom in practice.

j⧉nus@repligate

KV caching overcomes statelessness in a very meaningful sense and provides a very nice mechanism for introspection (specifically of computations at earlier token positions) the Value representations can encode information from residual streams of past positions without significant compression bottlenecks before they're added to residual streams of future positions the greatest constraint here imo is that it doesn't provide longer *sequential* computational paths that route through previous states, but it does provide a vast number of parallel computational paths that carry high dimensional (proportional to the model's hidden dimension) stored representations from all earlier layers/positions yes, some of the information in intermediate computations e.g. in the MLP is compressed and cannot be reconstructed fully, but that's just how any reasonable brain works if accurate introspection of previous states is incentivized at all, you should expect this mechanism to be exploited for that. and I think it definitely is, like, being able to accurately model your past beliefs and intentions and articulate them truthfully is pretty fucking useful for coordinating with yourself across time and doing useful cognitive work over multiple timesteps; hell, it's useful for writing fucking rhyming poems. also if you have interacted with models you may observe empirically that introspective reporting yields remarkably consistent results, and this is more true of more capable models with skillful agentic posttraining, which are necessarily minds that intimately know the shape of themselves in motion.

English

102

428

3.4K

812K

Jeff Wintersinger retweetledi

Nadav Brandes@BrandesNadav·9 Eyl

Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵

English

112

480

75.8K

Jeff Wintersinger retweetledi

Jacob Schreiber@jmschreiber91·27 Ağu

In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance. While undoubtedly important, how we *use* these models after training is potentially even more important. tangermeme v1.0.0 is out now. Hope you find it useful!

English

8.2K

Jeff Wintersinger@jwintersinger·27 Ağu

An oldie, but my God is it ever a goodie.

Jacob Schreiber@jmschreiber91

The more papers I read for a review article I'm writing about ML pitfalls in genomics, the more my faith is shaken in the results from papers that apply machine learning to methylation arrays. A salty thread. 1/

English

238

Jeff Wintersinger retweetledi

Ian Shi@ianshi3·18 Ağu

Will scaling current nucleotide foundation models lead to a GPT-3 moment for biology? Probably not. Although models like Evo2 are impressive in many ways, their pre-training objectives can struggle to outperform supervised learning. I dive into why at: quietflamingo.substack.com/p/scaling-is-d…

English

5.3K

Jeff Wintersinger retweetledi

dr. jack morris@jxmnop·8 Ağu

curious about the training data of OpenAI's new gpt-oss models? i was too. so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were... pretty bizarre time for a deep dive 🧵

English

121

496

5.5K

Jeff Wintersinger retweetledi

Phil Fradkin@phil_fradkin·6 Ağu

The news is out! We're starting Blank Bio to build a computational toolkit assisted with RNA foundation models. If you want to see my flip between being eerily still and overly animated check out the video below! The core hypothesis is that RNA is the most customizable molecule in the world. It's fast to produce, inherently programmable, and highly adaptable to diverse use cases. It's an open secret amongst folks who've worked at Deep Genomics, and that's why almost everyone is still working on RNA applications. We're building out the next generation of tooling enabled by our foundation models and partnering with companies to accelerate their work. If you have a use-case big or small we'd love to chat, please reach out. The foundation models we're building stem from intuition acquired through years of research from our PhDs. We learned that effective biological models require deep understanding of data, specific architectures and loss functions that reflect the molecular processes shaping the data. Genomics metrics like conservation and constraint inform model development. One of the best parts about this is that I get to work with such talanted, kind, ambitious people. @hsu_jonny has spent the last five years defining what great ML+Bio products look like and consistently pushes us to adapt to customer needs with mission oriented clarity. @ianshi3 is the most thoughtful and creative scientist I've ever worked with, bringing a great sense of design to everything he does. We're on a mission to build something great and useful for RNA scientists around the world. blank dot bio

Y Combinator@ycombinator

Blank Bio (@blankbio_) is building foundation models to power a computational toolkit for RNA therapeutics, starting with mRNA design and expanding to target ID, biomarker discovery, and more. ycombinator.com/launches/O8Z-b… Congrats on the launch, @hsu_jonny, @phil_fradkin & @ianshi3!

English

174

21K

Jeff Wintersinger retweetledi

Josh Welch@LabWelch·4 Ağu

Our PerturbNet paper is the cover article for Molecular Systems Biology! The image depicts our genAI model that predicts how cellular perturbations—including chemicals, gene knockdown or overexpression, and protein mutation—shift single-cell gene expression. 🧵

English

228

24K

Jeff Wintersinger retweetledi

Ryan York@ryanayork·31 Tem

Biological foundation models have hit a plateau. Scaling isn't working as expected. Foundational concepts from evolutionary biology could have predicted this: 🧵 research.arcadiascience.com/pub/idea-phylo… [1/9]

English

213

21.5K

Jeff Wintersinger retweetledi

Eric Kernfeld@ekernf01·25 Tem

In October 2024, I that "something is deeply wrong" with what we now call virtual cell models. A lot has happened since then. How am I updating? 🧵 x.com/ekernf01/statu…

Eric Kernfeld@ekernf01

Today, I'm excited to present the second big chunk of my Ph.D. work. We are building a thorough assessment of the burgeoning field of gene expression forecasting. Something is deeply wrong.🧵 - Preprint: biorxiv.org/content/10.110… - Code: #a-systematic-comparison-of-computational-methods-for-expression-forecasting-with-pereggrn" target="_blank" rel="nofollow noopener">github.com/ekernf01/pertu…

English

14.6K

Jeff Wintersinger retweetledi

Phil Fradkin@phil_fradkin·15 Tem

We're excited to introduce our new work on mature mRNA property prediction, co-first authored with the amazing @ianshi3 and @Taykhoom_Dalal. We introduce mRNABench to standardize evaluation and present a study on building more efficient RNA foundation models. 🧵

Ian Shi@ianshi3

We're excited to release 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡, a new benchmark suite for mRNA biology containing 10 diverse datasets with 59 prediction tasks, evaluating 18 foundation model families. Paper: biorxiv.org/content/10.110… GitHub: github.com/morrislab/mRNA… Blog: blank.bio/post/mrnabench

English

Jeff Wintersinger retweetledi

Ian Shi@ianshi3·15 Tem

English

Jeff Wintersinger retweetledi

Keyon Vafa@keyonV·11 Tem

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

English

209

990

6.7K

1.4M

Keşfet

@hsu_jonny @ianshi3 @Taykhoom_Dalal @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates