Courtney Shearer

217 posts

Courtney Shearer banner
Courtney Shearer

Courtney Shearer

@c_sheare

Visiting at GDM Genomics Harvard PhD SSQB @DeboraMarksLab Turtles all the way down

Cambridge, MA Katılım Ekim 2019
1.1K Takip Edilen321 Takipçiler
Courtney Shearer retweetledi
Sasha Gusev
Sasha Gusev@SashaGusevPosts·
Another Claude project: a static site that pulls in GWAS SNP data from ensemble, multiple public biobanks, open targets, gtex, eqtl catalog, and OMIM.
Sasha Gusev tweet media
English
5
27
159
10.8K
Courtney Shearer retweetledi
Jorge Bravo Abad
Jorge Bravo Abad@bravo_abad·
Mapping 3.4 billion gene circuit designs with AI Designing synthetic gene circuits is like tuning a complex instrument in the dark. You have dozens of genetic parts—promoters, transcription factors, binding motifs—that must work together precisely, yet each combination behaves unpredictably due to context-dependent molecular interactions. Traditional approaches test circuits one at a time, making optimization painfully slow. Kshitij Rai and coauthors just changed the rules of the game. Their platform, CLASSIC (Combining Long- And Short-range Sequencing to Investigate genetic Complexity), combines Nanopore and Illumina sequencing to profile over 100,000 multi-kilobase gene circuit designs in a single experiment—then uses machine learning to predict the behavior of billions more. The workflow is elegant: pooled DNA assembly with barcodes, long-read sequencing to index composition-to-barcode mappings, phenotypic sorting in human cells, and short-read sequencing to link barcodes to function. The result? Quantitative expression data for 121,000 single-input circuits and 128,000 dual-input circuits, used to train neural networks that predict circuit behavior with r² values of 0.86–0.90. The insights are remarkable. High-fold-change circuits don't emerge from a single "optimal" design but from multiple balanced combinations of medium-activity components. AND-gate logic requires clustered transcription factor binding sites; OR-gates need interspersed patterns. These rules were invisible before—now they're learnable from data. The message: by scaling the design-build-test-learn cycle by orders of magnitude and combining it with ML, we can finally navigate genetic design spaces too vast for human intuition, accelerating everything from metabolic engineering to cell therapies. Paper: nature.com/articles/s4158…
Jorge Bravo Abad tweet media
English
4
58
288
28.3K
Courtney Shearer retweetledi
hardmaru
hardmaru@hardmaru·
One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization. We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.
Sakana AI@SakanaAILabs

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings pub.sakana.ai/DroPE/ We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning. The core insight of this work challenges a fundamental assumption in Transformer architecture. We discovered that explicit positional embeddings like RoPE are critical for training convergence but eventually become the primary bottleneck preventing models from generalizing to longer sequences. Our solution is radically simple: We treat positional embeddings as a temporary training scaffold rather than a permanent architectural necessity. Real-world workflows like reviewing massive code diffs or analyzing legal contracts require context windows that break standard pretrained models. While models without positional embeddings (NoPE) generalize better to these unseen lengths, they are notoriously unstable to train from scratch. Here, we achieve the best of both worlds by using embeddings to ensure stability during pretraining and then dropping them to unlock length extrapolation during inference. Our approach unlocks seamless zero-shot context extension without any expensive long-context training. We demonstrated this on a range of off-the-shelf open-source LLMs. In our tests, recalibrating any model with DroPE requires less than 1% of the original pretraining budget, yet it significantly outperforms established methods on challenging benchmarks like LongBench and RULER. We have released the code and the full paper to encourage the community to rethink the role of positional encodings in modern LLMs. Paper: arxiv.org/abs/2512.12167 Code: github.com/SakanaAI/DroPE

English
49
241
2.5K
345.7K
Courtney Shearer retweetledi
Aviv Spinner
Aviv Spinner@AvivSpinner·
Deep learning is cool...but have you tried ✨logistic regression✨? Using just one round of sorting, these models predict affinity boosts and design nanobodies up to ~2500× better. Great work Steffanie, Eddie & team! 🧠🧬 biorxiv.org/content/10.648…
English
1
20
86
7.3K
Courtney Shearer retweetledi
Alan Amin
Alan Amin@AlanNawzadAmin·
If you want to do diffusion on discrete data, you have three choices: discrete, Gaussian, or simplicial. How are they related? Which should you use? We theoretically unify all three and train one model to do them all! @AlinaChandra @yucenlily @alex4ali @andrewgwils 1/7
Alan Amin tweet media
English
11
60
396
60.2K
Christian Steinmetz
Christian Steinmetz@csteinmetz1·
I’ve been at NeurIPS this past week. Here’s six things I learned: 1. Basically everyone is doing RL 2. And anyone who isn’t doing RL is doing a startup for data (usually for RL) 3. Diffusion LMs are popular enough that you now have to clarify discrete or continuous 4. I joined for the NeurIPS run. Turns out Jeff Dean is a really fast runner 5. AI music isn’t a niche topic anymore, the workshop was standing room only 6. Some people knew Suno, many did not, but a few people showed me full musicals they made with Suno Can’t wait until the next one.
English
32
50
917
78.2K
Courtney Shearer
Courtney Shearer@c_sheare·
Huge paper led by @roseorenbuch ! Glad to have contributed. We must find variants responsible for undiagnosed, rare genetic diseases. popEVE could help clinicians diagnose single-variant genetic diseases with life saving speed and accuracy.
Debora Marks@deboramarks

New paper “Proteome-wide model for human disease genetics” is now live at Nature Genetics: rdcu.be/eRu7K popEVE (pop.evemodel.org) finds the needles in the haystacks of human genetic variation:

English
0
0
6
641
Courtney Shearer retweetledi
Debora Marks
Debora Marks@deboramarks·
Announcing our new protein design server evedesign.bio: • End-to-end protein design for everyone! • Analyze your generated library interactively and on 3D structures • Export codon-optimized DNA sequences for experimental testing. Developed in collaboration between @deboramarks, @thomas_a_hopf, @SteineggerM, Simon d'Oelsnitz, Chris Sander, Artem Gazizov,@haysunny_hi, Milot Mirdita, Sergio Garcia Busto, Jake Reardon
GIF
English
3
78
376
38.5K
Courtney Shearer retweetledi
Sarah Gurev
Sarah Gurev@sarahgurev·
🚨New paper 🚨 Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇 1/12
Sarah Gurev tweet media
English
1
39
161
26.1K
Courtney Shearer retweetledi
Aviv Spinner
Aviv Spinner@AvivSpinner·
1/5 Biological data is noisy, redundant, and ever-growing. 🗣️ In our new paper (first paper of my post doc!! ⚡️), we track model performance across 14 years of UniRef100 snapshots to ask: how does pLM performance scale with training data?
Aviv Spinner tweet media
English
1
23
105
11.5K
Courtney Shearer retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Today marks my last day at @Dyno_Tx. The past three months have been a wonderful first experience in industry, and I’m so grateful for the knowledge I've gained and the support I've received. I would like to extend a heartfelt thank you to everyone at Dyno, especially the ML Research Team and my manager, Kathy, for the guidance and kindness. I am also deeply appreciative of Dyno's help in securing my visa situation during this challenging time for international students. I couldn't have imagined a better internship experience. To anyone in the ML x Bio space, I highly recommend Dyno; they are also actively recruiting for a full-time ML Scientist. Wishing the entire team all the best, and thanks again for everything 🙂
English
0
3
44
3.3K
Courtney Shearer retweetledi
Alan Amin
Alan Amin@AlanNawzadAmin·
We can make population genetics studies more powerful by building priors of variant effect size from features like binding. But we’ve been stuck on linear models! We introduce DeepWAS to learn deep priors on millions of variants! #ICML2025 Andres Potapczynski, @andrewgwils 1/7
Alan Amin tweet media
English
1
5
48
8.1K