Courtney Shearer

217 posts

Courtney Shearer

@c_sheare

Visiting at GDM Genomics Harvard PhD SSQB @DeboraMarksLab Turtles all the way down

Cambridge, MA Katılım Ekim 2019

1.1K Takip Edilen321 Takipçiler

Courtney Shearer retweetledi

Sasha Gusev@SashaGusevPosts·9 Şub

Another Claude project: a static site that pulls in GWAS SNP data from ensemble, multiple public biobanks, open targets, gtex, eqtl catalog, and OMIM.

English

159

10.8K

Courtney Shearer retweetledi

Jorge Bravo Abad@bravo_abad·15 Oca

Mapping 3.4 billion gene circuit designs with AI Designing synthetic gene circuits is like tuning a complex instrument in the dark. You have dozens of genetic parts—promoters, transcription factors, binding motifs—that must work together precisely, yet each combination behaves unpredictably due to context-dependent molecular interactions. Traditional approaches test circuits one at a time, making optimization painfully slow. Kshitij Rai and coauthors just changed the rules of the game. Their platform, CLASSIC (Combining Long- And Short-range Sequencing to Investigate genetic Complexity), combines Nanopore and Illumina sequencing to profile over 100,000 multi-kilobase gene circuit designs in a single experiment—then uses machine learning to predict the behavior of billions more. The workflow is elegant: pooled DNA assembly with barcodes, long-read sequencing to index composition-to-barcode mappings, phenotypic sorting in human cells, and short-read sequencing to link barcodes to function. The result? Quantitative expression data for 121,000 single-input circuits and 128,000 dual-input circuits, used to train neural networks that predict circuit behavior with r² values of 0.86–0.90. The insights are remarkable. High-fold-change circuits don't emerge from a single "optimal" design but from multiple balanced combinations of medium-activity components. AND-gate logic requires clustered transcription factor binding sites; OR-gates need interspersed patterns. These rules were invisible before—now they're learnable from data. The message: by scaling the design-build-test-learn cycle by orders of magnitude and combining it with ML, we can finally navigate genetic design spaces too vast for human intuition, accelerating everything from metabolic engineering to cell therapies. Paper: nature.com/articles/s4158…

English

288

28.3K

Courtney Shearer retweetledi

hardmaru@hardmaru·12 Oca

One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization. We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.

Sakana AI@SakanaAILabs

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings pub.sakana.ai/DroPE/ We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning. The core insight of this work challenges a fundamental assumption in Transformer architecture. We discovered that explicit positional embeddings like RoPE are critical for training convergence but eventually become the primary bottleneck preventing models from generalizing to longer sequences. Our solution is radically simple: We treat positional embeddings as a temporary training scaffold rather than a permanent architectural necessity. Real-world workflows like reviewing massive code diffs or analyzing legal contracts require context windows that break standard pretrained models. While models without positional embeddings (NoPE) generalize better to these unseen lengths, they are notoriously unstable to train from scratch. Here, we achieve the best of both worlds by using embeddings to ensure stability during pretraining and then dropping them to unlock length extrapolation during inference. Our approach unlocks seamless zero-shot context extension without any expensive long-context training. We demonstrated this on a range of off-the-shelf open-source LLMs. In our tests, recalibrating any model with DroPE requires less than 1% of the original pretraining budget, yet it significantly outperforms established methods on challenging benchmarks like LongBench and RULER. We have released the code and the full paper to encourage the community to rethink the role of positional encodings in modern LLMs. Paper: arxiv.org/abs/2512.12167 Code: github.com/SakanaAI/DroPE

English

241

2.5K

345.7K

Courtney Shearer retweetledi

Sarah Gurev@sarahgurev·13 Oca

We’ve updated the EVEREST benchmark to include real-world viral evolution! biorxiv.org/content/10.110… Co-led by @nooryoussef03 and me, along with co-authors @navami_jain, @aarushimehrotr, @AgaRosegirl, @_AMJackson, @deboramarks, and with @CEPIvaccines @FutureHouseSF!

Sarah Gurev@sarahgurev

🚨New paper 🚨 Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇 1/12

English

14K

Courtney Shearer retweetledi

Aviv Spinner@AvivSpinner·13 Oca

Deep learning is cool...but have you tried ✨logistic regression✨? Using just one round of sorting, these models predict affinity boosts and design nanobodies up to ~2500× better. Great work Steffanie, Eddie & team! 🧠🧬 biorxiv.org/content/10.648…

English

7.3K

Courtney Shearer retweetledi

Alan Amin@AlanNawzadAmin·19 Ara

If you want to do diffusion on discrete data, you have three choices: discrete, Gaussian, or simplicial. How are they related? Which should you use? We theoretically unify all three and train one model to do them all! @AlinaChandra @yucenlily @alex4ali @andrewgwils 1/7

English

396

60.2K

Courtney Shearer@c_sheare·8 Ara

@csteinmetz1 pay attention to @AlanNawzadAmin for number 3 arxiv.org/abs/2506.08316

English

Christian Steinmetz@csteinmetz1·8 Ara

I’ve been at NeurIPS this past week. Here’s six things I learned: 1. Basically everyone is doing RL 2. And anyone who isn’t doing RL is doing a startup for data (usually for RL) 3. Diffusion LMs are popular enough that you now have to clarify discrete or continuous 4. I joined for the NeurIPS run. Turns out Jeff Dean is a really fast runner 5. AI music isn’t a niche topic anymore, the workshop was standing room only 6. Some people knew Suno, many did not, but a few people showed me full musicals they made with Suno Can’t wait until the next one.

English

917

78.2K

Courtney Shearer retweetledi

Alan Amin@AlanNawzadAmin·1 Ara

What if we have a big protein but need a smaller version for delivery or engineering? Introducing SCISOR: we shrink proteins by training a diffusion model to find natural substrings! arXiv: arxiv.org/abs/2511.07390 @EthanBaron78553, @ruben_weitzman, @deboramarks, @andrewgwils! 1/7

English

170

17.4K

Courtney Shearer retweetledi

Ruben Weitzman@ruben_weitzman·2 Ara

If you’re into protein design but also a minimalist, check out this great collab!

Alan Amin@AlanNawzadAmin

English

496

Courtney Shearer@c_sheare·24 Kas

Huge paper led by @roseorenbuch ! Glad to have contributed. We must find variants responsible for undiagnosed, rare genetic diseases. popEVE could help clinicians diagnose single-variant genetic diseases with life saving speed and accuracy.

Debora Marks@deboramarks

New paper “Proteome-wide model for human disease genetics” is now live at Nature Genetics: rdcu.be/eRu7K popEVE (pop.evemodel.org) finds the needles in the haystacks of human genetic variation:

English

641

Courtney Shearer@c_sheare·23 Eki

Unstoppable

Alan Amin@AlanNawzadAmin

Thank you MoML! I will spend the novelty cheque wisely

English

314

Courtney Shearer retweetledi

Alan Amin@AlanNawzadAmin·23 Eki

Thank you MoML! I will spend the novelty cheque wisely

MIT Jameel Clinic for AI & Health@AIHealthMIT

A big congrats to @NYU_Courant’s @AlanNawzadAmin on winning the prestigious Octavian-Eugen Ganea Prize for Best Paper for his work “Shrinking Proteins with Diffusion”! 👏🏆#MoML2025

English

8.6K

Courtney Shearer retweetledi

Debora Marks@deboramarks·22 Eki

Announcing our new protein design server evedesign.bio: • End-to-end protein design for everyone! • Analyze your generated library interactively and on 3D structures • Export codon-optimized DNA sequences for experimental testing. Developed in collaboration between @deboramarks, @thomas_a_hopf, @SteineggerM, Simon d'Oelsnitz, Chris Sander, Artem Gazizov,@haysunny_hi, Milot Mirdita, Sergio Garcia Busto, Jake Reardon

GIF

English

376

38.5K

Courtney Shearer retweetledi

Aviv Spinner@AvivSpinner·8 Eyl

very amped to see more of these competitions coming up!!

Lood van Niekerk@lood_ml

Super excited about this competition! Use your own developability predictors or build your own on our public dataset of 250 antibodies. We're hoping to benchmark the current state-of-the-art hydrophobicity/thermostability/etc models out there 🏆 (1/2)

English

506

Courtney Shearer retweetledi

Sarah Gurev@sarahgurev·17 Ağu

🚨New paper 🚨 Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇 1/12

English

161

26.1K

Courtney Shearer retweetledi

Aviv Spinner@AvivSpinner·31 Tem

1/5 Biological data is noisy, redundant, and ever-growing. 🗣️ In our new paper (first paper of my post doc!! ⚡️), we track model performance across 14 years of UniRef100 snapshots to ask: how does pLM performance scale with training data?

English

105

11.5K

Courtney Shearer retweetledi

Alan Amin@AlanNawzadAmin·18 Tem

Come see how to shrink protein sequences with diffusion at our talk tomorrow morning!!!

Andrew Wagenmaker@ajwagenmaker

Best Paper Award Winners: @setlur_amrith, @AlanNawzadAmin, @wanqiao_xu, @EvZisselman

English

150

17.5K

Courtney Shearer retweetledi

Leo Zang@LeoTZ03·3 Tem

Today marks my last day at @Dyno_Tx. The past three months have been a wonderful first experience in industry, and I’m so grateful for the knowledge I've gained and the support I've received. I would like to extend a heartfelt thank you to everyone at Dyno, especially the ML Research Team and my manager, Kathy, for the guidance and kindness. I am also deeply appreciative of Dyno's help in securing my visa situation during this challenging time for international students. I couldn't have imagined a better internship experience. To anyone in the ML x Bio space, I highly recommend Dyno; they are also actively recruiting for a full-time ML Scientist. Wishing the entire team all the best, and thanks again for everything 🙂

English

3.3K

Courtney Shearer retweetledi

Alan Amin@AlanNawzadAmin·25 Haz

We can make population genetics studies more powerful by building priors of variant effect size from features like binding. But we’ve been stuck on linear models! We introduce DeepWAS to learn deep priors on millions of variants! #ICML2025 Andres Potapczynski, @andrewgwils 1/7

English

8.1K

Courtney Shearer retweetledi

rohit@rohitarorayyc·24 Haz

Been keeping up my New Year’s resolution to hit the gym… Excited to share RNAgym, the first large-scale benchmark for RNA fitness prediction!

Pascal Notin@NotinPascal

🚨 New paper 🚨 RNA modeling just got its own Gym! 🏋️ Introducing RNAGym, large-scale benchmarks for RNA fitness and structure prediction. 🧵 1/9

English

948

Keşfet

@nooryoussef03 @navami_jain @aarushimehrotr @AgaRosegirl @_AMJackson @deboramarks @CEPIvaccines @FutureHouseSF