Darin Tsui

23 posts

Darin Tsui banner
Darin Tsui

Darin Tsui

@darin_tsui

PhD candidate at @GeorgiaTech | ML for bioengineering | @UCSanDiego '23

Katılım Nisan 2024
89 Takip Edilen81 Takipçiler
Darin Tsui retweetledi
Darin Tsui
Darin Tsui@darin_tsui·
@jatin_n0 Thanks Jatin, appreciate the warm words! Hopefully we get to bump into each other again, and I hope you discover cool things with ProtoMech :)
English
0
0
1
56
Jatin Nainani
Jatin Nainani@jatin_n0·
Awesome work and interface! Very curious on how they handled output metrics. I feel like im going to spend a bunch of time on that ui (though now i need to stop all those clt runs lol)
Darin Tsui@darin_tsui

🚨Excited to share ProtoMech: a framework for discovering the computational circuits inside protein language models (pLMs)! pLMs like ESM2 are powerful, but the computational mechanisms, or circuits, underlying their predictions remain poorly understood. (1/n)

English
2
0
6
530
Darin Tsui
Darin Tsui@darin_tsui·
@rishabh16_ Thanks a bunch, Rishabh! We have not tried ProtoMech on Progen, although we anticipate that most of our methodology will port over seamlessly. We chose to work with ESM2 mainly because of how ubiquitous it is for computational biologists.
English
0
0
2
172
Rishabh Anand
Rishabh Anand@rishabh16_·
@darin_tsui Awesome work! ESM2 is an encoder-style model though ... wondering if you performed any similar analysis of decoder-style models like Progen?
English
1
0
2
477
Darin Tsui
Darin Tsui@darin_tsui·
🚨Excited to share ProtoMech: a framework for discovering the computational circuits inside protein language models (pLMs)! pLMs like ESM2 are powerful, but the computational mechanisms, or circuits, underlying their predictions remain poorly understood. (1/n)
Darin Tsui tweet media
English
3
36
143
13.5K
Darin Tsui
Darin Tsui@darin_tsui·
ProtoMech circuits also align with biology. Analyzing circuits for kinase activity, NADP+ binding, and GB1 reveal known structural and functional motifs. (4/n)
Darin Tsui tweet mediaDarin Tsui tweet media
English
1
0
2
673
Darin Tsui
Darin Tsui@darin_tsui·
Can we leverage sparse autoencoders (SAEs) to solve 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗯𝗶𝗼𝗹𝗼𝗴𝗶𝗰𝗮𝗹 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀? Stop by our #NeurIPS2025 AI4Science workshop poster @ 11:20am on repositioning SAEs for protein function prediction and design! arxiv.org/abs/2508.18567
English
1
3
7
538
Darin Tsui
Darin Tsui@darin_tsui·
🚨NeurIPS spotlight!!! We enable fast, autoregressive protein generation with biologically grounded 𝘀𝗽𝗲𝗰𝘂𝗹𝗮𝘁𝗶𝘃𝗲 𝗱𝗲𝗰𝗼𝗱𝗶𝗻𝗴! Stop by our poster 𝘁𝗼𝗱𝗮𝘆 @ 𝟰:𝟯𝟬𝗽𝗺 (#𝟭𝟲𝟭𝟬)! arxiv.org/abs/2509.21689
English
0
0
7
726
Darin Tsui
Darin Tsui@darin_tsui·
At #NeurIPS? Stop by our poster 𝘁𝗼𝗱𝗮𝘆 @ 𝟰:𝟯𝟬𝗽𝗺 (#𝟭𝟬𝟬𝟵) on explaining genomic and protein sequence models at scale! Excited to meet people and open to fun collaborations!
Darin Tsui@darin_tsui

Excited to announce our paper, SHAP zero, has been accepted into NeurIPS 2025! SHAP zero takes the first steps toward extracting biological insights at scale from machine learning models by amortizing the cost of explanations across large biological datasets. (1/n)

English
0
1
2
793
Darin Tsui
Darin Tsui@darin_tsui·
Excited to be in #NeurIPS this week in San Diego! Please feel free to reach out to me to chat about protein engineering, explainable AI, mechanistic interpretability, and everything in between!
English
0
0
1
168
Darin Tsui
Darin Tsui@darin_tsui·
We then moved to apply SHAP zero to extract epistatic interactions in protein language models. Despite the total feature space being larger than a trillion, SHAP zero was 7x faster in amortized time and uncovered interactions associated with structural stability. (6/n)
Darin Tsui tweet media
English
1
0
1
118
Darin Tsui
Darin Tsui@darin_tsui·
Excited to announce our paper, SHAP zero, has been accepted into NeurIPS 2025! SHAP zero takes the first steps toward extracting biological insights at scale from machine learning models by amortizing the cost of explanations across large biological datasets. (1/n)
Darin Tsui tweet media
English
1
0
3
2K
Darin Tsui retweetledi
Amirali Aghazadeh
Amirali Aghazadeh@amiraliagz·
Our new work on the utility of SAEs in Low-N protein function prediction and design tasks @darin_tsui
Biology+AI Daily@BiologyAIDaily

Sparse Autoencoders for Low-N Protein Function Prediction and Design 1. This study explores the use of Sparse Autoencoders (SAEs) for predicting protein function and designing proteins in low-data scenarios, demonstrating significant improvements over existing methods. 2. SAEs trained on fine-tuned ESM2 embeddings consistently outperform ESM2 baselines in fitness prediction tasks, even with as few as 24 sequences, showing their effectiveness in capturing biologically meaningful representations. 3. The study introduces a method to steer predictive latents in SAEs to design high-functioning protein variants, achieving top-fitness variants in 83% of cases compared to designing with ESM2 alone. 4. The authors analyze the best-performing variants in green fluorescent protein (GFP) and the IgG-binding domain of protein G (GB1), uncovering biologically meaningful motifs exploited by SAEs for steering. 5. SAEs achieve higher generalization to unseen variants compared to ESM2 in various low-N fitness extrapolation tasks, including position, regime, and score extrapolation. 6. The study highlights the importance of sparsity in SAEs, which compresses biologically relevant information into a sparse latent space, enhancing performance in low-N regimes. 7. The authors suggest future work could involve expanding the design space by steering multiple latents at once and coupling SAE steering with physics-based tools to optimize for both function and stability. 📜Paper: arxiv.org/abs/2508.18567… 💻Code: github.com/amirgroup-code… #ProteinEngineering #MachineLearning #SparseAutoencoders #ProteinDesign #LowDataScenarios

English
0
3
9
813