Linna An

118 posts

Linna An banner
Linna An

Linna An

@alchemist_an

TTAP @RiceUniversity | computational biochemist | Protein designer @UWproteindesign & natural product biochemist @ChemistryUIUC | open to industry

Katılım Ağustos 2022
526 Takip Edilen881 Takipçiler
Linna An
Linna An@alchemist_an·
First guest for ML4BioChem club comes today! @r_krishna3 ! See you all online and in person.
Linna An tweet media
English
0
3
29
1.9K
Linna An retweetledi
Debora Marks
Debora Marks@deboramarks·
Meet evedesign: a new open-source ML framework that makes protein design accessible and interoperable. 📢 See our post: deboramarkslab.substack.com/p/evedesign-bi… Protein design models are powerful, but combining them shouldn’t require custom glue code. ✅Combine models for multi-objective optimization ✅Integrate lab-in-the-loop experimental of data ✅100% secure: run on your own infra, no data sharing Get started building therapeutics & industrial enzymes today 👇 📄Paper: biorxiv.org/content/10.648… 💻Code: github.com/evedesignbio 🌐Webserver: evedesign.bio Reach out to collaborate: hello@evedesign.bio
Debora Marks tweet media
English
2
64
261
15.1K
Linna An
Linna An@alchemist_an·
This is an exciting work! I doubt it will resolve drug discovery, but will be really exciting for enzyme and molecular machine design.
Biology+AI Daily@BiologyAIDaily

Learning the All-Atom Equilibrium Distribution of Biomolecular Interactions at Scale 1 ByteDance AI Drug Discovery and Anew Therapeutics researchers introduce AnewSampling, the first generative foundation framework that faithfully reproduces molecular dynamics (MD) at the all-atom level for sampling the equilibrium distribution of biomolecular interactions, addressing the high computational cost of traditional MD simulations. 2 AnewSampling leverages a novel quotient-space generative framework to ensure mathematical consistency in its modeling, and is trained on AnewSamplingDB—the largest self-curated database of protein-ligand trajectories to date, containing over 15 million conformations across 10,297 unique protein sequences and 27,979 unique ligand SMILES. 3 The framework builds on an AlphaFold3-like architecture with a stratified hybrid fine-tuning strategy: Low-Rank Adaptation (LoRA) for sequence representation modules and full-parameter fine-tuning for the Diffusion Module, alongside a Cluster-Based Template Guidance mechanism to enforce exhaustive exploration of the equilibrium ensemble. 4 In benchmarking on the ATLAS monomer dataset, AnewSampling outperforms all state-of-the-art generative methods across all 13 evaluation metrics, showing unparalleled accuracy in predicting protein flexibility and distributional accuracy for monomeric systems. 5 For protein-ligand dynamics testing across held-out PDB systems, JACS & Merck industrial datasets and an in-house drug discovery pipeline dataset, AnewSampling achieves statistical alignment with ground-truth MD distributions that far surpasses static predictors and MD-enhanced models like Boltz2, with its generated conformations nearly indistinguishable from MD baselines in key metrics. 6 AnewSampling demonstrates emergent enhanced sampling capabilities beyond conventional MD, successfully navigating high energy barriers to recover coupled ligand and side-chain motions in CDK2 systems (1H1R and 1H1S)—a major challenge for traditional MD that often requires replica-exchange MD (REMD) to achieve. 7 The model accurately captures subtle ligand-induced conformational shifts in congeneric structure-activity relationship (SAR) series, a critical capability for lead optimization in drug discovery, and maintains high fidelity in modeling non-covalent protein-ligand interactions and global protein backbone dynamics across diverse chemical and conformational spaces. 8 The research team proposes a multi-level assessment strategy for generative biomolecular dynamics models, using metrics like Jensen-Shannon (JS) distance for ligand torsion, Wasserstein (WS) distance for protein-ligand interactions and Spearman correlation for Cα RMSF to rigorously validate physical fidelity at the atomic level. 9 AnewSampling offers unprecedented computational efficiency for exploring biomolecular conformational landscapes, enabling integration into research and industrial drug discovery pipelines and driving a shift toward dynamics-aware design of adaptive inhibitors and functional biomolecules. 10 While AnewSampling achieves significant advances, the researchers note current limitations including reliance on structural templates, limited training data for broader biomolecular interaction types (e.g., protein-nucleic acid) and restriction to fixed thermodynamic environments, outlining future work to address these and enable sequence-only equilibrium distribution prediction. 11 AnewSampling and conventional MD are shown to be complementary: MD provides the critical training data for the generative model, while AnewSampling can accelerate MD by generating diverse initial structural candidates that help bypass energy barriers in physical simulations. 📜Paper: biorxiv.org/content/10.648… #AIDrugDiscovery #BiomolecularDynamics #AllAtomModeling #GenerativeAI #ComputationalBiology #MolecularDynamics #ProteinLigandInteractions

English
0
4
41
5.6K
Linna An retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Learning the All-Atom Equilibrium Distribution of Biomolecular Interactions at Scale 1 ByteDance AI Drug Discovery and Anew Therapeutics researchers introduce AnewSampling, the first generative foundation framework that faithfully reproduces molecular dynamics (MD) at the all-atom level for sampling the equilibrium distribution of biomolecular interactions, addressing the high computational cost of traditional MD simulations. 2 AnewSampling leverages a novel quotient-space generative framework to ensure mathematical consistency in its modeling, and is trained on AnewSamplingDB—the largest self-curated database of protein-ligand trajectories to date, containing over 15 million conformations across 10,297 unique protein sequences and 27,979 unique ligand SMILES. 3 The framework builds on an AlphaFold3-like architecture with a stratified hybrid fine-tuning strategy: Low-Rank Adaptation (LoRA) for sequence representation modules and full-parameter fine-tuning for the Diffusion Module, alongside a Cluster-Based Template Guidance mechanism to enforce exhaustive exploration of the equilibrium ensemble. 4 In benchmarking on the ATLAS monomer dataset, AnewSampling outperforms all state-of-the-art generative methods across all 13 evaluation metrics, showing unparalleled accuracy in predicting protein flexibility and distributional accuracy for monomeric systems. 5 For protein-ligand dynamics testing across held-out PDB systems, JACS & Merck industrial datasets and an in-house drug discovery pipeline dataset, AnewSampling achieves statistical alignment with ground-truth MD distributions that far surpasses static predictors and MD-enhanced models like Boltz2, with its generated conformations nearly indistinguishable from MD baselines in key metrics. 6 AnewSampling demonstrates emergent enhanced sampling capabilities beyond conventional MD, successfully navigating high energy barriers to recover coupled ligand and side-chain motions in CDK2 systems (1H1R and 1H1S)—a major challenge for traditional MD that often requires replica-exchange MD (REMD) to achieve. 7 The model accurately captures subtle ligand-induced conformational shifts in congeneric structure-activity relationship (SAR) series, a critical capability for lead optimization in drug discovery, and maintains high fidelity in modeling non-covalent protein-ligand interactions and global protein backbone dynamics across diverse chemical and conformational spaces. 8 The research team proposes a multi-level assessment strategy for generative biomolecular dynamics models, using metrics like Jensen-Shannon (JS) distance for ligand torsion, Wasserstein (WS) distance for protein-ligand interactions and Spearman correlation for Cα RMSF to rigorously validate physical fidelity at the atomic level. 9 AnewSampling offers unprecedented computational efficiency for exploring biomolecular conformational landscapes, enabling integration into research and industrial drug discovery pipelines and driving a shift toward dynamics-aware design of adaptive inhibitors and functional biomolecules. 10 While AnewSampling achieves significant advances, the researchers note current limitations including reliance on structural templates, limited training data for broader biomolecular interaction types (e.g., protein-nucleic acid) and restriction to fixed thermodynamic environments, outlining future work to address these and enable sequence-only equilibrium distribution prediction. 11 AnewSampling and conventional MD are shown to be complementary: MD provides the critical training data for the generative model, while AnewSampling can accelerate MD by generating diverse initial structural candidates that help bypass energy barriers in physical simulations. 📜Paper: biorxiv.org/content/10.648… #AIDrugDiscovery #BiomolecularDynamics #AllAtomModeling #GenerativeAI #ComputationalBiology #MolecularDynamics #ProteinLigandInteractions
Biology+AI Daily tweet media
English
3
22
78
9.6K
Linna An retweetledi
James Roney
James Roney@jamesproney·
I'm excited to announce some major updates to our ProteinEBM paper with Chenxi Ou and @sokrypton!
English
5
60
363
34.7K
Linna An retweetledi
Bo Wang
Bo Wang@BoWang87·
This is really cool (and wild): Scientists simulated a complete living cell for the first time. Every molecule, every reaction, from DNA replication to cell division. The paper (Luthey-Schulten et al., Cell 2026, doi.org/10.1016/j.cell…), just out today, used JCVI-Syn3A — a synthetic minimal bacterium with fewer than 500 genes. A 3D+time simulation of the full 105-minute cell cycle: DNA replication, protein translation, metabolism, division. Every gene, protein, RNA, and chemical reaction tracked through physical space. It took years to build. Multiple GPUs. Six days of compute time per run. And this is the simplest possible cell. A human cell has ~20,000 genes. It lives in tissue. It interacts with neighbors. It differentiates. It responds to drugs in ways that depend on context we haven't fully measured. Mechanistic simulation of the minimal cell costs 6 GPU-days for 105 minutes of biology. You cannot scale that to human cells. The complexity isn't 40x harder. It's exponentially harder. This is why the field pivoted to data-driven models. You can't hand-encode the regulatory wiring of a human hepatocyte. But you can learn it — if you have the right perturbation data collected across enough diverse biological contexts. The two approaches aren't competing. Papers like this generate the ground truth that future ML models need for validation. But the path to a clinically useful virtual cell runs through foundation models, not through scaling up mechanistic simulation. Amazing work!
English
74
518
2.2K
287.7K
Linna An
Linna An@alchemist_an·
We = Linna An (AP@Rice) Cameron Glasscock(AP@Rice) Hayden Stegall(grad@Rice BCB); Precious Castillo (grad@Rice BCB) Yanapat Janthana (grad@Rice SSPB); Welcome sponsors💸; welcome speaker nomination!
Linna An@alchemist_an

We are launching ML4BioChem club to connect scientists through in-person/online seminars and happy hours to spark new collaborations, and beyond. Upcoming speakers 👉 : linnaan-lab.github.io/ML4BIOCHEM/ Subscribe @ `Email me about events` on website. Will add you to ml4biochem@rice.edu.

English
0
0
3
880
Linna An
Linna An@alchemist_an·
We are launching ML4BioChem club to connect scientists through in-person/online seminars and happy hours to spark new collaborations, and beyond. Upcoming speakers 👉 : linnaan-lab.github.io/ML4BIOCHEM/ Subscribe @ `Email me about events` on website. Will add you to ml4biochem@rice.edu.
English
3
21
163
10K
Linna An retweetledi
Hannes Stark
Hannes Stark@HannesStaerk·
Join us on Monday to discuss "Generative Modeling via Drifting" with the author MingYang Deng! A one-step generative model trained with an MMD loss. On Zoom at 9am PT / 12pm ET / 6pm CET: portal.valencelabs.com/starklyspeaking
Hannes Stark tweet media
English
1
10
168
18.8K
Linna An
Linna An@alchemist_an·
@silvirouskin Same respond from my dad, also he asked me when i will be a real professor. lol my dad means to be nice, he’s just not familiar with the system.
English
0
0
2
812
Silvi Rouskin
Silvi Rouskin@silvirouskin·
COMEDIC BIT “ So I started as an assistant professor at Harvard, which I thought was a pretty big deal. My dad hears this and goes, “Assistant professor? So… you assist the professor?” I’m like, “No, Dad, that’s the actual job. I am the professor.” And he’s like, “Right, right, but who’s the real professor?” I try to explain the whole academic ladder to him—assistant professor, associate professor, full professor—and he’s looking at me like I’m explaining Starbucks sizes [tall==small] He’s just nodding: “So you’re like the intern of professors. Got it.” It’s the job that finally upgraded me from ‘still studying’ to ‘sort of working’ in my dad’s’ eyes.”
English
21
10
441
51.8K
Linna An retweetledi
Hou Chao
Hou Chao@houchao1·
Our work of protein language models trained on biophysical dynamics just published in @PNASNews
Hou Chao tweet media
English
2
40
230
10.7K
Linna An retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
New course: Document AI: From OCR to Agentic Doc Extraction, built with @LandingAI, where I'm executive chairman, and taught by David Park and Andrea Kropp. Much of the world's data is locked in PDFs, JPEGs, and other documents. This short course shows you how to build agentic workflows that process documents accurately: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations. Traditional Optical Character Recognition (OCR) captures text but loses context from table headers, chart captions, or reading order of columns. After exploring OCR's limitations, you’ll use LandingAI's Agentic Document Extraction (ADE) framework to process documents. ADE treats pages as visually -- as images -- to parse information and extract fields. Skills you'll gain: - Build agents to convert unstructured files into structured Markdown/HTML and JSON - Use ADE to parse complex data like forms, handwriting, or equations - Map extracted information to named fields using a specified schema, with bounding boxes for grounding and validation - Deploy RAG applications with event-driven document processing Come learn about the best tools for processing documents like financial invoices, medical records, or academic papers intelligently: deeplearning.ai/short-courses/…
English
70
297
2.1K
197K
Linna An retweetledi
Leo Zang
Leo Zang@LeoTZ03·
ByteDance dropped SeedFold (with Linear Triangular Attention) and SeedProteo seedfold.github.io
Leo Zang tweet mediaLeo Zang tweet mediaLeo Zang tweet mediaLeo Zang tweet media
English
2
51
423
101.7K
Linna An retweetledi
AK
AK@_akhaliq·
Adversarial Flow Models
AK tweet media
English
3
7
44
8.8K
Linna An retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech report: huggingface.co/deepseek-ai/De… 1/n
DeepSeek tweet media
English
913
2.5K
16.1K
5.2M
Linna An
Linna An@alchemist_an·
@pranamanam big con! It's fantastic for you for life, science and business at the same time! still more to come :)
English
0
0
1
230
Pranam Chatterjee
Pranam Chatterjee@pranamanam·
Just got back from my wedding (and honeymoon 🇫🇷) -- the most magical (and relaxing) time of my life! To share our special day with my students, past and present, was incredible (pics below)! 🥰 Also, SO MUCH has happened in AIxBio while I gone! 🫨 I'll share my thoughts! 👇 But first, I can't express how much emotional it is (even with a young lab), to see generations of your team come together. @njakimo, my own closest mentor at MIT during undergrad/grad school, Sabrina, my first undergrad when I was a PhD student, my three former undergrads during my postdoc @Harvard: @garykbrixi (Evo2), @kalyanmpalepu @bhat_suhaas (Rhodes), and my amazing current/past students @DukeU and @Penn! 🥰 Even little @_sophia_tang_ took time away from her many projects! ☺️ I am so fortunate to have the best, most brilliant students in the world, who are all are such wonderful human beings. 💛 Okay, so back to the field! This week we saw BoltzGen, ODesign, OpenFold, Pearl -- all of which look very strong on all-atom benchmarking, prediction, and design tasks. Huge shoutout to @HannesStaerk and the Boltz team especially for getting experimental validations too! 🧑‍🔬 In the past month, many other models have come out: ProteinHunter for hallucination-based design, multimodal pLMs like Odyssey, and a month back, ProDiT from @aiproteins and Bowen (so so many more that I can't all highlight)! The field is moving (stressfully) quick in the representation/prediction/design tasks in both sequence and structure space. 🌬️ Not all of these tools can or will get used (there's just too many), but the hype-ier you can be, the more adoption you probably will get. That's just the (IMO slightly depressing) state of our field. 😔 Just gotta play the game, right? We've taken a slightly different route: my lab develops therapeutics internally ourselves (going from theory to in vivo). We know binding/interaction is important (and local) but you just can't ignore other key properties which are more global/contextual: half-life, solubility, toxicity, immunogenicity. ⚕️ Who cares if you can bind if you can't even get your molecule thru pre-clinical studies! That's why, over the past year, @_sophia_tang_ @TongChen321 @yinuo_z98 @SophieVincoff and the team, have been developing multi-objective-guided discrete models like PepTune, MOG-DFM, TR2-D2, AReUReDi (and more soon) that generate Pareto-optimal designs (we've done DNA, mRNA, proteins, and of course, peptides)! ⚖️ Eventually, folks using structure have to figure it out, but when you're only modeling local interactions (and you have to guide generation with global properties), there's bound to be a mismatch. 🙅‍♂️ And that 10 year promise to cure all diseases from @demishassabis? Not happening (not like it was ever going to 🙄). So where IS the field trending to for the next few years? Clearly in two directions: dynamics and cell state perturbation (please stop calling it Virtual Cells -- what does that even mean 😑) -- both of which should give more "context" to design. And both of these directions NEED advancements in theory (for example, in transition path sampling, my favorite of which are coming from @AlexanderTong7, @bose_joey, @mmbronstein, @ask1729's groups). From us, you've seen @_sophia_tang_ @yinuo_z98's work on our new Schrödinger Bridge frameworks, but more are coming soon to model intermediate and terminal states (cell and protein states) even better! 🫡 Here's the thing, though, these problems don't just need data, they need GROUND-TRUTH DATA. And for dynamics, that can't only be MD data (that's not real and still has all of the real-world inaccuracies of PDB-based structures). For now, to develop algorithms, it's fine (we use it -- it's all we have), but we must allow experimentalists the time to develop methods/assays to capture dynamics in cells/physiologically relevant environments. Otherwise, we will mistakenly claim this problem to be solved and folks will just jump to the next problem. A classic tendency of AIxBio that we have to stop doing. 🛑 Same goes for cell perturbation prediction/design. Single cell data is amazing for hypothesis generation and studying biology at scale (@GametoGen's tech wouldn't be possible without it!). But for ground-truth modeling? NO. In a given cell x gene matrix, like half the genes have counts of 0! You're telling me that's the full state of a cell?🤦‍♂️ And don't get me started on single cell spatial/epigenomics or other newer modalities with even sparser data modes! The folks hyping single cell foundation models and virtual cells need to take a breather. 🫥 We don't need a next "AlphaFold" moment, but we do need to solve this problem well: predicting and designing state perturbations are incredibly important challenges for bioengineering that we can't just hand-wave. One word: PATIENCE. 🫷 AIxBio isn't patient. Yes, we need to move fast because we can provide great therapies and solutions to many global problems. But if we move TOO fast, our work will be cool, sure, but meaningless. Work with experimentalists and give them space/time to hone their methods/instrumentation/data generation, and you, the algorithm folks, will benefit from the data, and the loop of benefit begins, and we create meaningful outputs! ♻️ And finally, let's CHILL with the hype. ❄️ Please. We're better than this. No model needs a 4k hype vid or launch party. This goes for some VCs too: on the same day you won't write a $500k seed check to a small company with strong pre-clinical data on an asset b/c you don't like the target or some random reason, you'll throw like $500M at a hyped-up AIxBio company -- bc idk, vibes? 💁 Look, let us show our models works, get experimentalists to use them, push good, optimized molecules into the clinic/field, and always be cognizant that we still have a long way to go. We'll get there. For me, I'm just grateful to have such a wonderful team to do it with. 😇
Pranam Chatterjee tweet mediaPranam Chatterjee tweet media
English
27
6
308
45.6K
Linna An retweetledi
Silvi Rouskin
Silvi Rouskin@silvirouskin·
I feel like crap. I didn’t get any grants this year, so I had to let go of my super talented postdoc—someone with a doctorate who’s now unemployed, without benefits or retirement, and still grinding away on a paper... Postdocs are the most unappreciated and undervalued positions at the PhD level-highly qualified, yet paid less than industry techs in Boston (with a BS degree) , forced to work weekends without benefits or job security, and at risk of losing everything if a grant isn't funded. Name another PhD role that's treated with less respect and value by the system.
English
139
222
2.6K
616.1K
Linna An retweetledi
Corey Howe
Corey Howe@design_proteins·
I just got laid off this morning from Myriad Genetics Looking for new opportunities, RT’s appreciated, DM’s open My experience: Protein design + engineering NGS + Bioinformatics + some software dev Data Science + Analytics Synbio + Cancer Genetics
English
17
77
344
26.7K
Linna An retweetledi
Yehlin Cho
Yehlin Cho@ChoYehlin·
Thrilled to announce our new preprint, “Protein Hunter: Exploiting Structure Hallucination within Diffusion for Protein Design,” in collaboration with Griffin, @GBhardwaj8 and @sokrypton 🧬Code and notebooks will be released by the end of this week. 🎧Golden- Kpop Demon Hunters
English
4
57
294
25.4K