Jiecong Lin

272 posts

Jiecong Lin

@JasonLinjc

Postdoc at @harvardmed @MGHPathology @DFBC_PedCare @HKUniversity, passionate about developing deep learning models to decipher gene regulation

Joined Temmuz 2017

1.2K Following234 Followers

Pinned Tweet

Jiecong Lin@JasonLinjc·6 Ağu

Excited to share our work introducing EPInformer🧬, a scalable and lightweight deep learning framework to predict gene expression by integrating promoter-enhancer sequences with epigenomic signals and chromatin contacts. 📜biorxiv.org/content/10.110… (1/11)

English

15.9K

Jiecong Lin retweeted

Ryan Corces@ryancorces·6 Mar

Just posted a preprint on a huge new single-cell study of Parkinson’s and its application to understanding noncoding variation. 🧵below. Led by Shreya Menon and Adam Turner and in collab w/ GP2 (@ASAP_Research), @BelloyMichael, @ZihHuaFang and others. doi.org/10.64898/2026.…

English

134

9.9K

Jiecong Lin@JasonLinjc·23 Ara

@vinyard_m @NatMachIntell Congrats Michael, amazing work!

English

155

Michael Vinyard@vinyard_m·22 Ara

How does a stem cell "decide" its fate? Development requires both reliability (consistent cell types) AND flexibility (diverse outcomes from identical progenitors). Cells achieve this by dynamically tuning deterministic drift and stochastic diffusion. New in @NatMachIntell: scDiffEq models state-dependent drift AND diffusion, improving fate prediction by ~8% over SOTA. scDiffEq also enables genome-wide in silico perturbation screens and reveals temporal gene dynamics. 🧵nature.com/articles/s4225…

English

361

80.6K

Jiecong Lin retweeted

Stacie Dodgson, PhD@StacieDodgson·10 Ara

Exciting new joint study out in @Nature today from Mineto Ota (Marson and @jkpritch labs) - Causal modelling of gene effects from regulators to programs to traits nature.com/articles/s4158…

English

138

14.5K

Jiecong Lin retweeted

Niko McCarty.@NikoMcCarty·29 Kas

The model of gene expression taught in school is highly misleading! Transcription factors are proteins that bind to DNA and then help repress, or activate, the expression of genes. Cells have hundreds of different types of transcription factors, each tuned to regulate different genes based on short snippets of DNA located near those genes. The basic model, taught in school, says that these transcription factor proteins float around the cell and, when they bump into a DNA sequence, either latch onto it strongly (CORRECT SITE!) or fall off quickly (WRONG SITE) and keep searching. All the other DNA in a cell is basically abstracted away as unimportant or irrelevant; mere background noise. But again, this model is naive! And a new paper, published in Cell, beautifully shows how the sequences SURROUNDING a transcription factor's binding site also matter a great deal. This won't be surprising to many biologists, as "cracks" in the standard two-state model began emerging decades(?) ago. Biologists have tagged transcription factors with fluorescent tags and then watched them move around living cells. And they have noticed that when transcription factors land in a "wrong" location in the genome, they skip or hop to a nearby location and repeat this until finally connecting with the "correct" sequence. So in other words, there are actually three states that a transcription factor can exist in: free-floating, "searching", or "bound." (More technically, transcription factors first do a 3D search, then latch onto DNA and do a 1D search to find the correct location.) For this new paper, though, scientists exhaustively quantified *how* the sequences flanking a transcription factor binding site influence the search of the protein. They did a huge in vitro experiment, wherein they placed a specific transcription factor with a known binding site, called KLF1, in a huge library of 11,812 different DNA sequences. These sequences had mutated "core" binding sites and variations in the flanking sequences. They also prepared negative controls. Then, these researchers measured the binding kinetics of KLF1 with each sequence to understand which bases in the flanking sites impact the 1D search. What they found is that KLF1 has a basically flat disocciation rate from its core sequence, but that the PROBABILITY that it finds this sequence depends a lot on the surrounding context. Even mutations located dozens of bases away from the core site matter a lot, either pushing KLF1 to "hop" faster to find the site, or "trapping" KLF1 and slowing down its search. These flanking sequences can cause up to a 40-fold variation in the affinity of a transcription factor for its target site! This is just one small part of the paper, though, so I encourage anyone interested to read the whole thing. It is challenging throughout.

English

241

1.2K

126.5K

Jiecong Lin retweeted

David Kelley@drklly·14 Eki

Excited to share our new paper on predicting gene expression in yeast! We introduce "Shorkie," a supervised ML model that builds off a self-supervised foundation to interpret regulatory DNA. Preprint: biorxiv.org/content/10.110…

English

4.1K

Jiecong Lin retweeted

Rex Ma@RexMa9·27 Eyl

Excited to share that Ctrl-DNA, our constrained RL + Genomic Language Model system for cell-type–specific regulatory DNA design, co-led with @xingyuchen67, was accepted as NeurIPS 2025 Spotlight (top 3.2%) 🧬✨ Paper: arxiv.org/abs/2505.20578 Code: github.com/bowang-lab/Ctr…

English

220

Jiecong Lin retweeted

Anshul Kundaje@anshulkundaje·19 Eyl

Excited for a major milestone in a collab effort led by @jengreitz to map enhancers & interpret variants in the human genome: The E2G Portal e2g.stanford.edu collates predictions of enhancer-gene regulatory interactions across >1,600 cell types & tissues. Use cases 👇1/

English

124

12.4K

Jiecong Lin retweeted

Stephen Turner 🦋 @stephenturner.us@strnr·13 Eyl

OmniPath (omnipathdb.org): integrated knowledgebase for multi-omics analysis biorxiv.org/content/10.110… 🧬🖥️🧪 Python module github.com/saezlab/pypath R package github.com/saezlab/Omnipa… #Rstats

Stephen Turner 🦋 @stephenturner.us tweet media

English

307

19.6K

Jiecong Lin retweeted

Nadav Brandes@BrandesNadav·9 Eyl

Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵

English

114

480

75.5K

Jiecong Lin retweeted

Marinka Zitnik@marinkazitnik·9 Eyl

Ever wish you could hit "undo" on disease? 🩺🔄 nature.com/articles/s4155… Most drug discovery asks: what does this perturbation do to cells? But we can also ask the reverse: which perturbations undo a disease signature and move cells back toward health? That's the idea behind PDGrapher, just published in Nature Biomedical Engineering @natBME @justguadaa @mmbronstein ⚠️ Challenge Phenotype-based discovery screens compounds forward, measuring whether they improve disease phenotypes But it does not address which perturbations will reverse disease signatures or capture combinatorial targets that may be essential for restoring health 💡 Approach PDGrapher tackles perturbagen discovery: finding the genes, drugs, or combinations that can shift a diseased cell state toward health It represents disease cell states on molecular networks and asks: which perturbations will move these states toward health? It uses a causally inspired model that overlays gene-expression signatures onto proxy causal graphs. Perturbagens are encoded as edge modifications, allowing the model to reason about how interventions reshape cellular processes It is trained with two objectives: 💡 Cycle-consistency loss: ensures predicted perturbations move a diseased state toward its observed treated state 💡 Supervision loss: directly aligns predictions with experimental treatment outcomes

English

405

32.1K

Jiecong Lin retweeted

Caleb Lareau@CalebLareau·3 Eyl

This is the single most important resource for genomics research. Earth’s DNA is now at our fingertips. What a heroic effort and mega congrats to @RayanChikhi and @RNA_Life!!!

Rayan Chikhi@RayanChikhi

🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵 Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open. doi.org/10.1101/2024.0…

English

3.8K

Jiecong Lin@JasonLinjc·31 Ağu

@KexinHuang5 @jure @EmmanuelCandes @marinkazitnik @anshulkundaje @jengreitz Big congrats, cant’t wait to see your next contributions to the field!

English

Kexin Huang@KexinHuang5·29 Ağu

👋Life update: I just defended my PhD thesis! Immensely grateful to my advisor @jure and my committee members @EmmanuelCandes, @marinkazitnik, @anshulkundaje, and @jengreitz for their support throughout this journey!

English

564

31.8K

Jiecong Lin retweeted

Jacob Schreiber@jmschreiber91·27 Ağu

In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance. While undoubtedly important, how we *use* these models after training is potentially even more important. tangermeme v1.0.0 is out now. Hope you find it useful!

English

8.2K

Jiecong Lin retweeted

Eric Nguyen@exnx·26 Ağu

In April '25, I shared the origin story of Evo on the TED stage. I talked about the motivation behind generating DNA with AI and how it could change what’s possible. It was an incredible experience. full video: go.ted.com/ericnguyen

English

9.1K

Jiecong Lin retweeted

Bo Wang@BoWang87·23 Ağu

🚀 OncoGAN is now published in @CellGenomics! We introduce an AI system that generates high-fidelity, privacy-preserving synthetic cancer genomes — now open-access. 🔍 Why OncoGAN? --No patient data leakage — critical for genomic privacy --Built-in ground truth — ideal for benchmarking models 🧬 What does it simulate? --Tumor heterogeneity across donors --Tissue-specific mutational signatures --Complex SVs & CNAs --Realistic VAFs driven by CNAs 💥 Is it useful? Yes! --Boosted DeepTumour performance --Validated ActiveDriverWGS on synthetic genomes --Generated full BAMs with InSilicoSeq & BAMSurgeon 🎁 We’re releasing 800 synthetic genomes across 8 tumor types, along with code and pipeline. 📄 Paper: tinyurl.com/3sw9tmbk 📂 Data: tinyurl.com/28bpd5hs 💻 Code: tinyurl.com/mr3ku653 🙏 Huge thanks to all co-authors! Particularly shoutout to the first author, Ander Diaz, who is a postdoc in the lab. @ontariogenomics @CANSSIOntario @MoGen_Grad @OICR_news

English

138

26K

Jiecong Lin retweeted

ℏεsam@Hesamation·21 Ağu

Tonight’s reading material. A team at DeepMind wrote this piece on how you must think about GPUs. Essential for AI engineers and researchers.

English

332

2.5K

159.2K

Jiecong Lin retweeted

evo-devo@Xiaojie_Qiu·21 Ağu

🚀 Introducing PantheonOS (pantheonOS.stanford.edu): A Fully Open-Source Agent OS for Science PantheonOS began as a research project in my Stanford lab and has since evolved into a vision to redefine data science in the era of AI—starting with computational biology, especially single-cell and spatial genomics. PantheonOS is a general agent platform built from the ground up. It is arguably the first distributed agent framework designed for scientific data analysis. 🔑 Key Features 1. Multi-Agent Collaboration – Built-in paradigms for distributed, cross-machine cooperation among agents and toolsets. 2. Native Toolset Support – Python, R, Julia, LaTeX, and more—designed for real scientific workflows. 3. Modular & Extensible – Developer-friendly design with shallow wrappers, plus LLM-driven toolset generation. 4. Evolvable Agents – Capable of evolving large-scale code projects to achieve superhuman performance (e.g., evolving upon the original Harmony [I Korsunsky, 2019, Nature Biotechnology] and Scanorama [BL Hie, 2019, Nature Biotechnology] implementations), and even evolving the system itself to adapt to new fields. 🎉 Stepwise Release Strategy We’re releasing PantheonOS in stages: Pantheon-CLI (today!), followed by Pantheon-Lab, Pantheon-Notebook, Pantheon-Slack, and more. 🌟 Pantheon-CLI Highlights - We're not just building another CLI tool. We're defining how scientists will interact with data in the AI era. - Open, Powerful, Python-First – The first fully open-source, endlessly extendable scientific “vibe analysis” framework. - Mixed Programming Magic – Combine Python, natural language, R, or Julia—seamlessly in the same environment. - PhD-Level Assistant – A command-line agent for complex real-world genomics and beyond, handling workflows at the PhD level. - Privacy by Design – Run entirely offline with local LLMs—your data never leaves your computer. ✅ Proven Applications (10 Demonstrations) Computational biology: 1. ATAC-seq: From raw reads to peak matrix 2. RNA-seq: From raw reads to expression matrix 3. Complex single-cell workflows (PhD-level) 4. Hybrid natural language + R for Seurat annotation 5. Learning from web tutorials + invoking single-cell foundation models 6. Cell segmentation on 10x Genomics HD Visium data And beyond: 7. Mixed Python & R programming examples 8. Molecular docking & structural analysis 9. Exploratory factor analysis for behavioral survey data 10. Customer segmentation & finance analytics 🌐 Learn More & Get Started Website: pantheonOS.stanford.edu Pantheon-CLI Documentation: pantheon-cli-docs.netlify.app GitHub Repo: github.com/aristoteleo/pa… 💬 Join our community: PantheonOS Slack: pantheonos.slack.com/ssb/redirect PantheonOS Discord: discord.com/invite/74yzAGYW

English

119

17K

Jiecong Lin retweeted

Jacob Schreiber@jmschreiber91·19 Ağu

I'm glad that I had a chance to contribute to this wide-ranging article discussing the myriad ways ML is being used in genomics: nature.com/articles/d4158…

English

2.8K

Jiecong Lin retweeted

Zaixi Zhang@ZaixiZhang·26 Tem

Introducing STELLA — a Self-Evolving LLM agent that autonomously creates its own tools to navigate and accelerate biomedical research. 🤖 Why It Matters: ⛓️ The Limitation: Most AI agents are fundamentally limited by a fixed set of predefined tools. This is a major bottleneck for complex, real-world scientific discovery. 🚀 The Breakthrough: STELLA shatters this limitation. It doesn't just use tools; it writes its own code to create new ones on the fly, allowing it to adapt and solve novel problems far beyond the scope of static systems. Top Highlights: 🛠️ Autonomous Tool Creation: At its core, STELLA features a self-evolving architecture that writes and integrates new bioinformatics tools into its own "Tool Ocean," moving beyond any reliance on a predefined library. 🎯 Novel Target Discovery: By creating custom workflows, STELLA successfully identified multiple novel therapeutic targets for Acute Myeloid Leukemia (AML) and melanoma. 🧬 Automated Enzyme Design: Designed novel enzymes with a 3-fold efficiency improvement over the wild type, showcasing its power in creative protein engineering tasks. 🏆 SOTA Performance: Significantly outperforms leading models on major biomedical benchmarks, achieving ~26% on Humanity's Last Exam, ~54% on LAB-Bench: DBQA, and ~63% on LAB-Bench: LitQA. Explore: 📜 Preprint: arxiv.org/abs/2507.02004 💻 Code: github.com/zaixizhang/STE…

English

2.3K

Jiecong Lin retweeted

Kexin Huang@KexinHuang5·21 Tem

🤝Excited to partner with Tamarind @kavi_deniz to build towards an agentic AI protein designer. 🔁 Agentic protein optimization — Starting from a sequence, Biomni iteratively improves thermostability by orchestrating AlphaFold-2, ThermoMPNN, and reasoning over predictions and literature to suggest mutations. 🧠 Natural language control of tools like Boltz-2, Chai-1, and ImmuneBuilder. Learn more: biomni.stanford.edu/blog/tamarind-…

English

133

9.4K

Discover

@ASAP_Research @BelloyMichael @ZihHuaFang @vinyard_m @NatMachIntell @Nature @jkpritch @xingyuchen67