Jiecong Lin

272 posts

Jiecong Lin banner
Jiecong Lin

Jiecong Lin

@JasonLinjc

Postdoc at @harvardmed @MGHPathology @DFBC_PedCare @HKUniversity, passionate about developing deep learning models to decipher gene regulation

Joined Temmuz 2017
1.2K Following234 Followers
Pinned Tweet
Jiecong Lin
Jiecong Lin@JasonLinjc·
Excited to share our work introducing EPInformer🧬, a scalable and lightweight deep learning framework to predict gene expression by integrating promoter-enhancer sequences with epigenomic signals and chromatin contacts. 📜biorxiv.org/content/10.110… (1/11)
English
2
16
99
15.9K
Michael Vinyard
Michael Vinyard@vinyard_m·
How does a stem cell "decide" its fate? Development requires both reliability (consistent cell types) AND flexibility (diverse outcomes from identical progenitors). Cells achieve this by dynamically tuning deterministic drift and stochastic diffusion. New in @NatMachIntell: scDiffEq models state-dependent drift AND diffusion, improving fate prediction by ~8% over SOTA. scDiffEq also enables genome-wide in silico perturbation screens and reveals temporal gene dynamics. 🧵nature.com/articles/s4225…
English
6
65
361
80.6K
Jiecong Lin retweeted
Niko McCarty.
Niko McCarty.@NikoMcCarty·
The model of gene expression taught in school is highly misleading! Transcription factors are proteins that bind to DNA and then help repress, or activate, the expression of genes. Cells have hundreds of different types of transcription factors, each tuned to regulate different genes based on short snippets of DNA located near those genes. The basic model, taught in school, says that these transcription factor proteins float around the cell and, when they bump into a DNA sequence, either latch onto it strongly (CORRECT SITE!) or fall off quickly (WRONG SITE) and keep searching. All the other DNA in a cell is basically abstracted away as unimportant or irrelevant; mere background noise. But again, this model is naive! And a new paper, published in Cell, beautifully shows how the sequences SURROUNDING a transcription factor's binding site also matter a great deal. This won't be surprising to many biologists, as "cracks" in the standard two-state model began emerging decades(?) ago. Biologists have tagged transcription factors with fluorescent tags and then watched them move around living cells. And they have noticed that when transcription factors land in a "wrong" location in the genome, they skip or hop to a nearby location and repeat this until finally connecting with the "correct" sequence. So in other words, there are actually three states that a transcription factor can exist in: free-floating, "searching", or "bound." (More technically, transcription factors first do a 3D search, then latch onto DNA and do a 1D search to find the correct location.) For this new paper, though, scientists exhaustively quantified *how* the sequences flanking a transcription factor binding site influence the search of the protein. They did a huge in vitro experiment, wherein they placed a specific transcription factor with a known binding site, called KLF1, in a huge library of 11,812 different DNA sequences. These sequences had mutated "core" binding sites and variations in the flanking sequences. They also prepared negative controls. Then, these researchers measured the binding kinetics of KLF1 with each sequence to understand which bases in the flanking sites impact the 1D search. What they found is that KLF1 has a basically flat disocciation rate from its core sequence, but that the PROBABILITY that it finds this sequence depends a lot on the surrounding context. Even mutations located dozens of bases away from the core site matter a lot, either pushing KLF1 to "hop" faster to find the site, or "trapping" KLF1 and slowing down its search. These flanking sequences can cause up to a 40-fold variation in the affinity of a transcription factor for its target site! This is just one small part of the paper, though, so I encourage anyone interested to read the whole thing. It is challenging throughout.
Niko McCarty. tweet mediaNiko McCarty. tweet media
English
19
241
1.2K
126.5K
Jiecong Lin retweeted
David Kelley
David Kelley@drklly·
Excited to share our new paper on predicting gene expression in yeast! We introduce "Shorkie," a supervised ML model that builds off a self-supervised foundation to interpret regulatory DNA. Preprint: biorxiv.org/content/10.110…
English
1
5
37
4.1K
Jiecong Lin retweeted
Anshul Kundaje
Anshul Kundaje@anshulkundaje·
Excited for a major milestone in a collab effort led by @jengreitz to map enhancers & interpret variants in the human genome: The E2G Portal e2g.stanford.edu collates predictions of enhancer-gene regulatory interactions across >1,600 cell types & tissues. Use cases 👇1/
English
6
30
124
12.4K
Jiecong Lin retweeted
Nadav Brandes
Nadav Brandes@BrandesNadav·
Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵
Nadav Brandes tweet media
English
9
114
480
75.5K
Jiecong Lin retweeted
Marinka Zitnik
Marinka Zitnik@marinkazitnik·
Ever wish you could hit "undo" on disease? 🩺🔄 nature.com/articles/s4155… Most drug discovery asks: what does this perturbation do to cells? But we can also ask the reverse: which perturbations undo a disease signature and move cells back toward health? That's the idea behind PDGrapher, just published in Nature Biomedical Engineering @natBME @justguadaa @mmbronstein ⚠️ Challenge Phenotype-based discovery screens compounds forward, measuring whether they improve disease phenotypes But it does not address which perturbations will reverse disease signatures or capture combinatorial targets that may be essential for restoring health 💡 Approach PDGrapher tackles perturbagen discovery: finding the genes, drugs, or combinations that can shift a diseased cell state toward health It represents disease cell states on molecular networks and asks: which perturbations will move these states toward health? It uses a causally inspired model that overlays gene-expression signatures onto proxy causal graphs. Perturbagens are encoded as edge modifications, allowing the model to reason about how interventions reshape cellular processes It is trained with two objectives: 💡 Cycle-consistency loss: ensures predicted perturbations move a diseased state toward its observed treated state 💡 Supervision loss: directly aligns predictions with experimental treatment outcomes
Marinka Zitnik tweet media
English
8
84
405
32.1K
Jiecong Lin retweeted
Jiecong Lin retweeted
Jacob Schreiber
Jacob Schreiber@jmschreiber91·
In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance. While undoubtedly important, how we *use* these models after training is potentially even more important. tangermeme v1.0.0 is out now. Hope you find it useful!
English
3
22
96
8.2K
Jiecong Lin retweeted
Eric Nguyen
Eric Nguyen@exnx·
In April '25, I shared the origin story of Evo on the TED stage. I talked about the motivation behind generating DNA with AI and how it could change what’s possible. It was an incredible experience. full video: go.ted.com/ericnguyen
English
1
8
55
9.1K
Jiecong Lin retweeted
Bo Wang
Bo Wang@BoWang87·
🚀 OncoGAN is now published in @CellGenomics! We introduce an AI system that generates high-fidelity, privacy-preserving synthetic cancer genomes — now open-access. 🔍 Why OncoGAN? --No patient data leakage — critical for genomic privacy --Built-in ground truth — ideal for benchmarking models 🧬 What does it simulate? --Tumor heterogeneity across donors --Tissue-specific mutational signatures --Complex SVs & CNAs --Realistic VAFs driven by CNAs 💥 Is it useful? Yes! --Boosted DeepTumour performance --Validated ActiveDriverWGS on synthetic genomes --Generated full BAMs with InSilicoSeq & BAMSurgeon 🎁 We’re releasing 800 synthetic genomes across 8 tumor types, along with code and pipeline. 📄 Paper: tinyurl.com/3sw9tmbk 📂 Data: tinyurl.com/28bpd5hs 💻 Code: tinyurl.com/mr3ku653 🙏 Huge thanks to all co-authors! Particularly shoutout to the first author, Ander Diaz, who is a postdoc in the lab. @ontariogenomics @CANSSIOntario @MoGen_Grad @OICR_news
Bo Wang tweet mediaBo Wang tweet mediaBo Wang tweet mediaBo Wang tweet media
English
4
24
138
26K
Jiecong Lin retweeted
ℏεsam
ℏεsam@Hesamation·
Tonight’s reading material. A team at DeepMind wrote this piece on how you must think about GPUs. Essential for AI engineers and researchers.
ℏεsam tweet media
English
17
332
2.5K
159.2K
Jiecong Lin retweeted
evo-devo
evo-devo@Xiaojie_Qiu·
🚀 Introducing PantheonOS (pantheonOS.stanford.edu): A Fully Open-Source Agent OS for Science PantheonOS began as a research project in my Stanford lab and has since evolved into a vision to redefine data science in the era of AI—starting with computational biology, especially single-cell and spatial genomics. PantheonOS is a general agent platform built from the ground up. It is arguably the first distributed agent framework designed for scientific data analysis. 🔑 Key Features 1. Multi-Agent Collaboration – Built-in paradigms for distributed, cross-machine cooperation among agents and toolsets. 2. Native Toolset Support – Python, R, Julia, LaTeX, and more—designed for real scientific workflows. 3. Modular & Extensible – Developer-friendly design with shallow wrappers, plus LLM-driven toolset generation. 4. Evolvable Agents – Capable of evolving large-scale code projects to achieve superhuman performance (e.g., evolving upon the original Harmony [I Korsunsky, 2019, Nature Biotechnology] and Scanorama [BL Hie, 2019, Nature Biotechnology] implementations), and even evolving the system itself to adapt to new fields. 🎉 Stepwise Release Strategy We’re releasing PantheonOS in stages: Pantheon-CLI (today!), followed by Pantheon-Lab, Pantheon-Notebook, Pantheon-Slack, and more. 🌟 Pantheon-CLI Highlights - We're not just building another CLI tool. We're defining how scientists will interact with data in the AI era. - Open, Powerful, Python-First – The first fully open-source, endlessly extendable scientific “vibe analysis” framework. - Mixed Programming Magic – Combine Python, natural language, R, or Julia—seamlessly in the same environment. - PhD-Level Assistant – A command-line agent for complex real-world genomics and beyond, handling workflows at the PhD level. - Privacy by Design – Run entirely offline with local LLMs—your data never leaves your computer. ✅ Proven Applications (10 Demonstrations) Computational biology: 1. ATAC-seq: From raw reads to peak matrix 2. RNA-seq: From raw reads to expression matrix 3. Complex single-cell workflows (PhD-level) 4. Hybrid natural language + R for Seurat annotation 5. Learning from web tutorials + invoking single-cell foundation models 6. Cell segmentation on 10x Genomics HD Visium data And beyond: 7. Mixed Python & R programming examples 8. Molecular docking & structural analysis 9. Exploratory factor analysis for behavioral survey data 10. Customer segmentation & finance analytics 🌐 Learn More & Get Started Website: pantheonOS.stanford.edu Pantheon-CLI Documentation: pantheon-cli-docs.netlify.app GitHub Repo: github.com/aristoteleo/pa… 💬 Join our community: PantheonOS Slack: pantheonos.slack.com/ssb/redirect PantheonOS Discord: discord.com/invite/74yzAGYW
English
2
38
119
17K
Jiecong Lin retweeted
Jacob Schreiber
Jacob Schreiber@jmschreiber91·
I'm glad that I had a chance to contribute to this wide-ranging article discussing the myriad ways ML is being used in genomics: nature.com/articles/d4158…
English
0
9
49
2.8K
Jiecong Lin retweeted
Zaixi Zhang
Zaixi Zhang@ZaixiZhang·
Introducing STELLA — a Self-Evolving LLM agent that autonomously creates its own tools to navigate and accelerate biomedical research. 🤖 Why It Matters: ⛓️ The Limitation: Most AI agents are fundamentally limited by a fixed set of predefined tools. This is a major bottleneck for complex, real-world scientific discovery. 🚀 The Breakthrough: STELLA shatters this limitation. It doesn't just use tools; it writes its own code to create new ones on the fly, allowing it to adapt and solve novel problems far beyond the scope of static systems. Top Highlights: 🛠️ Autonomous Tool Creation: At its core, STELLA features a self-evolving architecture that writes and integrates new bioinformatics tools into its own "Tool Ocean," moving beyond any reliance on a predefined library. 🎯 Novel Target Discovery: By creating custom workflows, STELLA successfully identified multiple novel therapeutic targets for Acute Myeloid Leukemia (AML) and melanoma. 🧬 Automated Enzyme Design: Designed novel enzymes with a 3-fold efficiency improvement over the wild type, showcasing its power in creative protein engineering tasks. 🏆 SOTA Performance: Significantly outperforms leading models on major biomedical benchmarks, achieving ~26% on Humanity's Last Exam, ~54% on LAB-Bench: DBQA, and ~63% on LAB-Bench: LitQA. Explore: 📜 Preprint: arxiv.org/abs/2507.02004 💻 Code: github.com/zaixizhang/STE…
Zaixi Zhang tweet mediaZaixi Zhang tweet mediaZaixi Zhang tweet media
English
0
5
32
2.3K
Jiecong Lin retweeted
Kexin Huang
Kexin Huang@KexinHuang5·
🤝Excited to partner with Tamarind @kavi_deniz to build towards an agentic AI protein designer. 🔁 Agentic protein optimization — Starting from a sequence, Biomni iteratively improves thermostability by orchestrating AlphaFold-2, ThermoMPNN, and reasoning over predictions and literature to suggest mutations. 🧠 Natural language control of tools like Boltz-2, Chai-1, and ImmuneBuilder. Learn more: biomni.stanford.edu/blog/tamarind-…
Kexin Huang tweet media
English
4
15
133
9.4K