Mingze Dong

27 posts

Mingze Dong

@Mingze7316

PhD student @YaleCBB; Integrate Science B.S. @PKU1898

Katılım Ekim 2021

340 Takip Edilen251 Takipçiler

Mingze Dong retweetledi

Stanford AI+Biomedicine Seminar@Stanford_AI_Bio·2 Şub

We are excited to have @Mingze7316 presenting “Stack: In-context learning of single cell biology” tomorrow! 📍CoDa E160 | 2/3 2:30pm | Stanford + Zoom

Stanford AI+Biomedicine Seminar tweet media

English

174

13.3K

Mingze Dong@Mingze7316·14 Oca

@maxkreuzz @abhinadduri @yusufroohani @davey_burke @dhrvji So it's inevitable given current bio data. But since biological foundation models only need to capture biology (not general reasoning), we may not need comparable scale for it to be useful. Plus, we believe we've set up the right framework for scaling as more data comes. (2/2)

English

Mingze Dong@Mingze7316·14 Oca

@maxkreuzz @abhinadduri @yusufroohani @davey_burke @dhrvji Thanks! You're right about the scale gap. All current human single-cell data is ~10¹⁰ tokens in stack, while modern LLMs train on >10¹³. Our model scales accordingly: 10⁸⁻⁹ params vs >10¹¹⁻¹² for SOTA LLMs—about 10⁶× smaller in total, hence the resource difference. (1/2)

English

Max@maxkreuzz·14 Oca

This is awesome progress. But the scale compared to llms is just funny. STACK: "[...] complete pre-training in 2–3 days on a single H100 GPU." the biggest models from OpenAI, Anthropic and xAI use 10s of thousands of H100s (or their equivalent) on multi-month pre-training runs. Saying the difference is "multiple OOMs" is an understatement. Regardless of whether the bottleneck is data, compute or smth else, this is exciting. The more I read the more undeniable the potential seems to me — please disagree in the comments if I'm missing smth.

Arc Institute@arcinstitute

Predicting cell state in previously unseen conditions such as disease or in response to a drug has typically required retraining for each new biological context. Today, Arc is releasing Stack, a foundation model that learns to simulate cell state under novel conditions directly at inference time, no fine-tuning required.

English

Mingze Dong retweetledi

Arc Institute@arcinstitute·9 Oca

English

209

964

401.9K

Mingze Dong@Mingze7316·10 Oca

Super proud to present Stack — a foundation model that brings in-context learning to leverage and engineer cellular contexts, through innovations grounded in single-cell biology. Huge thanks to @yusufroohani @abhinadduri @dhrvji @davey_burke and Arc team! A great summary below:

Yusuf Roohani@yusufroohani

Why define conditions, donors or even *tasks* when we can just use cells themselves to guide model output Presenting Stack, in-context learning using just cells! Use cell context -> enhance its embedding Engineer cell context ->modify its state Led by the brilliant @Mingze7316

English

3.5K

Mingze Dong@Mingze7316·3 Ara

Open to DMs / chats about AI for science and academic job opportunities! See my previous work on theoretically grounded single-cell and spatial omics AI models: scholar.google.com/citations?user… — with more to come.

English

318

Mingze Dong@Mingze7316·3 Ara

I’ll be at #NeurIPS from Wed–Sun presenting our work openreview.net/pdf?id=oQbTbio… ! We build a high-dim linear model that explains all kinds of phenomena in mask-based pretraining, and from this framework propose R²MAE that improves pretraining across language, DNA, and single-cell.

English

462

Mingze Dong@Mingze7316·2 Nis

Thrilled to share that our work is featured by @NatureComms as an editor's highlight in Computational and Theoretical Biology! Check the link below: nature.com/collections/cf…

English

351

Mingze Dong@Mingze7316·31 Mar

By identifiability, SIMVI uniquely enables inference of “spatial effects” at a single-cell level, empowering biological discoveries. Please refer to our manuscript (and the 44-page SI) for more details and applications. Many thanks for the support @YaleCBB @RongFan8 @Klugerlab!

English

807

Mingze Dong@Mingze7316·31 Mar

Out in @NatureComms! We tackle a core challenge in spatial omics—reliably disentangle spatial interactions from intrinsic cell properties, which requires identifiability. We built an identifiable deep learning framework SIMVI (with proofs!) to solve this: nature.com/articles/s4146…

English

106

12.5K

Mingze Dong@Mingze7316·1 Nis

@Ella_Maru Thanks for the question! Short answer: Yes. If the lineage is space-independent, intrinsic variation would capture and disentangle it from niche; if space-dependent, our relevant case study (Fig. 5) shows SIMVI can reveal spatial-dependent states and differentiate from niches.

English

188

Ella Marushchenko@Ella_Maru·1 Nis

@Mingze7316 @NatureComms Congratulations Mingze! Can spatial disentanglement clarify lineage vs. niche transitions?

English

201

Mingze Dong@Mingze7316·20 Ara

Many thanks to all co-authors whose contributions make this work possible! Please check our manuscript for details and more results: biorxiv.org/content/10.110… N/N.

English

222

Mingze Dong@Mingze7316·20 Ara

Summary: scShift demonstrates 4 important properties for next-generation single-cell models: 1) zero-shot, 2) disentanglement, 3) scaling, and 4) unsupervised. It facilitates analyses of biological states at all levels. The novel idea may lead to various future extensions. 11/N

English

254

Mingze Dong@Mingze7316·20 Ara

Thrilled to share our preprint: biorxiv.org/content/10.110…. Long story short: we found a way (scShift) leveraging massive single-cell atlases to build powerful zero-shot biological state extractors. Its performance scales with dataset diversity after an “emergence threshold”. 1/N

English

1.3K

Keşfet

@maxkreuzz @abhinadduri @yusufroohani @davey_burke @dhrvji @NatureComms @YaleCBB @RongFan8