Mingze Dong

27 posts

Mingze Dong

Mingze Dong

@Mingze7316

PhD student @YaleCBB; Integrate Science B.S. @PKU1898

Katılım Ekim 2021
340 Takip Edilen251 Takipçiler
Mingze Dong retweetledi
Stanford AI+Biomedicine Seminar
Stanford AI+Biomedicine Seminar@Stanford_AI_Bio·
We are excited to have @Mingze7316 presenting “Stack: In-context learning of single cell biology” tomorrow! 📍CoDa E160 | 2/3 2:30pm | Stanford + Zoom
Stanford AI+Biomedicine Seminar tweet media
English
2
21
174
13.3K
Mingze Dong
Mingze Dong@Mingze7316·
@maxkreuzz @abhinadduri @yusufroohani @davey_burke @dhrvji So it's inevitable given current bio data. But since biological foundation models only need to capture biology (not general reasoning), we may not need comparable scale for it to be useful. Plus, we believe we've set up the right framework for scaling as more data comes. (2/2)
English
1
0
2
71
Mingze Dong
Mingze Dong@Mingze7316·
@maxkreuzz @abhinadduri @yusufroohani @davey_burke @dhrvji Thanks! You're right about the scale gap. All current human single-cell data is ~10¹⁰ tokens in stack, while modern LLMs train on >10¹³. Our model scales accordingly: 10⁸⁻⁹ params vs >10¹¹⁻¹² for SOTA LLMs—about 10⁶× smaller in total, hence the resource difference. (1/2)
English
1
0
4
66
Max
Max@maxkreuzz·
This is awesome progress. But the scale compared to llms is just funny. STACK: "[...] complete pre-training in 2–3 days on a single H100 GPU." the biggest models from OpenAI, Anthropic and xAI use 10s of thousands of H100s (or their equivalent) on multi-month pre-training runs. Saying the difference is "multiple OOMs" is an understatement. Regardless of whether the bottleneck is data, compute or smth else, this is exciting. The more I read the more undeniable the potential seems to me — please disagree in the comments if I'm missing smth.
Arc Institute@arcinstitute

Predicting cell state in previously unseen conditions such as disease or in response to a drug has typically required retraining for each new biological context. Today, Arc is releasing Stack, a foundation model that learns to simulate cell state under novel conditions directly at inference time, no fine-tuning required.

English
1
0
0
90
Mingze Dong retweetledi
Arc Institute
Arc Institute@arcinstitute·
Predicting cell state in previously unseen conditions such as disease or in response to a drug has typically required retraining for each new biological context. Today, Arc is releasing Stack, a foundation model that learns to simulate cell state under novel conditions directly at inference time, no fine-tuning required.
Arc Institute tweet media
English
36
209
964
401.9K
Mingze Dong
Mingze Dong@Mingze7316·
Super proud to present Stack — a foundation model that brings in-context learning to leverage and engineer cellular contexts, through innovations grounded in single-cell biology. Huge thanks to @yusufroohani @abhinadduri @dhrvji @davey_burke and Arc team! A great summary below:
Yusuf Roohani@yusufroohani

Why define conditions, donors or even *tasks* when we can just use cells themselves to guide model output Presenting Stack, in-context learning using just cells! Use cell context -> enhance its embedding Engineer cell context ->modify its state Led by the brilliant @Mingze7316

English
2
6
21
3.5K
Mingze Dong
Mingze Dong@Mingze7316·
Open to DMs / chats about AI for science and academic job opportunities! See my previous work on theoretically grounded single-cell and spatial omics AI models: scholar.google.com/citations?user… — with more to come.
English
0
0
1
318
Mingze Dong
Mingze Dong@Mingze7316·
I’ll be at #NeurIPS from Wed–Sun presenting our work openreview.net/pdf?id=oQbTbio… ! We build a high-dim linear model that explains all kinds of phenomena in mask-based pretraining, and from this framework propose R²MAE that improves pretraining across language, DNA, and single-cell.
Mingze Dong tweet media
English
1
0
1
462
Mingze Dong
Mingze Dong@Mingze7316·
By identifiability, SIMVI uniquely enables inference of “spatial effects” at a single-cell level, empowering biological discoveries. Please refer to our manuscript (and the 44-page SI) for more details and applications. Many thanks for the support @YaleCBB @RongFan8 @Klugerlab!
English
1
1
4
807
Mingze Dong
Mingze Dong@Mingze7316·
Out in @NatureComms! We tackle a core challenge in spatial omics—reliably disentangle spatial interactions from intrinsic cell properties, which requires identifiability. We built an identifiable deep learning framework SIMVI (with proofs!) to solve this: nature.com/articles/s4146…
English
4
24
106
12.5K
Mingze Dong
Mingze Dong@Mingze7316·
@Ella_Maru Thanks for the question! Short answer: Yes. If the lineage is space-independent, intrinsic variation would capture and disentangle it from niche; if space-dependent, our relevant case study (Fig. 5) shows SIMVI can reveal spatial-dependent states and differentiate from niches.
English
1
0
1
188
Mingze Dong
Mingze Dong@Mingze7316·
Many thanks to all co-authors whose contributions make this work possible! Please check our manuscript for details and more results: biorxiv.org/content/10.110… N/N.
English
0
0
1
222
Mingze Dong
Mingze Dong@Mingze7316·
Summary: scShift demonstrates 4 important properties for next-generation single-cell models: 1) zero-shot, 2) disentanglement, 3) scaling, and 4) unsupervised. It facilitates analyses of biological states at all levels.  The novel idea may lead to various future extensions. 11/N
English
1
0
1
254
Mingze Dong
Mingze Dong@Mingze7316·
Thrilled to share our preprint: biorxiv.org/content/10.110…. Long story short: we found a way (scShift) leveraging massive single-cell atlases to build powerful zero-shot biological state extractors. Its performance scales with dataset diversity after an “emergence threshold”. 1/N
English
1
2
9
1.3K