Nazreen Pallikkavaliyaveetil

7 posts

Nazreen Pallikkavaliyaveetil

Nazreen Pallikkavaliyaveetil

@nazreenpm

Postdoctoral Associate Computational Biology @AI/ML @Yale

New Haven, Connecticut Beigetreten Nisan 2010
143 Folgt29 Follower
Nazreen Pallikkavaliyaveetil retweetet
David van Dijk
David van Dijk@david_van_dijk·
🚀 We are at @icmlconf in Vienna presenting #cell2sentence (C2S), the first framework for LLM-based single-cell foundation models! 🧬✨ C2S can generate cells from language prompts, interpret cells, and even generate natural language insights directly from data! 🔍💬📊 Stay tuned for some major model releases this week! 🔥📢 icml.cc/virtual/2024/p… biorxiv.org/content/10.110… github.com/vandijklab/cel…
David van Dijk tweet media
English
0
18
129
16.5K
Nazreen Pallikkavaliyaveetil retweetet
David van Dijk
David van Dijk@david_van_dijk·
Major Cell2Sentence update 🎉🔬! We’ve been thrilled to see the attention Cell2Sentence has received from the single-cell community. Now, we’re excited to release our first update of Cell2Sentence (C2S) - a framework to leverage LLMs to train foundational single-cell models, directly in text. What’s new & out: Updated preprint with latest results biorxiv.org/content/10.110… First full cell model available on the HuggingFace hub huggingface.co/vandijklab/pyt… Updated codebase for data transformation & training github.com/vandijklab/cel… We now fine-tune language models to generate entire cells, predict combinatorial cell labels, and generate textual data insights directly from cell sentences. We train GPT-2 and Pythia models on a large multi-tissue dataset containing 36M cells from @cellxgene as well as an immune tissue dataset containing 270k cells. C2S LMs achieve SOTA performance in single-cell data generation. C2S models trained for combinatorial label prediction settings excel in low-data regimes, outperforming single-cell foundation model baselines. We also show that C2S models benefit from natural language pre-training and always outperform models trained from scratch on cell sentences. C2S provides a straightforward approach to adapting LLMs for single-cell data analysis, leveraging their natural language capabilities to generate and derive insights from single cells. We are convinced that C2S’ approach of integrating data modalities through text is the way forward for single-cell foundation models, from representing multi-omics data to generating clinical insights, all in a human readable format. We’re excited to start building a community around Cell2Sentence! If you also think that C2S will be the framework for single-cell foundation models, and are interested in contributing, reach out to us! We welcome any collaborations and discussions. Huge thanks to our collaborator @aminkarbasi and the C2S team (@danielflevine, @sachalevy3, @SyedARizvi5688, @nazreenpm, Xingyu Chen, @dzhang03, @GhadermarziSina, Ruiming Wu, Ivan Vrkic, Anna Zhong, Daphne Raskin, Insu Han, @aho_fonseca, @josueortc) for their hard work on C2S! Special thanks to @rahuldhodapkar, who co-supervises this project.
GIF
English
3
43
179
51.2K