Philipp Schäfer

26 posts

Philipp Schäfer

@psl_schaefer

Computional biologist with background in biochem

Heidelberg Katılım Mart 2020

398 Takip Edilen87 Takipçiler

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·2 Tem

Postdoc position in our group at @emblebi , analyzing single cell data to better understand and treat neurodegen diseases, collab with @AndrewBassett43 , @mo_lotfollahi, @bayraktar_lab M Strauss and others @OpenTargets . Deadline 20/07. Please share🙏: embl.wd103.myworkdayjobs.com/en-US/EMBL/job…

English

Philipp Schäfer@psl_schaefer·15 May

@iskander @s_r_constantin What do you consider as independent validation?

English

alex rubinsteyn@iskander·14 May

@s_r_constantin Aiming for overwhelming effect size with early independent validation could help. Don’t advance drugs that delay tumor growth in mice, you want to cure in all idealized models rather than “statistically significant reduction in X” (aka noise)

English

371

alex rubinsteyn@iskander·13 May

Let’s dream a bit. How would you dramatically reorg biotech research / drug development to make dramatically faster progress towards curative therapies? (Don’t say “use AI”, hypothesis/candidate generation under current structure isn’t a bottleneck)

English

19.1K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·12 Ara

We will be at the inaugural #ESSB conference this week in Berlin, presenting with @MariaCHeinz a talk on multi-modal whole brain tumor profiling, @psl_schaefer with @tanevski a poster on analyzing cross-condition/temporal spatial data [1/2]

English

1.9K

Philipp Schäfer retweetledi

Kaessmann Lab@kaessmannlab·8 Kas

#Postdoc or #PhD position jointly hosted in our lab and that of Alexander Sasse @LXandR_ to investigate the evolution of gene regulation at the single-cell level across primates/mammals using state-of-the-art deep learning approaches: home.kaessmannlab.org/openPositions Please RT!

English

7.9K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·4 Kas

📢 Introducing CORNETO: a unified, knowledge-driven framework for multi-sample network learning & modeling via constrained optimization, supporting a wide variety of network methods and prior knowledge 💻Code: github.com/saezlab/corneto 📜Paper: doi.org/10.1101/2024.1… 🧵👇

English

118

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·5 Kas

This was a close collaboration with @schirmerlab, led by @Cels121 and our @PauBadiaM. Congrats to them and all the other co-authors: @roramirezf94, Patricia Sekol, @psl_schaefer, Christian J. Riedl, …

English

239

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·1 Eki

👉Consider joining us for a post-doc at @emblebi & @ASTRAZENECAUK as part of the EAZPOD program to study multicellular mechanisms associated with #cancer treatment resistance! #postdoc ebi.ac.uk/research/postd…

EMBL-EBI Jobs@EMBLEBIjobs

Apply to the 2024 EAZPOD Programme at EMBL-EBI & AstraZeneca in Cambridge. Work on cutting-edge computational biology in oncology with world-class scientists & access to state-of-the-art facilities! Learn more & apply: embl.org/jobs/position/… #Postdoc #Oncology

English

4.4K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·2 Eyl

📣 The much-revised manuscript of LIANA+, our all-in-one solution to study cell-cell communication from single-cell, spatial, and multi-omics technologies, is now published in @NatureCellBio nature.com/articles/s4155…

English

240

29.8K

Philipp Schäfer@psl_schaefer·8 Tem

@Thom_Wolf @Max90325883

QAM

159

Thomas Wolf@Thom_Wolf·7 Tem

There was a super impressive AI competition that happened last week that many people missed in the noise of AI world. I happen to know several participants so let me tell you a bit of this story as a Sunday morning coffee time. You probably know the Millennium Prize Problems where the Clay Institute pledged a US$1 million prize for the first correct solution to each of 7 deep math problems. To this date only one of these, the Poincaré conjecture, has been solved by Grigori Perelman who famously declined the award (go check Grigori out if you haven't the guy has a totally based life). So this new competition, the Artificial Intelligence Math Olympiad (AIMO) also came with a US$1M prize but was only open to AI model (so the human get the price for the work of the AI...). It tackle also very challenging but still simpler problems, namely problems at the International Math Olympiad gold level. Not yet the frontier of math knowledge but definitely above what most people, me included, can solve today. The organizing committee of the AIMO is kind-of-a who-is-who of highly respected mathematicians in the world, for instance Terence Tao widely famous math prodigy widely regarded as one of the greatest living mathematicians. Enter our team, Jia Li, Yann Fleuret, and Hélène Evain. After a successful exit in a previous startup (that I happen to have know well when I was an IP lawyer in a previous life but that's for another story) they decided to co-found Numina as a non-profit to do open AI4Math. Numina wanted to act as a counterpoint to AI math efforts like DeepMind's but in a much more open way with the goal to advance the use of AI in mathematics and make progress on hard, open problems. Along the way, they managed to recruit the help of some very impressive names in the AI+math world like Guillaume Lample, co-founder of Mistral or Stanislas Polu, formerly pushing math models at OpenAI. As Jia was participating in the code-model BigCode collaboration with some Hugging Face folks, came the idea to collaborate and explore how well code models could be used for formal mathematics. For context, olympiad math problems are extremely hard and the core of the issue is in the battle plan you draft to tackle each problem. A first focus of Numina was thus on creating high quality instruction Chain-of-Thought (CoT) data for competition-level mathematics. This CoT data has already been used to train models like DeepSeek Math, but is very rarely released so this dataset became an unvaluated ressource to tackle the challenges. BigCode's lead Leandro put Jia in touch with the team that trained the Zephyr models at Hugging Face, namely, Lewis, Ed, Costa and Kashif with additional help from Roman and Ben and the goal became to have a go at training some strong models on the math and code data to tackle the first progress prize of AIMO. And the trainings started: Jia being an olympiad coach, was intimately familiar with the difficulty level of these competitions and able to curate an very strong internal validation set to enable model selection (Kaggle submissions are blind). While iterating on dataset construction, Lewis and Ed from Hugging Face focused on training the models and building the inference pipeline for the Kaggle submissions. As often in competition it was an intense journey with Eureka and Aha moments pushing everyone further. Lewis told me about a couple of them which totally blow my mind. A tech report is coming so this is just some "along the way" nuggets that will be soon gathered in a much more comprehensive recipe and report. Learning to code: The submission of the team relied on self-consistency decoding (aka majority voting) to generate N candidates per problem and pick the most common solution. But initial models trained on the Numina data only scored around 13/50... they needed a better approach. They then saw the MuMath-Code paper (arxiv.org/abs/2405.07551) which showed you can combine CoT data with code data to get strong models. Jia was able to generate great code execution data from GPT-4 to enable the training of the initial models and get to impressive boost in performance. Taming the variance: Another Ahah moment came at some point when a Kaggle member shared a notebook showing how DeepSeek models worked super well with code execution (the model breaks down the problem into steps and each step is run in Python to reason about the next one). However, when the team tried this notebook they found this method had huge variance (the scores on Kaggle varied from 16/50 to 23/50). When meeting in Paris for a hackathon to improve this issue (like the HF team often does) Ed had the idea to frame the majority voting as a "tree of thoughts" where you'd progressively grow and prune a tree of candidate solutions (arxiv.org/abs/2305.10601). This had an impressive impact on the variance and enabled them to be much more confident in their submissions (which showed in how the model ended up performing extremely well on the test set versus the validation set) Overcoming compute constraints: the Kaggle submissions had to run on 2xT4s in under 9h which is really hard because FA2 doesn't work and you can't use bfloat16 either. The team explored quantization methods like AWQ and GPTQ, finding that 8-bit quantization of a 7B model with GPTQ was best Looking at the data: a large part of the focus was also on checking the GPT-4 datasets for quality (and fixing them) as they quickly discovered that GPT-4 was prone to hallucinations and failing to correctly interpret the code output. Fixing data issues in the final week led to a significant boost in performance. Final push: The result were really amazing and the model climbed to the 1 place. And even more, while tying up for first place on the public, validation leaderboard (28 solved challenges versus 27 for the second place), it really shined when tested on the private, test leaderboard where it took a wide margin solving 29 challenges versus 22 for the second team. As Terence Tao himself set it up, this is "higher than expected" Maybe what's even more impressive about this competition, beside the level of math these models are already capable of is how ressource contraint the participants were actually, having to run inference in a short amont of time on T4 which only let us imagine how powerful these models will become in the coming months. Time seem to be ripe for GenAI to have some impact in science and it's probably one of the most exciting thing AI will bring us in the coming 1-2 year. Accelerating human development and tackling all the real world problems science is able to tackle.

English

499

2.9K

651.2K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·3 Tem

Ever wondered which method to use to infer kinase activities from phosphoproteomics data? 💻💭In collaboration with the Zhang lab @bcmhouston, we performed a comprehensive evaluation of kinase activity inference tools to help answer that question tinyurl.com/yc56axsv 🧵⬇️

English

110

15.1K

Philipp Schäfer retweetledi

Jovan Tanevski@tanevski·11 Haz

🚨 We are looking for a PhD student 🚨 to join the Tanevski Lab and @DenisSchapiro Lab at the Heidelberg University Hospital, working at the intersection of machine learning and spatial omics as part of the Translational Spatial Profiling Center. Please RT karriere.klinikum.uni-heidelberg.de/index.php?ac=j…

English

4.4K

Philipp Schäfer retweetledi

EMBL-EBI@emblebi·2 May

We welcome Julio Saez-Rodriguez @JulioSaezRod as our new Head of Research! Associate Director for EMBL-EBI Services Jo McEntyre @jomcentyre has been promoted to Deputy Director of EMBL-EBI & Rolf Apweiler is stepping back to Associate Director as he looks towards retirement.

English

151

26.2K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·13 Mar

Local heterogeneity of tissues can define their function and predict clinical outcomes. We introduce Kasumi 💻biorxiv.org/content/10.110… to identify spatially localized neighborhoods of intra and intercellular relationships from spatial omics persistent across samples & conditions

English

173

17K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·28 Şub

1) Are you interested in computational approaches to study the immune system 🔎🦠by integrating single-cell + spatial multi-omics data with prior knowledge? Check our review 📖out now in @NatImmunol nature.com/articles/s4159… 🧵

English

173

23.8K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·3 Kas

We are recruiting a PhD student on machine learning methods to analyze spatially resolved omics data in human disease within @ELLISforEurope #ELLISPhD programme

ELLIS@ELLISforEurope

The portal is open: Our #ELLISPhD Program is now accepting applications! Apply by November 15 to work with leading #AI labs across Europe and choose your advisors among 200 top #machinelearning researchers! #JoinELLISforEurope #PhD #PhDProgram #ML ellis.eu/news/ellis-phd…

English

8.2K

Philipp Schäfer@psl_schaefer·15 Eki

@yun_s_song @k_mikulik

QAM

234

Yun S. Song@yun_s_song·14 Eki

We recently posted a preprint describing GPN-MSA, a DNA language model that leverages whole-genome alignments across multiple species while taking only a few hours to train. This thread summarizes its performance on the human genome. doi.org/10.1101/2023.1… 1/12

English

327

94K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·22 Ağu

📣 Introducing LIANA+: an all-in-one cell-cell communication (CCC) framework 🧬 Many tools exist, yet they use various syntaxes and are tailored for a specific purpose LIANA+ harmonises and extends existing methods, enabling their synergistic applications tinyurl.com/lianaplus

English

212

27.4K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·25 Tem

Integrating single-cell & spatial omics with different resolutions is key to understand tissue structure & function 🧬🔬 We had tackled this using multi-objective optimization - we just updated our paper and implemented an #rstats package arxiv.org/abs/2301.01682

English

128

20.5K

Philipp Schäfer retweetledi

Saez-Rodriguez Group@saezlab·21 Haz

📄 Knowledge graphs (KGs) are a key technology in biomedicine, but can be hard to create/share/reuse. To make KGs as accessible as possible we developed the #opensource framework BioCypher: nature.com/articles/s4158…, now peer-reviewed @NatureBiotech. 🧵👇 1/

English

230

57.1K

Philipp Schäfer retweetledi

Kaessmann Lab@kaessmannlab·8 Ara

#Postdoc + #PhD positions available in our lab in a new endeavor - funded by the NOMIS Foundation and supported by @TreutleinLab, @GrayCampLab and Svante Pääbo - to unravel the evolutionary impact of de novo genes in primates! Details here: zmbh.uni-heidelberg.de/Kaessmann/open… Please RT!

English

Keşfet

@emblebi @AndrewBassett43 @mo_lotfollahi @bayraktar_lab @OpenTargets @iskander @s_r_constantin @MariaCHeinz