Hen Davidov

140 posts

Hen Davidov

@hendav136

Rhodes Scholar, StatML CDT @UniOfOxford | @Forbes 30U30 2025 | Ex- @AmazonScience , @TechnionLive

Katılım Mart 2022

407 Takip Edilen226 Takipçiler

Sabitlenmiş Tweet

Hen Davidov@hendav136·6 May

Recently accepted to ICML 2026: Knowing when to quit. We refuse to complete failing LLM generations as they unfold, token by token, instead of judging the prompt up front or scanning the final output. The method utilizes a 2-layer probe to estimate expected correctness.

English

14.1K

Hen Davidov@hendav136·13h

@JoshPurtell Wait but if they use the value function (critic) trained on each completed trajectory verified reward, what difference does it make the split into subtraces? Am I missing a prm here?

English

Josh@JoshPurtell·1d

Valhalla! OSS is liberated from the GRPO grift at last No hate to DeepSeek for innovating on it as a solid baseline for math. But, good riddance

Z.ai@Zai_org

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai

English

287

60.2K

Hen Davidov@hendav136·1d

@saranormous See ya there!

English

Hen Davidov@hendav136·2d

@yoavgo קצת יותר נמוך מהסיכוי לחנינה

עברית

248

(((ل()(ل() 'yoav))))👾@yoavgo·2d

מה הסיכוי שהוא מתפטר?

עברית

6.1K

Hen Davidov@hendav136·6d

@ReddyAarya36823 @noclearnogga Lol dope

English

Hen Davidov@hendav136·8 Haz

Hen, first year PhD, pretraining an LLM with academia compute for the first time. Wish me luck.

English

301

44.2K

Hen Davidov@hendav136·6d

@redtachyon The Fable nerf came right after the first tweet

English

Ariel@redtachyon·11 Haz

@hendav136 What?

English

1.1K

Hen Davidov retweetledi

Hen Davidov@hendav136·10 Haz

Someone at anthropic saw my tweet I guess? Can't explain the timing any other way

Hen Davidov@hendav136

Hen, first year PhD, pretraining an LLM with academia compute for the first time. Wish me luck.

English

15K

Hen Davidov@hendav136·9 Haz

@ThakurAryancha4 Best of luck with that!

English

186

the_king_in_yellow@ThakurAryancha4·9 Haz

@hendav136 Good luck dude I am also currently doing patent research in Ai specially edge ai

English

136

Hen Davidov@hendav136·9 Haz

@ThakurAryancha4 Starting with a tiny gpt3-esque 100m param, and building up on the positive signals

English

709

the_king_in_yellow@ThakurAryancha4·9 Haz

@hendav136 Which base model ?

English

686

Hen Davidov@hendav136·9 Haz

@noclearnogga Let's start by changing your profile pic from a dude with a swastica on his forehead?

English

354

Noclear Nogga@noclearnogga·9 Haz

@hendav136 bro, im studying Physics but considering switching to CS, which i love, but I'm worried AI may automate much of what a CS degree teaches. My interests are AI theory, robotics and maybe quantum computing. If you were starting today, what would you do? I'd like working in deep tech

English

442

Hen Davidov@hendav136·9 Haz

@mesken_stefan Thank you!!

English

727

Stefan Mesken@mesken_stefan·9 Haz

@hendav136 You got this 💪

English

797

Hen Davidov@hendav136·9 Haz

@ruth_rallph Thank you!

English

447

Ruth Ralph@ruth_rallph·9 Haz

@hendav136 All the best Hen!

English

500

Hen Davidov@hendav136·9 Haz

@JerrryKun I have a transformers arch bet in mind, hoping to make a dent in this possibly saturated line of work

English

1.3K

Zhikun Xu@JerrryKun·9 Haz

@hendav136 🫡 (pretraining for what?

English

1.4K

Hen Davidov@hendav136·9 Haz

@gradascetic Less than you'd expect. Compute is kinda siloed off in different labs. I mainly use my lab's gpus

English

1.9K

edward@gradascetic·9 Haz

@hendav136 How much compute does ox stats have now?

English

Hen Davidov@hendav136·9 Haz

@ZacharyBamberg1 Thank you! Right now I'm just working off of a llama style recipe, iterating on my arch bet

English

Zachary Bamberger@ZacharyBamberg1·8 Haz

@hendav136 Happy to compare notes if you’re feeling confident nonetheless 😅 (It helps to pre-train decoder-only models with existing kernels — avoid pre-training encoder-decoders like the plague)

English

Hen Davidov@hendav136·9 Haz

@yoav_gelberg I'm not in the kindergarten bro lets be real

English

1.1K

Yoav Gelberg@yoav_gelberg·8 Haz

@hendav136 Don’t forget to qk norm

English

1.4K

Hen Davidov@hendav136·5 Haz

@Besteuler Thank you!

English

Weiyang Liu@Besteuler·4 Haz

@hendav136 You can check Appendix C in our paper. The actual cost depends on whether you use 2nd momentum in Pion. Without it, Pion will be more efficient than both Muon and AdamW.

English

Weiyang Liu@Besteuler·13 May

🚀 We are very excited to introduce Pion — a spectrum-preserving optimizer for LLM training. Pion shows strong training stability in practice. This is the project that we have been working on since POET. The central idea is to turn POET/POET-X (spherelab.ai/poet; spherelab.ai/poetx) into an easy-to-use optimizer. Instead of additive updates like Adam & Muon, Pion takes a different route by updating each weight matrix via coupled left & right orthogonal transformations, keeping its weight spectrum stable throughout training. This different update mechanism is directly inspired by the empirical effectiveness of Orthogonal Finetuning (oft.wyliu.com; boft.wyliu.com) and POET/POET-X, with roots tracing back to minimum-energy training methods such as MHE (arxiv.org/abs/1805.09298) and OPT (opt-training.github.io). Pion's key features: ✨ Competitive on LLM pretraining, SFT & RLVR ✨ μP-compatible by construction ✨ Stabily trains ultra-deep LLMs and even normalization-free LLMs, where AdamW & Muon diverge 🌐 Project: spherelab.ai/pion 📜 Paper: arxiv.org/abs/2605.12492

English

120

14.7K

Hen Davidov@hendav136·1 Haz

God I love science sometimes

Eric Topol@EricTopol

There hasn't previously been a treatment vs pancreatic cancer this successful. Striking improved (a > doubling) survival results @NEJM and @ASCO today with daraxonrasib, which also became available via an FDA approved early access program and began shipping to physicians this week @RevMedicines nejm.org/doi/full/10.10…

English

1.1K

Keşfet

@JoshPurtell @saranormous @yoavgo @ReddyAarya36823 @noclearnogga @redtachyon @ThakurAryancha4 @mesken_stefan