Joe Davison (@joeddav) - Twitter 프로필 | Zamantika Mersobahis Locabet

Joe Davison 리트윗함

Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months. At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods. YT playlist and course landing page below.

English

31

80

613

36.1K

Joe Davison 리트윗함

clem 🤗@ClementDelangue·6h

Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-compiled for your exact GPU, PyTorch & OS - Multiple kernel versions coexist in one process - torch.compile compatible - 1.7x–2.5x speedups over PyTorch baselines

English

46

122

906

58.9K

Joe Davison 리트윗함

Adam Zweiger@AdamZweiger·19 Şub

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

23

147

942

128.9K

Joe Davison 리트윗함

Nathan Lambert@natolambert·24 Mar

Really excited for the things @rosstaylor90 has been building to go out into the world. He's been one of the people I can always rely on to have non-cope takes on what we need to do to make the open ecosystem great. What a great time in RL.

General Reasoning@GenReasoning

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

English

7

18

197

27.6K

Joe Davison 리트윗함

Benjamin Todd@ben_j_todd·20 Mar

Opus 4.6 is hugely better at Pokemon: • Opus 4.0 took 1,000 hours to get half way through • Opus 4.5 could almost finish in 1,000 hours • Opus 4.6 was another 10x faster!

English

28

51

766

123.6K

Joe Davison 리트윗함

Perturb.ai@perturbai_tx·17 Mar

Introducing PerturbAI. Today we announced our emergence from stealth with the release of the world’s largest in vivo CRISPR data engine, interrogating the effects of thousands of genetic perturbations across 8 million cells throughout the whole brain. This dataset represents a new category of biological data: organism-level, circuit-resolved causal genomics leading to novel targets and therapeutics. By combining scalable in vivo CRISPR perturbation with AI, we model biological systems at unprecedented resolution and simulate therapeutic interventions before committing to expensive downstream development. We’re grateful to our collaborators at @NVIDIAHealth and @10xGenomics for helping make this landmark dataset possible. Read More: perturb.ai/news #CRISPR #AI #DrugDiscovery #FunctionalGenomics #Biotech

English

7

77

364

60.8K

Joe Davison@joeddav·16 Mar

Elegant idea. Instead of adding up layer residuals, use attention in depth This makes me more bullish that depth- and width-recurrence are not so fundamentally different. Someone working on depth-recurrent “loop” LMs 👇 should try recurrent attention arxiv.org/abs/2510.25741…

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

0

2

312

Joe Davison 리트윗함

Kimi.ai@Kimi_Moonshot·16 Mar

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

334

2K

13.5K

5M

Joe Davison 리트윗함

Sebastian Raschka@rasbt·15 Mar

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

202

1.4K

8.2K

720.4K

Joe Davison@joeddav·14 Mar

We’ve come full circle: there’s now a special term for thinking only in weights, without CoT

Neel Nanda@NeelNanda5

Out of context reasoning is one of the most fascinating developments in the science of how LLMs work. This primer by @OwainEvans_UK, one of the main discoverers of the phenomena, is a great introduction

English

0

4

480

Joe Davison 리트윗함

Luca Soldaini 🎀@soldni·13 Mar

wait why Opus got a blog but not Golden Gate Claude

English

1

2

13

2K

Joe Davison@joeddav·14 Mar

@natolambert What’s your read on that Sonnet 4.5->4.6 leap? Genuine long context capability leap, narrow benchmark-specific phase change, or something else?

English

0

135

Nathan Lambert@natolambert·13 Mar

I expect a lot more remarkable plots like this showing how fast the frontier models are progessing. Benchmarks aren't always everything, but hillclimbing is happening so fast.

Claude@claudeai

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.

English

7

9

175

15.8K

Joe Davison 리트윗함

Nathan Lambert@natolambert·13 Mar

World will converge on 3 types of models 1. Closed frontier (Ant, OAI, Gemini) 2. Open frontier (2-3 labs, much consolidation coming) 3. Open small / tool (fairly empty now) The open frontier will be far from the closed frontier, but way cheaper. Other statements are cope.

English

43

31

482

36.6K

Joe Davison 리트윗함

akira@realmcore_·12 Mar

x.com/i/article/2031…

ZXX

52

195

2K

647.3K

Joe Davison 리트윗함

Omar Khattab@lateinteraction·12 Mar

The Gemini Embedding 2 baseline here is.. 2 days old. Was just being celebrated and is now outperformed by a median of 14% and up to 91 points. If I didn't kind of know how powerful scaling ColBERTs and ColPalis can be compared to a single-vector model, I'd be in disbelief!

Ben Clavié@bclavie

I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...

English

13

46

692

91.2K

Joe Davison@joeddav·12 Mar

@ClementDelangue @abidlabs I missed the original tweet—huge congrats!

English

0

2

148

clem 🤗@ClementDelangue·11 Mar

I’m back. The girls and their superhero of a mom are doing great 😍😍😍 What did I miss?

clem 🤗@ClementDelangue

After almost 10 years of near nonstop grind, I’m taking 2 months of paternity leave to support my hero of a wife and welcome our twin daughters. @huggingface is in great hands with the team and @julien_c acting as interim CEO. Hope to return a changed man, to an even stronger HF, and to an AI field that’s more open and collaborative than ever!

English

99

8

1.1K

103.8K

Joe Davison 리트윗함

AT@AliesTaha·12 Mar

was skeptical but gave it a shot because @karpathy anyways 2x kernel perf (fp4 matmul) 3 minutes of work (1 prompt) triton beat cutlass (?!)

Jaber@Akashi203

i open-sourced autokernel -- autoresearch for GPU kernels you give it any pytorch model. it profiles the model, finds the bottleneck kernels, writes triton replacements, and runs experiments overnight. edit one file, benchmark, keep or revert, repeat forever. same loop as @karpathy autoresearch, applied to kernel optimization 95 experiments. 18 TFLOPS → 187 TFLOPS. 1.31x vs cuBLAS. all autonomous 9 kernel types (matmul, flash attention, fused mlp, layernorm, rmsnorm, softmax, rope, cross entropy, reduce). amdahl's law decides what to optimize next. 5-stage correctness checks before any speedup counts the agent reads program.md (the "research org code"), edits kernel.py, runs bench.py, and either keeps or reverts. ~40 experiments/hour. ~320 overnight ships with self-contained GPT-2, LLaMA, and BERT definitions so you don't need the transformers library to get started github.com/RightNow-AI/au…

English

13

12

244

39.6K

Joe Davison 리트윗함

Will Knight@willknight·11 Mar

Scoop from me: Nvidia will spend a total of $26 billion over the next five years building the world's best open source models. America is back in the open source AI race! wired.com/story/nvidia-i…

English

82

224

1.9K

689.4K

Joe Davison 리트윗함

Florian Brand@xeophon·11 Mar

time to look at your data, anon

English

21

10

224

21.8K

Joe Davison

탐색