Joe Davison

2.1K posts

Joe Davison banner
Joe Davison

Joe Davison

@joeddav

AI Research & Eng. @huggingface alumn.

Lehi, UT 🏔 가입일 Ocak 2009
1.1K 팔로잉3K 팔로워
Joe Davison 리트윗함
Nathan Lambert
Nathan Lambert@natolambert·
Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months. At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods. YT playlist and course landing page below.
Nathan Lambert tweet media
English
31
80
613
36.1K
Joe Davison 리트윗함
clem 🤗
clem 🤗@ClementDelangue·
Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-compiled for your exact GPU, PyTorch & OS - Multiple kernel versions coexist in one process - torch.compile compatible - 1.7x–2.5x speedups over PyTorch baselines
English
46
122
906
58.9K
Joe Davison 리트윗함
Adam Zweiger
Adam Zweiger@AdamZweiger·
We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.
Adam Zweiger tweet media
English
23
147
942
128.9K
Joe Davison 리트윗함
Nathan Lambert
Nathan Lambert@natolambert·
Really excited for the things @rosstaylor90 has been building to go out into the world. He's been one of the people I can always rely on to have non-cope takes on what we need to do to make the open ecosystem great. What a great time in RL.
General Reasoning@GenReasoning

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

English
7
18
197
27.6K
Joe Davison 리트윗함
Benjamin Todd
Benjamin Todd@ben_j_todd·
Opus 4.6 is hugely better at Pokemon: • Opus 4.0 took 1,000 hours to get half way through • Opus 4.5 could almost finish in 1,000 hours • Opus 4.6 was another 10x faster!
Benjamin Todd tweet media
English
28
51
766
123.6K
Joe Davison 리트윗함
Perturb.ai
Perturb.ai@perturbai_tx·
Introducing PerturbAI. Today we announced our emergence from stealth with the release of the world’s largest in vivo CRISPR data engine, interrogating the effects of thousands of genetic perturbations across 8 million cells throughout the whole brain. This dataset represents a new category of biological data: organism-level, circuit-resolved causal genomics leading to novel targets and therapeutics. By combining scalable in vivo CRISPR perturbation with AI, we model biological systems at unprecedented resolution and simulate therapeutic interventions before committing to expensive downstream development. We’re grateful to our collaborators at @NVIDIAHealth and @10xGenomics for helping make this landmark dataset possible. Read More: perturb.ai/news #CRISPR #AI #DrugDiscovery #FunctionalGenomics #Biotech
English
7
77
364
60.8K
Joe Davison 리트윗함
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
334
2K
13.5K
5M
Joe Davison 리트윗함
Luca Soldaini 🎀
Luca Soldaini 🎀@soldni·
wait why Opus got a blog but not Golden Gate Claude
English
1
2
13
2K
Joe Davison
Joe Davison@joeddav·
@natolambert What’s your read on that Sonnet 4.5->4.6 leap? Genuine long context capability leap, narrow benchmark-specific phase change, or something else?
English
0
0
0
135
Joe Davison 리트윗함
Nathan Lambert
Nathan Lambert@natolambert·
World will converge on 3 types of models 1. Closed frontier (Ant, OAI, Gemini) 2. Open frontier (2-3 labs, much consolidation coming) 3. Open small / tool (fairly empty now) The open frontier will be far from the closed frontier, but way cheaper. Other statements are cope.
English
43
31
482
36.6K
Joe Davison 리트윗함
Omar Khattab
Omar Khattab@lateinteraction·
The Gemini Embedding 2 baseline here is.. 2 days old. Was just being celebrated and is now outperformed by a median of 14% and up to 91 points. If I didn't kind of know how powerful scaling ColBERTs and ColPalis can be compared to a single-vector model, I'd be in disbelief!
Omar Khattab tweet media
Ben Clavié@bclavie

I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...

English
13
46
692
91.2K
clem 🤗
clem 🤗@ClementDelangue·
I’m back. The girls and their superhero of a mom are doing great 😍😍😍 What did I miss?
clem 🤗@ClementDelangue

After almost 10 years of near nonstop grind, I’m taking 2 months of paternity leave to support my hero of a wife and welcome our twin daughters. @huggingface is in great hands with the team and @julien_c acting as interim CEO. Hope to return a changed man, to an even stronger HF, and to an AI field that’s more open and collaborative than ever!

English
99
8
1.1K
103.8K
Joe Davison 리트윗함
AT
AT@AliesTaha·
was skeptical but gave it a shot because @karpathy anyways 2x kernel perf (fp4 matmul) 3 minutes of work (1 prompt) triton beat cutlass (?!)
AT tweet media
Jaber@Akashi203

i open-sourced autokernel -- autoresearch for GPU kernels you give it any pytorch model. it profiles the model, finds the bottleneck kernels, writes triton replacements, and runs experiments overnight. edit one file, benchmark, keep or revert, repeat forever. same loop as @karpathy autoresearch, applied to kernel optimization 95 experiments. 18 TFLOPS → 187 TFLOPS. 1.31x vs cuBLAS. all autonomous 9 kernel types (matmul, flash attention, fused mlp, layernorm, rmsnorm, softmax, rope, cross entropy, reduce). amdahl's law decides what to optimize next. 5-stage correctness checks before any speedup counts the agent reads program.md (the "research org code"), edits kernel.py, runs bench.py, and either keeps or reverts. ~40 experiments/hour. ~320 overnight ships with self-contained GPT-2, LLaMA, and BERT definitions so you don't need the transformers library to get started github.com/RightNow-AI/au…

English
13
12
244
39.6K
Joe Davison 리트윗함
Will Knight
Will Knight@willknight·
Scoop from me: Nvidia will spend a total of $26 billion over the next five years building the world's best open source models. America is back in the open source AI race! wired.com/story/nvidia-i…
English
82
224
1.9K
689.4K
Joe Davison 리트윗함
Florian Brand
Florian Brand@xeophon·
time to look at your data, anon
Florian Brand tweet media
English
21
10
224
21.8K