Benjamin Warner

618 posts

Benjamin Warner

@benjamin_warner

Research @SophontAI. Previously answerdotai. Vaccines save lives.

Se unió Eylül 2011

484 Siguiendo2.4K Seguidores

Tweet fijado

Benjamin Warner@benjamin_warner·19 Ara

Today we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.

English

11.3K

Benjamin Warner@benjamin_warner·18h

ModernBERT is the base model which keeps on delivering.

Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English

2.2K

Benjamin Warner@benjamin_warner·4d

DenseNet for LLMs

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

Norsk

522

Benjamin Warner@benjamin_warner·4d

Anyone who looked at Z’s IPO isn’t surprised by this move. Their current economic model of open releases isn’t sustainable.

Z.ai@Zai_org

Note: As an experimental version, GLM-5-Turbo is currently closed-source. All capabilities and findings will be incorporated into our next open-source model release.

English

623

Benjamin Warner@benjamin_warner·5d

@waydegilliam Almost everything other than UX. It's better at GPU kernel writing, debugging, you name it. GPT-5.4 is better than 5.3 at inferring intent, but Codex is still more literal than Claude, so if you don't change how you prompt Codex, it'll probably seem worse then CC.

English

Wayde Gilliam@waydegilliam·5d

@benjamin_warner What has codex succeeded in that CC didn’t for you?

English

Wayde Gilliam@waydegilliam·6d

One of the reasons I stick with CC as a full stack dev. The produced UX is night and day.

Asish Kumar@asishcodes

Codex seriously sucks at frontend. There’s something seriously wrong with ChatGPT 5.4 on the frontend part.

English

520

Benjamin Warner retuiteado

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·6d

A good discussion about evaluating LLMs in medicine by the head of Health AI at OpenAI, highly recommend reading. Appreciate the shoutout for Medmarks, our LLM evaluation suite developed at @SophontAI/@MedARC_AI. Glad to see even folks at frontier labs are finding it useful!

Karan Singhal@thekaransinghal

x.com/i/article/2032…

English

141

22.8K

Benjamin Warner retuiteado

Labomen@labomen001·12 Mar

@cursor_ai Here's the graph with the same data, but plotted against the actual output cost for each (Composer 1.5 output from Cursor docs is $17.5). Although this doesn't account for >200K Opus 4.6/>272K GPT 5.4/Gemini 3.1 >200K.

English

113

28.3K

Benjamin Warner retuiteado

Ben Clavié@bclavie·12 Mar

I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...

Mixedbread@mixedbreadai

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

English

410

138.3K

Benjamin Warner@benjamin_warner·11 Mar

Similarly, people post obvious Google AI Overview mistakes and assume it's representative instead of one of the worst models Google has. AI Overview is negative advertising for Gemini.

Elai@elaifresh

This is how a gorillion normies had first contact with AI Yes indeed no wonder many people think the whole concept is worthless slop

English

257

Benjamin Warner retuiteado

Timothy B. Lee@binarybits·10 Mar

Ed is a prominent "AI is a bubble" guy and he's basically sticking his fingers in his ears and shouting "la la la la I can't hear you."

Ed Zitron@edzitron

If we assume that annualized revenue refers to four weeks of revenue multiplied by twelve, this would mean that Anthropic made $1.16 billion - or more than 23% of its LIFETIME REVENUE - in the period leading up to Feb 12 2026. That doesn’t seem likely. anthropic.com/news/anthropic…

English

262

34K

Benjamin Warner@benjamin_warner·9 Mar

GPT 5.4 in Codex needs to be able to delegate simpler coding work to 5.3-Spark.

English

199

Benjamin Warner@benjamin_warner·7 Mar

@mariofilhoml Often medium is good enough.

English

Mario Filho@mariofilhoml·7 Mar

My timeline is split between people that believe 5.4 XHigh is the best thing ever And people that believe 5.4 High is enough A minority don't like 5.4 at all

English

174

Benjamin Warner@benjamin_warner·6 Mar

Does FA4 have a consumer GPU optimized path?

catid@MrCatid

FA2 is faster than FA4 on RTX5090/Pro 6000 GPUs

English

435

Benjamin Warner retuiteado

PyTorch@PyTorch·5 Mar

FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads. 🖇️ Read our latest blog here: hubs.la/Q045FHPh0 No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI

English

731

99.6K

Benjamin Warner@benjamin_warner·5 Mar

@shizhediao The license is going to prevent it from being used.

English

214

Shizhe Diao@shizhediao·5 Mar

Time to upgrade your pretraining dataset. Instead of FineWeb-EDU / DCLM / X, try ClimbMix-400B. 📄 Paper: arxiv.org/pdf/2504.13161 📦 Data: huggingface.co/datasets/nvidi… CLIMBMix uses clustering-based iterative data mixture to improve pretraining efficiency and data quality. Would love to see the community experiment with it and push it further 🚀

Shizhe Diao@shizhediao

Nemotron-CLIMBMix is now becoming the default recipe in nanochat speedrun. During the Time-to-GPT-2 Leaderboard experiments started by @karpathy, the community revisited CLIMBMix and found that it delivers by far the single biggest improvement to nanochat’s GPT-2 speedrun time. It’s incredibly rewarding to see the idea validated and adopted by the community. Huge thanks to everyone who experimented with it and pushed it forward 🚀 #L42" target="_blank" rel="nofollow noopener">github.com/karpathy/nanoc…

English

165

27K

Benjamin Warner@benjamin_warner·5 Mar

@OfirPress You can right now with these two buttons

English

Benjamin Warner@benjamin_warner·4 Mar

New optimizer paper and library which looks to be an improvement on the low precision error correction I've used in optimi and quantization used for optimizers in bitsandbytes et al.

Davis Blalock@davisblalock

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]

English

368

Benjamin Warner@benjamin_warner·2 Mar

@wightmanr @BlancheMinerva Given the referenced attack was with a previous generation of Claude and the best open models are approaching (or already reached) that level of performance, I don't see how one can come to any conclusion other then it's just a matter of time.

English

Ross Wightman@wightmanr·2 Mar

There's already a number of pen test oriented LLM attack orchestration demos/tools like HexStrike (open source), Cobalt Strike + LLM. And aside from being used to orchestrate attacks. LLM coding tools are definitely at a level where they can help build attack tools, especially for those that may have been lacking in their coding ability but not their imagination.

English

1.2K

Stella Biderman@BlancheMinerva·2 Mar

It's very common for people to claim that open LLMs will be used to commit cyber attacks at massive scale. What public evidence is there for this claim? The best (and one of the only) accounts I've seen of a cyber LLM attack was done using Claude anthropic.com/news/disruptin…

English

6.5K

Benjamin Warner@benjamin_warner·27 Şub

@alexisgallagher Very cool

English

Benjamin Warner retuiteado

Alexis Gallagher@alexisgallagher·27 Şub

I am thrilled and honored that Sparky and I were selected winners for NVIDIA GTC Golden Ticket. Here's how he received the news.

NVIDIA GTC@NVIDIAGTC

Congratulations to our #NVIDIAGTC Golden Ticket winners 🎉: @alexisgallagher Brandon I. Hans B. Julia S. Lluís D. Marco D. Tarique S. You’re headed to GTC! We’ll be reaching out soon with next steps to claim your prize. Thank you to our partners for collaborating with NVIDIA on the 2026 Golden Ticket Developer Contest: @huggingface / @pollenrobotics, @ollama, @ethroboticsclub, and @googlecloud. Stay tuned for one more winner reveal 👀

English

106

14.7K

Descubrir

@waydegilliam @SophontAI @MedARC_AI @cursor_ai @mariofilhoml @shizhediao @OfirPress @elonmusk