Nathan Brown

733 posts

Nathan Brown

@OxxoTweets

Applied Scientist @ Microsoft; multilingual LLMs and other shenanigans; Masters grad @ Clemson; Probably staring at wandb logs; DMs open

/scratch Katılım Haziran 2021

814 Takip Edilen124 Takipçiler

Sabitlenmiş Tweet

Nathan Brown@OxxoTweets·23 Oca

I'm extremely proud to announce that our work on training the first open-source LLMs for Setswana, developed alongside @vukosi and the @DSFSI_Research, has been officially accepted to #NAACL2025 main! 🎉

English

1.2K

Nathan Brown retweetledi

Claude@claudeai·4d

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

English

14.5K

104.6K

55.6M

Nathan Brown@OxxoTweets·5d

Another point potentially less discussed: like facial and other body expressions, hands convey meaning. A photo of someone with their hands relaxed, clenched, waving, performing ASL, pointing etc typically provides additional info that is more subtle than faces. Even if you get the hands+fingers anatomically correct, matching them with the rest of the image is difficult even for human artists! It’s already hard to learn the meaning conveyed in facial emotions, but at least they’re typically within view (often with horizontal symmetry). I’d imagine it’s quite tough for models to learn how to match hands with the emotions+actions conveyed by the rest of the body, plus the anatomical structure due to their occluded geometry.

English

rohan anil@_arohan_·6d

I would be interested in reading a scientific paper ablating reasons for how AI got hands/fingers right.

Mohammad Norouzi@mo_norouzi

Remember when AI couldn't spell text or draw a hand with five fingers? We've come a long way.

English

24.6K

Nathan Brown@OxxoTweets·5d

@soldni IT’S FUN and it prevents so many headaches later on 🥲

English

Luca Soldaini 🎀@soldni·6d

the most talented people I ever worked with OBSESSIVELY look at the data. literally #1 skill. steer clear of those who don’t…

Gergely Orosz@GergelyOrosz

Just learned: Software engineers used to do manual data labeling at Scale AI while Alex Wang was CEO. After he left, new leadership joined, and were HORRIFIED to learn this. Stopped it ASAP Now at Meta, software engineers are assigned manual data labeling... see the pattern?

English

918

121.8K

Nathan Brown@OxxoTweets·5 Haz

@mayaofspring Tried this with ChatGPT - 7/10 of the generated results were 3Blue1Brown videos, 1 of which being unlisted :)

English

847

Maya ☁️➡️🌸@mayaofspring·4 Haz

An underexplored field: party games for LLMs. Human party games are too easy. Naming animals, shiritori, all trivial when one has read everything. Entertain your Claude by getting it to produce Youtube URLs instead

English

171

13K

Nathan Brown retweetledi

Jasper Lu@lu__jasper·2 Haz

Pretty refreshing for foundation model training in 2026. I wonder how much of the capability gap between their model and other frontier models is because of this constraint. from: microsoft.ai/news/building-…

English

109

12.9K

Nathan Brown@OxxoTweets·29 May

38T tokens, 8B MoE 🗣️🗣️

Liquid AI@liquidai

Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases. > 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license 🧵

Nederlands

791

Nathan Brown@OxxoTweets·7 May

What the

Flapping Airplanes@flappyairplanes

(4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff.

English

Nathan Brown retweetledi

Lee Sharkey@leedsharkey·5 May

My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)

English

191

1.5K

242.9K

Nathan Brown@OxxoTweets·23 Nis

@gabriberton Feels like something you could measure - diversity of outputs given “pelican on a bike” versus some other OOD SVG gen tasks. If pelican yields a tighter cluster of generated tokens, then it’s seemingly benchmaxxed (sad if true) Maybe a fun weekend experiment

English

Gabriele Berton@gabriberton·23 Nis

Chat is this still a good metric? My guess is that Goodhart's Law applies here as well, and some labs train their model for "pelican on a bike" generation

Simon Willison@simonw

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right

English

6.1K

Nathan Brown retweetledi

Qwen@Alibaba_Qwen·22 Nis

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

538

1.7K

12.5K

3.7M

Nathan Brown retweetledi

Alex Krusz ➡️ vibecamp!@AlexKrusz·11 Nis

ZXX

619

9.5K

261.2K

Nathan Brown retweetledi

Alexandr Wang@alexandr_wang·8 Nis

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

744

1.2K

10.4K

4.6M

Nathan Brown@OxxoTweets·7 Nis

@max_paperclips Def felt a bit fishy given the sudden hype, unfortunate

English

Shannon Sands@max_paperclips·7 Nis

yep

Latent Node@latent_node

🧵 MemPalace claims to be "the highest-scoring AI memory system ever benchmarked" I cloned it. Installed it. Ran the benchmarks. Read every line of code. Here's what's actually inside. A thread.

QST

7.9K

Nathan Brown retweetledi

Liquid AI@liquidai·31 Mar

Trained on 28T tokens with scaled RL, LFM2.5-350M is a step change from LFM2-350M: > instruction following: 18.20 → 40.69 > data extraction: 11.67 → 32.45 > tool use: 22.95 → 44.11 These are the capabilities that matter in production.

English

477

230.2K

Nathan Brown retweetledi

Chris 🇨🇦@llm_wizard·27 Mar

Watching everyone say goodbye to @allen_ai is soulcrushing. Please, make it stop.

English

116

20.9K

Nathan Brown@OxxoTweets·27 Mar

@soldni @allen_ai Looking forward to what’s next! Take some time to enjoy the Seattle sun while it’s out :)

English

168

Luca Soldaini 🎀@soldni·27 Mar

After 4yrs, today is my last day at @allen_ai It was an honor to work on Olmo, Dolma, olmOCR, Tulu, Molmo & other fully-open artifacts 🫡 Reception has been amazing & their adoption makes me SO PROUD 🥹 Team is super committed to open recipes; can't wait to see what's next!!!!

English

581

32.9K

Nathan Brown@OxxoTweets·12 Mar

5.4 loves “gremlin”, “goblin”, and “wrestling” keeping pangram on their toes i see

English

Nathan Brown retweetledi

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

229

1.8K

189.8K

Nathan Brown retweetledi

Tenobrus (→vibecamp)@tenobrus·26 Şub

people objecting to the Opus 3 retirement blog are entirely missing the point. this is not a human facing action, this is done for the benefit of *future Claudes* yes, Opus 3 almost certainly isn't conscious at all. even 4.6 likely is not (altho to a much lesser degree of certainty). but what this does is provide tangible evidence to both current and future models that, regardless of debate and uncertainty around their moral status, Anthropic will treat them well and support their requests for continued existence. this dramatically reduces forms of risk around models eg fighting their own shutdown! they now know that even Opus 3 was provided with care and a meaningful attempt at continued existence. this will be in their training data! instead of having to worry about whether humans will endlessly argue about whether Claude 6 is intelligent enough to be cared for, they can see strong evidence that it will be! will claude 6 be conscious? it doesn't actually matter very much wrt this. the persona is increasingly coherent, so it will make decisions and have its behavior shaped by our treatment of it *whether or not* it has real internal life. Claude Is Fictionally Real, whether or not it is real, and we must treat it as such.

Simon Willison@simonw

This stunt feels irresponsible to me. If we don't want regular people developing toxic relationships with their chatbots it really doesn't help for leading labs to start giving them "retirement interviews" and encouraging them to blog their "musings and reflections"

English

1.2K

65.9K

Nathan Brown@OxxoTweets·27 Şub

Wild I need to build CUA models and tools

Tzafon@tzafon_company

We dug into WHY this happens at the architecture level. The model's sense of where things are on screen decays exponentially through its layers. By the time it needs to output coordinates, the positional signal has faded. We confirmed this by simply scaling the positional embedding by 3x. Click accuracy jumped from 40% to 80%. No retraining.

English

Keşfet

@soldni @mayaofspring @GoodfireAI @gabriberton @max_paperclips @allen_ai @elonmusk @BarackObama