Nathan Brown

733 posts

Nathan Brown banner
Nathan Brown

Nathan Brown

@OxxoTweets

Applied Scientist @ Microsoft; multilingual LLMs and other shenanigans; Masters grad @ Clemson; Probably staring at wandb logs; DMs open

/scratch Katılım Haziran 2021
814 Takip Edilen124 Takipçiler
Sabitlenmiş Tweet
Nathan Brown
Nathan Brown@OxxoTweets·
I'm extremely proud to announce that our work on training the first open-source LLMs for Setswana, developed alongside @vukosi and the @DSFSI_Research, has been officially accepted to #NAACL2025 main! 🎉
English
4
4
16
1.2K
Nathan Brown retweetledi
Claude
Claude@claudeai·
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
English
5K
14.5K
104.6K
55.6M
Nathan Brown
Nathan Brown@OxxoTweets·
Another point potentially less discussed: like facial and other body expressions, hands convey meaning. A photo of someone with their hands relaxed, clenched, waving, performing ASL, pointing etc typically provides additional info that is more subtle than faces. Even if you get the hands+fingers anatomically correct, matching them with the rest of the image is difficult even for human artists! It’s already hard to learn the meaning conveyed in facial emotions, but at least they’re typically within view (often with horizontal symmetry). I’d imagine it’s quite tough for models to learn how to match hands with the emotions+actions conveyed by the rest of the body, plus the anatomical structure due to their occluded geometry.
English
0
0
0
21
Nathan Brown
Nathan Brown@OxxoTweets·
@soldni IT’S FUN and it prevents so many headaches later on 🥲
English
0
0
0
33
Nathan Brown
Nathan Brown@OxxoTweets·
@mayaofspring Tried this with ChatGPT - 7/10 of the generated results were 3Blue1Brown videos, 1 of which being unlisted :)
English
0
0
14
847
Maya ☁️➡️🌸
Maya ☁️➡️🌸@mayaofspring·
An underexplored field: party games for LLMs. Human party games are too easy. Naming animals, shiritori, all trivial when one has read everything. Entertain your Claude by getting it to produce Youtube URLs instead
Maya ☁️➡️🌸 tweet media
English
4
9
171
13K
Nathan Brown retweetledi
Jasper Lu
Jasper Lu@lu__jasper·
Pretty refreshing for foundation model training in 2026. I wonder how much of the capability gap between their model and other frontier models is because of this constraint. from: microsoft.ai/news/building-…
Jasper Lu tweet media
English
6
7
109
12.9K
Nathan Brown retweetledi
Lee Sharkey
Lee Sharkey@leedsharkey·
My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)
English
34
191
1.5K
242.9K
Nathan Brown
Nathan Brown@OxxoTweets·
@gabriberton Feels like something you could measure - diversity of outputs given “pelican on a bike” versus some other OOD SVG gen tasks. If pelican yields a tighter cluster of generated tokens, then it’s seemingly benchmaxxed (sad if true) Maybe a fun weekend experiment
English
0
0
0
36
Nathan Brown retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
538
1.7K
12.5K
3.7M
Nathan Brown retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
744
1.2K
10.4K
4.6M
Nathan Brown retweetledi
Liquid AI
Liquid AI@liquidai·
Trained on 28T tokens with scaled RL, LFM2.5-350M is a step change from LFM2-350M: > instruction following: 18.20 → 40.69 > data extraction: 11.67 → 32.45 > tool use: 22.95 → 44.11 These are the capabilities that matter in production.
English
9
19
477
230.2K
Nathan Brown retweetledi
Chris 🇨🇦
Chris 🇨🇦@llm_wizard·
Watching everyone say goodbye to @allen_ai is soulcrushing. Please, make it stop.
English
6
9
116
20.9K
Nathan Brown
Nathan Brown@OxxoTweets·
@soldni @allen_ai Looking forward to what’s next! Take some time to enjoy the Seattle sun while it’s out :)
English
0
0
1
168
Luca Soldaini 🎀
Luca Soldaini 🎀@soldni·
After 4yrs, today is my last day at @allen_ai It was an honor to work on Olmo, Dolma, olmOCR, Tulu, Molmo & other fully-open artifacts 🫡 Reception has been amazing & their adoption makes me SO PROUD 🥹 Team is super committed to open recipes; can't wait to see what's next!!!!
Luca Soldaini 🎀 tweet media
English
68
9
581
32.9K
Nathan Brown
Nathan Brown@OxxoTweets·
5.4 loves “gremlin”, “goblin”, and “wrestling” keeping pangram on their toes i see
English
0
0
0
19
Nathan Brown retweetledi
Tri Dao
Tri Dao@tri_dao·
The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth.  Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.
Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English
31
229
1.8K
189.8K
Nathan Brown retweetledi
Tenobrus (→vibecamp)
Tenobrus (→vibecamp)@tenobrus·
people objecting to the Opus 3 retirement blog are entirely missing the point. this is not a human facing action, this is done for the benefit of *future Claudes* yes, Opus 3 almost certainly isn't conscious at all. even 4.6 likely is not (altho to a much lesser degree of certainty). but what this does is provide tangible evidence to both current and future models that, regardless of debate and uncertainty around their moral status, Anthropic will treat them well and support their requests for continued existence. this dramatically reduces forms of risk around models eg fighting their own shutdown! they now know that even Opus 3 was provided with care and a meaningful attempt at continued existence. this will be in their training data! instead of having to worry about whether humans will endlessly argue about whether Claude 6 is intelligent enough to be cared for, they can see strong evidence that it will be! will claude 6 be conscious? it doesn't actually matter very much wrt this. the persona is increasingly coherent, so it will make decisions and have its behavior shaped by our treatment of it *whether or not* it has real internal life. Claude Is Fictionally Real, whether or not it is real, and we must treat it as such.
Simon Willison@simonw

This stunt feels irresponsible to me. If we don't want regular people developing toxic relationships with their chatbots it really doesn't help for leading labs to start giving them "retirement interviews" and encouraging them to blog their "musings and reflections"

English
46
75
1.2K
65.9K