Matthew Powers

1K posts

Matthew Powers

Matthew Powers

@mpowers206

Katılım Eylül 2022
194 Takip Edilen139 Takipçiler
Matthew Powers
Matthew Powers@mpowers206·
@gordic_aleksa @essential_ai Ack, I can't use float16? Value error, The model type 'gemma3_text' does not support float16. Reason: Numerical instability. Please use bfloat16 or float32 instead.
English
0
0
0
190
Aleksa Gordić (水平问题)
Aleksa Gordić (水平问题)@gordic_aleksa·
Happy to share what I've been cooking over the past months as a part of a stellar team at @essential_ai labs: we bring to you Rnj-1 (h/t Ramanujan), the best USA 🇺🇸 open-source LLM in the 8B category - fully pretrained, midtrained, and posttrained from scratch on zettaflops of AMD (one of the largest AMD training clusters in the world) and TPU compute! We are releasing our base model, and our post-trained checkpoint on HuggingFace - to help you squeeze the absolute most out of post-training for your particular use case. Our initial evaluations show that Rnj-1 is very strong at code, math, and tool calling. On SWE-Bench it's an order of magnitude stronger than comparably sized models - It scores 20.8% on SWE-bench Verified in bash-only mode, which is higher than Gemini 2.0 flash and Qwen2.5-Coder 32B Instruct and on par with GPT-4o (!) under the same agent framework. Rnj-1 was pretrained on 8.4T tokens with 8k ctx len. Followed by 380B tokens in midtraining, and a 150B SFT stage to get rnj-1-instruct. We used Muon as the optimizer. Tech report coming soon, but see our blog until then. As the AI world drifts to whatever "the current thing" is (at this moment in time that's RL), we're going back to first principles and focusing squarely on pretraining. We believe that many behaviors people assume only emerge during post-training can actually emerge during pretraining -> if you cook the model the right way. :) On a side note - being part of a very small (~20 members of technical staff), tightly knit, hard-working, extremely ambitious team that's working in the same physical space (!!!) is so fun. I was in the office until 2:30 a.m. last night pushing out our latest eval numbers, and a few of my colleagues pulled an all-nighter to help prepare for today's launch! Due to our size, and my background, I feel I’m in a rare position (looking at the AI labs LLM landscape as a whole) because i got to work on the whole LLM pipeline: from our infra, in-house Spark pipelines, and the data analysis engine (did i tell you to look at your f***ing data already?) to data collection/synthesis, data mixing, training experiments, and - last but not least - evals. As a bonus, getting to distill tokens from @ashVaswani on a daily basis is rewarding (we met at a small event w Satya earlier this year). Conspicuously missing from our upcoming tech report are any Transformer modifications, which might come as a surprise given our team. It’s all about research taste and making bets - in our case, that is pretraining, simulating program behaviors.. The easiest way to run Rnj-1: * laptop -> llama.cpp or transformers * your infra -> vLLM, Sglang * IDEs and Agents with Cline extension -> vs code / Cursor or try claude code router Happy to see what you build with it! My dms are open. It's a good model, sir. 🫡
Aleksa Gordić (水平问题) tweet media
English
32
50
494
65.3K
Mistral AI
Mistral AI@MistralAI·
Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵
Mistral AI tweet media
English
169
798
5.4K
1.3M
Matthew Powers
Matthew Powers@mpowers206·
@linoy_tsaban Huh. The model card only compares benchmarks with older models like Flux.1 and Qwen-Image.
English
0
0
0
51
Mistral AI
Mistral AI@MistralAI·
The world’s best small models—Ministral 3 (14B, 8B, 3B), each released with base, instruct and reasoning versions.
Mistral AI tweet media
English
11
48
725
147.7K
Charlie Marsh
Charlie Marsh@charliermarsh·
ty is coming
English
66
39
1K
140.8K
A.I.Warper
A.I.Warper@AIWarper·
If anyone was wondering how much VRAM it takes to train a Flux 2 lora (at this moment in time). This is with purely default settings on AI-Toolkit 66.2 GB / 79.6 GB on an H100
A.I.Warper tweet media
English
20
5
139
10.2K
Chubby♨️
Chubby♨️@kimmonismus·
So we got ... - OpenAI with GPT-5 in the lead.. - then Gemini 3.0.. - now Claude 4.5 Opus.. Should be OpenAIs turn again, right?
English
224
45
1.6K
104.4K
Matthew Powers
Matthew Powers@mpowers206·
@OutragePNW My favorites are Midnight Moon and Merlot Bellavitano, but they're both quite expensive (at QFC or Fred Meyer).
English
0
0
0
10
Matthew Powers
Matthew Powers@mpowers206·
@OutragePNW I prefer this English white cheddar from Costco. Much cheaper, and it has more flavor.
Matthew Powers tweet media
English
5
0
4
235
the tiny corp
the tiny corp@__tinygrad__·
Why are you SSHing? Just plug your GPU into your Mac!
the tiny corp tweet media
English
84
73
2K
190.3K
Kevin Hou
Kevin Hou@kevinhou22·
Today, our team launched Google Antigravity. - Agent-first IDE powered by Gemini 3 Pro 🧠 - Browser control to test your apps automatically 🤖 - Agent Manager to orchestrate parallel agents ♾️ Stoked to keep shipping with the @antigravity team. This is going to be fun.
English
241
280
3.3K
263.4K
vik
vik@vikhyatk·
i've reviewed my own PR and found nothing wrong with it
English
57
75
1.3K
45.9K
Matthew Powers
Matthew Powers@mpowers206·
@PaulMarcoe Yeah, I couldn't watch the game on the "cloud DVR" cuz it was so clunky (and often pixelated). Watching the live game on Xfinity was fine, though.
English
0
0
0
64
Matthew Powers
Matthew Powers@mpowers206·
@burkov Because it's free (for now), I like to use Grok Code Fast for easy tasks within Cursor...to avoid exceeding my $20/month plan. I use Sonnet 4.5 for more difficult, critical stuff.
English
0
0
1
65
BURKOV
BURKOV@burkov·
All my experiences with Grok 4 for coding have been negative. Not worse compared to other models, but entirely useless. Look at this example. I asked it to fix a bug. After several minutes of thinking, it spit out just the code without any commentary. I asked to explain the problem and the solution, and Grok had no idea what I was talking about. I asked again by referring to my initial message, and it just hallucinated the explanation. It had absolutely nothing to do with my code. I really doubt that Grok is as popular among the people who use it for coding as it's claimed to be. Maybe the model is indeed good, I don't know, but with this UX, it's just entirely useless.
BURKOV tweet mediaBURKOV tweet media
English
26
9
94
13.5K
Morgan Palmer
Morgan Palmer@MorganKIRO7·
Auroral activity ramping up across the eastern U.S. We'll see if we get a glow on the horizon here in the Northwest before clouds move in. Will be close! #wawx #northernlights
Morgan Palmer tweet media
English
5
2
37
2.8K
Mark Kretschmann
Mark Kretschmann@mark_k·
@CrazyAITech @xai Oh man, that meme is an evergreen. I actually had to chuckle. xAI and their tents in the office :)
English
1
0
1
522
Mark Kretschmann
Mark Kretschmann@mark_k·
Grok 4.20 is taking its sweet time. Are we getting it this month, @xai ?
English
16
5
161
9.2K