Matthew Powers

1K posts

Matthew Powers

@mpowers206

Katılım Eylül 2022

194 Takip Edilen139 Takipçiler

@gordic_aleksa @essential_ai Ack, I can't use float16? Value error, The model type 'gemma3_text' does not support float16. Reason: Numerical instability. Please use bfloat16 or float32 instead.

English

190

Aleksa Gordić (水平问题)@gordic_aleksa·6 Ara

Happy to share what I've been cooking over the past months as a part of a stellar team at @essential_ai labs: we bring to you Rnj-1 (h/t Ramanujan), the best USA 🇺🇸 open-source LLM in the 8B category - fully pretrained, midtrained, and posttrained from scratch on zettaflops of AMD (one of the largest AMD training clusters in the world) and TPU compute! We are releasing our base model, and our post-trained checkpoint on HuggingFace - to help you squeeze the absolute most out of post-training for your particular use case. Our initial evaluations show that Rnj-1 is very strong at code, math, and tool calling. On SWE-Bench it's an order of magnitude stronger than comparably sized models - It scores 20.8% on SWE-bench Verified in bash-only mode, which is higher than Gemini 2.0 flash and Qwen2.5-Coder 32B Instruct and on par with GPT-4o (!) under the same agent framework. Rnj-1 was pretrained on 8.4T tokens with 8k ctx len. Followed by 380B tokens in midtraining, and a 150B SFT stage to get rnj-1-instruct. We used Muon as the optimizer. Tech report coming soon, but see our blog until then. As the AI world drifts to whatever "the current thing" is (at this moment in time that's RL), we're going back to first principles and focusing squarely on pretraining. We believe that many behaviors people assume only emerge during post-training can actually emerge during pretraining -> if you cook the model the right way. :) On a side note - being part of a very small (~20 members of technical staff), tightly knit, hard-working, extremely ambitious team that's working in the same physical space (!!!) is so fun. I was in the office until 2:30 a.m. last night pushing out our latest eval numbers, and a few of my colleagues pulled an all-nighter to help prepare for today's launch! Due to our size, and my background, I feel I’m in a rare position (looking at the AI labs LLM landscape as a whole) because i got to work on the whole LLM pipeline: from our infra, in-house Spark pipelines, and the data analysis engine (did i tell you to look at your f***ing data already?) to data collection/synthesis, data mixing, training experiments, and - last but not least - evals. As a bonus, getting to distill tokens from @ashVaswani on a daily basis is rewarding (we met at a small event w Satya earlier this year). Conspicuously missing from our upcoming tech report are any Transformer modifications, which might come as a surprise given our team. It’s all about research taste and making bets - in our case, that is pretraining, simulating program behaviors.. The easiest way to run Rnj-1: * laptop -> llama.cpp or transformers * your infra -> vLLM, Sglang * IDEs and Agents with Cline extension -> vs code / Cursor or try claude code router Happy to see what you build with it! My dms are open. It's a good model, sir. 🫡

English

494

65.3K

Matthew Powers@mpowers206·6 Ara

@gordic_aleksa @essential_ai Maybe you could add mistralai/Ministral-3-8B-Instruct-2512 to the benchmarks?

English

Matthew Powers@mpowers206·2 Ara

@NayefMistral @MistralAI Are you saying that if I only work with text, it'll be as fast as an LLM that truly has 3B params?

English

746

Nayef Livio Derwiche@NayefMistral·2 Ara

@mpowers206 @MistralAI It's the vision encoder. Image equivalent of a tokenizer.

English

646

Mistral AI@MistralAI·2 Ara

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵

English

169

798

5.4K

1.3M

Matthew Powers@mpowers206·2 Ara

@linoy_tsaban Huh. The model card only compares benchmarks with older models like Flux.1 and Qwen-Image.

English

Linoy Tsaban@linoy_tsaban·1 Ara

HOW IS THERE ANOTHER IMAGE MODEL ALREADY huggingface.co/AIDC-AI/Ovis-I…

English

209

16.3K

Matthew Powers@mpowers206·2 Ara

@MistralAI Hmm. The 3B model actually has 4.25 billion parameters.

English

3.8K

Mistral AI@MistralAI·2 Ara

The world’s best small models—Ministral 3 (14B, 8B, 3B), each released with base, instruct and reasoning versions.

English

725

147.7K

Matthew Powers@mpowers206·30 Kas

@charliermarsh when?

English

155

Charlie Marsh@charliermarsh·30 Kas

ty is coming

English

140.8K

Matthew Powers@mpowers206·26 Kas

@Rafa_Schwinger @AIWarper QLoRA

Italiano

A.I.Warper@AIWarper·25 Kas

If anyone was wondering how much VRAM it takes to train a Flux 2 lora (at this moment in time). This is with purely default settings on AI-Toolkit 66.2 GB / 79.6 GB on an H100

English

139

10.2K

Matthew Powers@mpowers206·26 Kas

@AIWarper FINE. I'll buy a new RTX 6000 Blackwell just for Flux.2

English

300

Matthew Powers@mpowers206·25 Kas

@kimmonismus Don't rule out Ask Jeeves.

English

Chubby♨️@kimmonismus·24 Kas

So we got ... - OpenAI with GPT-5 in the lead.. - then Gemini 3.0.. - now Claude 4.5 Opus.. Should be OpenAIs turn again, right?

English

224

1.6K

104.4K

Matthew Powers@mpowers206·24 Kas

@OutragePNW My favorites are Midnight Moon and Merlot Bellavitano, but they're both quite expensive (at QFC or Fred Meyer).

English

Matthew Powers@mpowers206·24 Kas

@OutragePNW I prefer this English white cheddar from Costco. Much cheaper, and it has more flavor.

English

235

Matthew Powers@mpowers206·24 Kas

@OutragePNW P.S. Go Vandals!

English

Matthew Powers@mpowers206·22 Kas

@__tinygrad__ Cuz my Linux box has four GPUs

English

388

the tiny corp@__tinygrad__·21 Kas

Why are you SSHing? Just plug your GPU into your Mac!

English

190.3K

Matthew Powers@mpowers206·18 Kas

@awpthorp @kevinhou22 @antigravity Same here. I wonder if it has to do with importing my SSH tunneling stuff from Cursor.

English

201

Alex 💪@awpthorp·18 Kas

@kevinhou22 @antigravity Seems stuck on this for me when connecting my Google Workspace account, 20 mins +

English

2.9K

Kevin Hou@kevinhou22·18 Kas

Today, our team launched Google Antigravity. - Agent-first IDE powered by Gemini 3 Pro 🧠 - Browser control to test your apps automatically 🤖 - Agent Manager to orchestrate parallel agents ♾️ Stoked to keep shipping with the @antigravity team. This is going to be fun.

English

241

280

3.3K

263.4K

Matthew Powers@mpowers206·18 Kas

@kevinhou22 @antigravity Been stuck here for 10 minutes.

English

257

Matthew Powers@mpowers206·17 Kas

@vikhyatk It works on my machine.

English

vik@vikhyatk·17 Kas

i've reviewed my own PR and found nothing wrong with it

English

1.3K

45.9K

Matthew Powers@mpowers206·17 Kas

@PaulMarcoe Yeah, I couldn't watch the game on the "cloud DVR" cuz it was so clunky (and often pixelated). Watching the live game on Xfinity was fine, though.

English

Paul Marcoe | PNW Photographer@PaulMarcoe·17 Kas

Anyone else’s Comcast glitching on the Seahawks game?

English

107

5.8K

Matthew Powers@mpowers206·12 Kas

@burkov Because it's free (for now), I like to use Grok Code Fast for easy tasks within Cursor...to avoid exceeding my $20/month plan. I use Sonnet 4.5 for more difficult, critical stuff.

English

BURKOV@burkov·12 Kas

All my experiences with Grok 4 for coding have been negative. Not worse compared to other models, but entirely useless. Look at this example. I asked it to fix a bug. After several minutes of thinking, it spit out just the code without any commentary. I asked to explain the problem and the solution, and Grok had no idea what I was talking about. I asked again by referring to my initial message, and it just hallucinated the explanation. It had absolutely nothing to do with my code. I really doubt that Grok is as popular among the people who use it for coding as it's claimed to be. Maybe the model is indeed good, I don't know, but with this UX, it's just entirely useless.

English

13.5K

Matthew Powers@mpowers206·12 Kas

@MorganKIRO7 My buddy in Pullman can see them quite well right now!

English

Morgan Palmer@MorganKIRO7·12 Kas

Auroral activity ramping up across the eastern U.S. We'll see if we get a glow on the horizon here in the Northwest before clouds move in. Will be close! #wawx #northernlights